Legal Definition of ownership is (according to Black’s Law Dictionary):
“The complete dominion, title, or proprietary right in a thing or claim. Ownership may be absolute or qualified, legal or equitable.”
While this definition provides a structural framework for property rights, it raises deeper philosophical and critical questions about the nature of ownership itself.
If ownership is fundamentally about gathering and choosing, then is it a necessity, a compulsion, or a mere illusion? Is it possible that the need to assert ownership is not just a legal convention but a fundamental human trait, rooted in the psychological desire for security, identity, and control?
Which we then made ‘legally binding’.
And for a fun fact. The word ‘legal’, generally comes from Latin. It’s base ‘lex’ is believed to derive from the Proto-Indo-European (PIE) root *leg-, meaning “to collect, gather, or choose.”
Ownership, as an abstract concept, has long been a subject of philosophical inquiry, often challenging its assumed legitimacy.
John Locke argued in his Second Treatise of Government that private property originates from labor—that by mixing one’s labor with nature, one “owns” the result. This labor theory of property suggests that ownership is not just a social construct but a natural extension of human effort. This however creates a paradox – could we ever own anything that wasn’t produced, but simply exists in nature?
Karl Marx viewed ownership, particularly private property in capital, as a mechanism of economic and class oppression. He distinguished personal property (objects of use) from private property (means of production), arguing that ownership is not an individual right but a social power dynamic. At the same time stripping an individual from any rights to things they have. The paradox it created was that we couldn’t possibly own anything? Only use things. Is this right?
Philosophers like Nietzsche and Foucault challenge whether ownership is even a fixed reality. In postmodern thought, ownership is not a tangible truth but a linguistic and legal fiction—an agreement upheld by societal structures rather than an objective fact. Could we exist as a society without simply agreeing boundaries? Could we ever achieve a level that everything is shared and thus owned by nobody? It’s fantastic that our minds can challenge reality, but it does not mean reality will go away. Neither will the need for practical solutions.
In the modern age, ownership is increasingly ambiguous. The rise of digital assets, intellectual property, and communal economies (such as the sharing economy) complicates traditional notions of property rights. We no longer just “own” things in a physical sense—our data, ideas, and even identities are subject to ownership claims by corporations and governments.
I would take these deliberations further – does an author own his own creation, or the people who buy his work and validate their existence (making his creations irrelevant, if nobody wants to pay for them?)? The author has the copyright, but once a consumer buys their creation, they now own the physical copy and, more importantly, they own its interpretation.
Foucault and Barthes both argued that “the author is dead”—that the moment a work is released, its meaning belongs to its audience, not to the person who wrote it. This suggests that ownership of an idea, a story, or a song is a shared experience, not an absolute right.
This logic applies far beyond the arts. The tech world is full of examples where an invention is legally owned by its creator but has no value or relevance until it is adopted by the market. Ideas—however brilliant—only become owned, recognized, and powerful when they are economically validated.
These deliberations and a very inspiring discussion recently, has led me to ask myself – can one own an AI model?
First, I needed to dig a little deeper to understand what is an AI model in technical terms. As I am most familiar with AWS, this is where I have focused. And then simplified things to avoid listing everything.
One of the bigger shockers to me was, how much human input is needed on each stage of this process. Not all data is applicable and while you can write many algorithms on what to avoid or correct, you still need people to decide on data quality and to monitor it / adjust the algorithms. This applies to every stage from collecting, through normalization, training, fine-tuning and output control.
Collecting the data
The first stage is data collection and storage, where massive amounts of information are gathered. This data can come from human inputs, labeled datasets, scraped text, or even artificially generated content. The data is typically stored in cloud storage services like AWS S3 or structured databases like AWS RDS or DynamoDB.
Normalization
Once the data is collected, it needs to go through preprocessing and cleaning. This stage involves removing irrelevant content, normalizing text, and structuring the data so the AI can use it effectively. Humans play a role here in defining what data should be included and ensuring it meets quality standards. Processing large datasets at scale requires cloud-based compute resources, such as AWS Lambda for automated data pipelines or AWS Glue for large-scale transformations. If heavy computation is needed, GPU-based EC2 instances are often used.
Training
The most resource-intensive part of building an AI model is training. This is where the AI “learns” by analyzing vast amounts of data, recognizing patterns, and adjusting its internal parameters (neural network weights). Training requires specialized hardware, often thousands of NVIDIA A100 or H100 GPUs, running for weeks or even months. Companies typically use AWS EC2 P5 instances, which provide GPU acceleration for deep learning. Some also use AWS SageMaker, a managed service that handles the complexity of training AI models without needing to configure infrastructure manually.
Fine-tuning
After training, the model needs fine-tuning and reinforcement, often involving human feedback. People review outputs, rank responses, and provide corrections, helping the model refine its accuracy. This process, known as Reinforcement Learning with Human Feedback (RLHF), is commonly outsourced through platforms like Amazon Mechanical Turk (basically a marketplace for tasks requiring human inputs). At this stage, the AI is adjusting to better reflect human expectations and biases, ensuring it produces more useful results.
Output control
When the AI is ready for deployment, it needs to handle inference and real-time matching. This is where it receives inputs (such as user queries), searches its trained knowledge, and generates appropriate responses. The matching happens within a trained neural network, using probability-based calculations to determine the best output. Running inference efficiently requires optimized compute resources like AWS EC2 Inf2 instances, which are designed for real-time AI workloads. If a company wants a fully managed AI service, they might deploy the model on AWS SageMaker Endpoints instead.
Resources
A critical aspect of AI deployment is memory and continuous learning. Most AI models don’t have long-term memory, but they can track interactions to personalize responses. User data, interactions, and feedback loops are often stored in AWS DynamoDB or high-speed caches like Redis. This enables the AI to improve over time by recognizing repeated user behaviors and adjusting its outputs accordingly.
Governance
Finally, the AI model must be continuously monitored and governed. This is where humans play another key role—ensuring that the AI doesn’t generate harmful or biased content. AI governance tools like AWS CloudWatch and AWS GuardDuty track system performance, security, and ethical considerations, helping maintain compliance with regulations.
Cost?
Now, what does it take to build an AI model like GPT-3.5? To train a large language model from scratch, you need enormous compute power, typically requiring 1,000+ high-end GPUs, weeks of computation, and petabytes of storage. The cost of such an undertaking can range between $15M and $25M*, depending on optimizations. Running the AI after training also requires significant cloud resources, with monthly inference costs easily reaching $100k to $500k** depending on the number of users. Smaller models, such as LLaMA-2 (7B parameters), are more affordable to train, costing around $500k to $1M***.
For most companies, training an AI model from scratch is impractical. Ultimately, AI ownership is not just about building the model but also about sustaining it—ensuring it remains efficient, up-to-date, and aligned with human values.
There are some shortcuts.
Instead of training an AI model from scratch, which is costly and time-consuming, shortcuts like model cloning, transfer learning, fine-tuning, and distillation can drastically reduce effort. Model cloning involves directly using a pre-trained AI model (e.g., LLaMA-2) without modifications. Transfer learning adapts an existing model by retraining only specific layers on new data, making it faster and resource-efficient. Fine-tuning adjusts a pre-trained model with targeted data to specialize it for specific tasks. Knowledge distillation compresses a large model into a smaller, more efficient version while retaining most of its intelligence. These techniques allow organizations to leverage powerful AI without the extreme costs of full-scale training. But all of these assume, someone else has paid the price first.
Verdict
Legally, ownership is defined by proprietary rights, patents, and data control, yet philosophically, AI exists in a liminal space between creator and user, between programmed rules and emergent behavior. If an AI model is trained on collective human knowledge, shaped by interactions, and continuously evolving, does it belong to the engineers who built it, the data sources that trained it, or the users who give it meaning through engagement? Much like an author loses sole ownership of their creation the moment it is bought, an AI model—once deployed—ceases to be a static possession and instead becomes a shared entity, shaped by interpretation, interaction, and the fluidity of digital existence.
And for more practical reasons related to cost and effort – no, you are currently not likely to be able to own your own AI model.
Here are some of the assumptions I made for costs:
*Big models
PU Cost per Instance per Hour (AWS P5 – A100 – p4d.24xlarge): $30
Total GPU Instances Used: 1,000 GPUs
Total Training Hours (~30 days): 680 hours
Total Compute Cost (GPUs): $20,400,000
Storage Cost per Petabyte per Month: $20,000
Total Storage Cost (1 PB): $20,000
Networking Cost per Month: $50,000
ML Engineer Salary per Month (per engineer): $100,000
Number of ML Engineers: 10 engineers
Total Human Labor Cost (10 ML Engineers for 1 month): $1,000,000
Total Estimated Cost for AI Training: $21,470,000
**Inference costs
GPU Cost per Instance per Hour (AWS Inf2 – Optimized for Inference – inf2.48xlarge): $12
Total GPU Instances Used: 20 GPUs
Total Compute Cost per Hour (20 GPUs): $240
Total Compute Cost for Inference (1 month, 24/7 runtime): $172,800
Storage Cost per Petabyte per Month: $20,000
Total Storage Cost (0.5 PB, model weights): $10,000
Networking Cost per Month (AWS VPC for API requests, user queries, etc.): $30,000
ML Engineer Salary per Month (1 engineer for monitoring & maintenance): $100,000
Total Human Labor Cost (1 ML Engineer for 1 month): $100,000
Total Estimated Cost for LLaMA-2 (7B) Inference per Month (20 GPUs): $312,800
***LLaMA-2
GPU Cost per Instance per Hour (AWS P5 – A100 – p4de.24xlarge): $40
Total GPU Instances Used: 100 GPUs
Total Compute Cost per Hour (100 GPUs): $4,000
Total Training Hours (7 days): 168 hours
Total Compute Cost for Training (7 days): $672,000
Storage Cost per Petabyte per Month: $20,000
Total Storage Cost (1 PB): $20,000
Networking Cost per Month (AWS VPC): $25,000
ML Engineer Salary per Month (1 engineer): $100,000
Total Human Labor Cost (1 ML Engineer for 1 month): $100,000
Total Estimated Cost for Training LLaMA-2 (7B) with 100 GPUs and 7 Days: $817,000
Leave a comment