Why is Llama open sourced?
We’ll get to the answer in a bit, but just in case you need help understanding what Llama is:
Llama (Large Language Model Meta AI) is a family of autoregressive large language models (LLMs), released by Meta AI starting in February 2023.
On April 18, 2024, Meta released Llama-3 with two sizes: 8B and 70B parameters. The models have been pre-trained on approximately 15 trillion tokens of text gathered from “publicly available sources” with the instruct models fine-tuned on “publicly available instruction datasets, as well as over 10M human-annotated examples”. Meta plans on releasing multimodal models, models capable of conversing in multiple languages, and models with larger context windows. A version with 400B+ parameters is currently being trained
So what, you might say. There’s the OG ChatGPT, there’s Claude, there’s Gemini… so one more comes along. Ho hum.
Well, if you say that, you’d be very, very wrong.
Why would you be wrong?
In contrast, the most powerful LLMs have generally been accessible only through limited APIs (if at all), Meta released LLaMA’s model weights to the research community under a noncommercial license
Why does this matter? Because, as our good friend Claude explains:
“The release of the Llama model by Meta under a noncommercial license is a significant development in the field of large language models (LLMs) and artificial intelligence more broadly. Here’s why it matters:
- Accessibility: Before this, the most powerful LLMs were usually kept secret by the companies that made them. For example, OpenAI’s GPT-3 model could only be used through a paid interface, like a vending machine you put money into to get a result. By releasing Llama’s “model weights” – essentially the knowledge the AI has learned – Meta has allowed researchers and hobbyists to experiment with and build upon a cutting-edge language model. It’s like they’ve given away the recipe for a powerful tool, not just limited access to using the tool itself.
- Democratization of AI: Restricting access to top LLMs meant that only a handful of big corporations could really use this powerful technology. Imagine if only a few factories could use electricity. An open-source model changes the game significantly. It empowers way more people to explore creative uses of language models and lowers the barriers to innovation in this space. It’s like the difference between a few people having libraries versus everyone having access to all the books.
- Cost: Using LLMs through paid interfaces can get expensive quickly, putting them out of reach for many. It’s like having to rent a supercomputer every time you want to use one. With access to the model weights themselves, people can run the model on their own computers, dramatically reducing costs. This opens up experimentation to students, researchers, startups and others with limited budgets.
- Customization: When you can only access a model through an interface, you’re limited to pre-defined uses, like ordering off a set menu at a restaurant. Having the actual model provides much more flexibility to tailor and fine-tune it for specific applications and domains. This could lead to an explosion of niche language models customized for particular industries or use cases – imagine a model specifically trained to understand and generate legal jargon, or one tuned for writing poetry.
- Reproducibility and Transparency: In scientific research, it’s crucial to be able to reproduce results. Using an API is like a black box – you can’t see how the model works under the hood, you just get the output. With the model weights, the exact workings of the model can be scrutinized, enabling more robust research and understanding of how these models function. It’s like being able to examine the engine of a car instead of just looking at the exterior.
Model weights are the key to how a language model works. They’re essentially the “knowledge” the model has learned during training. In a neural network (which is what most modern language models are), the weights are the strength of the connections between the neurons. These weights determine how the model responds to a given input, like how a brain’s neural connections determine a person’s response to a question. By releasing the weights, Meta has provided the “source code” of their model, allowing others to understand how it works, modify it, and use it for their own purposes.
While the noncommercial license does place some limits on how Llama can be used (you couldn’t start a company selling access to it, for example), the release of the model is still a major shift in the AI landscape that could have far-reaching effects on research, innovation, and accessibility of this transformative technology. We’re likely to see a proliferation of new applications and rapid progress in natural language AI as a result.”
You don’t just get the dish to eat, as Claude puts it, but you get the recipe so that you can try and recreate (and modify) the recipe at home. Not all of us have specialized cooking equipment at home, but those of us who do can get cooking very quickly indeed.
Speaking of cooking, have you seen this excellent series from Epicurious called 4 Levels? Chefs of varying expertise (home cook to the pros) are invited to cook the same dish, but with varying levels of expertise, ingredients and equipment.
That’s what the 8 billion, 70 billion and 400 billion parameter models are all about. Same idea (recipe), but different capabilities and “equipment”.
But why do this? If Gemini, Claude and ChatGPT are giving away basic versions for free and premium versions for 20 USD per month, then why is Meta not just giving away all versions for free… but also giving away the recipe itself?
Because game theory! (Do read the tweet linked here in its entirety, what follows is a much more concise summarization):
- You can get janta to do the debugging of the model for you.
- If social debugging and optimization of models makes AI so kickass that AI friends can replace all your friends, then who owns the technology to make these friends “wins” social media. Nobody does, because janta is doing the work for “everybody”. So sure, maybe Mark bhau doesn’t win… but hey, nobody else does either!
- The nobody else does point is the really important point here, because by open sourcing these models, he is making sure that Gemini, Claude and ChatGPT compete against everybody out there. In other words, everybody works for Mark bhau for free, but not to help Mark win, but to help make sure the others don’t win.
The economics of AI is a fascinating thing to think about, let alone the technological capabilities of AI. I hope to write more about this in the coming days, but whatay topic, with whatay complexities. Yay!
All this is based on just one tweet sourced from a ridiculously long (and that is a compliment, believe me) blog post by TheZvi on Dwarkesh’s podcast with Mark Zuckerberg. Both are worth spending a lot of time over, and I plan to do just that – and it is my sincere recommendation that you do the same.