Mark Zuckerberg Teaches Us the TMKK of Game Theory

Why is Llama open sourced?

We’ll get to the answer in a bit, but just in case you need help understanding what Llama is:

Llama (Large Language Model Meta AI) is a family of autoregressive large language models (LLMs), released by Meta AI starting in February 2023.
On April 18, 2024, Meta released Llama-3 with two sizes: 8B and 70B parameters. The models have been pre-trained on approximately 15 trillion tokens of text gathered from “publicly available sources” with the instruct models fine-tuned on “publicly available instruction datasets, as well as over 10M human-annotated examples”. Meta plans on releasing multimodal models, models capable of conversing in multiple languages, and models with larger context windows. A version with 400B+ parameters is currently being trained

So what, you might say. There’s the OG ChatGPT, there’s Claude, there’s Gemini… so one more comes along. Ho hum.


Well, if you say that, you’d be very, very wrong.

Why would you be wrong?

Because of this:

In contrast, the most powerful LLMs have generally been accessible only through limited APIs (if at all), Meta released LLaMA’s model weights to the research community under a noncommercial license

Why does this matter? Because, as our good friend Claude explains:

“The release of the Llama model by Meta under a noncommercial license is a significant development in the field of large language models (LLMs) and artificial intelligence more broadly. Here’s why it matters:

  1. Accessibility: Before this, the most powerful LLMs were usually kept secret by the companies that made them. For example, OpenAI’s GPT-3 model could only be used through a paid interface, like a vending machine you put money into to get a result. By releasing Llama’s “model weights” – essentially the knowledge the AI has learned – Meta has allowed researchers and hobbyists to experiment with and build upon a cutting-edge language model. It’s like they’ve given away the recipe for a powerful tool, not just limited access to using the tool itself.
  2. Democratization of AI: Restricting access to top LLMs meant that only a handful of big corporations could really use this powerful technology. Imagine if only a few factories could use electricity. An open-source model changes the game significantly. It empowers way more people to explore creative uses of language models and lowers the barriers to innovation in this space. It’s like the difference between a few people having libraries versus everyone having access to all the books.
  3. Cost: Using LLMs through paid interfaces can get expensive quickly, putting them out of reach for many. It’s like having to rent a supercomputer every time you want to use one. With access to the model weights themselves, people can run the model on their own computers, dramatically reducing costs. This opens up experimentation to students, researchers, startups and others with limited budgets.
  4. Customization: When you can only access a model through an interface, you’re limited to pre-defined uses, like ordering off a set menu at a restaurant. Having the actual model provides much more flexibility to tailor and fine-tune it for specific applications and domains. This could lead to an explosion of niche language models customized for particular industries or use cases – imagine a model specifically trained to understand and generate legal jargon, or one tuned for writing poetry.
  5. Reproducibility and Transparency: In scientific research, it’s crucial to be able to reproduce results. Using an API is like a black box – you can’t see how the model works under the hood, you just get the output. With the model weights, the exact workings of the model can be scrutinized, enabling more robust research and understanding of how these models function. It’s like being able to examine the engine of a car instead of just looking at the exterior.

Model weights are the key to how a language model works. They’re essentially the “knowledge” the model has learned during training. In a neural network (which is what most modern language models are), the weights are the strength of the connections between the neurons. These weights determine how the model responds to a given input, like how a brain’s neural connections determine a person’s response to a question. By releasing the weights, Meta has provided the “source code” of their model, allowing others to understand how it works, modify it, and use it for their own purposes.

While the noncommercial license does place some limits on how Llama can be used (you couldn’t start a company selling access to it, for example), the release of the model is still a major shift in the AI landscape that could have far-reaching effects on research, innovation, and accessibility of this transformative technology. We’re likely to see a proliferation of new applications and rapid progress in natural language AI as a result.”


You don’t just get the dish to eat, as Claude puts it, but you get the recipe so that you can try and recreate (and modify) the recipe at home. Not all of us have specialized cooking equipment at home, but those of us who do can get cooking very quickly indeed.

Speaking of cooking, have you seen this excellent series from Epicurious called 4 Levels? Chefs of varying expertise (home cook to the pros) are invited to cook the same dish, but with varying levels of expertise, ingredients and equipment.

Source

That’s what the 8 billion, 70 billion and 400 billion parameter models are all about. Same idea (recipe), but different capabilities and “equipment”.


But why do this? If Gemini, Claude and ChatGPT are giving away basic versions for free and premium versions for 20 USD per month, then why is Meta not just giving away all versions for free… but also giving away the recipe itself?

Because game theory! (Do read the tweet linked here in its entirety, what follows is a much more concise summarization):

  1. You can get janta to do the debugging of the model for you.
  2. If social debugging and optimization of models makes AI so kickass that AI friends can replace all your friends, then who owns the technology to make these friends “wins” social media. Nobody does, because janta is doing the work for “everybody”. So sure, maybe Mark bhau doesn’t win… but hey, nobody else does either!
  3. The nobody else does point is the really important point here, because by open sourcing these models, he is making sure that Gemini, Claude and ChatGPT compete against everybody out there. In other words, everybody works for Mark bhau for free, but not to help Mark win, but to help make sure the others don’t win.

The economics of AI is a fascinating thing to think about, let alone the technological capabilities of AI. I hope to write more about this in the coming days, but whatay topic, with whatay complexities. Yay!

All this is based on just one tweet sourced from a ridiculously long (and that is a compliment, believe me) blog post by TheZvi on Dwarkesh’s podcast with Mark Zuckerberg. Both are worth spending a lot of time over, and I plan to do just that – and it is my sincere recommendation that you do the same.

Write The Harder Version

Ben Thompson writes a lovely (as usual) essay about the latest Meta-Microsoft partnership. There’s a lot to think about and ponder in that essay, but for the moment, I want to just focus on a part of it that appears in the introduction:

That was why this Article was going to be easy: writing that Meta’s metaverse wasn’t very compelling would slot right in to most people’s mental models, prompting likes and retweets instead of skeptical emails; arguing that Meta should focus on its core business would appeal to shareholders concerned about the money and attention devoted to a vision they feared was unrealistic. Stating that Zuckerberg got it wrong would provide comfortable distance from not just an interview subject but also a company that I have defended in its ongoing dispute with Apple over privacy and advertising.
Indeed, you can sense my skepticism in the most recent episode of Sharp Tech, which was recorded after seeing the video but before trying the Quest Pro. See, that was the turning point: I was really impressed, and that makes this Article much harder to write.

https://stratechery.com/2022/meta-meets-microsoft/

When you’re writing about a particular topic, and particularly if you write often enough, you realize that there are two ways to go about it: the easy way, and the hard way. The easy way isn’t necessarily about slacking off – in fact, part of the reason it might be easy to write is precisely because you haven’t bene slacking off for a long time in terms of writing regularly.

Doing so – writing regularly, that is – gives you a way of thinking about what to write – a mental framework that lays out the broad contours of your write-up, a way to begin the first paragraph, and even a nice rhetorical flourish with which to end.

I speak from personal experience – every now and then, I can see the blogpost that will be written by me while I’m reading something. And this is a truly wonderful superpower – the ability to know that you can churn out a somewhat decent-ish piece about something in very short order. Which is why both writing regularly and writing with self-imposed deadlines is on balance a good thing.


But there is, alas, no such thing as a free lunch. The downside of this is that one also then develops the inability to push oneself more. Why bother coming up with a different way of thinking what to write about, and how to go about it? Even if you’ve developed the intuition while reading something that your regular mental framework will do just fine, and it might well be what your audience is expecting from you anyways, you know that you really should be framing it in a different way. Either because that’s really what the subject matter at hand demands, or because you’re somehow convinced that this new, different way will result in a better framing – but you just know it in your bones.

That’s the hard bit: should you then stick to what you know and thump out a piece, or should you take the time to pause, reflect and push yourself to build out a better essay? Should you pursue that contrarian take, even though it might take longer?

And if you have a regular schedule to keep up with, the answer need not necessarily be yes. But I would argue that every now and then, it does make sense to take a step back, allow yourself the luxury of time, and write the more difficult piece instead.

Yes it will take longer, and yes it will be more tiring, but now what to do. Such is life.


All that being said, three quick points about Ben’s essay that really stood out for me:

  1. What is Mark Zuckerberg optimizing for with this move, and what cost to himself and his firm? Why? Weirdly, it would seem as if he is pushing the technology (VR) at the cost of at least the short-term growth of his firm, and he seems to be fine with it. Huh.
  2. Who are likely to be the early adopters of your service, and how likely are they to eventually become your marketers for free is a question that never goes away, but remains underrated.
  3. I’ve never used a VR headset, but even after reading Ben’s article, it becomes difficult to see why this might take off at current costs – and those costs aren’t just monetary, but also about mass adoption, inconveniences and technological limitations. I just don’t get it (which, of course, is a good thing. More to learn!)

The Three Article Problem

I’ve been mulling over three separate columns/posts/interviews over the past few days. Today’s post was supposed to be me reflecting on my thoughts about all of them together, but as it turns out, I have more questions than I do thoughts.

Worse (or if you think like I do, better) I don’t even have a framework to go through these questions in my own head. That is to say, I do not have a mental model that helps me think about which questions to ask first, and which later, and why.

So this is not me copping out from writing today’s post. This is me asking all of you for help. What framework should I be using to think about these three pieces of content together?

All three posts revolve around technology, and two are about the Chinese tech crackdown. Two are about innovation in tech and America. And one of the three is, obviously, the intersection set.


The first is a write-up from Noah Smith’s Substack (which you should read, and if you can afford it, pay for. Note that I am well over my budget for subscribing to content for this year, so I don’t. But based on what I have read of his free posts, I have no hesitation in recommending it to you.)

In other words, the crackdown on China’s internet industry seems to be part of the country’s emerging national industrial policy. Instead of simply letting local governments throw resources at whatever they think will produce rapid growth (the strategy in the 90s and early 00s), China’s top leaders are now trying to direct the country’s industrial mix toward what they think will serve the nation as a whole.
And what do they think will serve the nation as a whole? My guess is: Power. Geopolitical and military power for the People’s Republic of China, relative to its rival nations.
If you’re going to fight a cold war or a hot war against the U.S. or Japan or India or whoever, you need a bunch of military hardware. That means you need materials, engines, fuel, engineering and design, and so on. You also need chips to run that hardware, because military tech is increasingly software-driven. And of course you need firmware as well. You’ll also need surveillance capability, for keeping an eye on your opponents, for any attempts you make to destabilize them, and for maintaining social control in case they try to destabilize you.

https://noahpinion.substack.com/p/why-is-china-smashing-its-tech-industry

As always, read the whole thing. But in particular, read his excerpts from Dan Wang’s letters from 2019 and 2020. It goes without saying that you should subscribe to Dan Wang’s annual letters (here are past EFE posts that mention Dan Wang). As Noah Smith says, China is optimizing for power, and is willing to pay for it by sacrificing, at least in part, the “consumer internet”.

That makes sense, in the sense that I understand the argument.


The second is an excellent column in the Economist, from its business section. Schumpeter is a column worth reading almost always, but this edition in particular was really thought-provoking. The column starts off by comparing how China and the United States of America are dealing with the influence of “big” technology firms.

As the column says, when it comes to the following:

  1. The speed with which China has dealt with the problem
  2. The scope of its tech crackdown
  3. The harshness of the punishments (fines is just one part of the Chinese government’s arsenal)

… China has America beat hollow. As Noah Smith argues, China is optimizing for power, and has done so for ages. As he mentions elsewhere in his essay, “in classic CCP fashion, it was time to smash”. Well, they have.

But the concluding paragraph of the Schumpeter column is worth savoring in full, and over multiple mugs of coffee:

But autarky carries its own risks. Already, Chinese tech darlings are cancelling plans to issue shares in America, derailing a gravy train that allowed Chinese firms listed there to reach a market value of nearly $2trn. The techlash also risks stifling the animal spirits that make China a hotbed of innovation. Ironically, at just the moment China is applying water torture to its tech giants, both it and America are seeing a flurry of digital competition, as incumbents invade each other’s turf and are taken on by new challengers. It is a time for encouragement, not crackdowns. Instead of tearing down the tech giants, American trustbusters should strengthen what has always served the country best: free markets, rule of law and due process. That is the one lesson America can teach China. It is the most important lesson of all.

https://www.economist.com/business/2021/07/24/china-offers-a-masterclass-in-how-to-humble-big-tech-right

This makes sense, in the sense that I understand the argument being made. Given what little I understand of economics and how the world works, I am in complete agreement with the idea being espoused.


The third is an interview of Mark Zuckerberg by Casey Newton of the Verge.

It is a difficult interview to read, and it is also a great argument for why we should all read more science fiction (note that the title of today’s post is a little bit meta, and that in more ways than one). Read books by Neal Stephenson. Listen to his conversation with Tyler Cowen. Read these essays by Matthew Ball.

Towards the end of the interview, Casey Newton asks Mark Zuckerberg about the role of the government, and the importance of public spaces, in the metaverse. Don’t worry right now if the concept of the metaverse seems a little abstract. Twenty years ago, driverless cars and small devices that could stream for you all of the world’s content (ever produced) also seemed a little abstract. Techno-optimism is great, I heavily recommend it to you.

Here is Mark Zuckerberg’s answer:

I certainly think that there should be public spaces. I think that’s important for having healthy communities and a healthy sphere. And I think that those spaces range from things that are government-built or administered, to nonprofits, which I guess are technically private, but are operating in the public interest without a profit goal. So you think about things like Wikipedia, which I think is really like a public good, even though it’s run by a nonprofit, not a government.
One of the things that I’ve been thinking about a lot is: there are a set of big technology problems today that, it’s almost like 50 years ago the government, I guess I’m talking about the US government here specifically, would have invested a ton in building out these things. But now in this country, that’s not quite how it’s working. Instead, you have a number of Big Tech companies or big companies that are investing in building out this infrastructure. And I don’t know, maybe that’s the right way for it to work. When 5G is rolled out, it’s tough for a startup to really go fund the tens of billions of dollars of infrastructure to go do that. So, you have Verizon and AT&T and T-Mobile do it, and that’s pretty good, I guess.
But there are a bunch of big technology problems, [like] defining augmented and virtual reality in this overall metaverse vision. I think that that’s going to be a problem that is going to require tens of billions of dollars of research, but should unlock hundreds of billions of dollars of value or more. I think that there are things like self-driving cars, which seems like it’s turning out to be pretty close to AI-complete; needing to almost solve a lot of different aspects of AI to really fully solve that. So that’s just a massive problem in terms of investment. And some of the aspects around space exploration. Disease research is still one that our government does a lot in.
But I do wonder, especially when we look at China, for example, which does invest a lot directly in these spaces, how that is kind of setting this up to go over time. But look, in the absence of that, yeah, I do think having public spaces is a healthy part of communities. And you’re going to have creators and developers with all different motivations, even on the mobile internet and internet today, you have a lot of people who are interested in doing public-good work. Even if they’re not directly funded by the government to do that. And I think that certainly, you’re going to have a lot of that here as well.
But yeah, I do think that there is this long-term question where, as a society, we should want a very large amount of capital and our most talented technical people working on these futuristic problems, to lead and innovate in these spaces. And I think that there probably is a little bit more of a balance of space, where some of this could come from government, but I think startups and the open-source community and the creator economy is going to fill in a huge amount of this as well.

https://www.theverge.com/22588022/mark-zuckerberg-facebook-ceo-metaverse-interview

I think he’s saying that the truth lies somewhere in the middle, and god knows I’m sympathetic to that argument. But who decides where in the middle? Who determines the breadth of this spectrum, governments or businesses? With what objective, over what time horizon, and with what opportunity costs?


At the moment, and that as a consequence of having written all of this out, this is where I find myself:

China is optimizing for power, and is willing to give up on innovation in the consumer internet space. America is optimizing for innovation in the consumer internet space, and is willing to cede power to big tech in terms of shaping up what society looks like in the near future.

Have I framed this correctly? If yes, what are the potential ramifications in China, the US and the rest of the world? What ought to be the follow-up questions? Why? Who else should I be following and reading to learn more about these issues?

I don’t have the answers to these questions, and would appreciate the help.

Thank you!