Mark Zuckerberg – Meta’s AGI Plan

Mark, thanks for coming on the podcast again. Yeah, happy to do it. Good to see you. You too. Last time you were here, you had launched Llama 3. Now you’ve launched Llama 4. Well, the first version. That’s right. What’s new? What’s exciting? What’s changed? The whole field is so dynamic. I feel like a ton has changed since the last time we talked. Meta AI has almost a billion people using it monthly now, which is pretty wild. I think this is going to be a really big year for all of this, especially once

you get the personalization loop going, which we’re just starting to build in now really, from both the context that all the algorithms have about what you’re interested in — feed, your profile information, your social graph information — but also what you’re interacting with the AI about. That’s going to be the next thing that’s super exciting. I’m really big on that. The modeling stuff continues to make really impressive advances too. I’m pretty happy with the first set of Llama 4 releases. We announced four models and released the

first two — the Scout and Maverick ones — which are mid-size to small models. The most popular Llama 3 model was the 8 billion parameter one. So we’ve got one of those coming in the Llama 4 series too. Our internal code name for it is “Little Llama.” That’s coming probably over the next few months. Scout and Maverick are good. They have some of the highest intelligence per cost you can get of any model out there. They’re natively multimodal, very efficient, run on one host. They’re designed to be very efficient and low latency, for a lot

of the use cases we’re building for internally. That’s our whole thing. We build what we want, and then we open-source it so other people can use it too. I’m excited about that. I’m also excited about the Behemoth model, which is coming up. It’s going to be our first model that’s sort of at the frontier — more than 2 trillion parameters. As the name says, it’s quite big. We’re trying to figure out how to make that useful for people. It’s so big that we’ve had to build a bunch of infrastructure just to be able to post-train it ourselves.

Now we’re trying to wrap our heads around, how does the average developer out there actually use something like this? How do we make it useful — maybe by distilling it into models that are a reasonable size to run? Because you’re obviously not going to want to run something like that in a consumer model. As you saw with the Llama 3 stuff last year, the initial launch was exciting and then we just built on that over the year. 3.1 released the 405 billion model, 3.2 is when we got all the multimodal stuff in. We basically have a roadmap

like that for this year too. So a lot going on. I’m interested to hear more about it. There’s this impression that the gap between the best closed-source and the best open-source models has increased over the last year. I know the full family of Llama 4 models isn’t out yet, but Llama 4 Maverick is at #35 on Chatbot Arena. On a bunch of major benchmarks, it seems like o4-mini or Gemini 2.5 Flash are beating Maverick, which is in the same class. What do you make of that impression? There are a few things. First, I actually

think this has been a very good year for open source overall. If you go back to where we were last year, Llama was the only real, super-innovative open-source model. Now you have a bunch of them in the field. In general, the prediction that this would be the year open source generally overtakes closed source as the most used models out there, I think that’s generally on track to be true. One interesting surprise — positive in some ways, negative in others, but overall good — is that it’s not just Llama. There are a lot of good

ones out there. I think that’s quite good. Then there’s the reasoning phenomenon, which you’re alluding to talking about o3, o4, and other models. There’s a specialization happening. If you want a model that’s the best at math problems, coding, or different things like those tasks, then reasoning models that consume more test-time or inference-time compute in order to provide more intelligence are a really compelling paradigm. And we’re building a Llama 4 reasoning model too. It’ll come out at some point. But for a lot of the applications we care about,

latency and good intelligence per cost are much more important product attributes. If you’re primarily designing for a consumer product, people don’t want to wait half a minute to get an answer. If you can give them a generally good answer in half a second, that’s a great tradeoff. I think both of these are going to end up being important directions. I’m optimistic about integrating reasoning models with the core language models over time. That’s the direction Google has gone in with some of the more recent Gemini models. I think that’s really promising.

But I think there’s just going to be a bunch of different stuff that goes on. You also mentioned the whole Chatbot Arena thing, which I think is interesting and points to the challenge around how you do benchmarking. How do you know what models are good for which things? One of the things we’ve generally tried to do over the last year is anchor more of our models in our Meta AI product north star use cases. The issue with open source benchmarks, and any given thing like the LM Arena stuff, is that they’re often skewed toward a very specific

set of uses cases, which are often not actually what any normal person does in your product.

The portfolio of things they’re trying to measure is often different from what people care about in any given product. Because of that, we’ve found that trying to optimize too much for that kind of stuff has led us astray. It’s actually not led towards the highest quality product, the most usage, and best feedback within Meta AI as people use our stuff. So we’re trying to anchor our north star on the product value that people report to us, what they say that they want, and what their revealed preferences are, and using

the experiences that we have. Sometimes these benchmarks just don’t quite line up. I think a lot of them are quite easily gameable. On the Arena you’ll see stuff like Sonnet 3.7, which is a great model, and it’s not near the top. It was relatively easy for our team to tune a version of Llama 4 Maverick that could be way at the top. But the version we released, the pure model, actually has no tuning for that at all, so it’s further down. So you just need to be careful with some of these benchmarks. We’re going to index primarily on the products.

Do you feel like there is some benchmark which captures what you see as a north star of value to the user which can be be objectively measured between different models and where you’d say, “I need Llama 4 to come out on top on this”? Our benchmark is basically user value in Meta AI. But you can’t compare that to other models. We might be able to, because we might be able to run other models and be able to tell. That’s one of the advantages of open source. You have a good community of folks who can poke holes in your stuff and point out, “Okay, where is

your model not good, and where is it good?” The reality at this point is that all these models are optimized for slightly different mixes of things. Everyone is trying to go towards the same end in that all the leading labs are trying to create general intelligence, superintelligence, whatever you call it. AI that can lead toward a world of abundance where everyone has these superhuman tools to create whatever they want. That leads to dramatically empowering people and creating all these economic benefits.

However you define it, that’s what a lot of the labs are going for. But there’s no doubt that different folks have optimized toward different things. I think the Anthropic folks have really focused on coding and agents around that. The OpenAI folks, I think, have gone a little more toward reasoning recently. There’s a space which, if I had to guess, I think will end up being the most used one: quick, very natural to interact with, natively multimodal, fitting throughout your day in the ways you want to interact with it.

I think you got a chance to play around with the new Meta AI app that we’re releasing. One of the fun things we put in there is the demo for the full-duplex voice. It’s early. There’s a reason why we haven’t made that the default voice model in the app yet. But there’s something about how naturally conversational it is that’s really fun and compelling. Being able to mix that in with the right personalization is going to lead toward a product experience where… If you fast-forward a few years, I think we’re just going to be

talking to AI throughout the day about different things we’re wondering about. You’ll have your phone. You’ll talk to it while browsing your feed apps. It’ll give you context about different stuff. It’ll answer your questions. It’ll help you as you’re interacting with people in messaging apps. Eventually, I think we’ll walk through our daily lives and have glasses or other kinds of AI devices and just seamlessly interact with it all day long.

That’s the north star. Whatever the benchmarks are that lead toward people feeling like the quality is where they want to interact with it, that’s what will ultimately matter the most to us. I got a chance to play around with both Orion and also the Meta AI app, and the voice mode was super smooth. It was quite impressive. On the point of what the different labs are optimizing for — to steelman their view — I think a lot of them believe that once you fully automate software engineering and AI research, then you can kick off an intelligence explosion. You would have

millions of copies of these software engineers replicating the research that happened between Llama 1 and Llama 4 — that scale of improvement again — but in a matter of weeks or months rather than years. So it really matters to just close the loop on the software engineer, and then you can be the first to ASI. What do you make of that? I personally think that’s pretty compelling. That’s why we have a big coding effort too. We’re working on a number of coding agents inside Meta. Because we’re not really an enterprise software company,

we’re primarily building it for ourselves. Again, we go for a specific goal. We’re not trying to build a general developer tool. We’re trying to build a coding agent and an AI research agent that

advances Llama research specifically. And it’s fully plugged into our toolchain and all that. That’s important and is going to end up being an important part of how this stuff gets done. I would guess that sometime in the next 12 to 18 months, we’ll reach the point where most of the code that’s going toward these efforts is written by AI. And I don’t mean autocomplete. Today you have good autocomplete. You start writing something and it can complete a section of code. I’m talking more like: you give it a goal, it can run tests, it can improve things,

it can find issues, it writes higher quality code than the average very good person on the team already. I think that’s going to be a really important part of this for sure. But I don’t know if that’s the whole game. That’s going to be a big industry, and it’s going to be an important part of how AI gets developed. But I think there are still… One way to think about it is that this is a massive space. I don’t think there’s just going to be one company with one optimization function that serves everyone as best as possible. There are going to be a bunch

of different labs doing leading work in different domains. Some will be more enterprise-focused or coding-focused. Some will be more productivity-focused. Some will be more social or entertainment-focused. Within the assistant space, there will be some that are more informational and productivity-focused, and some that are more companion-focused. It’s going to be a lot of stuff that’s just fun and entertaining and shows up in your feed. There’s just a huge amount of space. Part

of what’s fun about going toward this AGI future is that there are a bunch of common threads for what needs to get invented, but also a lot of things that still need to be created. I think you’re going to start seeing more specialization between different groups, if I had to guess. It’s really interesting to me that you basically agree with the premise that there will be an intelligence explosion and we’ll get something like superintelligence on the other end. Tell me if I’m misunderstanding you. If that’s the case, why even bother with

personal assistants and whatever else? Why not just get to superhuman intelligence first and then deal with everything else later? I think that’s just one aspect of the flywheel. Part of what I generally disagree with on the fast-takeoff view is that it takes time to build out physical infrastructure. If you want to build a gigawatt cluster of compute, that just takes time. NVIDIA needs time to stabilize their new generation of systems. Then you need to figure out the networking around it. Then you need to build the building. You

need to get permitting. You need to get the energy. Maybe that means gas turbines or green energy, either way, there’s a whole supply chain of that stuff. We talked about this a bunch the last time I was on the podcast with you. I think some of these are just physical-world, human-time things. As you start getting more intelligence in one part of the stack, you’re just going to run into a different set of bottlenecks. That’s how engineering always works: solve one bottleneck, you get another bottleneck.

Another bottleneck in the system or ingredient that’s going to make this work well, is people getting used to learning and having a feedback loop with using the system. These systems don’t

just show up fully formed with people magically knowing how to use them. There’s a co-evolution that happens where people are learning how to best use these AI assistants. At the same time, the AI assistants are learning what people care about. Developers are making the AI assistants better. You’re building up a base of context too. You wake up a year or two into it and the assistant can reference things you talked about two years ago and that’s pretty cool. You couldn’t do that even if you launched the perfect thing on day one. There’s no way it

could reference what you talked about two years ago if it didn’t exist two years ago. So I guess my view is that there’s this huge intelligence growth. There’s a very rapid curve on the uptake of people interacting with the AI assistants, and the learning feedback and data flywheel around that. And then there is also the buildout of the supply chains and infrastructure and regulatory frameworks to enable the scaling of a lot of the physical infrastructure. At some level, all of those are going to be necessary, not just the coding piece.

One specific example of this that I think is interesting. Even if you go back a few years ago, we had a project, I think it was on our ads team, to automate ranking experiments. That’s a pretty constrained environment. It’s not open-ended code. It’s basically, look at the whole history of the company — every experiment that any engineer has ever done in the ad system — and look at what worked, what didn’t, and what the results of those were. Then basically formulate new hypotheses for different tests that we should run that could improve the performance of the ad system.

What we basically found was that we were bottlenecked on compute to run tests, based on the number of hypotheses. It turns out, even with just the humans we have right now on the ads team, we already have more good ideas to test than we actually have either compute or, really, cohorts of people to test them with. Even if you have three and a half billion people using your products, you still want each test to be statistically significant. It needs to have hundreds of thousands or millions of people. There’s only so much throughput you can get on

testing through that. So we’re already at the point, even with just the people we have, that we can’t really test everything that we want. Now just being able to test more things is not necessarily going to be additive to that. We need to get to the point where the average quality of the hypotheses that the AI is generating is better than all the things above the line that we’re actually able to test that the best humans on the team have been able to do, before it will even be marginally useful for it. We’ll get there I think pretty quickly. But

it’s not just, “Okay, cool, the thing can write code, and now all of a sudden everything is just improving massively.” There are real-world constraints that need to be overcome. Then you need to have the compute and the people to test. Then over time, as the quality creeps up, are we here in five or 10 years where no set of people can generate a hypothesis as good as the AI system? I don’t know, maybe. In that world, obviously that’s going to be how all the value is created. But that’s not the first step.

So if you buy this view, that this is where intelligence is headed, the reason to be bullish on Meta is obviously that you have all this distribution. You can also use that to learn more things that can be useful for training. You mentioned the Meta AI app now has a billion active users. Not the app. The app is a standalone thing that we’re just launching now. It’ll be fun for people who want to use it. It’s a cool experience. We can talk about that too because we’re experimenting with some new ideas in there

that I think are novel and worth talking through. But I’m mostly talking about our apps. Meta AI is actually most used in WhatsApp. WhatsApp is mostly used outside of the U.S. We just passed like a hundred million people in the US, but it’s not the primary messaging system in the US, iMessage is. So people in the U.S. probably tend to underestimate Meta AI usage somewhat. But part of the reason the standalone app is going to be so important is because the US, for a lot of reasons, is one of the most important countries. And the fact that WhatsApp is the main

way people are using Meta AI and that’s not the main messaging system in the US means we need another way to build a first-class experience that’s really in front of people. And I guess, to finish the question, the bearish case would be that if the future of AI is less about just answering your questions and more about being a virtual coworker, then it’s not clear how Meta AI inside of WhatsApp gives you the relevant training data to make a fully autonomous programmer or remote worker. In that case, does it not matter that much who

has more distribution right now with LLMs? Again, I just think there are going to be different things. Imagine you were sitting at the beginning of the development of the internet and you asked, “What’s going to be the main internet thing? Is it going to be knowledge work or massive consumer apps?” You got both. You don’t have to choose one. The world is big and complicated. Does one company build all of that stuff? Normally the answer is no. But to your question, people do not code in WhatsApp for the most part. And I don’t foresee

that people starting to write code in WhatsApp is going to be a major use case. Although I do think people are going to ask AI to do a lot of things that result in the AI coding without them necessarily knowing it. That’s a separate thing. We do have a lot of people who are writing code at Meta and they use Meta AI. We have this internal thing called MetaMate, and a number of different coding and AI research agents that we’re building around that. That has its own feedback loop and I think it can get quite good for accelerating those efforts.

But again, there are going to be a lot of things. AI is almost certainly going to unlock a massive revolution in knowledge work and code. I also think it’s going to be the next generation of search and how people get information, and do more complex information tasks. I also think it’s going to be fun. People are going to use it to be entertained. A lot of the internet today is memes and humor. We have this amazing technology at our fingertips. It’s amazing and funny when you think about how much of human energy just goes toward entertaining ourselves,

designing, pushing culture forward, and finding humorous ways to explain cultural phenomena that we observe. I think that’s almost certainly going to be the case in the future. Look at the evolution of things like Instagram and Facebook. If you go back 10, 15, 20 years ago, it was text. Then we all got phones with cameras, and most of the content became photos. Then the mobile networks got good enough that if you wanted to watch a video on your phone, it wasn’t just buffering the whole time. So that got good. Over the last 10 years, most of the content has

moved toward video at this point. Today, most of the time spent on Facebook and Instagram is on video. But do you think in five years we’re just going to be sitting in our feed and consuming media that’s just video? No, it’s going to be interactive. You’ll be scrolling through your feed. There will be content that maybe looks like a Reel to start. But you can talk to it, or interact with it, and it talks back, or it changes what it’s doing. Or you can jump into it like a game and interact with it. That’s all going to be AI.

My point is that there are going to be all these different things. We’re ambitious, so we’re working on a bunch of them. But I don’t think any one company is going to do all of it. On this point about AI-generated content and AI interactions, already people have meaningful relationships with AI therapists, AI friends, maybe more. This is just going to get more intense as these AIs become more unique, more personable, more intelligent, more spontaneous, more funny, and so forth. People are going to have relationships with

AI. How do we make sure these are healthy relationships? There are a lot of questions that you only can really answer as you start seeing the behaviors. Probably the most important upfront thing is just to ask that question and care about it at each step along the way. But I also think being too prescriptive upfront and saying, “We think these things are not good” often cuts off value. People use stuff that’s valuable for them. One of my core guiding principles in designing products is that people are smart. They know

what’s valuable in their lives. Every once in a while, something bad happens in a product and you want to make sure you design your product well to minimize that. But if you think something someone is doing is bad and they think it’s really valuable, most of the time in my experience, they’re right and you’re wrong. You just haven’t come up with the framework yet for understanding why the thing they’re doing is valuable and helpful in their life. That’s the main way I think about it. I do think people are going to use AI for a

lot of these social tasks. Already, one of the main things we see people using Meta AI for is talking through difficult conversations they need to have with people in their lives. “I’m having this issue with my girlfriend. Help me have this conversation.” Or, “I need to have a hard conversation with my boss at work. How do I have that conversation?” That’s pretty helpful. As the personalization loop kicks in and the AI starts to get to know you better and better, that will just be really compelling. Here’s one stat from working on social media

for a long time that I always think is crazy. The average American has fewer than three friends, fewer than three people they would consider friends. And the average person has demand for meaningfully more. I think it’s something like 15 friends or something. At some point you’re like, “All right, I’m just too busy, I can’t deal with more people.” But the average person wants more connection than they have. There’s a lot of concern people raise like, “Is this going to replace real-world, in-person connections?” And my default

is that the answer to that is probably not. There are all these things that are better about physical connections when you can have them. But the reality is that people just don’t have as much connection as they want. They feel more alone a lot of the time than they would like. So I think a lot of these things — things that today might have a little bit of stigma around

them — over time, we’ll find the vocabulary as a society to articulate why they are valuable, why the people who are doing them are rational for doing it, and how it is actually adding value to their lives. But also the field is very early. There are a handful of companies doing virtual therapists, virtual girlfriend-type stuff. But it’s very early. The embodiment in those things is still pretty weak. You open it up and it’s just an image of the therapist or the person you’re talking to. Sometimes there’s some very rough animation, but it’s not an embodiment.

You’ve seen the stuff we’re working on in Reality Labs, where you have the Codec Avatars and it actually feels like a real person. That’s where it’s going. You’ll be able to have an always-on video chat with the AI. The gestures are important too. More than half of communication, when you’re actually having a conversation, is not the words you speak. It’s all the nonverbal stuff. I did get a chance to check out Orion the other day, and I thought it was super impressive. I’m mostly optimistic about the technology. Generally, like you mentioned, I’m pretty libertarian

about this. If people are doing something, they probably think it’s good for them. Although, I actually don’t know if it’s the case that if somebody is using TikTok, they would say that they’re happy with how much time they’re spending on TikTok or something. I’m mostly optimistic about it in the sense that if we’re going to be living in this future world of AGI, we need to be upgrading our capabilities too, with tools like this. And just generally, there can be more beauty in the world if you can see Studio Ghibli everywhere or something.

I was worried about one of the flagship use cases that your team showed me. I’m sitting at the breakfast table and on the periphery of my vision is just a bunch of Reels that are scrolling by. Maybe in the future, my AI girlfriend is on the other side of the screen or something. So I am worried that we’re just removing all the friction between getting totally reward-hacked by our technology. How do we make sure this is not what ends up happening in five years? Again, I think people have a good sense of what they want. That experience you saw was just a

demo to show multitasking and holograms. I agree, I don’t think the future is one where you have stuff that’s trying to compete for your attention in the corner of your vision all the time. I don’t think people would like that too much. As we’re designing these glasses, it’s actually one of the things that we’re really mindful of. Probably the number one thing the glasses need to do is get out of the way and be good glasses. As an aside, I think that’s part of the reason why the Ray-Ban Meta product has done so well. It’s great for listening to music,

taking phone calls, taking photos and videos. The AI is there when you want it. But when you don’t, it’s just a good-looking pair of glasses that people like. It gets out of the way well. I would guess that’s going to be a very important design principle for the augmented reality future. The main thing that I see here is this. It’s kind of crazy that, for how important the digital world is in all of our lives, the only way we access it is through these physical, digital screens. You have your phone, your computer. You can put a big TV on your wall. It’s this huge physical thing.

It just seems like we’re at the point with technology where the physical and digital world should really be fully blended. That’s what holographic overlays allow you to do. But I agree. I think a big part of the design principles around that will be around how you’ll be interacting with people. You’ll be able to bring digital artifacts into those interactions and do cool things very seamlessly. If I want to show you something, here’s a screen. We can interact with it. It can be 3D. We can play with it. You want to

play a card game? All right, here’s a deck of cards. We can play with it. If two of us are physically together and we have a third friend who’s hologramming in, they can participate too. But in that world too — just as you don’t want your physical space to be cluttered because it wears on you psychologically — I don’t think people are going to want their digital-physical space to feel that way either. That’s more of an aesthetic norm that will have to get worked out, but I think we’ll figure that out. Going back to the AI conversation,

you were mentioning how big of a bottleneck the physical infrastructure can be. Related to other open-source models, like DeepSeek and so forth, DeepSeek right now has less compute than a lab like Meta and you could argue that it’s competitive with the Llama models. If China is better at physical infrastructure, industrial scale-ups, getting more power and more data centers online, how worried are you that they might beat us here? It’s a real competition. You’re seeing industrial policies really play out. China is bringing online

more power. Because of that, the US really needs to focus on streamlining the ability to build data centers and produce energy. Otherwise, I think we’ll be at a significant disadvantage.

At the same time, some of the export controls on things like chips, I think you can see how they’re clearly working in a way. There was all the conversation with DeepSeek about, “Oh, they did all these very impressive low-level optimizations.” And the reality is, they did and that is impressive. But then you ask, “Why did they have to do that, when none of the American labs did it?” It’s because they’re using partially nerfed chips that are the only ones NVIDIA is allowed to sell in China because of the export

controls. DeepSeek basically had to spend a bunch of their calories and time doing low-level infrastructure optimizations that the American labs didn’t have to do. Now, they produced a good result on text. DeepSeek is text-only. The infrastructure is impressive. The text result is impressive. But every new major model that comes out now is multimodal. It’s image, it’s voice. Theirs isn’t. Now the question is, why is that the case? I don’t think it’s because they’re not capable of doing it. It’s because they

had to spend their calories on doing these infrastructure optimizations to overcome the fact that there were these export controls. But when you compare Llama 4 with DeepSeek —I mean our reasoning model isn’t out yet, so the R1 comparison isn’t clear yet— but we’re basically in the same ballpark on all the text stuff that DeepSeek is doing but with a smaller model. So

the cost-per-intelligence is lower with what we’re doing for Llama on text. On the multimodal side we’re effectively leading at and it just doesn’t exist in their models. So the Llama 4 models, when you compare them to what DeepSeek is doing, are good. I think people will generally prefer to use the Llama 4 models. But there’s this interesting contour where it’s clearly a good team doing stuff over there. And you’re right to ask about the accessibility of power, the accessibility of compute and chips, because the work that you’re

seeing different labs do and the way it’s playing out is somewhat downstream of that.

So Sam Altman recently tweeted that OpenAI is going to release an open-source SOTA reasoning model. I think part of the tweet was that they won’t do anything silly, like say you can only use it if you have less than 700 million users. DeepSeek has the MIT license, whereas I think a couple of the contingencies in the Llama license require you to say “built with Llama” on applications using it or any model that you train using Llama has to begin with the word “Llama.” What do you think about the license? Should it be less onerous for developers?

Look, we basically pioneered the open-source LLM thing. So I don’t consider the license to be onerous. When we were starting to push on open source, there was this big debate in the industry. Is this even a reasonable thing to do? Can you do something that is safe and trustworthy with open source? Will open source ever be able to be competitive enough that anyone will even care? Basically, when we were answering those questions a lot of the hard work was done by the teams at Meta. There were other folks in the industry but really,

the Llama models were the ones that broke open this whole open-source AI thing in a huge way.

If we’re going to put all this energy into it, then at a minimum, if you’re going to have these large cloud companies — like Microsoft and Amazon and Google — turn around and sell our model, then we should at least be able to have a conversation with them before they do that around

what kind of business arrangement we should have. Our goal with the license, we’re generally not trying to stop people from using the model. We just think that if you’re one of those companies, or if you’re Apple, just come talk to us about what you want to do. Let’s find a productive way to do it together. I think that’s generally been fine. Now, if the whole open-source part of the industry evolves in a direction where there are a lot of other great options and the license ends up being a reason why people don’t want to

use Llama, then we’ll have to reevaluate the strategy. What it makes sense to do at that point. But I don’t think we’re there. That’s not, in practice, something we’ve seen, companies coming to us and saying, “We don’t want to use this because your license says if you reach 700 million people, you have to come talk to us.” So far, that’s been more something we’ve heard from open-source purists like, “Is this as clean of an open-source model as you’d like it to be?” That debate has existed since the beginning of open source. All the GPL license stuff

versus other things, do you need to make it so that anything that touches open source has to be open source too? Or can people take it and use it in different ways? I’m sure there will continue to be debates around this. But if you’re spending many billions of dollars training these models, I think asking the other companies — the huge ones that are similar in size and can easily afford to have a relationship with us — to talk to us before they use it seems like a pretty reasonable thing. If it turns out that other models are also

really good. There’s a bunch of good open-source models. So that part of your mission is fulfilled, and maybe other models are better at coding. Is there a world where you just say, “Look, the open-source ecosystem is healthy. There’s plenty of competition. We’re happy to just use some other model, whether it’s for internal software engineering at Meta or deploying to our apps. We don’t necessarily need to build with Llama”? Again, we do a lot of things. Let’s take a step back. The reason why we’re building our own big models is because we want to be able to

build exactly what we want. None of the other models in the world are exactly what we want. If they’re open source, you can take them and fine-tune them in different ways. But you still have to deal with the model architectures. And they make different size tradeoffs that affect latency and inference cost. At the scale that we operate at, that stuff really matters. We made the Llama Scout and Maverick models certain sizes for a specific reason. They fit on a host and we wanted certain latency — especially for the voice models that we’re

working on — that we want to pervade everything we’re doing from the glasses to all of our apps to the Meta AI app and all that stuff. There’s a level of control of your own destiny that you only get when you build the stuff yourself. That said, AI is going to be used in every single thing that every company does. When we build a big model, we also have to choose which internal use cases we’re going to optimize for. So does that mean for certain things we might say, “Okay, maybe Claude is better for building this specific development tool that this team is

using”? All right, cool then use that. Great. We don’t want to fight with one hand tied behind our back. We’re doing a lot of different stuff. You also asked, would it not be important anymore because other people are doing open source? On this, I’m a little more worried. You have to ask yourself this. For anyone who shows up now and is doing open source — now that we have done it — would they still be doing open source if we weren’t doing it? I think there are a handful of folks who see the trend that more and more development is

going toward open source, and they’re like, “Oh crap, we need to be on this train or else we’re going to lose.” If you have a closed-model API and increasingly a lot of developers don’t want that. So you’re seeing a bunch of other players start to do some work in open source. But it’s unclear if it’s dabbling, or fundamental for them the way that it has been for us. A good example is what’s going on with Android. Android started off as the open-source thing. There’s not really any open-source alternative. Over time, Android has just gotten more and more closed.

So if you’re us, you need to worry that if we stop pushing the industry in this direction, all these other people… Maybe they’re only really doing it because they’re trying to compete with us and the direction we’re pushing things. They already showed their revealed preference for what they would do if open source didn’t exist. And it wasn’t open source. We just need to be careful about relying on that continued behavior for the future of the technology that we’re going to build at the company. Another thing I’ve heard you mention is that

it’s important that the standard gets built around American models like Llama. I wanted to understand your logic there. With certain kinds of networks, it is the case that the Apple App Store just has a big contingency around what it’s built around. But it doesn’t seem like if you built some sort of scaffold for DeepSeek, you couldn’t have easily just switched it over to Llama 4, especially since between generations. Llama 3 wasn’t MoE and Llama 4 is. So things are changing between generations of models as well. What’s the reason for thinking things will get

built out in this contingent way on a specific standard? I’m not sure, what do you mean by contingent? As in, it’s important that people are building for Llama rather than for LLMs in general, because that will determine what the standard is in the future. Look, I think these models encode values and ways of thinking about the world. We had this interesting experience early on, where we took an early version of Llama and translated it. I think it was French, or some other language.

The feedback we got from French people was, “This sounds like an American who learned to speak French. It doesn’t sound like a French person.” And we were like, “what do you mean, does it not speak French well?” No, it speaks French fine. It was just that the way it thought about the world seemed slightly American. So I think there are these subtle things that get built into the models. Over time, as models get more sophisticated, they should be able to embody different value sets across the world. So maybe that’s

not a particularly sophisticated example, but I think it illustrates the point.

Some of the stuff we’ve seen in testing some of the models, especially coming out of China, have certain values encoded in them. And it’s not just a light fine-tune to change that. Now,

language models — or something that has a kind of world model embedded in it — have more values. Reasoning, I guess, you could say has values too. But one of the nice things about reasoning models is they’re trained on verifiable problems. Do you need to be worried about cultural bias if your model is doing math? Probably not. I think the chance that some reasoning model built elsewhere is going to incept you by solving a math problem in a devious way seems low. But there’s a whole different set of issues around coding, which is the other verifiable

domain. You need to worry about waking up one day and if you’re using a model that has some tie to another government, can it embed vulnerabilities in code that their intelligence organizations could exploit later? In some future version you’re using a model that came from another country and it’s securing your systems. Then you wake up and everything is just vulnerable in a way that that country knows about and you don’t. Or it turns on a vulnerability at some point. Those are real issues. I’m very interested in studying this because I think one of the main

things that’s interesting about open source is the ability to distill models. For most people, the primary value isn’t just taking a model off the shelf and saying, “Okay, Meta built this version of Llama. I’m going to take it and I’m going to run it exactly in my application.” No, your application isn’t doing anything different if you’re just running our thing. You’re at least going to fine-tune it, or try to distill it into a different model. When we get to stuff like the Behemoth model, the whole value is being able to take this very high amount of

intelligence and distill it down into a smaller model that you’re actually going to want to run. This is the beauty of distillation. It’s one of the things that I think has really emerged as a very powerful technique over the last year, since the last time we sat down. I think it’s worked better than most people would have predicted. You can basically take a model that’s much bigger, and capture probably 90 or 95% of its intelligence, and run it in something that’s 10% of the size. Now, do you get 100% of the intelligence? No. But 95% of the intelligence at 10% of

the cost is pretty good for a lot of things. The other thing that’s interesting is that now, with this more varied open-source community, it’s not just Llama. You have other models too. You have the ability to distill from multiple sources. So now you can basically say, “Okay, Llama’s really good at this. Maybe its architecture is really good because it’s fundamentally multimodal, more inference-friendly, more efficient. But let’s say this other model is better at coding.” Okay, great. You can distill from both of them and build something that’s better than either

individually, for your own use case. That’s cool. But you do need to solve the security problem of knowing that you can distill it in a way that’s safe and secure. This is something that we’ve been researching and have put a lot of time into. What we’ve basically found is that anything that’s language is quite fraught. There’s just a lot of values embedded into it. Unless you don’t care about taking on the values from whatever model you’re distilling from, you probably don’t want to just distill a straight language world model. On reasoning, though, you can get a lot of the way

there by limiting it to verifiable domains, and running code cleanliness and security filters. Whether it’s using Llama Guard open source, or the Code Shield open source tools that we’ve done, things that allow you to incorporate different input into your models and make sure that both the input and the output are secure. Then it’s just a lot of red teaming. It’s

having experts who are looking at the model and asking, “Alright, is this model doing anything after distillation that we don’t want?” I think with the combination of those techniques, you can probably distill on the reasoning side for verifiable domains quite securely. That’s something I’m pretty confident about and something we’ve done a lot of research around. But I think this is a very big question. How do you do good distillation? Because there’s so much value to be unlocked. But at the same time, I do think there is some fundamental

bias embedded in different models. Speaking of value to be unlocked, what do you think the right way to monetize AI will be? Obviously digital ads are quite lucrative. But as a fraction of total GDP, it’s small compared to all remote work. Even if you can increase productivity without replacing work, that’s still worth tens of trillions of dollars. Is it possible that ads might not be it? How do you think about this? Like we were talking about before, there’s going to be all these different applications,

and different applications tend toward different things. Ads are great when you want to offer people a free service. Because it’s free, you need to cover it somehow. Ads solve this problem where a person does not need to pay for something. They can get something that is amazing for free. Also by the way, with modern ad systems, a lot of the time people think the ads add value to the thing if you do it well.

You need to be good at ranking and you need to have enough liquidity of advertising inventory. If you only have five advertisers in the system, no matter how good you are at ranking, you may not be able to show something to someone that they’re interested in. But if you have a million advertisers in the system, then you’re probably going to be able to find something pretty compelling, if you’re good at picking out the different needles in the haystack that that person is going to be interested in. So that definitely has its place. But there are

also clearly going to be other business models as well, including ones that just have higher costs so it doesn’t even make sense to offer them for free. By the way, there have always been business models like this. There’s a reason why social media is free and ad-supported, but then if you want to watch Netflix or ESPN or something, you need to pay for that. The content that’s going into that, they need to produce it, and that’s very expensive for them to produce. They probably could not have enough ads in the service in order

to make up for the cost of producing the content. Basically, you just need to pay to access it. The trade-off is fewer people do it. Instead of billions, you’re talking about hundreds of millions of people using those services. There’s a value switch there. I think it’s similar here. Not everyone is going to want a software engineer, or a thousand software engineering agents, or whatever it is. But if you do, that’s something you’re probably going to be willing to pay thousands, or tens of thousands, or hundreds of thousands of dollars for.

That just speaks to the diversity of different things that need to get created. There are going to be business models at each point along the spectrum. At Meta,

for the consumer piece we definitely want to have a free thing. I’m sure that will end up being ad-supported. But I also think we’re going to want to have a business model that supports people using arbitrary amounts of compute to do even more amazing things than what it would make sense to offer in the free service. For that, I’m sure we’ll end up having a premium service. But I think our basic values on this are that we want to serve as many people in the world as possible.

How do you keep track of all these different projects, some of which we’ve talked about today. I’m sure there are many I don’t even know about. As the CEO overseeing everything, there’s a big spectrum between going to the Llama team and saying, “Here are the hyperparameters you should use,” versus just giving a mandate like, “Go make the AI better.” And there are so many different projects. How do you think about the way in which you can best deliver your value-add and oversee all these things?

A lot of what I spend my time on is trying to get awesome people onto the teams. There’s that, and then there’s stuff that cuts across teams. You build Meta AI, and you want to get it into WhatsApp or Instagram. Okay, now I need to get those teams to talk together. Then there are a bunch of questions like, “do you want the thread for Meta AI in WhatsApp to feel like other WhatsApp threads, or do you want it to feel like other AI chat experiences?” There are different idioms for those. So there are all these interesting questions that need

to get answered around how does this stuff basically fit into everything we’re doing? Then there’s a whole other part of what we’re doing, which is pushing on the infrastructure. If you want to stand up a gigawatt cluster, first of all, that has a lot of implications for the way we’re doing infrastructure buildouts. It has political implications for how you engage with the different states where you’re building that stuff. It has financial implications for the company in terms of: “All right, there’s a lot of economic uncertainty in the world. Do we

double down on infrastructure right now? If so, what other trade-offs do we want to make around the company?” Those are the kinds of decisions that are tough for other people to really make.

Then there’s this question around taste and quality. When is something good enough that we want to ship it? In general, I’m the steward of that for the company. Although we have a lot of other people who I think have good taste as well and are also filters for different things.

Those are basically the areas. AI is interesting because, more than some of the other stuff that we do, it is more research and model-led than really product-led. You can’t just design the product that you want and then try to build the model to fit into it. You really need to design the model first and the capabilities that you want, and then you get some emergent properties. Then it’s, “Oh, you can build some different stuff because this turned out in a certain way.” At the end of the day, people want to use the best model. That’s partially why, when we’re talking about

building the most personal AI, the best voice, the best personalization — and also a very smart experience with very low latency — those are the things that we need to design the whole system to build. That’s why we’re working on full-duplex voice. That’s why we’re working on personalization to both have good memory extraction from your interactions with AI, but also to be able to plug into all the other Meta systems. That’s why we design the specific models that we design, to have the kind of size and latency parameters that they do.

Speaking of politics, there’s been this perception that some tech leaders have been aligning with Trump. You and others donated to his inaugural event and were on stage with him and I think you settled a lawsuit that resulted in them getting $25 million. I wonder what’s going on here? Does it feel like the cost of doing business with an administration? What’s the best way to think about this? My view on this is that he’s the President of the United States. Our default, as an American company, should be to try to have a productive

relationship with whoever is running the government. We’ve tried to offer support to previous administrations as well. I’ve been pretty public with some of my frustrations with the previous administration, how they basically did not engage with us or the business community more broadly. Frankly, that’s going to be necessary to make progress on some of these things. We’re not going to be able to build the level of energy that we need if you don’t have a dialogue, and if they’re not prioritizing trying to do those things.

A lot of people want to write this story about what direction people are going. We’re trying to build great stuff, and we want to have a productive relationship with people. That’s how I see it. It is also how I would guess most others see it, but obviously, I can’t speak for them. You’ve spoken out about how you’ve rethought some of the ways in which you engage and defer to the government, in terms of moderation stuff in the past. How are you thinking about AI governance? Because if AI is as powerful as we think it might be,

the government will want to get involved. What is the most productive approach to take there, and what should the government be thinking about? I guess in the past, most of the comments that I made were in the context of content moderation. It’s been an interesting journey over the last 10 years on this. It’s obviously been an interesting time in history. There have been novel questions raised about online content moderation. Some of those have led to productive new systems getting built, like our AI systems to detect nation-states trying to interfere in each other’s

elections. I think we will continue building that stuff out, and that has been net positive. With some other stuff, we went down some bad paths. I just think the fact-checking thing was not as effective as Community Notes because it’s not an internet-scale solution. There weren’t enough fact-checkers, and people didn’t trust the specific fact-checkers. You want a more robust system. So I think what we got with Community Notes is the right one on that. But my point on this was more that historically, I probably deferred a little too much to

either the media and their critiques, or to the government, on things that they did not really have authority over. But just as like a central figure, I think we tried to build systems where maybe we wouldn’t have to make all of the content moderation decisions ourselves or something. I guess part of the growth process over the last 10 years is realizing, “Okay, we’re a meaningful company. We need to own the decisions that we need to make. We should listen to feedback from people, but we shouldn’t defer too much to people who do not actually have authority over this. Because

at the end of the day, we’re in the seat, and we need to own the decisions that we make.”

It’s been a maturation process, and in some ways painful, but I think we’re probably a better company for it. Will tariffs increase the cost of building data centers in the US and shift buildouts to Europe and Asia? It is really hard to know how that plays out. I think we’re probably in the early innings on that, and it’s very hard to know. What is your single highest-leverage hour in a week? What are you doing in that hour? I don’t know. Every week is a little bit

different. It’s probably got to be the case that the most leveraged thing you do in a week is not the same thing each week. Or else, by definition, you should probably spend more than one hour doing that thing every week. I don’t know. Part of the fun of this job, and also of the industry being so dynamic, is that things really move around. The world is very different now than it was at the beginning of the year, or even six months ago, or in the middle of last year. I think a lot has advanced meaningfully. A lot of cards have been turned

over since the last time that we sat down. I think that was about a year ago, right? Yeah. I guess what you were saying earlier that recruiting people is a super high-leverage thing you do. It’s very high-leverage, yeah.

You talked about these models being mid-level software engineers by the end of the year. What would be possible if, say, software productivity increased like 100x in two years? What kinds of things could be built that can’t be built right now? What kinds of things? That’s an interesting question. One theme of this conversation is that the amount of creativity that’s going to be unlocked is going to be massive.

If you look at the overall arc of human society and the economy over 100 or 150 years, it’s basically people going from being primarily agrarian — with most human energy going toward just feeding ourselves — to that becoming a smaller and smaller percent. And the things that take care of our basic physical needs have become a smaller and smaller percent of human energy. That shift has led to two impacts: one is that more people are doing creative and cultural pursuits. The second is that more people, in general, spend less time working and more time on

entertainment and culture. I think that is almost certainly going to continue as this goes on. This isn’t the 1-2 year thing of what happens when you have a super powerful software engineer. But over time, if everyone has these superhuman tools to create a ton of different stuff, you’re going to get incredible diversity. Part of it is going to be solving hard problems: solving diseases, advancing science, developing new technology that makes our lives better. But I would guess that a lot of it is going to end up being cultural and social pursuits

and entertainment. I would guess the world is going to get a lot funnier, weirder, and quirkier, the way that memes on the internet have gotten over the last 10 years. I think that adds a certain richness and depth. In funny ways, it actually helps you connect better with people. Now all day long, I just find interesting stuff on the internet and send it in group chats to the people I care about, who I think are going to find it funny. The media that people can produce today to express very nuanced,

specific cultural ideas is really cool. That’ll continue to get built out. It does advance society in a bunch of ways, even if it’s not the “hard science” way of curing a disease.

If you think about it, the Meta social media view of the world is that yeah, people are going to spend a lot more time doing that stuff in the future. It’s going to be a lot better, and it’s going to help you connect, because it’ll help express different ideas. The world is going to get more complicated, but our technology, our cultural technology, to express these very complicated things — in a very kind of funny little clip or whatever — is going to get so much better. I think that’s all great. I don’t know about next year. One other thought

that I think is interesting to cover is that I tend to think that, for at least the foreseeable future, this is going to lead to more demand for people doing work, not less. Now, people have a choice of how much time they want to spend working.

I’ll give you one interesting example we were talking about recently. We have almost three and a half billion people using our services every day. One question we’ve struggled with forever is how do we provide customer support? Today, you can write an email, but we’ve never seriously been able to contemplate having voice support where someone can just call in. I guess that’s maybe one of the artifacts of having a free service. The revenue per person isn’t high enough to have an economic model where people can call in.

But also, with three and a half billion people using your service every day, the number of calls would be massive. It’d be like the biggest call center in the world. It would be like $10 or $20 billion a year to staff that. So we’ve never thought too seriously about it, because it always seemed like there was no way that could make sense. But now, as AI gets better, you’re going to get to a place where AI can handle a bunch of people’s issues. Not all of them — maybe 10 years from now it can handle all of them — but thinking about a 3-5 year

time horizon, it will be able to handle a bunch. It’s kind of like a self-driving car. They can handle a bunch of terrain, but they’re not doing the whole route by themselves yet in most cases. People thought truck-driving jobs were going to go away, but there’s actually more truck-driving jobs now than when we first started talking about self-driving cars 20 years ago.

Going back to the customer support thing, it wouldn’t make sense to staff out calling for everyone. But let’s say AI can handle 90% of that. Then if it can’t, it kicks it off to a person. If you get the cost of providing that service down to one-tenth of what it would’ve otherwise been, then maybe now it actually makes sense to do it. That would be cool. So the net result is that I actually think we’re probably going to hire more customer support people. The common belief is that AI will automate jobs away. But that hasn’t really been how

the history of technology has worked. Usually, you create things that take away 90% of the work, and that leads you to want more people, not less. To close off the interview… I’ve been playing devil’s advocate on a bunch of points, and I really appreciate you being a good sport about it. But I do think there’s no upper bound to how much beauty there can be in the world, especially if there are billions of AIs optimizing the amount of beauty you can see, the amount of connection you can have, and so forth. I’m pretty optimistic about it.

Final question: Who is the one person in the world today who you most seek out for advice? Oh, man. I feel like part of my style is that I like having a breadth of advisors. It’s not just one person. We’ve got a great team. There are people at the company, people on our board. There are a lot of people in the industry who are doing new stuff. There’s not a single person. But it’s fun. Also, when the world is dynamic, just having a reason to work with people you like on cool stuff… To me, that’s what life is about.

Great note to close on. Thanks for doing this. Yeah, thank you.

Conclusion

In this article, we have covered important information about Technology. We hope this information proves valuable to you.

Mark Zuckerberg – Meta’s AGI Plan

ByAI Auditor

Conclusion

By AI Auditor

Related Post

Wimbledon Women’s Semifinals Interrupted by Health Incidents

Baloch Militants Launch Major Offensive in Balochistan

Etihad Airways Expands Its Reach with New Flights to the US

Leave a Reply Cancel reply

You missed

Wimbledon Women’s Semifinals Interrupted by Health Incidents

Baloch Militants Launch Major Offensive in Balochistan

Kerala’s Messi Dream Deferred as Argentina Skips India

Etihad Airways Expands Its Reach with New Flights to the US