Fri. Jun 27th, 2025

Mark, thanks for coming on the podcast again. Yeah, happy to do it. Good to see you. You too. Last time you were here, you had  launched Llama 3. Now you’ve launched Llama 4. Well, the first version. That’s right. What’s new? What’s exciting? What’s changed? The whole field is so dynamic. I feel like a ton has changed since the last time we talked. Meta AI  has almost a billion people using it monthly now, which is pretty wild. I think this is going to be  a really big year for all of this, especially once

you get the personalization loop going, which  we’re just starting to build in now really, from both the context that all the algorithms  have about what you’re interested in — feed, your profile information, your social graph information  — but also what you’re interacting with the AI about. That’s going to be the next thing  that’s super exciting. I’m really big on that. The modeling stuff continues to make really  impressive advances too. I’m pretty happy with the first set of Llama 4 releases.  We announced four models and released the

first two — the Scout and Maverick ones  — which are mid-size to small models. The most popular Llama 3 model was the 8  billion parameter one. So we’ve got one of those coming in the Llama 4 series too. Our  internal code name for it is “Little Llama.” That’s coming probably over the next few months. Scout and Maverick are good. They have some of the highest intelligence per cost you can get of  any model out there. They’re natively multimodal, very efficient, run on one host. They’re designed  to be very efficient and low latency, for a lot

of the use cases we’re building for internally.  That’s our whole thing. We build what we want, and then we open-source it so other people  can use it too. I’m excited about that. I’m also excited about the Behemoth model, which  is coming up. It’s going to be our first model that’s sort of at the frontier — more than  2 trillion parameters. As the name says, it’s quite big. We’re trying to figure out how  to make that useful for people. It’s so big that we’ve had to build a bunch of infrastructure  just to be able to post-train it ourselves.

Now we’re trying to wrap our heads around, how  does the average developer out there actually use something like this? How do we make it  useful — maybe by distilling it into models that are a reasonable size to run? Because  you’re obviously not going to want to run something like that in a consumer model. As you saw with the Llama 3 stuff last year, the initial launch was exciting and then we  just built on that over the year. 3.1 released the 405 billion model, 3.2 is when we got all the  multimodal stuff in. We basically have a roadmap

like that for this year too. So a lot going on. I’m interested to hear more about it. There’s this impression that the gap between the best  closed-source and the best open-source models has increased over the last year. I know the  full family of Llama 4 models isn’t out yet, but Llama 4 Maverick is at #35 on Chatbot  Arena. On a bunch of major benchmarks, it seems like o4-mini or Gemini 2.5 Flash  are beating Maverick, which is in the same class. What do you make of that impression? There are a few things. First, I actually

think this has been a very good year for  open source overall. If you go back to where we were last year, Llama was the only  real, super-innovative open-source model. Now you have a bunch of them in the field. In general, the prediction that this would be the year open source generally overtakes  closed source as the most used models out there, I think that’s generally on track to be true. One interesting surprise — positive in some ways, negative in others, but overall good — is that  it’s not just Llama. There are a lot of good

ones out there. I think that’s quite good. Then there’s the reasoning phenomenon, which you’re alluding to talking about o3, o4, and  other models. There’s a specialization happening. If you want a model that’s the best at  math problems, coding, or different things like those tasks, then reasoning models that  consume more test-time or inference-time compute in order to provide more intelligence are a really  compelling paradigm. And we’re building a Llama 4 reasoning model too. It’ll come out at some point. But for a lot of the applications we care about,

latency and good intelligence per cost are much  more important product attributes. If you’re primarily designing for a consumer product, people  don’t want to wait half a minute to get an answer. If you can give them a generally good answer  in half a second, that’s a great tradeoff. I think both of these are going to end up  being important directions. I’m optimistic about integrating reasoning models with the core  language models over time. That’s the direction Google has gone in with some of the more recent  Gemini models. I think that’s really promising.

But I think there’s just going to be a  bunch of different stuff that goes on. You also mentioned the whole Chatbot Arena thing,  which I think is interesting and points to the challenge around how you do benchmarking. How do  you know what models are good for which things? One of the things we’ve generally tried to do  over the last year is anchor more of our models in our Meta AI product north star use cases.  The issue with open source benchmarks, and any given thing like the LM Arena stuff, is  that they’re often skewed toward a very specific

set of uses cases, which are often not actually   what any normal person does in your product.

The portfolio of things they’re trying  to measure is often different from what people care about in any given product. Because of that, we’ve found that trying to optimize too much for that kind of stuff has  led us astray. It’s actually not led towards the highest quality product, the most usage, and best  feedback within Meta AI as people use our stuff. So we’re trying to anchor our north star on  the product value that people report to us, what they say that they want, and what  their revealed preferences are, and using

the experiences that we have. Sometimes  these benchmarks just don’t quite line up. I think a lot of them are quite easily gameable. On the Arena you’ll see stuff like Sonnet 3.7, which is a great model, and it’s not near the  top. It was relatively easy for our team to tune a version of Llama 4 Maverick that could  be way at the top. But the version we released, the pure model, actually has no tuning for that  at all, so it’s further down. So you just need to be careful with some of these benchmarks.  We’re going to index primarily on the products.

Do you feel like there is some benchmark which  captures what you see as a north star of value to the user which can be be objectively measured  between different models and where you’d say, “I need Llama 4 to come out on top on this”? Our benchmark is basically user value in Meta AI. But you can’t compare that to other models. We might be able to, because we might be able to run other models and be able to tell. That’s  one of the advantages of open source. You have a good community of folks who can poke holes  in your stuff and point out, “Okay, where is

your model not good, and where is it good?” The reality at this point is that all these models are optimized for slightly different  mixes of things. Everyone is trying to go towards the same end in that all the leading  labs are trying to create general intelligence, superintelligence, whatever you call it. AI  that can lead toward a world of abundance where everyone has these superhuman tools  to create whatever they want. That leads to dramatically empowering people and  creating all these economic benefits.

However you define it, that’s what  a lot of the labs are going for. But there’s no doubt that different folks have  optimized toward different things. I think the Anthropic folks have really focused on coding and  agents around that. The OpenAI folks, I think, have gone a little more toward reasoning recently. There’s a space which, if I had to guess, I think will end up being the most used  one: quick, very natural to interact with, natively multimodal, fitting throughout your  day in the ways you want to interact with it.

I think you got a chance to play around with  the new Meta AI app that we’re releasing. One of the fun things we put in there is the  demo for the full-duplex voice. It’s early. There’s a reason why we haven’t made that the  default voice model in the app yet. But there’s something about how naturally conversational  it is that’s really fun and compelling. Being able to mix that in with the right  personalization is going to lead toward a product experience where… If you fast-forward  a few years, I think we’re just going to be

talking to AI throughout the day about  different things we’re wondering about. You’ll have your phone. You’ll talk to it  while browsing your feed apps. It’ll give you context about different stuff. It’ll answer your  questions. It’ll help you as you’re interacting with people in messaging apps. Eventually, I  think we’ll walk through our daily lives and have glasses or other kinds of AI devices and  just seamlessly interact with it all day long.

That’s the north star. Whatever the benchmarks  are that lead toward people feeling like the quality is where they want to interact with it,  that’s what will ultimately matter the most to us. I got a chance to play around with both Orion  and also the Meta AI app, and the voice mode was super smooth. It was quite impressive. On the point of what the different labs are optimizing for — to steelman their view — I think  a lot of them believe that once you fully automate software engineering and AI research, then you can  kick off an intelligence explosion. You would have

millions of copies of these software engineers  replicating the research that happened between Llama 1 and Llama 4 — that scale of improvement  again — but in a matter of weeks or months rather than years. So it really matters to just close the  loop on the software engineer, and then you can be the first to ASI. What do you make of that? I personally think that’s pretty compelling. That’s why we have a big coding  effort too. We’re working on a number of coding agents inside Meta. Because  we’re not really an enterprise software company,

we’re primarily building it for ourselves. Again, we go for a specific goal. We’re not trying to build a general developer tool. We’re trying to  build a coding agent and an AI research agent that

advances Llama research specifically. And it’s  fully plugged into our toolchain and all that. That’s important and is going to end up being  an important part of how this stuff gets done. I would guess that sometime in the next 12 to  18 months, we’ll reach the point where most of the code that’s going toward these efforts is  written by AI. And I don’t mean autocomplete. Today you have good autocomplete. You start  writing something and it can complete a section of code. I’m talking more like: you give it a  goal, it can run tests, it can improve things,

it can find issues, it writes higher quality  code than the average very good person on the team already. I think that’s going to be  a really important part of this for sure. But I don’t know if that’s the whole game. That’s  going to be a big industry, and it’s going to be an important part of how AI gets developed. But I  think there are still… One way to think about it is that this is a massive space. I don’t think  there’s just going to be one company with one optimization function that serves everyone as  best as possible. There are going to be a bunch

of different labs doing leading work in different  domains. Some will be more enterprise-focused or coding-focused. Some will be more  productivity-focused. Some will be more social or entertainment-focused. Within the assistant space, there will be some that are more informational and  productivity-focused, and some that are more companion-focused. It’s going to  be a lot of stuff that’s just fun and entertaining and shows up in your feed. There’s just a huge amount of space. Part

of what’s fun about going toward this AGI future  is that there are a bunch of common threads for what needs to get invented, but also a lot of  things that still need to be created. I think you’re going to start seeing more specialization  between different groups, if I had to guess. It’s really interesting to me that you  basically agree with the premise that there will be an intelligence explosion and  we’ll get something like superintelligence on the other end. Tell me if I’m misunderstanding  you. If that’s the case, why even bother with

personal assistants and whatever else? Why not  just get to superhuman intelligence first and then deal with everything else later? I think that’s just one aspect of the flywheel. Part of what I generally disagree  with on the fast-takeoff view is that it takes time to build out physical infrastructure. If you want to build a gigawatt cluster of compute, that just takes time. NVIDIA needs time  to stabilize their new generation of systems. Then you need to figure out the networking around  it. Then you need to build the building. You

need to get permitting. You need to get the  energy. Maybe that means gas turbines or green energy, either way, there’s  a whole supply chain of that stuff. We talked about this a bunch the last time  I was on the podcast with you. I think some of these are just physical-world, human-time  things. As you start getting more intelligence in one part of the stack, you’re just going  to run into a different set of bottlenecks. That’s how engineering always works: solve  one bottleneck, you get another bottleneck.

Another bottleneck in the system or ingredient  that’s going to make this work well, is people getting used to learning and having a feedback  loop with using the system. These systems don’t

just show up fully formed with people magically  knowing how to use them. There’s a co-evolution that happens where people are learning how to best  use these AI assistants. At the same time, the AI assistants are learning what people care about.  Developers are making the AI assistants better. You’re building up a base of context too.  You wake up a year or two into it and the assistant can reference things you talked  about two years ago and that’s pretty cool. You couldn’t do that even if you launched the  perfect thing on day one. There’s no way it

could reference what you talked about two  years ago if it didn’t exist two years ago. So I guess my view is that there’s this huge  intelligence growth. There’s a very rapid curve on the uptake of people interacting with the  AI assistants, and the learning feedback and data flywheel around that. And then there is  also the buildout of the supply chains and infrastructure and regulatory frameworks to  enable the scaling of a lot of the physical infrastructure. At some level, all of those are  going to be necessary, not just the coding piece.

One specific example of this that I think is  interesting. Even if you go back a few years ago, we had a project, I think it was on our ads team,  to automate ranking experiments. That’s a pretty constrained environment. It’s not open-ended code.  It’s basically, look at the whole history of the company — every experiment that any engineer has  ever done in the ad system — and look at what worked, what didn’t, and what the results of those  were. Then basically formulate new hypotheses for different tests that we should run that could  improve the performance of the ad system.

What we basically found was that we were  bottlenecked on compute to run tests, based on the number of hypotheses. It turns  out, even with just the humans we have right now on the ads team, we already have more good ideas  to test than we actually have either compute or, really, cohorts of people to test them with. Even if you have three and a half billion people using your products, you still want each test to  be statistically significant. It needs to have hundreds of thousands or millions of people.  There’s only so much throughput you can get on

testing through that. So we’re already at the  point, even with just the people we have, that we can’t really test everything that we want. Now just being able to test more things is not necessarily going to be additive to that. We need  to get to the point where the average quality of the hypotheses that the AI is generating  is better than all the things above the line that we’re actually able to test that the  best humans on the team have been able to do, before it will even be marginally useful for it. We’ll get there I think pretty quickly. But

it’s not just, “Okay, cool, the thing  can write code, and now all of a sudden everything is just improving massively.” There are  real-world constraints that need to be overcome. Then you need to have the compute and  the people to test. Then over time, as the quality creeps up, are we here in  five or 10 years where no set of people can generate a hypothesis as good as the AI  system? I don’t know, maybe. In that world, obviously that’s going to be how all the value  is created. But that’s not the first step.

So if you buy this view, that this  is where intelligence is headed, the reason to be bullish on Meta is obviously  that you have all this distribution. You can also use that to learn more things that can  be useful for training. You mentioned the Meta AI app now has a billion active users. Not the app. The app is a standalone thing that we’re just launching now. It’ll be fun  for people who want to use it. It’s a cool experience. We can talk about that too because  we’re experimenting with some new ideas in there

that I think are novel and worth talking through. But I’m mostly talking about our apps. Meta AI is actually most used in WhatsApp. WhatsApp is  mostly used outside of the U.S. We just passed like a hundred million people in the US, but  it’s not the primary messaging system in the US, iMessage is. So people in the U.S. probably  tend to underestimate Meta AI usage somewhat. But part of the reason the standalone app is  going to be so important is because the US, for a lot of reasons, is one of the most important  countries. And the fact that WhatsApp is the main

way people are using Meta AI and that’s not  the main messaging system in the US means we need another way to build a first-class  experience that’s really in front of people. And I guess, to finish the question, the bearish  case would be that if the future of AI is less about just answering your questions and more  about being a virtual coworker, then it’s not clear how Meta AI inside of WhatsApp gives  you the relevant training data to make a fully autonomous programmer or remote worker.  In that case, does it not matter that much who

has more distribution right now with LLMs? Again, I just think there are going to be different things. Imagine you were sitting  at the beginning of the development of the internet and you asked, “What’s going to  be the main internet thing? Is it going to be knowledge work or massive consumer apps?” You got both. You don’t have to choose one. The world is big and complicated. Does one company  build all of that stuff? Normally the answer is no. But to your question, people do not code in  WhatsApp for the most part. And I don’t foresee

that people starting to write code in WhatsApp  is going to be a major use case. Although I do think people are going to ask AI to do a lot of  things that result in the AI coding without them necessarily knowing it. That’s a separate thing. We do have a lot of people who are writing code at Meta and they use Meta AI. We have  this internal thing called MetaMate, and a number of different coding and AI research  agents that we’re building around that. That has its own feedback loop and I think it can get  quite good for accelerating those efforts.

But again, there are going to be a lot of things.  AI is almost certainly going to unlock a massive revolution in knowledge work and code. I also  think it’s going to be the next generation of search and how people get information,  and do more complex information tasks. I also think it’s going to be fun. People are  going to use it to be entertained. A lot of the internet today is memes and humor. We have this  amazing technology at our fingertips. It’s amazing and funny when you think about how much of human  energy just goes toward entertaining ourselves,

designing, pushing culture forward, and  finding humorous ways to explain cultural phenomena that we observe. I think that’s almost  certainly going to be the case in the future. Look at the evolution of things like Instagram  and Facebook. If you go back 10, 15, 20 years ago, it was text. Then we all got phones with cameras,  and most of the content became photos. Then the mobile networks got good enough that if you  wanted to watch a video on your phone, it wasn’t just buffering the whole time. So that got good. Over the last 10 years, most of the content has

moved toward video at this point. Today, most  of the time spent on Facebook and Instagram is on video. But do you think in five years we’re  just going to be sitting in our feed and consuming media that’s just video? No, it’s going to be  interactive. You’ll be scrolling through your feed. There will be content that maybe looks  like a Reel to start. But you can talk to it, or interact with it, and it talks back,  or it changes what it’s doing. Or you can jump into it like a game and interact  with it. That’s all going to be AI.

My point is that there are going to be all  these different things. We’re ambitious, so we’re working on a bunch of them. But I don’t  think any one company is going to do all of it. On this point about AI-generated content  and AI interactions, already people have meaningful relationships with AI therapists,  AI friends, maybe more. This is just going to get more intense as these AIs become more  unique, more personable, more intelligent, more spontaneous, more funny, and so forth. People are going to have relationships with

AI. How do we make sure these  are healthy relationships? There are a lot of questions that you only can  really answer as you start seeing the behaviors. Probably the most important upfront thing is  just to ask that question and care about it at each step along the way. But I also think being  too prescriptive upfront and saying, “We think these things are not good” often cuts off value. People use stuff that’s valuable for them. One of my core guiding principles in designing  products is that people are smart. They know

what’s valuable in their lives. Every  once in a while, something bad happens in a product and you want to make sure you  design your product well to minimize that. But if you think something someone is doing  is bad and they think it’s really valuable, most of the time in my experience, they’re right  and you’re wrong. You just haven’t come up with the framework yet for understanding why the thing  they’re doing is valuable and helpful in their life. That’s the main way I think about it. I do think people are going to use AI for a

lot of these social tasks. Already, one of the  main things we see people using Meta AI for is talking through difficult conversations they  need to have with people in their lives. “I’m having this issue with my girlfriend. Help me  have this conversation.” Or, “I need to have a hard conversation with my boss at work. How do I  have that conversation?” That’s pretty helpful. As the personalization loop kicks in and the  AI starts to get to know you better and better, that will just be really compelling. Here’s one stat from working on social media

for a long time that I always think is crazy. The  average American has fewer than three friends, fewer than three people they would consider  friends. And the average person has demand for meaningfully more. I think it’s something like 15  friends or something. At some point you’re like, “All right, I’m just too busy,  I can’t deal with more people.” But the average person wants more connection  than they have. There’s a lot of concern people raise like, “Is this going to replace real-world, in-person connections?” And my default

is that the answer to that is probably not.  There are all these things that are better about physical connections when you can have  them. But the reality is that people just don’t have as much connection as they want. They feel  more alone a lot of the time than they would like. So I think a lot of these things — things that  today might have a little bit of stigma around

them — over time, we’ll find the vocabulary as  a society to articulate why they are valuable, why the people who are doing them are rational for  doing it, and how it is actually adding value to their lives. But also the field is very early.  There are a handful of companies doing virtual therapists, virtual girlfriend-type stuff. But  it’s very early. The embodiment in those things is still pretty weak. You open it up and it’s  just an image of the therapist or the person you’re talking to. Sometimes there’s some very  rough animation, but it’s not an embodiment.

You’ve seen the stuff we’re working on in Reality  Labs, where you have the Codec Avatars and it actually feels like a real person. That’s  where it’s going. You’ll be able to have an always-on video chat with the AI. The gestures are  important too. More than half of communication, when you’re actually having a conversation, is not  the words you speak. It’s all the nonverbal stuff. I did get a chance to check out Orion the other  day, and I thought it was super impressive. I’m mostly optimistic about the technology. Generally,  like you mentioned, I’m pretty libertarian

about this. If people are doing something, they  probably think it’s good for them. Although, I actually don’t know if it’s the  case that if somebody is using TikTok, they would say that they’re happy with how much  time they’re spending on TikTok or something. I’m mostly optimistic about it in the sense that  if we’re going to be living in this future world of AGI, we need to be upgrading our capabilities  too, with tools like this. And just generally, there can be more beauty in the world if you  can see Studio Ghibli everywhere or something.

I was worried about one of the flagship use  cases that your team showed me. I’m sitting at the breakfast table and on the periphery of my  vision is just a bunch of Reels that are scrolling by. Maybe in the future, my AI girlfriend is on  the other side of the screen or something. So I am worried that we’re just removing all the  friction between getting totally reward-hacked by our technology. How do we make sure this  is not what ends up happening in five years? Again, I think people have a good sense of what  they want. That experience you saw was just a

demo to show multitasking and holograms. I agree,  I don’t think the future is one where you have stuff that’s trying to compete for your attention  in the corner of your vision all the time. I don’t think people would like that too much. As we’re designing these glasses, it’s actually one of the things that we’re really  mindful of. Probably the number one thing the glasses need to do is get out of the way and be  good glasses. As an aside, I think that’s part of the reason why the Ray-Ban Meta product has  done so well. It’s great for listening to music,

taking phone calls, taking photos and videos. The  AI is there when you want it. But when you don’t, it’s just a good-looking pair of glasses that  people like. It gets out of the way well. I would guess that’s going to be a very important  design principle for the augmented reality future. The main thing that I see here is this. It’s kind  of crazy that, for how important the digital world is in all of our lives, the only way we access it  is through these physical, digital screens. You have your phone, your computer. You can put a big  TV on your wall. It’s this huge physical thing.

It just seems like we’re at the point with  technology where the physical and digital world should really be fully blended. That’s  what holographic overlays allow you to do. But I agree. I think a big part of the design  principles around that will be around how you’ll be interacting with people. You’ll be able  to bring digital artifacts into those interactions and do cool things very seamlessly. If I want to show you something, here’s a screen. We can interact with it. It  can be 3D. We can play with it. You want to

play a card game? All right, here’s a deck of  cards. We can play with it. If two of us are physically together and we have a third friend  who’s hologramming in, they can participate too. But in that world too — just as you don’t want  your physical space to be cluttered because it wears on you psychologically — I don’t think  people are going to want their digital-physical space to feel that way either.  That’s more of an  aesthetic norm that will have to get worked out, but I think we’ll figure that out. Going back to the AI conversation,

you were mentioning how big of a bottleneck  the physical infrastructure can be. Related to other open-source models, like DeepSeek and  so forth, DeepSeek right now has less compute than a lab like Meta and you could argue  that it’s competitive with the Llama models. If China is better at physical  infrastructure, industrial scale-ups, getting more power and more data centers online,  how worried are you that they might beat us here? It’s a real competition. You’re seeing industrial  policies really play out. China is bringing online

more power. Because of that, the US really needs  to focus on streamlining the ability to build data centers and produce energy. Otherwise, I  think we’ll be at a significant disadvantage.

At the same time, some of the export controls  on things like chips, I think you can see how they’re clearly working in a way. There was  all the conversation with DeepSeek about, “Oh, they did all these very impressive  low-level optimizations.” And the reality is, they did and that is impressive. But then you ask, “Why did they have to do that, when none of the American labs did it?” It’s  because they’re using partially nerfed chips that are the only ones NVIDIA is allowed  to sell in China because of the export

controls. DeepSeek basically had to spend  a bunch of their calories and time doing low-level infrastructure optimizations  that the American labs didn’t have to do. Now, they produced a good result on text. DeepSeek  is text-only. The infrastructure is impressive. The text result is impressive. But every new  major model that comes out now is multimodal. It’s image, it’s voice. Theirs isn’t. Now the question is, why is that the case? I don’t think it’s because they’re  not capable of doing it. It’s because they

had to spend their calories on doing these  infrastructure optimizations to overcome the fact that there were these export controls. But when you compare Llama 4 with DeepSeek —I mean our reasoning model isn’t out yet, so the R1  comparison isn’t clear yet— but we’re basically in the same ballpark on all the text stuff that  DeepSeek is doing but with a smaller model. So

the cost-per-intelligence is lower with  what we’re doing for Llama on text. On the multimodal side we’re effectively leading  at and it just doesn’t exist in their models. So the Llama 4 models, when you compare them  to what DeepSeek is doing, are good. I think people will generally prefer to use the Llama  4 models. But there’s this interesting contour where it’s clearly a good team doing stuff  over there. And you’re right to ask about the accessibility of power, the accessibility of  compute and chips, because the work that you’re

seeing different labs do and the way it’s  playing out is somewhat downstream of that.

So Sam Altman recently tweeted that OpenAI is  going to release an open-source SOTA reasoning model. I think part of the tweet was that they  won’t do anything silly, like say you can only use it if you have less than 700 million users. DeepSeek has the MIT license, whereas I think a couple of the contingencies in the Llama  license require you to say “built with Llama” on applications using it or any model that you  train using Llama has to begin with the word “Llama.” What do you think about the license?  Should it be less onerous for developers?

Look, we basically pioneered the open-source  LLM thing. So I don’t consider the license to be onerous. When we were starting to push on open  source, there was this big debate in the industry. Is this even a reasonable thing to do? Can you  do something that is safe and trustworthy with open source? Will open source ever be able to be  competitive enough that anyone will even care? Basically, when we were answering those  questions a lot of the hard work was done by the teams at Meta. There were  other folks in the industry but really,

the Llama models were the ones that broke open  this whole open-source AI thing in a huge way.

If we’re going to put all this energy into it,  then at a minimum, if you’re going to have these large cloud companies — like Microsoft and Amazon  and Google — turn around and sell our model, then we should at least be able to have a  conversation with them before they do that around

what kind of business arrangement we should have. Our goal with the license, we’re generally not trying to stop people from using the model. We  just think that if you’re one of those companies, or if you’re Apple, just come talk to  us about what you want to do. Let’s find a productive way to do it together.  I think that’s generally been fine. Now, if the whole open-source part of the  industry evolves in a direction where there are a lot of other great options and the license  ends up being a reason why people don’t want to

use Llama, then we’ll have to reevaluate  the strategy. What it makes sense to do at that point. But I don’t think we’re there. That’s not, in practice, something we’ve seen, companies coming to us and saying, “We don’t want  to use this because your license says if you reach 700 million people, you have to come talk to us.”  So far, that’s been more something we’ve heard from open-source purists like, “Is this as clean  of an open-source model as you’d like it to be?” That debate has existed since the beginning  of open source. All the GPL license stuff

versus other things, do you need to make it  so that anything that touches open source has to be open source too? Or can people take  it and use it in different ways? I’m sure there will continue to be debates around this. But if you’re spending many billions of dollars training these models, I think asking the other  companies — the huge ones that are similar in size and can easily afford to have a relationship  with us — to talk to us before they use it seems like a pretty reasonable thing. If it turns out that other models are also

really good. There’s a bunch of good open-source  models. So that part of your mission is fulfilled, and maybe other models are better at coding. Is there a world where you just say, “Look, the open-source ecosystem is healthy. There’s plenty  of competition. We’re happy to just use some other model, whether it’s for internal software  engineering at Meta or deploying to our apps. We don’t necessarily need to build with Llama”? Again, we do a lot of things. Let’s take a step back. The reason why we’re building our own  big models is because we want to be able to

build exactly what we want. None of the other  models in the world are exactly what we want. If they’re open source, you can take them and  fine-tune them in different ways. But you still have to deal with the model architectures.  And they make different size tradeoffs that affect latency and inference cost. At the scale  that we operate at, that stuff really matters. We made the Llama Scout and Maverick models  certain sizes for a specific reason. They fit on a host and we wanted certain latency  — especially for the voice models that we’re

working on — that we want to pervade everything  we’re doing from the glasses to all of our apps to the Meta AI app and all that stuff. There’s a level of control of your own destiny that you only get when you build the stuff  yourself. That said, AI is going to be used in every single thing that every company does. When  we build a big model, we also have to choose which internal use cases we’re going to optimize for. So does that mean for certain things we might say, “Okay, maybe Claude is better for building this  specific development tool that this team is

using”? All right, cool then use that. Great. We  don’t want to fight with one hand tied behind our back. We’re doing a lot of different stuff. You also asked, would it not be important anymore because other people are doing  open source? On this, I’m a little more worried. You have to ask yourself this. For anyone who  shows up now and is doing open source — now that we have done it — would they still be  doing open source if we weren’t doing it? I think there are a handful of folks who see  the trend that more and more development is

going toward open source, and they’re like, “Oh  crap, we need to be on this train or else we’re going to lose.” If you have a closed-model API and  increasingly a lot of developers don’t want that. So you’re seeing a bunch of other players start  to do some work in open source. But it’s unclear if it’s dabbling, or fundamental for them the  way that it has been for us. A good example is what’s going on with Android. Android started  off as the open-source thing. There’s not really any open-source alternative. Over time,  Android has just gotten more and more closed.

So if you’re us, you need to worry that if we  stop pushing the industry in this direction, all these other people… Maybe they’re only really  doing it because they’re trying to compete with us and the direction we’re pushing things. They  already showed their revealed preference for what they would do if open source didn’t exist.  And it wasn’t open source. We just need to be careful about relying on that continued  behavior for the future of the technology that we’re going to build at the company. Another thing I’ve heard you mention is that

it’s important that the standard gets built around  American models like Llama. I wanted to understand your logic there. With certain kinds of networks,  it is the case that the Apple App Store just has a big contingency around what it’s built around. But it doesn’t seem like if you built some sort of scaffold for DeepSeek, you couldn’t have  easily just switched it over to Llama 4, especially since between generations. Llama  3 wasn’t MoE and Llama 4 is. So things are changing between generations of models as well. What’s the reason for thinking things will get

built out in this contingent  way on a specific standard? I’m not sure, what do you mean by contingent? As in, it’s important that people are building for Llama rather than for LLMs in  general, because that will determine what the standard is in the future. Look, I think these models encode values and ways of thinking about the world. We had this interesting experience early on, where we took an early version of Llama and translated  it. I think it was French, or some other language.

The feedback we got from French people  was, “This sounds like an American who learned to speak French. It doesn’t sound  like a French person.” And we were like, “what do you mean, does it not speak French  well?” No, it speaks French fine. It was just that the way it thought about the world seemed  slightly American. So I think there are these subtle things that get built into the models. Over time, as models get more sophisticated, they should be able to embody different  value sets across the world. So maybe that’s

not a particularly sophisticated example,  but I think it illustrates the point.

Some of the stuff we’ve seen in testing some  of the models, especially coming out of China, have certain values encoded in them. And it’s  not just a light fine-tune to change that. Now,

language models — or something that has a kind  of world model embedded in it — have more values. Reasoning, I guess, you could say has values too.  But one of the nice things about reasoning models is they’re trained on verifiable problems. Do you  need to be worried about cultural bias if your model is doing math? Probably not. I think  the chance that some reasoning model built elsewhere is going to incept you by solving  a math problem in a devious way seems low. But there’s a whole different set of issues  around coding, which is the other verifiable

domain. You need to worry about waking up one day  and if you’re using a model that has some tie to another government, can it embed vulnerabilities  in code that their intelligence organizations could exploit later? In some future version you’re  using a model that came from another country and it’s securing your systems. Then you wake  up and everything is just vulnerable in a way that that country knows about and you don’t.  Or it turns on a vulnerability at some point. Those are real issues. I’m very interested in  studying this because I think one of the main

things that’s interesting about open source is  the ability to distill models. For most people, the primary value isn’t just taking a model  off the shelf and saying, “Okay, Meta built this version of Llama. I’m going to take it and  I’m going to run it exactly in my application.” No, your application isn’t doing anything  different if you’re just running our thing. You’re at least going to fine-tune it, or try to  distill it into a different model. When we get to stuff like the Behemoth model, the whole value  is being able to take this very high amount of

intelligence and distill it down into a smaller  model that you’re actually going to want to run. This is the beauty of distillation. It’s one of  the things that I think has really emerged as a very powerful technique over the last year, since  the last time we sat down. I think it’s worked better than most people would have predicted. You  can basically take a model that’s much bigger, and capture probably 90 or 95% of its intelligence,  and run it in something that’s 10% of the size. Now, do you get 100% of the intelligence?  No. But 95% of the intelligence at 10% of

the cost is pretty good for a lot of things. The other thing that’s interesting is that now, with this more varied open-source community, it’s  not just Llama. You have other models too. You have the ability to distill from multiple sources.  So now you can basically say, “Okay, Llama’s really good at this. Maybe its architecture is  really good because it’s fundamentally multimodal, more inference-friendly, more efficient. But let’s  say this other model is better at coding.” Okay, great. You can distill from both of them and  build something that’s better than either

individually, for your own use case. That’s cool. But you do need to solve the security problem of knowing that you can distill it in a way that’s  safe and secure. This is something that we’ve been researching and have put a lot of time into.  What we’ve basically found is that anything that’s language is quite fraught. There’s just a lot of  values embedded into it. Unless you don’t care about taking on the values from whatever model  you’re distilling from, you probably don’t want to just distill a straight language world model. On reasoning, though, you can get a lot of the way

there by limiting it to verifiable domains, and  running code cleanliness and security filters. Whether it’s using Llama Guard open source,  or the Code Shield open source tools that we’ve done, things that allow you to incorporate  different input into your models and make sure that both the input and the output are secure. Then it’s just a lot of red teaming. It’s

having experts who are looking at the model  and asking, “Alright, is this model doing anything after distillation that we don’t want?”  I think with the combination of those techniques, you can probably distill on the reasoning  side for verifiable domains quite securely. That’s something I’m pretty confident about and  something we’ve done a lot of research around. But I think this is a very big question. How  do you do good distillation? Because there’s so much value to be unlocked. But at the same  time, I do think there is some fundamental

bias embedded in different models. Speaking of value to be unlocked, what do you think the right way to monetize  AI will be? Obviously digital ads are quite lucrative. But as a fraction of total GDP,  it’s small compared to all remote work. Even if you can increase productivity without  replacing work, that’s still worth tens of trillions of dollars. Is it possible that ads  might not be it? How do you think about this? Like we were talking about before, there’s  going to be all these different applications,

and different applications  tend toward different things. Ads are great when you want to offer  people a free service. Because it’s free, you need to cover it somehow. Ads solve this  problem where a person does not need to pay for something. They can get something that is amazing  for free. Also by the way, with modern ad systems, a lot of the time people think the ads  add value to the thing if you do it well.

You need to be good at ranking and you need to  have enough liquidity of advertising inventory. If you only have five advertisers in the  system, no matter how good you are at ranking, you may not be able to show something to  someone that they’re interested in. But if you have a million advertisers in the system,  then you’re probably going to be able to find something pretty compelling, if you’re good at  picking out the different needles in the haystack that that person is going to be interested in. So that definitely has its place. But there are

also clearly going to be other business  models as well, including ones that just have higher costs so it doesn’t even  make sense to offer them for free. By the way, there have always been business models like this. There’s a reason why social media is free and ad-supported, but then if you want to  watch Netflix or ESPN or something, you need to pay for that. The content that’s going  into that, they need to produce it, and that’s very expensive for them to produce. They probably  could not have enough ads in the service in order

to make up for the cost of producing the content.  Basically, you just need to pay to access it. The trade-off is fewer people do it. Instead  of billions, you’re talking about hundreds of millions of people using those services. There’s a  value switch there. I think it’s similar here. Not everyone is going to want a software engineer,  or a thousand software engineering agents, or whatever it is. But if you do, that’s  something you’re probably going to be willing to pay thousands, or tens of thousands,  or hundreds of thousands of dollars for.

That just speaks to the diversity of  different things that need to get created. There are going to be business models at  each point along the spectrum. At Meta,

for the consumer piece we definitely want to  have a free thing. I’m sure that will end up being ad-supported. But I also think we’re going  to want to have a business model that supports people using arbitrary amounts of compute to do  even more amazing things than what it would make sense to offer in the free service. For that, I’m  sure we’ll end up having a premium service. But I think our basic values on this are that we want  to serve as many people in the world as possible.

How do you keep track of all these different  projects, some of which we’ve talked about today. I’m sure there are many I don’t even  know about. As the CEO overseeing everything, there’s a big spectrum between going to  the Llama team and saying, “Here are the hyperparameters you should use,” versus just  giving a mandate like, “Go make the AI better.” And there are so many different  projects. How do you think about the way in which you can best deliver your  value-add and oversee all these things?

A lot of what I spend my time on is trying to  get awesome people onto the teams. There’s that, and then there’s stuff that cuts across  teams. You build Meta AI, and you want to get it into WhatsApp or Instagram. Okay, now  I need to get those teams to talk together. Then there are a bunch of questions like, “do  you want the thread for Meta AI in WhatsApp to feel like other WhatsApp threads, or do you  want it to feel like other AI chat experiences?” There are different idioms for those. So there  are all these interesting questions that need

to get answered around how does this stuff  basically fit into everything we’re doing? Then there’s a whole other part of what we’re  doing, which is pushing on the infrastructure. If you want to stand up a gigawatt cluster,  first of all, that has a lot of implications for the way we’re doing infrastructure buildouts.  It has political implications for how you engage with the different states where you’re building  that stuff. It has financial implications for the company in terms of: “All right, there’s a  lot of economic uncertainty in the world. Do we

double down on infrastructure right now? If so,  what other trade-offs do we want to make around the company?” Those are the kinds of decisions  that are tough for other people to really make.

Then there’s this question around taste and  quality. When is something good enough that we want to ship it? In general, I’m the steward  of that for the company. Although we have a lot of other people who I think have good taste as  well and are also filters for different things.

Those are basically the areas. AI is interesting  because, more than some of the other stuff that we do, it is more research and model-led than really  product-led. You can’t just design the product that you want and then try to build the model to  fit into it. You really need to design the model first and the capabilities that you want, and  then you get some emergent properties. Then it’s, “Oh, you can build some different stuff because  this turned out in a certain way.” At the end of the day, people want to use the best model. That’s partially why, when we’re talking about

building the most personal AI, the best voice,  the best personalization — and also a very smart experience with very low latency — those are  the things that we need to design the whole system to build. That’s why we’re working on  full-duplex voice. That’s why we’re working on personalization to both have good memory  extraction from your interactions with AI, but also to be able to plug into all the  other Meta systems. That’s why we design the specific models that we design, to have the kind  of size and latency parameters that they do.

Speaking of politics, there’s been this  perception that some tech leaders have been aligning with Trump. You and others donated  to his inaugural event and were on stage with him and I think you settled a lawsuit  that resulted in them getting $25 million. I wonder what’s going on here? Does it feel like  the cost of doing business with an administration? What’s the best way to think about this? My view on this is that he’s the President of the United States. Our default, as an American  company, should be to try to have a productive

relationship with whoever is running the  government. We’ve tried to offer support to previous administrations as well. I’ve been  pretty public with some of my frustrations with the previous administration,  how they basically did not engage with us or the business community more broadly. Frankly, that’s going to be necessary to make progress on some of these things. We’re not going  to be able to build the level of energy that we need if you don’t have a dialogue, and if they’re  not prioritizing trying to do those things.

A lot of people want to write this story about  what direction people are going. We’re trying to build great stuff, and we want to have a  productive relationship with people. That’s how I see it. It is also how I would guess most others  see it, but obviously, I can’t speak for them. You’ve spoken out about how you’ve  rethought some of the ways in which you engage and defer to the government, in  terms of moderation stuff in the past. How are you thinking about AI governance? Because  if AI is as powerful as we think it might be,

the government will want to get involved. What  is the most productive approach to take there, and what should the government be thinking about? I guess in the past, most of the comments that I made were in the context of content moderation.  It’s been an interesting journey over the last 10 years on this. It’s obviously been an  interesting time in history. There have been novel questions raised about online content moderation. Some of those have led to productive new systems getting built, like our AI systems to detect  nation-states trying to interfere in each other’s

elections. I think we will continue building  that stuff out, and that has been net positive. With some other stuff, we went down some  bad paths. I just think the fact-checking thing was not as effective as Community Notes  because it’s not an internet-scale solution. There weren’t enough fact-checkers, and people  didn’t trust the specific fact-checkers. You want a more robust system. So I think what we got  with Community Notes is the right one on that. But my point on this was more that historically,  I probably deferred a little too much to

either the media and their critiques, or to the  government, on things that they did not really have authority over. But just as like a central  figure, I think we tried to build systems where maybe we wouldn’t have to make all of the content  moderation decisions ourselves or something. I guess part of the growth process over the last  10 years is realizing, “Okay, we’re a meaningful company. We need to own the decisions that we need  to make. We should listen to feedback from people, but we shouldn’t defer too much to people who do  not actually have authority over this. Because

at the end of the day, we’re in the seat, and  we need to own the decisions that we make.”

It’s been a maturation process, and in  some ways painful, but I think we’re probably a better company for it. Will tariffs increase the cost of building data centers in the US and  shift buildouts to Europe and Asia? It is really hard to know how that plays  out. I think we’re probably in the early innings on that, and it’s very hard to know. What is your single highest-leverage hour in a week? What are you doing in that hour? I don’t know. Every week is a little bit

different. It’s probably got to be the case  that the most leveraged thing you do in a week is not the same thing each week. Or else,  by definition, you should probably spend more than one hour doing that thing every week. I don’t know. Part of the fun of this job, and also of the industry being so dynamic, is  that things really move around. The world is very different now than it was at the beginning  of the year, or even six months ago, or in the middle of last year. I think a lot has advanced  meaningfully. A lot of cards have been turned

over since the last time that we sat down.  I think that was about a year ago, right? Yeah. I guess what you were saying  earlier that recruiting people is a super high-leverage thing you do. It’s very high-leverage, yeah.

You talked about these models being mid-level  software engineers by the end of the year. What would be possible if, say, software  productivity increased like 100x in two years? What kinds of things could be  built that can’t be built right now? What kinds of things? That’s an interesting  question. One theme of this conversation is that the amount of creativity that’s going  to be unlocked is going to be massive.

If you look at the overall arc of human  society and the economy over 100 or 150 years, it’s basically people going from being primarily  agrarian — with most human energy going toward just feeding ourselves — to that becoming a  smaller and smaller percent. And the things that take care of our basic physical needs have become  a smaller and smaller percent of human energy. That shift has led to two impacts: one is that  more people are doing creative and cultural pursuits. The second is that more people, in  general, spend less time working and more time on

entertainment and culture. I think that is almost  certainly going to continue as this goes on. This isn’t the 1-2 year thing of what happens when  you have a super powerful software engineer. But over time, if everyone has these superhuman  tools to create a ton of different stuff, you’re going to get incredible diversity. Part  of it is going to be solving hard problems: solving diseases, advancing science, developing  new technology that makes our lives better. But I would guess that a lot of it is going  to end up being cultural and social pursuits

and entertainment. I would guess the  world is going to get a lot funnier, weirder, and quirkier, the way that memes on  the internet have gotten over the last 10 years. I think that adds a certain richness and  depth. In funny ways, it actually helps you connect better with people. Now all day long,  I just find interesting stuff on the internet and send it in group chats to the people I care  about, who I think are going to find it funny. The media that people can produce  today to express very nuanced,

specific cultural ideas is really cool.  That’ll continue to get built out. It does advance society in a bunch of ways, even if it’s  not the “hard science” way of curing a disease.

If you think about it, the Meta social  media view of the world is that yeah, people are going to spend a lot more time doing  that stuff in the future. It’s going to be a lot better, and it’s going to help you connect,  because it’ll help express different ideas. The world is going to get more complicated,  but our technology, our cultural technology, to express these very complicated things — in a very  kind of funny little clip or whatever — is going to get so much better. I think that’s all great. I don’t know about next year. One other thought

that I think is interesting to cover is  that I tend to think that, for at least the foreseeable future, this is going to  lead to more demand for people doing work, not less. Now, people have a choice of  how much time they want to spend working.

I’ll give you one interesting example we were  talking about recently. We have almost three and a half billion people using our services  every day. One question we’ve struggled with forever is how do we provide customer support? Today, you can write an email, but we’ve never seriously been able to contemplate having  voice support where someone can just call in. I guess that’s maybe one of the artifacts  of having a free service. The revenue per person isn’t high enough to have an  economic model where people can call in.

But also, with three and a half billion  people using your service every day, the number of calls would be massive. It’d be like  the biggest call center in the world. It would be like $10 or $20 billion a year to staff that.  So we’ve never thought too seriously about it, because it always seemed like there was  no way that could make sense. But now, as AI gets better, you’re going to get to a place  where AI can handle a bunch of people’s issues. Not all of them — maybe 10 years from now it can  handle all of them — but thinking about a 3-5 year

time horizon, it will be able to handle a bunch.  It’s kind of like a self-driving car. They can handle a bunch of terrain, but they’re not  doing the whole route by themselves yet in most cases. People thought truck-driving jobs  were going to go away, but there’s actually more truck-driving jobs now than when we first started  talking about self-driving cars 20 years ago.

Going back to the customer support thing, it  wouldn’t make sense to staff out calling for everyone. But let’s say AI can handle 90% of that.  Then if it can’t, it kicks it off to a person. If you get the cost of providing that service down  to one-tenth of what it would’ve otherwise been, then maybe now it actually makes sense  to do it. That would be cool. So the net result is that I actually think we’re probably  going to hire more customer support people. The common belief is that AI will automate  jobs away. But that hasn’t really been how

the history of technology has worked. Usually,  you create things that take away 90% of the work, and that leads you to want more people, not less. To close off the interview… I’ve been playing devil’s advocate on a bunch of points, and  I really appreciate you being a good sport about it. But I do think there’s no upper bound  to how much beauty there can be in the world, especially if there are billions of AIs  optimizing the amount of beauty you can see, the amount of connection you can have, and  so forth. I’m pretty optimistic about it.

Final question: Who is the one person in the  world today who you most seek out for advice? Oh, man. I feel like part of my style  is that I like having a breadth of advisors. It’s not just one person. We’ve got a great team. There are people at the company, people on our board. There  are a lot of people in the industry who are doing new stuff. There’s not a single person.  But it’s fun. Also, when the world is dynamic, just having a reason to work with people you like  on cool stuff… To me, that’s what life is about.

Great note to close on. Thanks for doing this. Yeah, thank you.

Conclusion

In this article, we have covered important information about Technology. We hope this information proves valuable to you.

Leave a Reply

Your email address will not be published. Required fields are marked *