The Shifting Dynamics & Meta-Moats of AI
As AI continues to progress we must update our frameworks for company building across the short, mid, and long-term.
The full post with footnotes can be read on my website.
—
Perhaps the best targets for computer scientists and engineers looking to build new systems is not to find intelligences that humans lack. Instead, it is to identify the skills that generate outsized income and build machines that allow many more people to benefit from those skills.
Since over 8 years ago when we started investing in AI at Compound we have held fairly constant (albeit increasingly data-informed and nuanced) investment views. These views encapsulate everything from timelines surrounding technology, to the optimal types of startups that certain types of founders should start within AI, to where value will accrue amongst companies, to inflection points of the industry, to areas most ripe (and under-appreciated) for disruption or enablement, and more.
This could partially be summarized in the simplistic framing below from 2017 and has been published many times in our annual letters over the years (2018, 2019, 2022):
Despite this high-level framing, our conviction was perhaps less deep then on the multi-variable dynamics that founders needed to navigate while company building in AI. This was largely because all of AI felt quite novel, with few consensus winners for years, as no company in the industry had meaningfully commercially inflected. There were very few “best practices” or “table stakes” operating principles.1
AI technical approaches (and perhaps the entire industry) have become increasingly consensus with Transformers leading the way and a variety of iterations getting coalesced around quite quickly within the community over time (CoT, RAG, MoE, Model Merging, etc.). 2
There are of course emerging developments via new approaches like RWKV, SSMs, and more, however it feels more likely this will be done via skunkworks team in large orgs or via newer orgs aimed at disruption, while the consensus premise of performance at incumbents (and most AI Labs founded in the past 3 years) will be based on compute, data, and scale.3
This monotony naturally happens in industries that have fairly monolithic leaders and also experience large inflows of talent that are looking to be “shown the way” as they acclimate to a new category or industry.
With this inflection of commercial and technical success, we have new learnings surrounding company and moat building in AI in 2024 and onward.
ON SPEED
For better or for worse, every single company will face material competition in AI over the next few years either directly or via death by a thousand cuts. There will be areas that attract tons of short-term capital, open source projects that attempt to cut away at closed-source competitors, and products/companies that over-promise and under-deliver, but still create a large amount of noise. Thus, in the short term, a main goal for a startup is to continue to stand out amongst this noise.
...in reality AI companies are perhaps some of the most complex businesses we’ve had being built in tech in some time. Doing core AI model R&D necessitates a need to play 4D Chess around research communities, capital accumulation and deployment, talent acquisition, competitive understanding, and commercialization.
Due to the complexity of AI companies and the category at large, the nuances of building moats gets split across short, mid, and long-term dynamics.
Once a startup is able to create enough early moats, they then move into the necessity to continue to maintain velocity over multiple years. Again, capital will continue to flow into the space creating competition across multiple vectors for your business for longer than many appreciate, and as model performance commoditizes, incumbents will also begin to be late movers in some areas.4
Effectively what this means is that founders have to do all of the things great companies do in software, but faster than everyone else, in a space that is moving with uncertain progress and speed. And they have to do it a very high level, potentially implementing or in some cases creating things that have never been created before.
The industry at large seems to have noticed this, however few companies are able to execute at the rate that outpaces natural industry wide compression and erosion of advantages, often leading to a ton of companies with similar goals competing on the same, eventually inconsequential axis of competition.5
In some ways, OpenAI is the company which fired the starting gun and is a good example of the speed dynamic.
After the ChatGPT experiment was such a success (supposedly undertaken as an attempt to front-run competitors) a small team at OpenAI quickly became a much larger team, as the company saw the necessity to move into a faster shipping product org with tighter feedback loops after being a slow-moving research org for many years once the proverbial chasm was crossed with transformers and GPT-2.6 We’ll come back to this.
Other orgs like Runway (a Compound portfolio company) and Perplexity are great examples of organizations with elite speed at their core. Both companies rose from seemingly nowhere to those not paying attention closely, shipping a wide range of features, and shifting narratives from utilizing open-sourced models or APIs into proprietary ones as their product feature sets deepened with great velocity.
Maintaining this velocity doesn’t just mean founders must do it over 12 to 24 months, before “growing up” into larger organizations and asymptoting. Instead in AI, the best companies will have to both be elite sprinters while also being the fastest marathon runner. 7
Naturally, the prize at the end of the race will be immensely valuable, but this can go wrong in many different ways, with the ability to get distracted fighting the wrong battles always being a constant.
If i had to guess, this is where we will see a majority of companies fail; taking advantage of false signs of early durable PMF, spinning wheels playing short-term games, only to get leapfrogged or destroyed by other startups or companies playing long-term games (again, 4D chess).
The most common state of this is building a product that creates incremental value on top short-term base model performance, instead of leveraging increases in base model performance to increase product value materially.
Sam Altman framed a way to understand this with a simple heuristic recently in an interview:
“When we just do our fundamental job, which is make the model and its tooling better with every crank, then you get the ‘OpenAI killed my startup’ meme. If you’re building something on GPT-4 that a reasonable observer would say ‘if GPT-5 is as much better as GPT-4 over GPT-3 was.’..not because we don’t like you but just because we like have a mission…we’re going to steamroll you.”
While Brad Lightcap said to “Ask the company whether a 100x Improvement in the model is something they’re excited about.” in the same interview.
Another fail state of short-term games are those building products with a skeuomorphic framing of building something that is “x but with AI” instead building a novel thing that is “y” because of AI.
Because of these potentially destructive distractions (along with many others not discussed here), we believe founders need to build structures that enable their team to fight against this pitfall from a cultural perspective and a shipping and executional perspective.
In addition, they must build infrastructure that enables them to not be rate-limited in areas like compute, data, and talent, all enabled by capital that allows them to not have to throttle down speed of product iteration and possibly be disrupted due to the lack of fully unassailable moats that exist within AI currently.
Put simplistically, building an AI company is not just about survival.
DATA FLYWHEELS
Multimodality and scaling has taught us that a large amount of ROI from AI can come from “connecting the dots” of existing data today. The early canonical example of this in LLMs has been when models are taught coding tasks and it improves performance across a variety of seemingly unrelated tasks. We now have seen similar dynamics across multimodality as companies push from LLMs or “Multimodal models” to World Models in order to increase performance and world understanding brought on by image, audio, video, and more.
Pairing this with the continual obliteration of context windows, what this means is that perhaps pre-training a large model and then fine tuning it with data is not the short to mid-term moat that people once believed.
That said, LLMs have not materially shown intelligence outside of training data distribution and across a variety of non-language-only tasks we have seen that data and scale are not nearly maximized in order to drive performance like we have for more traditional natural language understanding enabled early on by things like The Pile and more.
This has caused us to progressively prefer companies that aim to own more of the stack than originally anticipated, including owning the entire stack and building an AI-enabled operating company.8
Companies that are in industries where data is structurally locked within incumbents with little computational excellence (healthcare, bio, industrials) are also increasingly attractive if you are able to navigate partnerships and more.
These dynamics allow a subset of companies to gather unique large scale data for a variety of tasks, and/or build custom workflows that allow their business to generate novel datasets9 and most importantly close the loop with real world proof and feedback, bridging a new form of Sim2Real Gap perhaps best framed as the Bits To Atoms Gap.
SYNTHETIC DATA
…then we can take another step back. With enough specific modules and dynamic beings in a simulated environment, we can better understand how all types of robots (cars, delivery robots, social robots, etc.) will interact autonomously with our real (and digital) world.
How to drive 10 billion miles in an autonomous vehicle, 2017
Across all times in AI there is an ever-presence of the promise of synthetic data, and just like any other AI hype cycle, it increasingly feels important to have a view of how that may play a role in a given sector and company progressing towards superutility.
While we initially were turned onto these use-cases as it related to AVs back in 2016, the world seems more convinced than ever of the importance of synthetic data for AGI, ASI, or whatever next order meaningful inflections of performance we see across embodied and non-embodied intelligence. Because of this, again, it wouldn’t shock us if across modalities a few different organizations become world class at generating their data and use that as a compounding advantage.10