AI apps startups resemble football’s Premier League clubs—the money comes in and flows straight out. These startups need to use their focus on unit economics to inform their technology choices. Stability AI is a case in point.
The breakthroughs in large-scale language models (LLMs) during 2020–22 have resulted in a wave of artificial intelligence (AI) applications startups that hope to commercialize this extraordinary technology. In 2023, they face a squeeze between the growing compute demand of training these models and serving inference to a growing audience on the one hand and an increasingly skeptical funding environment on the other. The danger is that the AI applications ecosystem may come to resemble football’s Premier League clubs—businesses where the money comes in from media rights, sponsorship, and gate takings on one side and flows to players and their agents, in this case, hardware vendors and hyperscalers. AI apps startups need to focus on unit economics and use this focus to inform their technology choices down to the metal and to fundamental AI model design. Iconic startup Stability AI is a case in point.
The Premier League economy of AI
Top-level football clubs have a problem. Winning, or even just staying in the game, requires recruiting top players who are well aware of their unique status. And these players are supported by professional agents when they negotiate with the clubs. Consequently, whether the money comes in from investors, ticket sales, media rights, merchandising, or sponsorships, it tends to end up in the players’ pockets. The result is that even successful clubs tend to be financially fragile. Football continues, though, because it has a fan base that is willing to spend serious money—either at the micro-level by paying TV subscriptions and going to the matches or at the macro-level by subsidizing clubs as trophy assets.
In the AI sector at the moment, there is a growing micro-level fan base of enthusiasts, but rather than supporting the clubs, these enthusiasts are usually subsidized by them through free tiers, previews, and developer or creative programs. The macro-level fan base is the investor ecosystem; as in football, investors tend to be a much fickler source of funding than fans. So who are the players and their agents?
The AI startup cost model
Companies custom-building large AI models face three drivers of cost:
- R&D staff: This involves hiring in some of the hottest labor markets in the world.
- Model training: This involves prolonged (days to months) batch runs on large clusters of flagship GPUs or AI ASICs.
- Model inference: This involves scaling out the application that runs inference against the model to match demand, usually on CPUs or lower tier GPUs.
The first of these scales with the complexity of the model, but only weakly (e.g., pulling more data tokens from a source such as The Pile or Common Crawl does not require more staff). The second scales very strongly with both the size and the complexity of the model and the frequency with which it is trained, while the third scales primarily with demand and secondarily with the size of the trained model.
Training runs are opportunities to improve model performance or incorporate new features, so the frequency with which a company can retrain its models essentially determines how fast it can improve. It is no surprise, then, that the industry looks like a race to train models. The problem in this race, though, is that model size impacts two out of the three cost drivers and is multiplied by the frequency of training runs, something which is itself driven by competition and fundamental innovation. Therefore, money raised from investors and any customer revenue is irresistibly drawn toward the players—the silicon vendors—and their agents—the hyperscale cloud providers.
The issue, in the end, is also a familiar one from football. On the one hand, the challenge is to increase revenue from the fan base so as to escape from burning the owners’ cash. As well as converting fans into paying customers, it is possible to expand the base, for example, by adding movie VFX studios or advertising agencies as customers. On the other hand, though, there is still the problem of keeping the revenue from running right out of the door. The good news is that as long as the effective price per inference request is greater than the increments on the cloud bill, growth in the user base is self-financing. But the bad news is that this gets harder with increasing model sizes. In addition, the contribution margin remaining has to finance both maintenance training runs (e.g., retraining to incorporate new data or to fine-tune problems detected by monitoring) and development training runs (e.g., ones to implement new features and fundamental improvements). The key problem of the Premier League economy is its terrible unit economics.
Case study: Stability AI
Possibly the AI startup standout in 2022 with its Stable Diffusion open-source image generator, Stability AI made more waves in October when it raised another $101m in venture capital (VC) funding at a $1bn valuation. Playing no small part was related media coverage that put Stability’s model training expenses at $50m and described a cluster of 4,000 NVIDIA A100 GPUs on Amazon Web Services (AWS) P4d instances running for a month. On the face of it, this sounds like an extreme example of football economics, with funding being consumed at a remarkable rate—the new money might be burned through in two years or so. However, Stability founder Emad Mostaque has quoted very different numbers publicly, saying that the model training cost “$600k” at AWS quoted prices using 256 A100s for “150k” hours.
AWS rents an instance with eight A100s for $32.77/hour, so 32 of those or 256 GPUs come to $1,048/hour just for compute. If the number of 150,000 hours refers to the total runtime of the individual GPUs, rather than the cluster, this implies a training runtime of 24.4 days, a bit less than a month. Therefore, 10–12 iterations a year is the maximum achievable, and it generates an aggregate compute bill of $7.2m. The $50m figure might be accounted for if the company was training multiple versions of the model in parallel as part of model development.
Mostaque also says Stability negotiated a substantial discount after the fact, something AWS is routinely willing to do for very large customers. As model training is a batch workload, Stability could probably also benefit from the heavily discounted prices AWS offers for Reserved Instances by booking ahead of time and possibly from Spot Instances pricing if the training algorithm tolerates interruptions. Whatever benefit the company got from cost optimization like this would proportionately reduce the runtimes above.
One thing Stability definitely does, though, is invest in improved training algorithms, infrastructure, and model development to keep the key units of scale down. If the theory described above is correct, much of the quoted $50m spend went on model development rather than just training. The key units of scale are the training runtime, which controls how quickly or expensively the model can improve, and the model size in terms of RAM, which both controls how quickly the inference serving bill increases with demand and feeds back into the training bill. This matters to Stability in two ways: first, through the cost base of its DreamStudio online service; and second, because StableDiffusion users can run the application locally on their own hardware.
Is the future at the edge, after all?
Running on the user’s machine is the cheapest possible way to provide inference; it costs nothing, scales naturally with the user base, and is therefore a massive boost to unit economics. It follows that model developers should target the accelerators that might be available—various gaming- or workstation-grade GPUs and Apple’s Neural Engines. Since Apple moved the MacBook line to its own silicon, all the MacBooks get either a seven- or an eight-core GPU and an eight-core accelerator. The accelerator offers 15.8 TOPS of performance, increased to 22 TOPS in the very latest Macs. StableDiffusion is designed to run with 5GB of VRAM, which fits in enthusiast-grade GPUs such as the NVIDIA GTX series. Apple, meanwhile, has released its own implementation that ports the model to the Neural Engine architecture. Omdia forecasts that 1.4 billion AI accelerators of various kinds will ship across PCs, tablets, and smartphones in 2023, approaching 2 billion in 2027 (see Figure 1).
Figure 1: AI accelerators are shipping to PCs and smartphones in real scale Source: Omdia, AI Chipsets for Edge Forecast Report – 2022 Database
In its documentation, Apple describes the model as consisting of four distinct neural networks in a pipeline and 1.27 billion parameters. This makes it a very frugal model for its capabilities. The compute-optimal scaling logic in Google’s Training Compute-Optimal Large Language Models paper on the Chinchilla LLM indicates that for a given model and performance target, there is a trade-off between the model size in parameters, training effort (whether in terms of a longer training run or a faster one), and the volume of training data. We can save on model size by training with more data, use less training data by training for longer, or reduce training runtime against a given data set by accepting a bigger model.
In the case of StableDiffusion, it seems that Stability chose to peg the model size at levels that make running inference on the local machine possible and, indeed, the best option. Having set a 5GB footprint as the most limiting factor, Chinchilla logic implies the company maxed out on either training runtime or training data. We know that Stability was a substantial contributor to the LAION-Aesthetics project, a 2 billion image dataset that was extensively labeled by human raters. Given what we know about the training runs, this sounds like a key element of the project’s success. That said, the fundamental design of the model is a given factor in compute-optimal scaling, but improving it is an option.
Much comment about Stability has centered on the decision to release StableDiffusion as open-source software. Although this has led to an impressive burst of derived innovations, perhaps the most interesting aspect of the company is its focus on building models that work on PC hardware through a combination of Chinchilla logic, fundamental model development, and dataset shaping. Using the increasing wealth of accelerator hardware deployed to the edge may well be the best way to save AI startups’ unit economics and escape from the Premier League economy of AI.
AI Chipsets for Edge Forecast Report – 2022 Database, (August 2022)
Atila Orhon, Michael Siracusa, Aseem Wadhwa, Stable Diffusion with Core ML on Apple Silicon, Apple (December 2022)
Kyle Wiggers, “Stability AI, the startup behind Stable Diffusion, raises $101M,” TechCrunch (October 2022)
Kyle Wiggers, “This startup is setting a DALL-E 2-like AI free, consequences be damned,” TechCrunch, (August 2022)
Stability AI SEC filing (October 2022)
EC2 Pricing: Reserved Instances, Amazon Web Services (January 2023)
@Emad Mostaque remarks on Stable Diffusion training, Twitter (August 28, 2022)
Jordan Hoffman et al., Training Compute-Optimal Large Language Models, Cornell University (March 2022)
Alexander Harrowell, Principal Analyst, Advanced Computing