Evolution is very powerful and very slow. A new startup thinks it can harness its might — without the millennia of waiting around.
The startup EvolutionaryScale launched Tuesday with a $142 million seed round from tech-focused investors, and ambitions to condense hundreds of millions of years of biological trial-and-error into a work week.
It was founded by a team of former Meta scientists led by Alex Rives, who developed a large language model called evolutionary scale modeling, or ESM, that could predict the three-dimensional structure of a protein just by knowing its sequence. It rivals the better-known AlphaFold model, developed by scientists at Google DeepMind.
Last year, Rives’ team left tech giant Meta to start their own company and develop the latest version of ESM — one that could generate proteins never seen in nature. Dubbed ESM3, it debuted Tuesday in a preprint showing how it created new glow-in-the-dark proteins based on the green fluorescent proteins that make jellyfish light up.
The new protein isn’t better or brighter than the jellyfish’s — but it shares a mere 58% of its genetic sequence. The EvolutionaryScale team estimates a change that dramatic would take 500 million years to happen in nature.
The glow-in-the-dark exercise was more academic than business-focused, but Rives told Endpoints News he believes ESM3 could have medical and environmental applications, such as creating antibody drugs or enzymes that help break down plastic or capture carbon. The startup is planning on building even more powerful models, he added.
“We will really be able to think much more creatively about designing biology and will have the tools to be able to approach biology not through painstaking trial and error, but through engineering from first principles,” Rives said.
Favoring partnerships and software sales over pipeline
EvolutionaryScale is part of the growing field of de novo protein design, in which some scientists believe that AI will help companies design protein-based drugs more quickly, or find new leads on traditionally tough targets and diseases.
Other heavyweights include Flagship Pioneering’s Generate:Biomedicines, which has raised over $750 million, as well as Xaira Therapeutics, which debuted earlier this year with more than $1 billion in initial funding to advance models from the lab of co-founder and biochemist David Baker.
While Generate and Xaira are focused on developing their own pipelines, EvolutionaryScale’s strategy includes plans to partner with drugmakers as well as sell its software as a microservice via Nvidia. A pared-back version of ESM3 will also be available for free to academics.
“We really want to build tools to enable scientists to understand and design biology,” Rives said. “What we’re not doing is building a vertically integrated drug discovery company.”
The lack of specifics reflects the early stage of the venture, which started up about a year ago. EvolutionaryScale has about 20 employees and doesn’t have a CEO. Rives said they aren’t looking for one, either, and he plans to return to academia this fall as an assistant professor at MIT and a member of the Broad Institute of MIT and Harvard while keeping his role as chief scientist.
“We see ourselves first and foremost as a frontier research lab,” Rives said.
The $142 million seed came from a small group of investors, led by Lux Capital and the duo of tech founders-turned-investors: Nat Friedman, the former GitHub CEO, and Daniel Gross, whose search engine startup Cue was acquired by Apple. Amazon Web Services and NVentures, the corporate venture capital arm of Nvidia, also participated.
Rives co-leads the startup with two fellow ex-Meta scientists: Vice President of Engineering Tom Sercu, who worked on Meta’s fundamental AI research team and on IBM’s Watson, and Chief Technology Officer Salvatore Candido, a leader of Google’s internet-via-balloon subsidiary Loon, which shut down in 2021. The trio makes up EvolutionaryScale’s board, along with Lux Capital co-founder and managing partner Josh Wolfe.
Synthetic data helped fuel 98-billion parameter model
EvolutionaryScale’s focus, for now, is its ESM3 model, which was trained on nearly 2.8 billion proteins, many of which are only known from genetic sequences of microbes and whose structures are predicted with AI itself.
Rives believes bigger is better when it comes to AI. The power of text generators like ChatGPT increased tremendously as more data were fed into them. But AI leaders have sounded the alarm on running out of fresh data to feed these hungry models. The problem is even greater for biology, where data are scarcer and far more expensive to produce.
Some companies such as Generate and Xaira believe that conducting lab experiments to get more data is necessary to improve their protein-creating models. Rives said that EvolutionaryScale will have its own labs too, but he emphasized the importance of using protein sequences, readily obtained in bulk from thousands of microbes that inhabit every pocket of the earth, without ever having to grow or study the organisms in a lab.
“Low-cost gene sequencing is just kind of unlocking a broad understanding of protein sequence diversity across life. So that’s really the core of the data that our model is trained on,” he said.
Rives’ team previously used AI to predict the structures of such proteins only known by their sequence. These so-called synthetic data were key to ESM3. “It really suggests a path to continue to add capabilities and additional scale and data to these models,” Rives said.
His first version of ESM had about 700 million parameters — the internal gears of an AI model, determined by the training data. The latest version has 98 billion parameters and was trained on one trillion teraflops of compute. “As far as we know, this is the most compute that has gone into training a language model for biology,” Rives said.
Rives said the company is working on even bigger models, and that the company’s current funding should last about two years.
“A lot of the funding is going towards building that next-generation model at an even greater level of scale and compute,” he said.