The key to nature’s intelligence is artificial intelligence

16 06 2023 | George Darrah


In just four short weeks over March and April, LLMs became commoditised. Since Meta’s ‘leak’ of its foundational LLM, LlaMa, there has been a Cambrian explosion of open-source models and associated tooling built on LlaMa’s foundation. Start-ups building a moat anchored upon the strength of their algorithms now look more precarious than ever.

For tech founders today, focusing on building a proprietary and differentiated dataset is therefore even more important. This is because the public models can go incredibly deep in areas where there is a large, annotated and publicly available corpus. It's clearly not true to say ChatGPT is wide but shallow. Ask it for a medical diagnosis or about nuances of English vs American contract law, and you'll see it can go deeper and way, way faster than the nerdiest PhD (despite not always being 100% accurate). But bring it into a realm of sparse data, with limited pre-existing inference, and watch it react with a shrug emoji. The model remains the same but is hamstrung by the lack of data.

All this means there is a colossal opportunity for start-up building in areas where:

  1. Data is poor and difficult to collect; and

  2. If this data is collected and fed into the latest algorithms there are immense and commercially relevant problems that can be solved.

Nature is uniquely suited to this opportunity. Where life exists, chaos reigns. Detecting patterns in messy biological datasets and then making predictions based upon these patterns is a perfect-use case for AI.

The commercial opportunity to leverage nature’s intelligence is immense. Humanity has been tweaking nature’s behaviour for millennia to produce food and materials. And now we’ll need more efficient ways to produce more food and more materials, alongside recovering biodiversity. Perhaps most significantly though we need to understand nature’s mechanisms at an ecosystem and planetary scale, thus enhancing our ability to mitigate and adapt to climate change.

Some biological data is surprisingly easy to collect. Sequencing DNA is cheap, and the earth can now be scanned in near real time for algal blooms in the oceans and forest growth on land. But for most biological data, there remains a double problem: high quality and granular biological data is difficult to both collect and curate.

Better data acquisition typically looks like hardware innovation combined with AI. The AI helps the hardware get used more efficiently. For biological data acquisition, this could look like:

  1. Intelligently running experiments in the laboratory so that experimental design and execution evolves based upon near real-time integration of experimental data as it happens (closed loop optimisation). Such as Melonfrost’s novel bioreactor, and Synthace’s digital lab backbone, and Hoxton Farms’ computer vision powered cell counting platform.

  2. Collecting data on nature’s interactions at an ecosystem and even planetary scale, then optimising your data collection spatially and temporally using data from your previous sampling. An example from agriculture: soil carbon, the world’s largest terrestrial organic carbon store, is expensive to monitor, so companies like Miraterra are using AI to streamline their physical sampling. Naturemetrics are building a similar optimisation engine for biodiversity sampling using eDNA.

  3. Discovering the value of other, lower-cost data in enriching or potentially even supplanting new sensor deployments. For example, Basecamp Research collect environmental metadata wherever they find protein encoding DNA sequences. It turns out that knowing the ph, salinity and temperature of the environment in which a protein evolved results in more accurate predictions of protein function.

  4. For a few applications, data can be acquired synthetically – such as using AI to create 3D renderings of zebras to train a visual classification model – but this doesn’t generalise well to most other types of nature data.

As with most applications of AI, the principal of optimising data acquisition isn’t new. But it’s only now that AI is cheap enough and easy to use that early-stage founders can start using these models to interpret patterns in the highly complex systems across nature.

Acquiring data is only one side of the coin – the next step of turning data into knowledge is doing the painstaking work of accurately labelling data-points. Highly contextualised data results in more powerful outcomes, as models can be more precisely fine-tuned. This needs expert curators, who find joy in structuring and labelling these datasets prior to feed into stacks of carefully crafted algorithms. Biologists and computer scientists with grit need only apply.

It’s slightly ironic that to understand our own living system, nature, we’ll need AI. We are just at the very beginning of a journey towards the greatest prize, a dynamic and granular understanding of how the planetary biome creates conditions habitable for life on Earth, from a single virus up to the global ecosystem.

If you’re building a start-up with a data moat at the intersection between nature and climate, please get in touch.  

Special thanks to co-contributors Amy Varney, Matt Goldstein and Philipp Lorenz

Nature’s intelligence: a high resolution image of a mycelium network under the soil, mirroring connections in a neural net or neurons in the brain. Photo credit: Loreto Oyarte Galvez


EMAIL

contact@
systemiqcapital.earth

George Darrah