AI Models at the Edge

Today, most large language models are run by making requests over the network to a provider like OpenAI which has several disadvantages. You have to trust the entire chain of custody (e.g. network stack, the provider, their subprocessors etc.). It can be slow or flakey and therefore impractical for certain operations (e.g. voice inference, large volumes of text). It can also be expensive—providers are charging per API call and experiments can result in a surprising bill (my useless fine-tuned OpenAI model cost $36).

Open-source tools for running AI models locally (or “at the edge”) are being built to solve that. Utilities like ggml an llama.cpp lets you run models on commodity hardware like your laptop or even a phone. They also have all the advantages of open source AI—pushing the boundary of where these models run and improving it by having increasing access.

I’m very excited about the interest in AI at the edge because driving the cost to near zero (at least for the folks with the skillset to tinker) and removing the network stack creates will encourage more people to try and build new things.