By far the best installation and running experience for using a large language model locally is llamafile. The entire model, weights, and a server are packaged into a single binary that can be run across multiple runtime environments.
It turns a complicated set of steps for compiling, fine tuning, and running into a single download and command sh -c ./path/to/llamafile
.
This is ideal if you just want to try out a model to see how it performs. For me, it was interesting to see that my Macbook M1 laptop can get ~11 tokens per second out of the box and the llava model is good enough for basic tasks.
I hope they make more models available to try!
See also:
- This might make it easier for personal infrastructure applications where you want to avoid sending sensitive data over the wire but can provide the context for an LLM application locally
- Convenience is king, even for developers that might be used to yak shaving