Rather than converting to text at every step in a chain of thought process with large language models to solve a complex problem, new research suggests that reasoning can happen in a latent space using the internal representation of the model. Besides improving responses that require a greater degree of reasoning, utilizing latent space is faster because it skips the continuous tokenization and text generation.
Since OpenAI introduced the o1 model and preview of o3, models that utilize chain of thought style processing have shown good results with o3 outperforming all other models in the ARC prize benchmarks. However, performance comes at a large cost of tokens used and time. Latent space reasoning seems like it will improve on both of these issues.
Read Training Large Language Models to Reason in a Continuous Latent Space.
See also:
- Consciousness is categories
- Associative thinking gives rise to creativity
- If this takes off, reasoning will be model specific and opaque leading to even mushier systems