Retrieval-Augmented Generation (RAG)

  • LLM Latency Is Output-Size Bound

    As it stands today, LLM applications have noticeable latency but much of the latency is output-size bound rather than input-size bound. That means the amount of text that goes into a prompt does not matter.