This note does not have a description yet.
Links to this note
-
LLM Latency Is Output-Size Bound
As it stands today, LLM applications have noticeable latency but much of the latency is output-size bound rather than input-size bound. That means the amount of text that goes into a prompt does not matter.