Pareto Principle

80% of the effects comes from 20% of the causes. This distribution seems to repeat in many problem spaces where there is an imbalance between the inputs and outputs.

Given a corpus of language the frequency of the frequency of a word (how many times it shows up in the corpus) is inversely proportional to it’s rank in the frequency table of words. For example, ‘the’ appears 2x as much as ‘of’ (7% of words, 3.5% of words respectively). This is another example of the Pareto principle.