You don’t understand a distributed computing problem until you get it to fit on a single machine first.
Speeding up computing can be thought of as three different approaches: high (vertically scaling e.g. more RAM and faster CPU/storage), wide (distributed work), and deep (refactoring).
I saw this happen at work where an engineer rewrote a Spark job distributed over many machines to a single large machine calculating the same output using Unix commands and pipes faster than the distributed version.