2025-01-14T16:26
The Hazy Research Lab, creators of Flash Attention, combine both systems software engineering expertise with ML algorithms development. In the algebra of a ML algorithm, there are a lot of ways to follow equivalent equations. The best way to forge a path, then, is to choose what best takes advantage of the underlying hardware. This, in essence, is how FlashAttention was invented.
Since so much of ML is driven by the bitter lesson, knowing how to make use of hardware as efficiently as possible is incredibly important. In my opinion, a lot of the frontier of ML research and application right now is in software systems.
In any case, my favorite contribution that Hazy Research has made is their writing on GPU optimization a la GPUs Go Brrr.