The order of some instructions appears imperfect because, when len==32, the registers for operations like hv can only just suffice, making it difficult to adjust. It's possible to create a separate function for len<32, but it likely won't have a significant impact, so this hasn't been done yet.