It appears the sam3x8e has a primitive cache prefetch mechanism (it
prefetches 32 bytes at a time aligned to a 16 byte boundary).
Aligning the main loop in timer_dispatch_many() to a 16 byte boundary
significantly improves performance.
Signed-off-by: Kevin O'Connor <kevin@koconnor.net>