Friday, September 10, 2010

Profiler tips

When using a profiler one gets a pointer to the instructions where most of the time is spent. But given the complexity of hardware processors how do we interpret it?
  • First look at the function graph to make sure that "fan-in" and "fan-out" are as expected. If not, then check the logic.
  • If they are memory loads, then we need to look at how our data is layed out in memory and how we are accessing them. On the instruction side we need to look into whether prefetch is being used advantageously.
  • If they are branches, then we have to see whether we are using the branch prediction unit advantageously. Maybe the the condition needs to be rewritten or compiler hints given to better the predictability.
  • If they are compute instructions, then we need to look at how our local algorithmic code is written and rewrite it to better use the instruction set.