Monday, September 15, 2008

A little bit of __attribute__((always_inline)) goes a long way

Previously, with my SPU programs, I've been relying on heavy, gratuitous use of the param option to set various inlining thresholds absurdly high - the result being large programs that take a long time to compile, but run quite fast.

The alternative is a little bit more precision - working out where the compiler isn't inlining something that would be beneficial to be inlined (i.e. handling sw cache hits) and forcing it to do so using always_inline.

The result? Faster compilation, smaller programs and (so far) programs that are as fast or faster - the compiler generally knows what it's doing when it comes to inlining, there's just some silly little, very hot, cache routines that it doesn't handle well.

No comments: