Previously, with my SPU programs, I've been relying on heavy, gratuitous use of the param option to set various inlining thresholds absurdly high - the result being large programs that take a long time to compile, but run quite fast.
The alternative is a little bit more precision - working out where the compiler isn't inlining something that would be beneficial to be inlined (i.e. handling sw cache hits) and forcing it to do so using always_inline.
The result? Faster compilation, smaller programs and (so far) programs that are as fast or faster - the compiler generally knows what it's doing when it comes to inlining, there's just some silly little, very hot, cache routines that it doesn't handle well.
Monday, September 15, 2008
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment