Again, if its speed you want, you can't beat look up tables no matter how hard you try. A 256 byte table will work just fine, and it's four add statements with possibly a shift, but the shift too can be bypassed. Observe: int bitcount(long *value) { char *c; c=(char *) value; // convert long pointer to a char pointer. return table[c[0]]+table[c[1]]+table[c[2]]+table[c[3]]; } The above may be slightly less efficient than a XOR, ADD and SHIFT operation that the original function showed, however this is CPU dependant. For a 16 bit: int bitcount(int *value) { char *c; c=(char *) value; return table[c[0]]+table[c[1]]; } This will kick the ass of that call, because there's a single add and only two memory fetches. Further, for a single byte, you can implement this as a macro function which gets rid of all the overhead: #define bitcount(value) table[value] Granted, this wastes memory, but it depends on whether you're willing to trade clarity for speed. The three above functions assume lots of things about the bit size and such, yes, but that's not the point. They are CLEAR in their functionality, and FAST. The eight line function I showed is also clear in functionality, but is slower. Personally I'd rather have clarity than speed. I'm not interested in breaking cyphers as much as I am in writing them, so brute force isn't something I'd look to using. I've seen far too much weird code in my time to want to use that "simple" ADD/XOR/SHIFT function. As "simple" as it seems, there are alternatives. IF you want a really high speed method of counting bits, do it in hardware with a dedicated chip and shove it up the parallel port or directly on the machine's bus. If you're trying to break cyphers, you will undoubtedly do this. If you are not, it's far safer to write clean, clear, precise understandable code which won't require a second or thrid glance even with comments. (That of course is how this got started in the first place... the Cray Opcode that did this. :-) }