Friday, December 5, 2008

12-05-08 - lrotl



Well I found one x86 ASM widget. I've always known you could do nice fast barrel rotations on x86 but
thought they were inaccessible from C. Huzzah! Stdlib has a function "_lrotl()" which is exactly what you
want, and happily it is one of the magic functions the MSVC recognizes in their compiler and turns into
assembly with all goodness. (They also have custom handling for strcmp, memcpy, etc.)



Oh, I noticed lrotl in OpenSSL which seems to have a ton of good
code for different hashes/checksums/digests/whatever-the-fuck-you-call-them's.



As a test I tried it on Sean's hash, which is quite good and fast for C strings :



RADINLINE U32 stb_hash(const char *str)
{
U32 hash = 0;
while (*str)
{
hash = (hash << 7) + (hash >> 25) + *str++;
}
return hash;
}

RADINLINE U32 stb_hash_rot(const char *str)
{
U32 hash = 0;
while (*str)
{
hash = _lrotl(hash,7) + *str++;
}
return hash;
}

stb_hash : 6.43 clocks per char
stb_hash_rot : 3.24 clocks per char



Woot! Then I also remembered something neat I saw today at
Paul Hsieh's Assembly Lab .
A quick check for whether a 32 bit word has any null byte in it :



#define has_nullbyte(x) (((x) - 0x01010101) & ( ~(x) & 0x80808080 ) )



Which can of course be used to make an unrolled stb_hash :



RADINLINE U32 stb_hash_rot_dw(const char *str)
{
U32 hash = 0;

while ( ! has_nullbyte( *((U32 *)str) ) )
{
hash = _lrotl(hash,7) + str[0];
hash = _lrotl(hash,7) + str[1];
hash = _lrotl(hash,7) + str[2];
hash = _lrotl(hash,7) + str[3];
str += 4;
}
while (*str)
{
hash = _lrotl(hash,7) + *str++;
}
return hash;
}

stb_hash_rot_dw : 2.50 clocks



So anyway, I'm getting distracted by pointless nonsense, but it's nice to know lrotl works.
(and yes, yes, you could be faster still by switching the hash algorithm to something that works
directly on dwords)

No comments:

Post a Comment