Monday, December 8, 2008

12-08-08 - DXTC Summary



I thought I should fix some things that were wrong or badly said in my original DXTC posts :



DXTC Part 1

DXTC Part 2

DXTC Part 3

DXTC Part 4




On the "ryg" coder : there was a bug/typo in the implementation I was using which gave bogus results, so you should
just ignore the numbers in those tables. See for correction :

Molly Rocket Forum on Ryg DXTC coder
. Also I should note he does have some neat code in there. The color finder is indeed
very fast; it is an approximation (not 100% right) but the quality is very close to perfect. Also his "RefineBlock" which does
the Simon Brown style optimize-end-from-indices is a very clever fast implementation that collapses a lot of work. I like the
way he uses the portions of one 32 bit word to accumulate three counters at a time.



I also mentioned in those posts that the optimized version of the Id "real time DXTC" bit math was "wrong". Well, yes, it
is wrong in the sense that it doesn't give you exactly the same indeces, but apparently that was an intentional approximation
by JMP, and in fact it's a very good one. While it does pick different indeces than the exact method, it only does so in cases
where the error is zero or close to zero. On most real images the actual measured error of this approximation is exactly zero,
and it is faster.



So, here are some numbers on a hard test set for different index finders :



exact : err: 31.466375 , clocks: 1422.256522

id : err: 31.466377 , clocks: 1290.232239
diff: 0.000002

ryg : err: 31.466939 , clocks: 723.051241
diff: 0.000564

ryg-t : err: 31.466939 , clocks: 645.445860
diff: 0.000564



You can see the errors for all of them are very small. "ryg-t" is a new one I made which uses a table to turn the dot
product checks into indexes, so that I can eliminate the branches. Start with the "ryg" dot product code and change the inner loop to :



const int remap[8] = { 0 << 30,2 << 30,0 << 30,2 << 30,3 << 30,3 << 30,1 << 30,1 << 30 };

for(int i=0;i < 16;i++)
{
int dot = block.colors[i].r*dirr + block.colors[i].g*dirg + block.colors[i].b*dirb;

int bits =( (dot < halfPoint) ? 4 : 0 )
| ( (dot < c0Point) ? 2 : 0 )
| ( (dot < c3Point) ? 1 : 0 );

mask >>= 2;
mask |= remap[bits];
}



I should note that these speed numbers are for pure C obvious implementations and if you really cared about speed you should
use SSE and who knows what would win there.




Now this last bit is a little half baked but I thought I'd toss it up. It's a little bit magical to me that Ryg's Mul8Bit (which is
actually Jim Blinn's) also happens to produce perfect quantization to 565 *including the MSB shifting into LSB reproduction*.



I mentioned before that the MSB shifting into LSB thing is actually "wrong" in that it would hurt RMSE on purely random data,
because it is making poor use of the quantization bins. That is, for random data, to quantize [0,255] -> 32 values (5 bits)
you should have quantization bins that each hold 8 values, and whose reconstruction is at the middle of each bin. That is,
you should reconstruct to {4,12,20,...} Instead we reconstruct to {0,8,...247,255} - the two buckets at the edges only get
4 values, and there are some other ones that get 9 values. Now in practice this is a good thing because your original data
is *not* random - it's far more likely to have exact 0 and 255 values in the input, so you want to reproduce those exactly.
So anyway, it's not a uniform quantizer on the range [0,255]. In fact, it's closer to a uniform quantizer on the range
[-4,259].





I think it might actually just be a numerical coincidence in the range [0,255].

The correct straight-forward quantizer for the DXTC style colors is

return (32*(val+4))/(256+8);

for R. Each quantization bin gets 8 values except the top and bottom which only get 4. That's equivalent to quantizing the range [-4,256+4) with a uniform 8-bin quantizer.

Now

1/(256 + 8) = 1/256 * 1/(1 + 8/256)

We can do the Taylor series expansion of 1/(1+x) for small x on the second term and we get ( 1 - 8/256 + 64/256/256) up to powers of x^2

So we have

( (32*val+128) * ( 1 - 8/256 + 64/256/256) ) >> 8

And if we do a bunch of reduction we get

return ( 31*val+124 + ((8*val+32)>>8) ) >> 8

If we compare this to Mul8bit :

return ( 31*val+128 + ((val*31 + 128)>>8)) >> 8;

it's not exactly the same math, but they are the same for all val in [0,255]





But I dunno. BTW another way to think about all this is to imagine your pixels are an 8.inf fixed point rep of an original floating
point pixel, and you should replicate the 8 bit value continuously. So 0 = 0, 255 = FF.FFFFFF.... = 1.0 exactly , 127 = 7F.7F7F7F..



BTW this reminds me :
When I do Bmp -> FloatImage conversions I used to do (int + 0.5)/256 , that is 0 -> 0.5/256 , 255 -> 255.5/256 .
I no longer do that, I do 0->0, and 255 -> 1.0 , with a 1/255 quantizer.

No comments:

Post a Comment