Tuesday, November 18, 2008

11-18-08 - DXTC

I've been doing a little work on DXT1 compression. The main reason I'm looking at it is because I want to do something
*better* than DXT1, and before I do that, I want to make sure I'm doing the DXTC right.

For now let's gather prior art :

The only real decision the coder gets to make is what the two 565 endpoints are. The other details you just have to get right -
reconstruct the palette interpolants right, and assign the 16 indices to the closest palette entries right. For the moment I
don't care about speed, I just want to make sure the indices are actually the best choices.

All the papers by Ignacio and
J.M.P. van Waveren are modern classics. One thing of note : the "simplified" index finder that JMP has in his original paper
is wrong. On page 12 he says "Evaluating the above expression reveals that the sub expression ( !b3 & b4 ) can be
omitted because it does not significantly contribute to the final result" - apparently that is not true, because when I use
the "rewritten" index finder it produces bad indexes. His original version of the index finder bit twiddle is fine.

One thing I've found with the index finding is that the integer and nonlinear effects are pretty strong and any ideas you have
about planar geometry fail if you assume that your 4 points are colinear (they are not actually).
For example, an obvious idea is to do a dot product test to pick between the 4 points. First dot
product with the middle separating plane, then the two planes between the next pairs of points. This doesn't work. Part of
the problem is because of the bad rounding in the generation of the 1/3 point. The JMP method is actually comparing vs the
true distances, so that part is good.

In more detail, an obvious idea is to do something like this :

int dot = (color - c0) * (c1 - c0);

int bmid = (dot > mid01);
int b02 = (dot > mid02);
int b31 = (dot > mid31);

// using the DXT1 palette order label 0,2,3,1

int threebit = (bmid<<2) + (b02<<1) + b31;

int index = remap[threebit];

// remap is an 8-entry table that remaps from 3 bits to 2 bits to give an index

That "works" just as much as the "ryg" code works or the "Extreme DXT" method of finding indexes work - which is to say that
it doesn't work.

FastDXT appears to be an implementation of the id "Real Time DXT"

Ericsson Texture Compression
is similar to DXTC but different; this is a pretty good paper, there are some interesting ideas here like the T and H
blocks. It gets slightly better quality in some cases. It's obvious that you can beat DXTC by having a base color and a
log-scaled delta color, rather than two end points. The two 565 end points is wasteful; you could for example do a 777
base color and a 444 log scaled delta.

Tight Frame Normal Map Compression is similar to
DXT5 (really to 3Dc) with some small improvement on normal maps.

Extreme DXT Compression by Peter Uliciansky is indeed
super fast. However, it has some errors. The division method that he uses to assign the indeces is not
correct. You can improve it by offsetting the end points correctly to put the division quantization boundaries in the
right place, but even then it's still not exactly correct unless the diagonal of the color bbox is along {1,1,1}. The
error from this is pretty small and he's going for max speed, so it's semi reasonable what he does.
Note that using the full range of the bbox for the division
but insetting it to make the dxt1 colors (as he does) improves the indeces slightly, but it's not actually the correct range scale.

In more detail :

To find indices for the DXTC colors c0 and c1

You should use

base = c0 + (c0 - c1)/6

range = (c1 - c0) * 4/3

index = (color - base) * 4 / range;


index = (color - base) * 3 / (c1 - c0);

not :

index = (color - c0) * 4 / (c1 - c0);

But even if you do that it's still wrong because the planes through color space do not actually separate the colors from
their closest palette entry.

MSDN compressed format documentation now has the
right rounding for the 1/3 points, but the old DirectX8 docs have the wrong rounding. The old docs said

color_2 = (2*color_0 + 1*color_1 + 1)/3
color_3 = (1*color_0 + 2*color_1 + 1)/3

Which is actually better, but is not what the standard or the hardware does.

Simon Brown's Squish is very good. His Cluster
Fit idea is elegant and clever and produces quite good quality.

There's a thread on MollyRocket
about DXTC with some convenient code from "ryg". It's nice
simple code, but it has some errors. The way he finds the indices is the dot product method which is
wrong. Also the way he computes the 1/3 and 2/3 colors for the palette using LerpRGB is wrong, it
rounds differently that the hardware really does.

One thing that some simple code gets wrong is the way to quantize from 24 bit to 565. You can't
just take the top bits, because the 565 values are not reconstructed to the middle of their bins.
You need to correct for the offsetting. You can either use the true range, which is something like -4 to 259,
or you can skew the bits in the quantizer. The "ryg" code has a perfect quantizer that's very neat
in the "As16Bit" function. An example is that the value 250 should map to 30 in 5 bits, not

BTW the YCoCg-DXT5 method is pretty good on current hardware, but I'm not really looking at it. It's obviously
inherently wasteful, it's just a way of making use of the current hardware, but it's not really what you'd want
to be doing. It's also very large, at around 2.6 bits per byte (8 bits per pixel).

More advanced topics to come.

No comments:

Post a Comment