Tuesday, November 18, 2008

11-18-08 - DXTC



I've been doing a little work on DXT1 compression. The main reason I'm looking at it is because I want to do something
*better* than DXT1, and before I do that, I want to make sure I'm doing the DXTC right.



For now let's gather prior art :



The only real decision the coder gets to make is what the two 565 endpoints are. The other details you just have to get right -
reconstruct the palette interpolants right, and assign the 16 indices to the closest palette entries right. For the moment I
don't care about speed, I just want to make sure the indices are actually the best choices.



All the papers by Ignacio and
J.M.P. van Waveren are modern classics. One thing of note : the "simplified" index finder that JMP has in his original paper
is wrong. On page 12 he says "Evaluating the above expression reveals that the sub expression ( !b3 & b4 ) can be
omitted because it does not significantly contribute to the final result" - apparently that is not true, because when I use
the "rewritten" index finder it produces bad indexes. His original version of the index finder bit twiddle is fine.



One thing I've found with the index finding is that the integer and nonlinear effects are pretty strong and any ideas you have
about planar geometry fail if you assume that your 4 points are colinear (they are not actually).
For example, an obvious idea is to do a dot product test to pick between the 4 points. First dot
product with the middle separating plane, then the two planes between the next pairs of points. This doesn't work. Part of
the problem is because of the bad rounding in the generation of the 1/3 point. The JMP method is actually comparing vs the
true distances, so that part is good.




In more detail, an obvious idea is to do something like this :

int dot = (color - c0) * (c1 - c0);

int bmid = (dot > mid01);
int b02 = (dot > mid02);
int b31 = (dot > mid31);

// using the DXT1 palette order label 0,2,3,1

int threebit = (bmid<<2) + (b02<<1) + b31;

int index = remap[threebit];

// remap is an 8-entry table that remaps from 3 bits to 2 bits to give an index




That "works" just as much as the "ryg" code works or the "Extreme DXT" method of finding indexes work - which is to say that
it doesn't work.



FastDXT appears to be an implementation of the id "Real Time DXT"
paper.



Ericsson Texture Compression
(ETC2)
is similar to DXTC but different; this is a pretty good paper, there are some interesting ideas here like the T and H
blocks. It gets slightly better quality in some cases. It's obvious that you can beat DXTC by having a base color and a
log-scaled delta color, rather than two end points. The two 565 end points is wasteful; you could for example do a 777
base color and a 444 log scaled delta.



Tight Frame Normal Map Compression is similar to
DXT5 (really to 3Dc) with some small improvement on normal maps.



Extreme DXT Compression by Peter Uliciansky is indeed
super fast. However, it has some errors. The division method that he uses to assign the indeces is not
correct. You can improve it by offsetting the end points correctly to put the division quantization boundaries in the
right place, but even then it's still not exactly correct unless the diagonal of the color bbox is along {1,1,1}. The
error from this is pretty small and he's going for max speed, so it's semi reasonable what he does.
Note that using the full range of the bbox for the division
but insetting it to make the dxt1 colors (as he does) improves the indeces slightly, but it's not actually the correct range scale.



In more detail :

To find indices for the DXTC colors c0 and c1

You should use

base = c0 + (c0 - c1)/6

range = (c1 - c0) * 4/3

index = (color - base) * 4 / range;

or

index = (color - base) * 3 / (c1 - c0);

not :

index = (color - c0) * 4 / (c1 - c0);



But even if you do that it's still wrong because the planes through color space do not actually separate the colors from
their closest palette entry.



MSDN compressed format documentation now has the
right rounding for the 1/3 points, but the old DirectX8 docs have the wrong rounding. The old docs said



color_2 = (2*color_0 + 1*color_1 + 1)/3
color_3 = (1*color_0 + 2*color_1 + 1)/3



Which is actually better, but is not what the standard or the hardware does.



Simon Brown's Squish is very good. His Cluster
Fit idea is elegant and clever and produces quite good quality.




There's a thread on MollyRocket
about DXTC with some convenient code from "ryg". It's nice
simple code, but it has some errors. The way he finds the indices is the dot product method which is
wrong. Also the way he computes the 1/3 and 2/3 colors for the palette using LerpRGB is wrong, it
rounds differently that the hardware really does.



One thing that some simple code gets wrong is the way to quantize from 24 bit to 565. You can't
just take the top bits, because the 565 values are not reconstructed to the middle of their bins.
You need to correct for the offsetting. You can either use the true range, which is something like -4 to 259,
or you can skew the bits in the quantizer. The "ryg" code has a perfect quantizer that's very neat
in the "As16Bit" function. An example is that the value 250 should map to 30 in 5 bits, not
31.



BTW the YCoCg-DXT5 method is pretty good on current hardware, but I'm not really looking at it. It's obviously
inherently wasteful, it's just a way of making use of the current hardware, but it's not really what you'd want
to be doing. It's also very large, at around 2.6 bits per byte (8 bits per pixel).



More advanced topics to come.

Friday, November 7, 2008

11-07-08 - Randoms



I'm really sick and kind of out of it, so this might not be totally coherent, but here goes :



I randomly found this :
Nick Chapman's renderer blog ; he's just some dude who wrote
a ray tracer and blogged about it; and now apparently it's A commercial ray tracer
called Indigo
. Anyway the blog is amusing and has lots of links to good papers and such. I've always wanted to write a physically
accurate raytracer with all-frequency light propagation. It seems pretty easy and fun.



Drew asked me about random numbers so I was searching around and found
Pixelero . Pixelero has lots of rally awesome Flash code. In particular he's
got some nice graphs of the distributions you can make by doing some simple transforms on random numbers :
here . More on
this in a second.



GoodPracticeRNG by D.T. Jones is a good short simple paper on
random number generation and talks about the basic issues for uniform random generation.



Drew asked about generating gaussian or other probability distributions of random numbers. I'm mainly interested in approximate
distributions that are tweakable and useful for random events in games. We don't really want true Gaussians that have infinite
domain, we want something that's kind of like a Gaussian over a finite range, with a hump in the middle at lower at the edges.



There are lots of methods for this in the literature, but it's mainly pretty hard core and exact stuff. I'm interested in clever little
hacky tricks. If you want some background in the serious stuff, the most awesome reference is :
Non-Uniform Random Variate Generation book by Luc Devoye .
this handbook seems to be a short version of some
of the same material. The beginning is a good read for anyone.
( new link )



I'll briefly mention if you want a true Gaussian everyone likes the polar form of the Box-Muller transform. You can look on Wikipedia for
more info about this stuff.



Transforms are a handy way to distort probability distributions. Typically you'll be generating a uniform random with a standard random
number generator, then you want to transform it to create a nonuniform distribution.



Say you want generate a random variable X with a probability distribution P(X). You want to do a random draw where the chance of each
is P(X). To do this you sum the P's to make a cumulative probability function, C(X) = P(X) + C(X - step). (or C(X) = integral P(X)).
I assume we're on the interval [0,1] so C(0) = 0 and C(1) = 1.



Now you just generate a random R = rand() in [0,1] , and you find what "bucket" it goes in, that is find X such that C(X - step) < R < C(X).



That search could be slow, but notice that if we could invert C, then it would just be a function call : X = C^-1(R).



Now, some cumulative probability functions are explicitly invertible. Luc Devoye lists a few common useful ones in his book. Others are
not invertible directly, but by adding an extra variable or munging them somehow they are invertible, such as the Gaussian with the Box-Muller
transform.



Low-order polynomials are invertible, but they're a pain. If your probability distribution is cubic, like a bezier curve, the CDF is quartic,
and to invert it you have to solve a quartic equation analytically. That's possible but ugly. For example, a simple hump probability would
be P(x) = x*(1-x) (unnormalized). The CDF is 3x^2 - 2x^3 (normalized). You can invert that directly by solving the cubic, but it's ugly.



A handle simple case is to approximate the CDF as piecewise linear. Then you can trivially invert each piecewise linear portion. So you just
look at R and select the linear region you're in and then do a simple slope linear invert thing to make X.



Another approach is to just to play directly with C^-1. C^-1 is a kind of unit interval remapper. It has the properties C^-1(0) = 0 and
C^-1(1) = 1. (for a symmetric probability distribution it also has other constraints). We should be quite familiar with unit interval
remappers and we can just play around with things here. You can tweak directly on C^-1 and see what kind of shape of probability distribution
that gives you.



For example, C^(-1)(t) = t , the identity function, is just uniform constant probability. Anywhere that C^-1 is flatter corresponds to
higher probability, anywhere it's steep corresponds to low probability.
To make something like a Gaussian that has low
probability at the edges and high probability in the middle, what you want is a symmetric function that's steep at first, then flattens out,
then is steep again. Like a "smooth step" but reflected across the 45 degree axis.



One such function is "sinh". You can look at graphs of sinh to see
what I mean. It's not ideal, I'm sure there are better choices to play with, but sinh works. (btw you might think x^3 would work; no it
does not, because it becomes completely flat at X = 0 which corresponds to infinite probability).



You can create various shapes with sinh by using different parts of it : curve = sinh( x * K ) / sinh( K ) , where K is a parameter that
affects the curve. As K -> 0 it becomes linear, as K gets larger it becomes more severe.



Another option is to use a cubic curve. For a symmetric probability distribution you also have that C^-1(0.5) = 0.5 and you only need to
design have the curve, the other half is given by reflection. You thus can design with f(0) = 0, f(0.5) = 0.5, and then set the slope you
want at 0 and 0.5 and that gives you a cubic. The slope inversely corresponds to setting the probability P(X) at those points.




The mathematics of combining random numbers is quite interesting. Maybe if I wasn't sick this would be more obvious and less
interesting, but right now it seems really cute.



A random unit float is a box filter. That is, if you plot the probability distribution it's a square step on [0,1].



Adding random numbers is equivalent to convolution of the probability distributions.



Add two random unit floats and you get a linear step filter, that's a convolution of a box with a box. If you add another
random unit, you convolve with the box again and get a quadratic step filter. Add another random you get a cubic. Note
that these are also the polynomial approximations of the Gaussian. I believe I've talked about building up the Gaussian
from polynomials like this before. In any case it's in my Filter.h in cblib.



To lerp probability distributions you random select. eg. lerp(P1,P2,t) is if ( rand() < t ) return rand1 else return rand2.



Pixelero showed some interesting stuff with multiplying randoms. Apparently addition is convolution, I'm not sure what
multiplication does exactly.



You can also create distributions by doing random selection vs. other randoms, stuff like x = rand1(); if x < rand2() return x;
By using functions on the conditional you can create lots of shapes, but I'm not sure how to understand what kind of shapes you get.
Luc Devoye's book has got some complex conditional tests like this that generate all kinds of shapes of probability distributions.



Some other good links I found :

basic Gaussian rand ,
stocc source code with lots of random generators ,
anuran automatic code gen for random generators