Monday, December 29, 2008

12-29-08 - Junk

Bill Clinton was perhaps the best Republican president this country has ever had.

Israel's overreactions to nuisance attacks from Hamas has made them a liability to US global interests.
It is no longer a tenable political position
to stand by them. This is not a question of who's right or wrong or morality, it's simply realpolitik.

Any reasonable investment these days returns far less than inflation (eg. a CD is around 3% which is not enough).
Any money you have sitting around is losing
value in real dollars. We may well be in a period similar to 1964 to 1984 where we could go 10-20 years with
investments either showing no gain or a loss in real dollars. The only good option is to spend immediately on hookers and blow
before the money loses more value.

Capitol Hill Snow Slideshow at Seattle Weekly ;
most of these pictures are from right outside our house. I love the way people go nuts here when the weather's wacky.
When it dumps snow nobody drives, everything is cancelled, and people just play. (people go nuts here in the summer
too when the days are so long and it's too hot to sleep, everyone stays up late and wanders outside). We were flying that day, but I
just wanted to cancel it and stay home and have snowball fights and drink cocoa. It's funny walking around and
seeing the cars abandoned on the steep hills where people realized they couldn't make it (why wouldn't you just
go back down to the bottom of the hill and park properly?).

I hate video games that change my resolution to go full screen. My laptop runs on a 1920x1200 external LCD. Any
res except native looks like pure ass (the exception is 1600x1200 or 800x600 in center mode, not stretched).
Furthermore, because I run over analog VGA, changing res causes the monitor to go through its autoadjust which
takes a solid 30 seconds or os. That means absolutely any resolution change pisses me off (and it has to adjust when the
game starts, then again when I quit, and when it crashes, which is like inevitable, it always leaves me in fucking 800x600
desktop mode and screws up all my icons; URG).

For my own 3d work
I've always used the method of just grabbing the desktop res and going fullscreen with that res (then let the user down-res if they want,
but never change the res unless they ask for it).
But my GPU is too slow to run most games at full 1920x1200. The way that I'd like to see more games handle this
is to let me stay in 1920x1200 but render only a portion of the screen. eg. 1600x1200 centered would be okay,
but hell you could also just let me do a 1600x1000 wide aspect centered window and that would be fine too.
Actually while I'm at it, I just wish most games and 3d apps didn't go fullscreen AT ALL without me telling them
to. The default initial startup should always be windowed mode (since I'm going to have to watch it load for
half an hour, I'd like to keep the use of my web browser thank you very much).

One of the weird things about Vegas is the way it's just a complete shambles as soon as you step off the strip.
In fact, they don't even try to hide it much on the strip, like even at the Bellagio if you look behind a bush
you'll see the walls are some kind of paper-mache plaster kind of thing and there are just gaps and dented spots
all over. If you look out the back windows there will be the heating/cooling plants, piles of construction equipment,
etc. Off the strip it really reminds me of Cancun or Mexico in general - piles of dirt, construction equipment
everywhere, tons of stuff half built that nobody seems to be working on actively.

I hate that I'm writing all this nonsense. I hate sitting at the computer all the time, it gives me no happiness. I just don't know what else to do with myself. It's miserable
outside. My shoulders are wrecked, probably forever, I can't work out and I'm fat. I feel like I'm not doing good work right now and
I'm racking my brain trying to fix my coding process. When my work suffers I get depressed since it's about the only thing that gives
me any sense of self worth.

I'm trying to avoid doing allocations in CINIT ; we've made our own allocator now just fail if you call it at cinit,
but that's not catching everything for me because of the difficulty of making sure you've replaced every bit of code
that tries to use "new" or "malloc" or whatever. One annoying thing I found : almost every STL container can be declared
static in a file and it doesn't do any initialization, the constructor just zeros out some members - but hash_map actually
does an alloc :

static std::hash_map<int,int> s_hash;

will allocate a 193 member table. That sucks. Empty containers should not do any allocations IMO since there are many
usage patterns where you might declare a container but never actually put anything in it. (note that of course in
real Oodle I don't use any STL, this was just in some test code).

Does the gmail "Report Spam" actually do anything? In any case I'm pretty happy with my gmail email pass-through.

When I see a "baby on board" sticker, it makes me really want to ram my car into the person. Is that wrong?
Also, 1980 called, they want their sticker back.

I mentioned before how I think the bilateral filter is a pretty mediocre way of denoising or super-res'ing images,
because it basically gives you a piecewise constant model. On the other hand, it's a great way to deal with things
that actually ARE piecewise constant - and lighting in graphics is pretty close to piecewise constant. (a symmetric
bilateral filter can do piecewise linear ; really what you want is piecewise quadratic). There are some good new
papers around about new realtime GI techniques.

The new Fog Creek Office is pretty nice, and as
usual Joel is right on about spending money to make programmers more productive. I disagree with some of the details,
they don't really need to spend so much on fancy design and the common areas, but stuff like the mechanical height
adjustable desks is super awesome. Of course you've got to offer a lot of perks to get people to work on FogBugz.
ZOMG that's got to be some of the more boring programming work in the universe. Also, WTF how many people does it
take to make fucking FogBugz !? We evaluated a bunch of bug trackers at OddWorld, and FogBugz was okay, but it's very
simple and very restrictive, it's not overloaded with crazy features the way the competitors are;
it literally looks like something written by 1-3 people (1 Casey or 3 normal people).
I guess it's because if you're a programmer working on FogBugz you only spend 1 hour a day actually working and you
spend the rest of the day dreaming about quitting or killing yourself.

Sunday, December 28, 2008

12-28-08 - Capitalism Reform

A lot of the problems of the corporate system can be traced back to the fact that the individual actors cannot be held responsible.
Executives basically have carte blanche to commit crimes behind the corporate veil, and the liability is subsumed by the corporation.
Corporations can be spun off with debts and legal liabilities, corporations can be dissolved and reformed, moved to other countries
or states where the laws treat them better, etc.

The basic idea of capitalism is that if individuals make decisions in their own best interest, the total value of the system is maximized.
That does not hold true when you have a government structure which allows individuals to take the profit and pass on the risk (such as
all the mortgage brokers who got commisions on loans but passed on the actual loan risk to others ; if they actually had to offer the loan
capital themselves they would have been more careful and this never would have happened).

I've been thinking about this a while, and something that occurs to me is that there's really no need for the whole idea of a "corporation",
nor is a there a need for "stock". The corporation is an artificial construct which acts like an individual but actually has many more rights
than an individual.

Who needs corporations? Instead the companies should just be owned by individuals. In the end, that individual owner is fully responsible
for the actions of the company. The company can be sold to another, but the legal liability for actions during that time sticks with the
single owner. Corporations can no longer do things like pay out dividends when they have massive debt. Sure, they can still pay their owner,
but there is no "corporate veil" - the owner is liable for all the corporation's debts, so the money is not being hidden away in a place it
can't be retreived.

Who need stocks? Stockholders (and brokers) don't have the ability to really analyze companies and pick stocks well, they're just gambling.
What do companies need stocks for? To raise cash for expansion. Well they can do that just by issuing bonds. Individuals can buy the bonds
if they have faith in the company. The company gets cash, and the individuals get a return on their investment and capital flows to where
it's needed just fine with no need for stock. The good thing about this is the company has a straightforward debt to the bondholders, and
the bondholders are offered a clear risk vs return in buying the bond.

Furthermore, bond issues should go through a rating company, and that rating company should be required to insure those bonds! We got rid of
stocks so there's no longer a problem of all the bogus stock hawkers, but you still might have a problem of bogus risk ratings on bonds. That's
easy to fix - make the bond rater take the penalty for mis-rating the bonds. Boom rating qualities will be far more accurate and more conservative.
The insurance goes like this : when you buy a bond you can choose to just buy it normally without insurance, but if you choose, the bond rater
is required to give you insurance at a fixed fee based on the rating that they gave, eg. AAA= 1% fee, Baa = 3% fee or whatever; the point is that by
assigning a rating they are basically picking how much vig they would need to profitably insure that bond.

You can still have things like "hedge funds" - the hedge fund manger just personally owns the funds, various people give him big loans, he runs
the money and pays back the profit.

Now I'd also like to get rid of the Fed entirely and get rid of the FDIC but there may be more complaints about that. Getting rid of the FDIC
is relatively easy, you just require banks to purchase insurance at market rate from various insurers instead of giving it away for basically free.
Also getting rid of corporations means the owners are personally responsible for debts if the bank defaults so they would be less likely to default.

ADDENDUM : yes I know this is not practical or realistic, but I do think it's interesting as a thought experiment. I believe most of the
reasons that people cite for why "we need stocks" are bogus.

1. "We need stocks to create enough aggregate wealth to fund big projects" ; not true, in fact even today those kinds of big capital-intensive
projects don't raise money through stocks, they raise money from hedge funds and other large private investors; it's always been that way.

2. "We need stocks to reward entrepreneurs and early employees" ; if you mean that you need stocks to randomly massive over-reward some people
and not others, then perhaps, but really this is just a big randomizer. Without stocks entrepreneurs can still get a huge payday by selling
their company, or if it's actually a profitable company, they can just keep ownership of it and make a profit with it! NO WAI. In fact, the
real benefit of stocks here is for people who create bogus non-profitable companies and somehow manage to convince others that it has good
prospects and the stock should be worth more than it is. Early employees could easily be given profit-sharing agreements or conditional bonds
which would reward them handsomely. In fact, making all these rewards more accurately tied to the real value of the company would improve
the efficiency of capital markets.

3. "Stocks provide a way for the average individual to invest and grow their wealth" ; Individual selective stock ownership has been pretty
widely shown to be a bad thing. People don't have the time or skill to invest wisely, and so are best off just investing in large funds.
You could just as easily invest in large funds without stock ownership.

In fact, so far as I can tell, the biggest benefit of stocks and public corporations is in fact the exact thing I'm trying to eliminate -
that is, responsibility for actions. If the individual running the company really had to take responsibility, they would be far less willing
to take risks. Lots of exciting companies might never have happened because they were too risky. The ability to create a diffuse corporate
veil is in fact very beneficial in some ways, because some amount of illogical risk is actually good for the economy. (it's an old saying
that if people really understood how hard and risky it was to start a company, nobody would ever do it).

Let me illustrate foolish financial risk taking through an analogy to gambling. (Gambling with an edge is just "investing").

Consider a simple thought experiment. You are given the opportunity to bet on a coin flip where you an edge (in this example you win 53%
of the time). You start with 1.0 monies. You get to bet a fixed fraction of your bankroll on each flip. eg. you could bet 1/4 of your
bankroll on every flip, or 1/5 on each flip, but you aren't allowed to change the fraction. Note that as long as you pick a fraction < 100%
you can never go broke. You must always bet all 100 flips.

What is the best fraction to bet ? Well, if you only care about the *average* return, the best fraction is 100%. That should be obvious if
you think about it a second. Every extra dollar that you can put in this bet means more profit on average, so you should put as much as possible.
But at the same time, if you bet 100% you almost always go broke. In fact only once in 2^100 do you make any profit at all.

It's a little unclear exactly what metric to use to decide which strategy is best, so lets go ahead and look some numbers :

winPercent : 53% , num bets = 100, num trials = 262144

fraction average sdev profit% median
1/ 2 19.22 1051.56 1.69% 0.00
1/ 3 7.24 422.89 13.48% 0.02
1/ 4 4.43 58.10 24.17% 0.18
1/ 5 3.30 20.66 30.92% 0.44
1/ 6 2.70 10.43 38.30% 0.67
1/ 7 2.35 5.79 46.10% 0.85
1/ 8 2.11 3.94 46.03% 0.97
1/10 1.82 2.30 54.18% 1.10
1/12 1.64 1.61 53.96% 1.17
1/14 1.53 1.24 61.69% 1.19
1/16 1.45 0.99 61.75% 1.20
1/19 1.37 0.77 61.94% 1.19
1/22 1.31 0.62 61.76% 1.18
1/25 1.27 0.53 62.10% 1.17
1/29 1.23 0.43 69.36% 1.16
1/33 1.20 0.37 69.22% 1.15
1/38 1.17 0.31 69.21% 1.13
1/43 1.15 0.27 69.40% 1.12
1/49 1.13 0.23 69.25% 1.11

fraction = fraction of bankroll placed on each bet
average = average final bankroll
sdev = standard deviation
profit = %% of people in the trial who had any profit at all
median = median average bankroll

(note the sdev for the 1/2 and 1/3 bet cases is very inaccurate ; I foolishly did a monte carlo simulation rather than
just directly counting it which actually is easy to do with the binomial theorem; the average return is exact)

It's easy to see in the table that the average profit is maximize for very large (risky) bets. But if you look at the profit% to see what
portion of the population is benefiting, you see that in the risky bet cases only a tiny part of the population is benefiting and the
high average is just because a very few people got very lucky.

It seems to me just intuitively that maximizing the median seems to produce a pretty solid strategy. For a win percent of 53% that corresponds
to a bet fraction of 1/16. Another sort of reasonable approach might be to choose the point where > 50% of the population shows a profit,
which would be at 1/10. (these both also occur right around the area where the sdev becomes smaller than the mean, which might be another
good heuristic for picking a strategy). Using much smaller fractions doesn't really increase the profit% very much, while using larger fractions very
rapidly increases the variance and descreases the profit%.

In any case I believe the analogy to finance is amusing. In a simple experiment like this it's trivial to see that when everyone is taking
too much risk, it might be very good for the *whole*, but it's bad for almost every individual. It can be very profitable for a tiny portion
of the population. It's also trivial to see here that any measure of the sum of the population (such as the GDP) is pretty useless when
really measuring how well your strategy for the economy is working.

ASIDE: I used the fast median code from here . There are basically
two sane fast ways to find a median. In most cases the fastest is the histogram method, which is basically just a radix sort,
assuming you have 32 bit values or something reasonable like that.
Basically you do a radix sort, but rather than read the values back
out to a sorted array, you just step through the histogram until you've seen half the values and that's your median. If you can't do
a radix for some reason, the other sane way is basically like quicksort. You pick a pivot and shuffle values, and then rather than
descending to both sides you only need to descend to one side. This is actually still O(N) , it's not N log N, because N + N/2 + N/4 ... = 2*N
which is of course O(N). Both of these methods can find the Kth largest element, the median is just a special case.

Friday, December 26, 2008

12-26-08 - In Defense of Less Typing

In the C++ rant we got a bit into the discussion of "that just reduces typing". It's become a common wisdom
these days that "anything that just reduces typing is not of significant importance in programming". This is
sort of a reaction to the bad old days where developers put too much emphasis on reducing the time it took to
make a first draft of the code. Now people like to repeat the plathitudes that "first draft is only 10% of
dev time, debuggability and readability and maintainability are what really matter".

Yes, yes, that is all true, but I think people miss the forest for the trees here in some cases.

Every character you type is a chance for a bug or a mistake. For one thing there are typos, but there are
also just brain farts. The less you type when writing a piece of code, the more likely it is to be correct.

(I should emphasize the fact that reducing code duplication is very good for many reasons that I don't go into detail
much in this rant, and those are mainly the main reason to merge duplicate code; I'm talking about cases where the
code is not exactly duplicated, but is similar, or where you have a choice between making a very simple API which requires
a lot of typing by the client, or a more complex API which has very simple usage for the client).

A lot of good programmers now are adopting the idea of exposing simple minimal C-style APIs that leave usage up to
the client. There are a lot of things to like about that (for example, Sean's stb.h style thing for simple functionality is
in fact wonderfully easy to integrate and steal snippets from),
but there are also bad things about it. I think good programmers overestimate their ability to write simple
usage code without mistakes. For example, you might think that you don't need a class to encapsulate a 32-bit Color, you
can easily just use a 4-byte array or pass around a dword and do the shifts by hand - but I have seen bug after bug
from small mistakes in that kind of code, because if you write the same thing over and over, or copy-paste and try to change
a bunch of code by hand, there is some small chance of mistake each time you do it.

It's funny to me that good programmers in game dev are going in two directions at the same time. One direction is to make code
very easy to follow by simple visual inspection. Many major people in game dev are pretty high on this these days.
The basic idea is to use C-style imperative programming, make the action of each line obvious to simple visual inspection, reduce
segmentation of code into function calls and prefer simple long linear code blocks (rather than factored out functions, conditionals,
objects, etc).
There are certainly merits to that. The other
direction people are going is custom metaprogramming and local language redefinition. Many of these people want coding languages where you can locally redefine the rules of the language and do custom markup for things like
network mirroring, thread fiber spawning, local-global state memory variable differentiation, etc. This kind of stuff would make
code completely impossible to understand by simple visual inspection without intimate undestanding of how all the extra meta-language
rules work. These ideas also have a lot of merit, because writing micro-threaded massively parallel code in plain C-style C++ is
really difficult and ugly, and having custom features would make it much better - but these two directions are totally conflicting.

While I'm ranting about opposite directions, let me also rant about the idea that something is a good idea for "library design" but
not for within your app (or vice versa). IMO Coding = Library Design. Most of the complex code that I write is in "libraries" for
myself. Libraries are just chunks of functionality that you want to expose in a compact interface. Well, that's what you should be
doing all the time. Coding is just writing a bunch of libraries, then the "app" is just tying together the "libraries".

So, for example, Casey's excellent talk about good library design (things like exposing multiple levels of interface from very simple to
nitty gritty, and not forcing a usage pattern on the client) are just good ways to write code *always*.

I don't trust the me of one year ago, nor do I trust the me of one year in the future. I need to write API's
for myself that make me write the right code. Part of that is all the things I've often written about before
(self-validating API's, API's that are impossible to use wrong), but part of it is just plain less typing.
If the API makes me (the client) write a whole bunch of code to do the simple things I often want to do - that
makes it far more likely I will do it wrong.

Also I believe the illusion of choice is a horrible thing. If there's really only one or two reasonable ways to use a system, then just
expose that to me. Don't give me what looks like a bunch of flexible components, but they only really work right if you do one specific

Addendum : okay I'm bored of this topic and I'm sure you are too, but I feel like I started it so I should wrap it up a bit more.

Paul Graham has this thing "Succinctness is Power" that's sort of similar to this
rant. As usual he writes it well, but I think he's a little bit wrong. The issue that I believe is important, which is what I'm trying
to talk about here is :

Reducing the number of choices that the programmer has to make in order to write correct code.

Part of that is reducing typing - but the crucial thing is reducing typing when there is actual choice in the alternatives. That is, if
it's something you *have* to type, that's not bad. For example a very verbose language like COBOL is not inherently bad due to its verbosity
(cobol is horrible for other reasons).

Making code that works correctly with minimal typing (and makes compile errors if you use it wrong) is the goal. So part of what I'm getting
at here is using short common words that it's easy for the programmer to get right, using highly constrained classes instead of very general ones, etc.

Part of the credo goes like this :

remove the option to do things that you never want to do

make it hard to do things that you rarely want to do

make it easy to do the right thing

As an example - iterators are cool even when they save almost no work. Say for example you have something like a doubly linked list class.
Many of the simple C guys would say "hey you can just write code to traverse the linked list", and you write client code like :

for(Node * n = list.head; n != list.head; n = n->next)

That's easy right, you're a good programmer, you can just write that loop. No. You can't, that's what I'm trying to say with this rant.
I mean, yes, you can, but you had a 1% chance of introducing a bug because you had to write code.

Writing code is bad.

Instead you could make your Node look like an iterator. You could give it standard names like begin() and end(). Now instead of writing
code you can just use for_each , or even just copy-paste or spit out a loop that you've done a million times :

for(Node::iterator it = list.begin(); it != list.end(); ++it)

Is safer because it's standard. On a similar note, using a constrained object like an iterator is safer than using an int, because every
time you use an int people get tempted to do evil things. How many bugs have I seen because people try to get clever with their loop
iteration? Maybe they count backwards for efficiency and use and unsigned type by mistake. Or they pull the ++i out of the for() and then
forget to do it due to a continue. Or they use the "i" outside of the for loop and bugs get introduced.

Lots of people are anti-OOP these days. I love OOP ; no, not deep inheritance trees and lots of data inheritance, and whatnot, but the basic
idea of coding in terms of objects that impose constraints and conditions on their use. The best way for me to program is to build components
and helpers which make expressing the program easy.

Tuesday, December 16, 2008

12-16-08 - Libraries and Cinit

I need a kind of mini class-factory for Oodle. This is for when I load a paging bundle that's full of various resources,
I need to make each one. (BTW there will be a "low level" usage pattern for Oodle where you just load a paging bundle and
the game gets a handle to the bundle and does whatever it wants with it. This is for the "high level" layer that automates
a lot of things for you but is optional).

I guess what I'm going to have to do is require the game to give me a creator function pointer that knows how to make all
the objects. eg. it would give me something like

void * (*fp_factory) (int objectType);

and fp_factory would point to something like

void * GameObjectFactory(int type)
case OBJECT_PLAYER : return (void *) new Player;
case OBJECT_ENEMY: ...

Or something. As a game developer I hate that kind of global registry, because when you add a new object type you have
to remember to go update this global registry file, which becomes a frequent merge disaster for source control, etc. I really
like having everything you need to make a game object in a single CPP file. That means objects should be self-registering.

The way I usually do self-registering objects is with a little class that runs at cinit. Something like :

#define IMPLEMENT_FACTORY_PRODUCT(classname) namespace { ClassFactory::Registrar classname##registrar( classname::Factory , typeid(classname) ); }

then in Player.cpp you have :


That's all fine and dandy, but it's not so cool for me as a library maker.

For one thing, doing work during CINIT is kind of naughty as a library. I've been trying to make it a rule for myself that
I don't use object constructors to do cinit work the way I sometimes like to do. It's a perfectly normal thing to do in C++,
and if you are writing C++ code you should make all your systems instantiate-on-use like proper singletons so that CINIT
objects work - BUT as a library maker I can't require the clients to have done all that right. In particular if I do something
like allocations during CINIT, they might run before the client does its startup code to install custom allocators or whatever.

For another thing, there are problems with the linker and cinit in libraries. The linker can drop objects even though they are
doing cinit calls that register them in global factory databases. There are various tricks to prevent this, but they are
platform dependent and it is a little evil bit of spaghetti to get the client involved in.

I guess I probably also shouldn't rely on "typeid" or "dynamic_cast" or any other runtime type information existing either since
people like to turn that off in the compiler options for no good reason (it has basically zero cost if you don't use it). So without
that stuff I pretty much just have to rely on the game to give me type info manually anyway.

Bleh, I'm just rambling now...

Monday, December 15, 2008

12-15-08 - Denoising

I've been playing around with denoising images a tiny bit. There's a ton of papers on this and I've barely only
scratched the surface, but it strikes me that the majority of the researches seem to be doing silly things that
are ill-conceived.

Almost all of them work in the same basic way. They create a prediction of what the current pixel should be
with a local smooth predictor, let's call that 'S'. Then they take the difference from the actual pixel value 'A'.
If the difference is greater than a certain threshold, |S - A| > T , they replace the pixel value with 'S'.

That's just very wrong. It assumes that images can be modeled with a single-lobe Gaussian probability distribution.
In fact, images are better modeled as a blend of several lobes with different means. That is, there is not one single
smooth generator, but many, which are switched or blended between based on some state.

Any single-lobe predictor will incorrectly identify some valid image detail as noise.

I like to make it clear that the problem has two parts : deciding if a pixel is noise or not noise, and then filling in
a replacement value if you decide that the pixel is noise.

My feeling is that the second part is actually not the important or difficult part. Something like a median filter or
a bilateral filter is probably an okay way to fill in a value once you decide that a pixel is noise. But the first
part is tricky and as I've said any simple weighted linear predictor is no good.

Now, ideally we would have a rich model for the statistical generation of images. But I wrote about that before when
talking about Image Doubling (aka Super-Resolution), and we're still very far from that.

In the mean time, the best thing we have at the moment, I believe, is the heuristic modes of something like CALIC, or
the Augural Image Zooming paper, or Pyramid Workshop or TMW. Basically these methods have 6 - 100 simple models of
local image neighborhoods. For example the basic CALIC models are : {locally smooth}, {vertical edge}, {horizontal edge},
{45 degree edge}, {135 degree edge}, {local digital}, {pattern/texture}. The neighborhood is first classified to one
of the heuristic models, and then a prediction is made using that model.

We can thus propose a simple heuristic noise detection algorithm :

Bayesian Noise Detection :

N = current local neighborhood
A = actual pixel value

P(M|N) = probability of model M given neighborhood N
P(A|M) = probability of pixel A was generated by model M


P(A|N) = argmax{M} P(A|M) * P(M|N)

then classify A as noise if

P(A|N) < T

for some threshold T

(I don't specify how the P's are normalized because it just changes the scaling of T,
but they should be normalized in the same way for the whole image)

Note that a very crucial thing is that we are using the argmax on models, NOT the average on models. What we're saying is
that if *any* of our heuristic local models had a high likelihood of generating the value A, then we do not consider it noise.
The only values that are noise are ones which are unlikely under *all* models.

In a totally hand-wavey heuristic way, this is just saying that if a pixel is within threshold of being a locally smooth
value, or an edge value, or a texture, etc. then it's not noise. If it fits none of those models within threshold, it's

I started by looking at the Median Filter and the Bilateral Filter. There have been some cool recent papers on fast
Median Filtering :

Constant Time Median Filter

Weiss Log(r) Median Filter

Fast Bilateral Filter ; Sylvain Paris and Fr�do Durand +
good references

Siggraph Course on Bilateral Filtering

Those are all very worth reading even though I don't think it's actually super useful. The fast median filter approaches use
cool ways of turning an operation over a sliding window into incremental operations that only process values getting added in and removed
as you step the window. Median filter is a weird thing that works surprisingly well actually, but it does create a weird sort of Nagel-drawing
type of look, with nothing but smooth gradients and hard edges. In fact it's a pretty good toon-shading process.

BTW the fast median filters seem useless for denoising, since they really only matter for large r (radius of the filter), and for denoising you
really only want something like a 5x5 window, at which size a plain brute force median is faster.

Bilateral filter actually sort of magically does some of the heuristic cases that I've talked about it. Basically it makes a prediction using
a filter weighted by distance and also value difference. So similar values contribute and disparate values don't. That actually does a sort of
"feature selection". That is, if your pixel value is close to other pixel values in a vertical edge, then the bilateral filter will weight
strongly on those other similar pixel values and ignore the other off-edge pixels. That's pretty good, and the results are in fact decent,
but if you think about it more you see the bilateral filter is just a ghetto approximation of what you really want. Weighting based on pixel
value difference is not the right way to blend models, it makes no distinction about the context of that value difference - eg. it doesn't
care if the value difference comes from a smooth gradient or a sudden step. As others have noted, the Bilateral Filter makes the
image converge towards piecewise-constant, which is obviously wrong. Getting towards piecewise linear would be better, piecewise bicubic
would be better still - but even that is just the very easiest beginning of what the heuristic estimator can do.

NL Means is a denoising algorithm which is a bit closer to
the right idea; he's definitely aware of the issues. However, the actual NL means method is poor. It relies on closely
matching neighborhoods to form good predictions, which anyone who's worked in image compression or super-resolution knows
is not a good approach. The problem is there are simply too many possible values in reasonably sized neighborhoods. That
is, even for a moderately sized neighborhood like 5x5, you have 2^8^25 possible values = 2^200. No matter how much you train,
the space is too sparse. It may seem from the NL Means formulation that you're weighting in various neighbors, but in reality
in practice you only find a few neighbors that are reasonably close and they get almost all of the weight, and they might not
even be very close. It's like doing K-means with 2^200 possible values - not good.

There's a lot of work on Wavelet Denoising which I haven't really read. There are some obvious appealing aspects of that. With wavelets
you can almost turn an image into a sum of {smooth}+{edges}+{texture}+{noise} and then just kill the noise. But I don't really like the
idea of working in wavelet domain, because you wind up affecting a lot of pixels. Most of the noise I care about comes from cameras,
which means the noise is in fact isolated to 1 or 2 adjacent pixels. I also don't like all the research papers that want to use 9x9 or
larger local windows. Real images are very local, their statistics change very fast, and pulling in shit from a 9x9 window is just going
to mess them up. IMO a 5x5 window is the reasonable size for typical image resolutions of today.

BTW one thing I've noticed with my camera noise images is that the fucking camera JPEG makes the noise so much harder to remove. The
noise looks like it's originally just true single pixel noise, but when it goes through the camera JPEG, that single-pixel peak is
really unfriendly to the DCT, so it gets smeared out, and you wind up having noise lumps that look like little DCT shapes. To specifically
denoise photos that have gone through JPEG you probably have to work on 8x8 blocks and work directly on the DCT coefficients.
(also the Bayer pattern demosaic obviously spreads noise as well; ideally you'd get to work on the raw taps before jpeg, before
the camera denoise, and before the Bayer demosaic).

ADDENDUM : a lot of the denoise people seem to be trying to perfect the Playboy Centerfold algorithm, that makes photos look
extremely smoothed and airbrushed. Often if you're not sure a pixel is noise it's best to leave it alone. Also, all the methods
that use a pixel-magnitude threshold value for noise are wrong. The threshold for noise needs to be context sensitive. That is,
in smooth parts of the image, you might be able to say that a pixel is probably noise when it's off from expectation by only 1 or 2
pixel values. In chaotic textures parts of the image, a pixel might be off by 20 values or more and you still might not be able to
say it's noise. The correct parameter to expose to the user is a *confidence*. That is, I want to do something like replace all
pixels which the algorithm is >= 90% confident it can improve.

Another problem I've seen with the majority of the denoisers is that they create structures from noise. If you run them on just
staticy junk, they will form local flat junks, or linear bits or weird feathery patterns. This is because even in random noise
there will be little bits that have similar values so they become seeds to create structures. This is very bad, the weird structures
that are formed by this "denoising" are much worse than the original static, which is pretty inoffensive to the eye.

Marc sent me the link to GREYCstoration a free denoiser based on
the image manifold PDE research. I don't like that this technique is becoming known as "PDE" - PDE just means partial differential equation;
in this case it's a diffusion equation, in particular a variant of anisotropic diffusion with different diffusion rates based on
the local curvature - eg. it diffuses across smooth areas and not across edges. (that's actually an old basic technique,
the new thing he does is the diffusion follows contour lines (but is actually 2d, just weighted) and works on all components).
It looks pretty decent. It's actually more
exciting to me for super-resolution, it looks like it does a pretty good job of image super-resolution.

Monday, December 8, 2008

12-08-08 - DXTC Summary

I thought I should fix some things that were wrong or badly said in my original DXTC posts :

DXTC Part 1

DXTC Part 2

DXTC Part 3

DXTC Part 4

On the "ryg" coder : there was a bug/typo in the implementation I was using which gave bogus results, so you should
just ignore the numbers in those tables. See for correction :

Molly Rocket Forum on Ryg DXTC coder
. Also I should note he does have some neat code in there. The color finder is indeed
very fast; it is an approximation (not 100% right) but the quality is very close to perfect. Also his "RefineBlock" which does
the Simon Brown style optimize-end-from-indices is a very clever fast implementation that collapses a lot of work. I like the
way he uses the portions of one 32 bit word to accumulate three counters at a time.

I also mentioned in those posts that the optimized version of the Id "real time DXTC" bit math was "wrong". Well, yes, it
is wrong in the sense that it doesn't give you exactly the same indeces, but apparently that was an intentional approximation
by JMP, and in fact it's a very good one. While it does pick different indeces than the exact method, it only does so in cases
where the error is zero or close to zero. On most real images the actual measured error of this approximation is exactly zero,
and it is faster.

So, here are some numbers on a hard test set for different index finders :

exact : err: 31.466375 , clocks: 1422.256522

id : err: 31.466377 , clocks: 1290.232239
diff: 0.000002

ryg : err: 31.466939 , clocks: 723.051241
diff: 0.000564

ryg-t : err: 31.466939 , clocks: 645.445860
diff: 0.000564

You can see the errors for all of them are very small. "ryg-t" is a new one I made which uses a table to turn the dot
product checks into indexes, so that I can eliminate the branches. Start with the "ryg" dot product code and change the inner loop to :

const int remap[8] = { 0 << 30,2 << 30,0 << 30,2 << 30,3 << 30,3 << 30,1 << 30,1 << 30 };

for(int i=0;i < 16;i++)
int dot = block.colors[i].r*dirr + block.colors[i].g*dirg + block.colors[i].b*dirb;

int bits =( (dot < halfPoint) ? 4 : 0 )
| ( (dot < c0Point) ? 2 : 0 )
| ( (dot < c3Point) ? 1 : 0 );

mask >>= 2;
mask |= remap[bits];

I should note that these speed numbers are for pure C obvious implementations and if you really cared about speed you should
use SSE and who knows what would win there.

Now this last bit is a little half baked but I thought I'd toss it up. It's a little bit magical to me that Ryg's Mul8Bit (which is
actually Jim Blinn's) also happens to produce perfect quantization to 565 *including the MSB shifting into LSB reproduction*.

I mentioned before that the MSB shifting into LSB thing is actually "wrong" in that it would hurt RMSE on purely random data,
because it is making poor use of the quantization bins. That is, for random data, to quantize [0,255] -> 32 values (5 bits)
you should have quantization bins that each hold 8 values, and whose reconstruction is at the middle of each bin. That is,
you should reconstruct to {4,12,20,...} Instead we reconstruct to {0,8,...247,255} - the two buckets at the edges only get
4 values, and there are some other ones that get 9 values. Now in practice this is a good thing because your original data
is *not* random - it's far more likely to have exact 0 and 255 values in the input, so you want to reproduce those exactly.
So anyway, it's not a uniform quantizer on the range [0,255]. In fact, it's closer to a uniform quantizer on the range

I think it might actually just be a numerical coincidence in the range [0,255].

The correct straight-forward quantizer for the DXTC style colors is

return (32*(val+4))/(256+8);

for R. Each quantization bin gets 8 values except the top and bottom which only get 4. That's equivalent to quantizing the range [-4,256+4) with a uniform 8-bin quantizer.


1/(256 + 8) = 1/256 * 1/(1 + 8/256)

We can do the Taylor series expansion of 1/(1+x) for small x on the second term and we get ( 1 - 8/256 + 64/256/256) up to powers of x^2

So we have

( (32*val+128) * ( 1 - 8/256 + 64/256/256) ) >> 8

And if we do a bunch of reduction we get

return ( 31*val+124 + ((8*val+32)>>8) ) >> 8

If we compare this to Mul8bit :

return ( 31*val+128 + ((val*31 + 128)>>8)) >> 8;

it's not exactly the same math, but they are the same for all val in [0,255]

But I dunno. BTW another way to think about all this is to imagine your pixels are an 8.inf fixed point rep of an original floating
point pixel, and you should replicate the 8 bit value continuously. So 0 = 0, 255 = FF.FFFFFF.... = 1.0 exactly , 127 = 7F.7F7F7F..

BTW this reminds me :
When I do Bmp -> FloatImage conversions I used to do (int + 0.5)/256 , that is 0 -> 0.5/256 , 255 -> 255.5/256 .
I no longer do that, I do 0->0, and 255 -> 1.0 , with a 1/255 quantizer.

Friday, December 5, 2008

12-05-08 - lrotl

Well I found one x86 ASM widget. I've always known you could do nice fast barrel rotations on x86 but
thought they were inaccessible from C. Huzzah! Stdlib has a function "_lrotl()" which is exactly what you
want, and happily it is one of the magic functions the MSVC recognizes in their compiler and turns into
assembly with all goodness. (They also have custom handling for strcmp, memcpy, etc.)

Oh, I noticed lrotl in OpenSSL which seems to have a ton of good
code for different hashes/checksums/digests/whatever-the-fuck-you-call-them's.

As a test I tried it on Sean's hash, which is quite good and fast for C strings :

RADINLINE U32 stb_hash(const char *str)
U32 hash = 0;
while (*str)
hash = (hash << 7) + (hash >> 25) + *str++;
return hash;

RADINLINE U32 stb_hash_rot(const char *str)
U32 hash = 0;
while (*str)
hash = _lrotl(hash,7) + *str++;
return hash;

stb_hash : 6.43 clocks per char
stb_hash_rot : 3.24 clocks per char

Woot! Then I also remembered something neat I saw today at
Paul Hsieh's Assembly Lab .
A quick check for whether a 32 bit word has any null byte in it :

#define has_nullbyte(x) (((x) - 0x01010101) & ( ~(x) & 0x80808080 ) )

Which can of course be used to make an unrolled stb_hash :

RADINLINE U32 stb_hash_rot_dw(const char *str)
U32 hash = 0;

while ( ! has_nullbyte( *((U32 *)str) ) )
hash = _lrotl(hash,7) + str[0];
hash = _lrotl(hash,7) + str[1];
hash = _lrotl(hash,7) + str[2];
hash = _lrotl(hash,7) + str[3];
str += 4;
while (*str)
hash = _lrotl(hash,7) + *str++;
return hash;

stb_hash_rot_dw : 2.50 clocks

So anyway, I'm getting distracted by pointless nonsense, but it's nice to know lrotl works.
(and yes, yes, you could be faster still by switching the hash algorithm to something that works
directly on dwords)

Tuesday, November 18, 2008

11-18-08 - DXTC

I've been doing a little work on DXT1 compression. The main reason I'm looking at it is because I want to do something
*better* than DXT1, and before I do that, I want to make sure I'm doing the DXTC right.

For now let's gather prior art :

The only real decision the coder gets to make is what the two 565 endpoints are. The other details you just have to get right -
reconstruct the palette interpolants right, and assign the 16 indices to the closest palette entries right. For the moment I
don't care about speed, I just want to make sure the indices are actually the best choices.

All the papers by Ignacio and
J.M.P. van Waveren are modern classics. One thing of note : the "simplified" index finder that JMP has in his original paper
is wrong. On page 12 he says "Evaluating the above expression reveals that the sub expression ( !b3 & b4 ) can be
omitted because it does not significantly contribute to the final result" - apparently that is not true, because when I use
the "rewritten" index finder it produces bad indexes. His original version of the index finder bit twiddle is fine.

One thing I've found with the index finding is that the integer and nonlinear effects are pretty strong and any ideas you have
about planar geometry fail if you assume that your 4 points are colinear (they are not actually).
For example, an obvious idea is to do a dot product test to pick between the 4 points. First dot
product with the middle separating plane, then the two planes between the next pairs of points. This doesn't work. Part of
the problem is because of the bad rounding in the generation of the 1/3 point. The JMP method is actually comparing vs the
true distances, so that part is good.

In more detail, an obvious idea is to do something like this :

int dot = (color - c0) * (c1 - c0);

int bmid = (dot > mid01);
int b02 = (dot > mid02);
int b31 = (dot > mid31);

// using the DXT1 palette order label 0,2,3,1

int threebit = (bmid<<2) + (b02<<1) + b31;

int index = remap[threebit];

// remap is an 8-entry table that remaps from 3 bits to 2 bits to give an index

That "works" just as much as the "ryg" code works or the "Extreme DXT" method of finding indexes work - which is to say that
it doesn't work.

FastDXT appears to be an implementation of the id "Real Time DXT"

Ericsson Texture Compression
is similar to DXTC but different; this is a pretty good paper, there are some interesting ideas here like the T and H
blocks. It gets slightly better quality in some cases. It's obvious that you can beat DXTC by having a base color and a
log-scaled delta color, rather than two end points. The two 565 end points is wasteful; you could for example do a 777
base color and a 444 log scaled delta.

Tight Frame Normal Map Compression is similar to
DXT5 (really to 3Dc) with some small improvement on normal maps.

Extreme DXT Compression by Peter Uliciansky is indeed
super fast. However, it has some errors. The division method that he uses to assign the indeces is not
correct. You can improve it by offsetting the end points correctly to put the division quantization boundaries in the
right place, but even then it's still not exactly correct unless the diagonal of the color bbox is along {1,1,1}. The
error from this is pretty small and he's going for max speed, so it's semi reasonable what he does.
Note that using the full range of the bbox for the division
but insetting it to make the dxt1 colors (as he does) improves the indeces slightly, but it's not actually the correct range scale.

In more detail :

To find indices for the DXTC colors c0 and c1

You should use

base = c0 + (c0 - c1)/6

range = (c1 - c0) * 4/3

index = (color - base) * 4 / range;


index = (color - base) * 3 / (c1 - c0);

not :

index = (color - c0) * 4 / (c1 - c0);

But even if you do that it's still wrong because the planes through color space do not actually separate the colors from
their closest palette entry.

MSDN compressed format documentation now has the
right rounding for the 1/3 points, but the old DirectX8 docs have the wrong rounding. The old docs said

color_2 = (2*color_0 + 1*color_1 + 1)/3
color_3 = (1*color_0 + 2*color_1 + 1)/3

Which is actually better, but is not what the standard or the hardware does.

Simon Brown's Squish is very good. His Cluster
Fit idea is elegant and clever and produces quite good quality.

There's a thread on MollyRocket
about DXTC with some convenient code from "ryg". It's nice
simple code, but it has some errors. The way he finds the indices is the dot product method which is
wrong. Also the way he computes the 1/3 and 2/3 colors for the palette using LerpRGB is wrong, it
rounds differently that the hardware really does.

One thing that some simple code gets wrong is the way to quantize from 24 bit to 565. You can't
just take the top bits, because the 565 values are not reconstructed to the middle of their bins.
You need to correct for the offsetting. You can either use the true range, which is something like -4 to 259,
or you can skew the bits in the quantizer. The "ryg" code has a perfect quantizer that's very neat
in the "As16Bit" function. An example is that the value 250 should map to 30 in 5 bits, not

BTW the YCoCg-DXT5 method is pretty good on current hardware, but I'm not really looking at it. It's obviously
inherently wasteful, it's just a way of making use of the current hardware, but it's not really what you'd want
to be doing. It's also very large, at around 2.6 bits per byte (8 bits per pixel).

More advanced topics to come.

Friday, November 7, 2008

11-07-08 - Randoms

I'm really sick and kind of out of it, so this might not be totally coherent, but here goes :

I randomly found this :
Nick Chapman's renderer blog ; he's just some dude who wrote
a ray tracer and blogged about it; and now apparently it's A commercial ray tracer
called Indigo
. Anyway the blog is amusing and has lots of links to good papers and such. I've always wanted to write a physically
accurate raytracer with all-frequency light propagation. It seems pretty easy and fun.

Drew asked me about random numbers so I was searching around and found
Pixelero . Pixelero has lots of rally awesome Flash code. In particular he's
got some nice graphs of the distributions you can make by doing some simple transforms on random numbers :
here . More on
this in a second.

GoodPracticeRNG by D.T. Jones is a good short simple paper on
random number generation and talks about the basic issues for uniform random generation.

Drew asked about generating gaussian or other probability distributions of random numbers. I'm mainly interested in approximate
distributions that are tweakable and useful for random events in games. We don't really want true Gaussians that have infinite
domain, we want something that's kind of like a Gaussian over a finite range, with a hump in the middle at lower at the edges.

There are lots of methods for this in the literature, but it's mainly pretty hard core and exact stuff. I'm interested in clever little
hacky tricks. If you want some background in the serious stuff, the most awesome reference is :
Non-Uniform Random Variate Generation book by Luc Devoye .
this handbook seems to be a short version of some
of the same material. The beginning is a good read for anyone.
( new link )

I'll briefly mention if you want a true Gaussian everyone likes the polar form of the Box-Muller transform. You can look on Wikipedia for
more info about this stuff.

Transforms are a handy way to distort probability distributions. Typically you'll be generating a uniform random with a standard random
number generator, then you want to transform it to create a nonuniform distribution.

Say you want generate a random variable X with a probability distribution P(X). You want to do a random draw where the chance of each
is P(X). To do this you sum the P's to make a cumulative probability function, C(X) = P(X) + C(X - step). (or C(X) = integral P(X)).
I assume we're on the interval [0,1] so C(0) = 0 and C(1) = 1.

Now you just generate a random R = rand() in [0,1] , and you find what "bucket" it goes in, that is find X such that C(X - step) < R < C(X).

That search could be slow, but notice that if we could invert C, then it would just be a function call : X = C^-1(R).

Now, some cumulative probability functions are explicitly invertible. Luc Devoye lists a few common useful ones in his book. Others are
not invertible directly, but by adding an extra variable or munging them somehow they are invertible, such as the Gaussian with the Box-Muller

Low-order polynomials are invertible, but they're a pain. If your probability distribution is cubic, like a bezier curve, the CDF is quartic,
and to invert it you have to solve a quartic equation analytically. That's possible but ugly. For example, a simple hump probability would
be P(x) = x*(1-x) (unnormalized). The CDF is 3x^2 - 2x^3 (normalized). You can invert that directly by solving the cubic, but it's ugly.

A handle simple case is to approximate the CDF as piecewise linear. Then you can trivially invert each piecewise linear portion. So you just
look at R and select the linear region you're in and then do a simple slope linear invert thing to make X.

Another approach is to just to play directly with C^-1. C^-1 is a kind of unit interval remapper. It has the properties C^-1(0) = 0 and
C^-1(1) = 1. (for a symmetric probability distribution it also has other constraints). We should be quite familiar with unit interval
remappers and we can just play around with things here. You can tweak directly on C^-1 and see what kind of shape of probability distribution
that gives you.

For example, C^(-1)(t) = t , the identity function, is just uniform constant probability. Anywhere that C^-1 is flatter corresponds to
higher probability, anywhere it's steep corresponds to low probability.
To make something like a Gaussian that has low
probability at the edges and high probability in the middle, what you want is a symmetric function that's steep at first, then flattens out,
then is steep again. Like a "smooth step" but reflected across the 45 degree axis.

One such function is "sinh". You can look at graphs of sinh to see
what I mean. It's not ideal, I'm sure there are better choices to play with, but sinh works. (btw you might think x^3 would work; no it
does not, because it becomes completely flat at X = 0 which corresponds to infinite probability).

You can create various shapes with sinh by using different parts of it : curve = sinh( x * K ) / sinh( K ) , where K is a parameter that
affects the curve. As K -> 0 it becomes linear, as K gets larger it becomes more severe.

Another option is to use a cubic curve. For a symmetric probability distribution you also have that C^-1(0.5) = 0.5 and you only need to
design have the curve, the other half is given by reflection. You thus can design with f(0) = 0, f(0.5) = 0.5, and then set the slope you
want at 0 and 0.5 and that gives you a cubic. The slope inversely corresponds to setting the probability P(X) at those points.

The mathematics of combining random numbers is quite interesting. Maybe if I wasn't sick this would be more obvious and less
interesting, but right now it seems really cute.

A random unit float is a box filter. That is, if you plot the probability distribution it's a square step on [0,1].

Adding random numbers is equivalent to convolution of the probability distributions.

Add two random unit floats and you get a linear step filter, that's a convolution of a box with a box. If you add another
random unit, you convolve with the box again and get a quadratic step filter. Add another random you get a cubic. Note
that these are also the polynomial approximations of the Gaussian. I believe I've talked about building up the Gaussian
from polynomials like this before. In any case it's in my Filter.h in cblib.

To lerp probability distributions you random select. eg. lerp(P1,P2,t) is if ( rand() < t ) return rand1 else return rand2.

Pixelero showed some interesting stuff with multiplying randoms. Apparently addition is convolution, I'm not sure what
multiplication does exactly.

You can also create distributions by doing random selection vs. other randoms, stuff like x = rand1(); if x < rand2() return x;
By using functions on the conditional you can create lots of shapes, but I'm not sure how to understand what kind of shapes you get.
Luc Devoye's book has got some complex conditional tests like this that generate all kinds of shapes of probability distributions.

Some other good links I found :

basic Gaussian rand ,
stocc source code with lots of random generators ,
anuran automatic code gen for random generators