I've been playing around with denoising images a tiny bit. There's a ton of papers on this and I've barely only

scratched the surface, but it strikes me that the majority of the researches seem to be doing silly things that

are ill-conceived.

Almost all of them work in the same basic way. They create a prediction of what the current pixel should be

with a local smooth predictor, let's call that 'S'. Then they take the difference from the actual pixel value 'A'.

If the difference is greater than a certain threshold, |S - A| > T , they replace the pixel value with 'S'.

That's just very wrong. It assumes that images can be modeled with a single-lobe Gaussian probability distribution.

In fact, images are better modeled as a blend of several lobes with different means. That is, there is not one single

smooth generator, but many, which are switched or blended between based on some state.

Any single-lobe predictor will incorrectly identify some valid image detail as noise.

I like to make it clear that the problem has two parts : deciding if a pixel is noise or not noise, and then filling in

a replacement value if you decide that the pixel is noise.

My feeling is that the second part is actually not the important or difficult part. Something like a median filter or

a bilateral filter is probably an okay way to fill in a value once you decide that a pixel is noise. But the first

part is tricky and as I've said any simple weighted linear predictor is no good.

Now, ideally we would have a rich model for the statistical generation of images. But I wrote about that before when

talking about Image Doubling (aka Super-Resolution), and we're still very far from that.

In the mean time, the best thing we have at the moment, I believe, is the heuristic modes of something like CALIC, or

the Augural Image Zooming paper, or Pyramid Workshop or TMW. Basically these methods have 6 - 100 simple models of

local image neighborhoods. For example the basic CALIC models are : {locally smooth}, {vertical edge}, {horizontal edge},

{45 degree edge}, {135 degree edge}, {local digital}, {pattern/texture}. The neighborhood is first classified to one

of the heuristic models, and then a prediction is made using that model.

We can thus propose a simple heuristic noise detection algorithm :

Bayesian Noise Detection :

N = current local neighborhood

A = actual pixel value

P(M|N) = probability of model M given neighborhood N

P(A|M) = probability of pixel A was generated by model M

let

P(A|N) = argmax{M} P(A|M) * P(M|N)

then classify A as noise if

P(A|N)`<`

T

for some threshold T

(I don't specify how the P's are normalized because it just changes the scaling of T,

but they should be normalized in the same way for the whole image)

Note that a very crucial thing is that we are using the argmax on models, NOT the average on models. What we're saying is

that if *any* of our heuristic local models had a high likelihood of generating the value A, then we do not consider it noise.

The only values that are noise are ones which are unlikely under *all* models.

In a totally hand-wavey heuristic way, this is just saying that if a pixel is within threshold of being a locally smooth

value, or an edge value, or a texture, etc. then it's not noise. If it fits none of those models within threshold, it's

noise.

I started by looking at the Median Filter and the Bilateral Filter. There have been some cool recent papers on fast

Median Filtering :

Constant Time Median Filter

Weiss Log(r) Median Filter

Fast Bilateral Filter ; Sylvain Paris and Fr�do Durand +

good references

Siggraph Course on Bilateral Filtering

Those are all very worth reading even though I don't think it's actually super useful. The fast median filter approaches use

cool ways of turning an operation over a sliding window into incremental operations that only process values getting added in and removed

as you step the window. Median filter is a weird thing that works surprisingly well actually, but it does create a weird sort of Nagel-drawing

type of look, with nothing but smooth gradients and hard edges. In fact it's a pretty good toon-shading process.

BTW the fast median filters seem useless for denoising, since they really only matter for large r (radius of the filter), and for denoising you

really only want something like a 5x5 window, at which size a plain brute force median is faster.

Bilateral filter actually sort of magically does some of the heuristic cases that I've talked about it. Basically it makes a prediction using

a filter weighted by distance and also value difference. So similar values contribute and disparate values don't. That actually does a sort of

"feature selection". That is, if your pixel value is close to other pixel values in a vertical edge, then the bilateral filter will weight

strongly on those other similar pixel values and ignore the other off-edge pixels. That's pretty good, and the results are in fact decent,

but if you think about it more you see the bilateral filter is just a ghetto approximation of what you really want. Weighting based on pixel

value difference is not the right way to blend models, it makes no distinction about the context of that value difference - eg. it doesn't

care if the value difference comes from a smooth gradient or a sudden step. As others have noted, the Bilateral Filter makes the

image converge towards piecewise-constant, which is obviously wrong. Getting towards piecewise linear would be better, piecewise bicubic

would be better still - but even that is just the very easiest beginning of what the heuristic estimator can do.

NL Means is a denoising algorithm which is a bit closer to

the right idea; he's definitely aware of the issues. However, the actual NL means method is poor. It relies on closely

matching neighborhoods to form good predictions, which anyone who's worked in image compression or super-resolution knows

is not a good approach. The problem is there are simply too many possible values in reasonably sized neighborhoods. That

is, even for a moderately sized neighborhood like 5x5, you have 2^8^25 possible values = 2^200. No matter how much you train,

the space is too sparse. It may seem from the NL Means formulation that you're weighting in various neighbors, but in reality

in practice you only find a few neighbors that are reasonably close and they get almost all of the weight, and they might not

even be very close. It's like doing K-means with 2^200 possible values - not good.

There's a lot of work on Wavelet Denoising which I haven't really read. There are some obvious appealing aspects of that. With wavelets

you can almost turn an image into a sum of {smooth}+{edges}+{texture}+{noise} and then just kill the noise. But I don't really like the

idea of working in wavelet domain, because you wind up affecting a lot of pixels. Most of the noise I care about comes from cameras,

which means the noise is in fact isolated to 1 or 2 adjacent pixels. I also don't like all the research papers that want to use 9x9 or

larger local windows. Real images are very local, their statistics change very fast, and pulling in shit from a 9x9 window is just going

to mess them up. IMO a 5x5 window is the reasonable size for typical image resolutions of today.

BTW one thing I've noticed with my camera noise images is that the fucking camera JPEG makes the noise so much harder to remove. The

noise looks like it's originally just true single pixel noise, but when it goes through the camera JPEG, that single-pixel peak is

really unfriendly to the DCT, so it gets smeared out, and you wind up having noise lumps that look like little DCT shapes. To specifically

denoise photos that have gone through JPEG you probably have to work on 8x8 blocks and work directly on the DCT coefficients.

(also the Bayer pattern demosaic obviously spreads noise as well; ideally you'd get to work on the raw taps before jpeg, before

the camera denoise, and before the Bayer demosaic).

ADDENDUM : a lot of the denoise people seem to be trying to perfect the Playboy Centerfold algorithm, that makes photos look

extremely smoothed and airbrushed. Often if you're not sure a pixel is noise it's best to leave it alone. Also, all the methods

that use a pixel-magnitude threshold value for noise are wrong. The threshold for noise needs to be context sensitive. That is,

in smooth parts of the image, you might be able to say that a pixel is probably noise when it's off from expectation by only 1 or 2

pixel values. In chaotic textures parts of the image, a pixel might be off by 20 values or more and you still might not be able to

say it's noise. The correct parameter to expose to the user is a *confidence*. That is, I want to do something like replace all

pixels which the algorithm is >= 90% confident it can improve.

Another problem I've seen with the majority of the denoisers is that they create structures from noise. If you run them on just

staticy junk, they will form local flat junks, or linear bits or weird feathery patterns. This is because even in random noise

there will be little bits that have similar values so they become seeds to create structures. This is very bad, the weird structures

that are formed by this "denoising" are much worse than the original static, which is pretty inoffensive to the eye.

Marc sent me the link to GREYCstoration a free denoiser based on

the image manifold PDE research. I don't like that this technique is becoming known as "PDE" - PDE just means partial differential equation;

in this case it's a diffusion equation, in particular a variant of anisotropic diffusion with different diffusion rates based on

the local curvature - eg. it diffuses across smooth areas and not across edges. (that's actually an old basic technique,

the new thing he does is the diffusion follows contour lines (but is actually 2d, just weighted) and works on all components).

It looks pretty decent. It's actually more

exciting to me for super-resolution, it looks like it does a pretty good job of image super-resolution.

## No comments:

## Post a Comment