Monday, December 17, 2012

Image Compression using DCT (Discrete Cosine Transform)

It has been a while , so I decided to write something from memory to keep this blog alive. Currently I am stuck with some differential equations,i really want to solve that by hand. The more i dig, it becomes more complicated.

DCT aka Discrete Cosine Transform is similar to Fourier transform but uses only Cosine functions.
Its output is just scalar values , not complex numbers as in Discrete Fourier Transform. In image processing point of view it can be used to compress data. Jpeg format uses this technique to compress data. Besides this you can use this property to any field , where you want to get only some influencing data out of your sample.

There exists different kind of kernel functions for DCT. Out of which common form is shown below.
X_k =
 \sum_{n=0}^{N-1} x_n \cos \left[\frac{\pi}{N} \left(n+\frac{1}{2}\right) k \right] \quad \quad k = 0, \dots, N-1.

I copied this from wiki. If you carefully examine you can visualize this as projecting your data on the cosine.

Let us now apply DCT over an array of numbers
result = FourierDCT[{1, 10, 2, 3, 4, 2, 5, 7, 2, 10, 10, 11, 2}]
{19.1372, -3.7222, 1.37076, 2.10949, -2.33856, 2.01418, -4.42612, 0.769566, -2.33272, -3.44945, -3.3978, 1.07119, -2.33659}

This is the result , now lets take only first 10 from result array. That is .
len = Length[result];  // assign length (13) to variable len.
compressed  = Take[result, 10]

{19.1372, -3.7222, 1.37076, 2.10949, -2.33856, 2.01418, -4.42612, 0.769566, -2.33272, -3.44945}

Now lets take Inverse DCT of this compressed data which is expected to give our valuable original data back. Note original array 'result' contained  13 numbers  So we need to fill the remaining items with 0's.

FourierDCT[PadRight[compressed, len], 3];
This will output as below.
{1.682, 8.26, 4.01, 1.549, 4.43, 2.42, 4.41, 6.876, 3.40, 7.369, 13.12, 8.47, 2.96}
Note the similarity with original data , We just did a simple compression on input data and recovered successfully. This is a lossy compression technique. Applying this on image will not reduce the overall quality of image. I mean it is still possible to understand the original image.

Let us now apply this example on an image. This is our original image, Did you able to recognize him? if you are a German you must. He is the prince of mathematicians.

This image's dimensions is {220,282} and GrayScale.

Lets now take only first 100 rows from this image and compress it


dctResult = FourierDCT[ImageData[ image, "Byte"]];
Length[dctResult];
// Here we are only taking first 100 rows from image , that is we are compressing it
compressed = Take[dctResult, {1, 100}];

Now let us recreate this compressed data again , and see how it looks like
padded = PadRight[compressed, {282, 220}];
resImage = FourierDCT[padded, 3];
Image[resImage, "byte"]



It is not that bad, still you can recognize him.
How about taking 200 rows from the original image. See the output below . It is actually good. We just saved (282-200) * 220 bytes by this simple compression. (282 is original image height, 220 is width)




I used mathematica program here. Hopes still you could get the idea.
Now again , I am asking , Did you recognize this genius ? if not he is Carl Friedrich Gauss, the Legend.