|
|
 |
 |
 |
 |
 |
|
| |
Vector quantization and orthogonal transformation coding are classic schemes of lossy compression. Here orthogonal transformation coding, the principle method used today, and its elementary techniques will be explained. |
|
 |
| |
 |
|
| |
As shown in Fig. 1.4, a monochrome image is divided into square-shaped blocks consists of n pixels, and those pixel values are expressed as a sequence p1, p 2,..., pn. As there is a wide distribution across the range of values from 0 to 255 in no particular order, simply performing scale quantization and entropy coding will not compress the image effectively. However, because neighboring pixels tend to have a strong pixel value correlation in the case of natural images, by applying appropriate linear transformation (n-n matrix) to the n-dimension vector made of n pixel values, a new component can be computed that statistically satisfies the following:
|q1| >|q2| > |q3| > ............... > |qn|
When scale quantization is applied to the above inequality, one obtains
|q1/Q| > |q2/Q| > |q3/Q| > ....> |qk/Q| > 0 = ........... = 0
where zero will appear repeatedly towards the end of the series. Therefore by outputting a symbol that means "ends with non-zero coefficient" at the beginning of a sequence ending in zero, the code volume can be compressed even further. |
|
| |
| |
|
|
| |
| Example: When a monochrome image (see Fig. 1.5 (a)) is divided into 1-2 blocks and each block is plotted with 2-dimensional coordinates that represent pixel values p1 and p2, the scatter diagram shown in Fig. 1.5 (b) is obtained. Most of the dots are distributed around the linear central axis p1= p2. It is clearly seen that in many of the blocks there is a strong pixel value correlation between neighboring pixels. Also note that even if scalar quantization is performed on p1 and p2 as are at this point, there will be few components that will become zero. Whereas, by rotating the horizontal axis 45·〈 counterclockwise and producing q1and q2 by rotational transformation |
 |
| and overlay on the p1 = p2 line, q2 approaches zero. (See Fig. 1.5 (c)). Therefore, scalar quantization will result in most q2 values becoming zero. |
|
|
|
|
|
 |
| |
As in the rotational transformation of the example, when the length and angle remains the same and just the coordinate axis is rotated or reflected in a linear transformation, this is referred to as an orthogonal transformation. From this we conclude that orthogonal transformation coding is a technique that increases the effectiveness of scalar quantization and entropy encoding by taking pixel values and grouping them in blocks and then orthogonally transforming elements from largest to smallest. Examples of orthogonal transformation include the Hadamard transform like in Fig. 1.4 in which component values of the matrix consists of -1 or 1, and the discrete cosign transform whose component values smoothly changes on a image plane. In particular, the first row vector (u, v) = (0, 0) of the orthogonal matrix gives the average pixel value for the blocks (DC component) and other vectors give the frequency component (AC component) from the bottom to the top. |
|
| |
| |
When data loss is significant in scalar quantization we see the appearance of distortions called false contours, but the following types of distortions characteristic of orthogonal transformation coding can also arise. |
|
| |
| ・Block noise : |
Distortions seen in the shape of the block due to unaligned pixel values near the block borders. |
|
|
| |
| ・Mosquito noise : |
Distortions where the shape of the whole block is distorted in response to lost high-frequency components in blocks where a large gap in pixel values exists. |
|
|
| |
 |
|
| |
 |
|
| |
When orthogonal transformation is performed for a block with n pixels, one DC component and n-1 AC components are obtained from one block. The DC component is the average block pixel value, and so if all of these are brought together a compressed image of the original image is obtained. (See Fig. 1.5) That means that DC components still retain characteristics of the natural image and the values of neighboring pixels are very close. Then, by lining up all DC components from left to right and taking the difference between any given value and the value to the left of it (prediction differential) as the predicted value, you get values approaching zero. The technique used to enhance the efficiency of entropy encoding using this prediction differential is called prediction coding (DPCM: Differential Pulse Code Modulation). |
|
| |
|
|
| |
 |
|
| |
| Example: Fig. 1.7 shows the frequency of occurrence of all pixel values and prediction differentials for the monochrome image (see Fig. 1.5 (a)). The pixel values show a broad distribution, but we see that the prediction differentials are approaching zero. |
 |
 |
|
|
|
|
 |
| |
 |
|
| |
 |
|
| |
When any sort of orthogonal transformation is made to 2-2 blocks, one DC component and 3 AC components are obtained from one block. Also, as an image can be compressed to half its size from all the DC components, it is possible to reconstruct an image the same size as the original one by interpolating the compressed image. (See Fig. 1.8) Considering the interpolated image as a prediction image, AC components that approach zero can be obtained by applying orthogonal transformation to the prediction differentials. This scheme is called AC component estimation. |
|
| |
|
|
| |
 |
|
 |
| |
| Example: Fig. 1.9 (a) shows the case where an image compressed to half its size has simply been enlarged, and when it has been interpolated using an appropriate technique. When the three AC components were sought using Hadamard transformation on 2-2 blocks (for (u,v)=(0,0),(0,1)(1,0),(1,1) in Fig. 1.4) to give a prediction image in both cases, the graph showing the difference in the frequency of occurrence in each is that shown in Fig. 1.9 (b). There we see that although there was no difference in the frequency of occurrence of the AC component for (u, v) = (1, 1), the values for the other two AC components are approaching zero. |
 |
|
|
|
|
 |
| |
 |
|
| |
 |
|
| |
The following series of symbols made up of "0" and "1",
00000111000111110000111111000000011
is expressed by a 1-bit fixed length, and therefore it cannot be compressed using variable-length coding like Huffman coding. But if we note the length of runs of consecutive "0"s and "1"s and convert the series into a series of 3-bit fixed-length codewords as follows,
53354672
then the code volume can be compressed from 35 bits to 24 bits. In this way, when certain symbols have a tendency to appear consecutively, the technique used to overwrite these lengths of consecutive symbols is referred to as run-length coding. In particular in symbol series where there is a strong tendency for runs of "0"s to occur, the scheme used to express just the "0"s in a length, is called zero run-length coding. |
|
| |
 |
|
| |
 |
|
| |
By breaking up natural images into blocks and grouping together pixel values by block and using an appropriate orthogonal transformation technique, components that approach zero are obtained. In the same way as this, RGB components in color images can also be converted into components that approach zero using an appropriate 3-dimensional orthogonal transformation. More specifically, although there are various orthogonal transformations out there, the YCbCr coordinate system
Y = 0.299 R + 0.587 G + 0.114 B
Cb = 0.564 (B - Y) + 128
Cr = 0.713 (R - Y) + 128
Which is established under the digital video component standard ITU-R BT.601 is widely used. Here the Y component represents brightness (luminance component) and Cb and Cr represent the difference between B and Y and R and Y respectively (color-difference component). Also, in terms of the color-difference component, it is known that even if neighboring pixels have the same value, the effect on picture quality is minimal, and a sub-sample that trims the same distance from in between pixels is also at work. The following classifications are made according to the number of trimmed pixels.
·
4:4:4 Format: All pixel values used
·
4:2:2 Format: The same pixel value used in 2 horizontal pixels.
·
4:1:1 Format: The same pixel value used in 4 horizontal pixels.
·
4:2:0 Format: The same pixel value used in 2 horizontal -2 vertical pixels. |
|
| |
|
|
| |
Further reading
Please refer to the following references for more detailed explanations. |
|
|
 |
|
 |
 |
| |
|
|
 |
|
|
|
 |
|
|
|