Introduction
In this design problem, you will implement a decoder of a simple image compression algorithm. The details of the compression algorithm, as well as an explanation of its MATLAB model, is discussed in the following sections. The end-goal of the exercise is to implement the decoder using a generic 90nm standard cell library.
Contents
Image Compression Algorithm
Although you will implement the decoder, the best way to understand the decoding process is to study the encoding process. All processing is done in 8 x 8 pixel blocks. For the decoder, we limit the image size to QCIF resolution, which is 176 x 144 pixels. Thus the decoder will be processing a total of 396 blocks. Take note that this can easily be extended to any arbitrary resolution, as long as the image can be divided in to 8 x 8 pixel blocks.
In order to simplify the design, we will limit processing to the luminance (Y) component only. Full color images can be processed by executing 3 samples in parallel, one for each component. Since the processes are the same for all components, it will be a redundant exercise to process all three samples.
Each Y pixel is encoded in 8-bits, hence a bit can have 256 different values. White and black are encoded at the boundaries of this 8-bit range. Let’s refer to this representation as Y_src.
Each 8×8 block undergoes three main processing steps: (1) Discrete Cosine Transform, (2) Quantization, and (3) Run-length Encoding. The output of the last step is an encoded bitstream which serves as the final representation of the encoded image.
Discrete Cosine Transform
The Discrete Cosine Transform (DCT) represents the 8×8 block in a different domain, similar to how the fourier transform represents data into the frequency domain from the time domain. For most natural images, the DCT transforms the 8×8 block into a sparse matrix, with most values set to 0. The advantage of having a sparse matrix will be apparent in run-length encoding. In general, run-length encoding represents data with repeated values in a smaller number of bits than data with different values, hence compression is achieved. For this exercise, we achieve the DCT by using a DCT matrix, shown below:
dct_matrix = | 0.3536 0.3536 0.3536 0.3536 0.3536 0.3536 0.3536 0.3536 | | 0.4904 0.4157 0.2778 0.0975 -0.0975 -0.2778 -0.4157 -0.4904 | | 0.4619 0.1913 -0.1913 -0.4619 -0.4619 -0.1913 0.1913 0.4619 | | 0.4157 -0.0975 -0.4904 -0.2778 0.2778 0.4904 0.0975 -0.4157 | | 0.3536 -0.3536 -0.3536 0.3536 0.3536 -0.3536 -0.3536 0.3536 | | 0.2778 -0.4904 0.0975 0.4157 -0.4157 -0.0975 0.4904 -0.2778 | | 0.1913 -0.4619 0.4619 -0.1913 -0.1913 0.4619 -0.4619 0.1913 | | 0.0975 -0.2778 0.4157 -0.4904 0.4904 -0.4157 0.2778 -0.0975 |
The DCT matrix was generated using the MATLAB function dctmtx. Take note that dct_matrix contains rational numbers, so you may have to consider what kind of number representation you are going to use in your system.
To generate the transform of Y_src, we first normalize it by dividing all encoded values by 256. This generates a new matrix, Y_orig, which has a range of 0 to ~1 (as opposed to Y_src which has a range of 0 to 255). We then define the transform, Y_dct, as the product of dct_matrix, Y_orig, and the transpose of dct_matrix, shown below:
Y_dct = dct_matrix * Y_orig * dct_matrix'
Take note that this operation is not commutative. Since the operation is matrix multiplication, changing the order of the operands will definitely change the result.
Quantization
After producing the discrete cosine transform Y_dct, the next step is to quantize the values. Quantization reduces the number of bits needed to represent the data, but results in lossy compression since truncated data is lost. The quantization factor is not fixed for all of the elements in the 8×8 block. Elements near the top left of Y_dct have smaller quantization factors, while those at the opposite side have larger quantization factors. This further exaggerates the sparsity of the matrix.
For this exercise, quantization is achieved by performing a scalar division of Y_dct by the quantization matrix q_y, which we define as:
q_y = | 22 15 14 22 171 192 208 222 | | 16 16 20 26 174 217 221 213 | | 20 17 22 33 207 216 233 215 | | 20 24 30 40 208 258 248 223 | | 25 30 51 77 231 150 141 243 | | 33 48 76 88 249 143 155 264 | | 67 88 108 120 141 166 165 139 | | 99 126 130 135 154 138 141 274 |
After division, the resulting value is rounded off to produce a quantized matrix Y_q, which now contains only integer values.
Run-length Encoding
The final step in encoding the 8×8 block is to scan it in a zig-zag pattern then encode the values using run-length encoding. Because of the properties of DCT and quantization, the resulting Y_q should mostly contain zeros, with the non-zero values concentrated in the top-left of the block. Scanning in a zig-zag fashion thus positions these non-zero values at the beginning of the vector, while locating the zeros at the end.
For example, the matrix elements of Y_q are numbered 1-8 in the first row, 9-16 in the second, 17-24 in the third, and so on. Scanning produces a vector with the following order of the matrix elements: 1, 2, 9, 17, 10, 3, 4, 11, 18, 25, and so on. Let’s refer to this vector as Y_zig.
If we were to represent the 8×8 block raw, we would need 8 * 8 * 8 = 512 bits. However, because DCT and quantization produces a very sparse matrix, we can use run-length encoding to reduce the needed bits significantly. Run-length encoding encodes non-zero values raw, but then encodes the following run of zeros as a single value instead of representing them individually. For example, the vector we want to encode is 25, 0, 0, 0, 0, 3, 0, 0. Instead of using 8 * 8 bits to represent the whole 8-element vector, we instead represent a non-zero value + zero run as a pair of numbers. In this example, it would be (25, 4),(3, 2). If each run of zeros is represented as 3-bits (since there are only 8 elements in a vector), then we would only need (8 + 3) * 2 bits to encode the vector.
For this exercise, we use a similar scheme for run-length encoding. We encode the non-zero value in 8-bits, and the run of zeros in 6-bits, for a total of 14-bits for each non-zero element + zero run pair. Let’s assume that we wish to encode our values in 8-bit unsigned notation. However, since Y_zigĀ may contain negative values. To remedy this, let us shift the values of the vector up by half the scale of the pixel representation (128), such that the run of zeros is now a run of 128’s. For example, (25, 4), (3, 2) would now be represented as (153, 4), (131, 2). The final bitstream would then be 10011001 000100 10000011 000010.
Each 8×8 block will have an associated run-length-encoded bitstream. Because the number of elements in the encoded bitstream is not fixed, we terminate the bitstream using the value (0, 0), which is encoded as 00000000 000000. The overall bistream will contain the individual bitstreams in raster order (left-to-right, top-to-bottom).
Decoder
As mentioned earlier, the system you will be implementing is the decoder and not the encoder. Decoding is basically just the same as encoding, except that the processes are performed in reverse:
- Decode a run-length-encoded bitstream into raw, zig-zag scanned values. (Y_zig_dec)
- Reconstruct the quantized 8×8 block by reversing the zig-zag scan process. (Y_q_dec)
- Reverse the quantization process by performing scalar multiplication of Y_q to Y_q_dec. (Y_iq)
- Reverse the DCT process by performing the following: dct_matrix’ * Y_iq * dct_matrix. (Y_dec_norm)
- Remove normalization to restore the value of the decoded 8×8 block to the correct scale, which is 0 to 255. (Y_dec)
MATLAB Model
To further illustrate the encoding and decoding process, a MATLAB model can be obtained here. Just run the modelrun.m script to perform the whole encoding and decoding process. For convenience, all variables are named in the same way as what was described in the previous sections. You can also try to use a different image example by just changing the source image file specified in the modelrun.m script.
Number Representation
As observed in the discussion, some intermediate values of the encoding/decoding process are non-integers. I leave it up to you to decide on how to best represent these values. You can use floating-point, but generally it is highly recommended to just use fixed-point to save on logic. However, be sure to retain a high degree of accuracy, as large values of deviation from the MATLAB model will incur deductions in scoring.