You are on page 1of 3

GPU Experiment 3

Arjun Kesani, 31410497; Venkatesh Sridharan, 31413670

Q1: Note that the MATLAB code has five sections. Explain clearly the operations done in each
section of the code.

Section 1 defines the constants that will be used.

Section 2 calculates the DCT matrix and writes it column wise in a text file.

Section 3 writes the DCT matrix in a row wise format in another text file.

Section 4 reads the image, extracts the sub-image and then writes it in a text file.

Section 5 displays the Theoretical and Expected results after calculating the regular and the scaled-
quantized versions of the calculation.

Q2: Pick three elements of the DCT matrix, calculate its elements using (14.2), multiply them by
quantization factor, Q, and compare them with their corresponding entries in “dct.txt” and
“dct_transpose.txt.” Show that they are equal (with a reasonable quantization error!) to each
other.

The first 3 elements of the first column of the DCT matrix are 0.3536, 0.4904 and 0.4619.

Scaled elements = 353.6, 490.4 and 461.9.

Corresponding elements in the text files = 354, 490, 462.

Q3: Why the elements in Q2 are not exactly same? Are we doing just multiplication, or are we doing
something more?

The elements are not exactly the same because we round them up after multiplication.

Q4: Which variables do represent the matrices B and Φ in the code? Provide your reasoning with
your answer.

b and dev_b represent B and c and dev_c represent Φ in the code. We know this because,

multiply<<<blockDim,N>>>( dev_at, dev_b, dev_x );

multiply<<<blockDim,N>>>( dev_x, dev_a, dev_c );


Q5: Show that this is equivalent to (14.5).

multiply<<<blockDim,N>>>( dev_at, dev_b, dev_x );


multiply<<<blockDim,N>>>( dev_x, dev_a, dev_c );

The first kernel call gives X = A’*B/Q

The second kernel call gives Φ = X*A/Q = A’*B*A/Q2

Q6: How do we make sure that we do not exceed the pre-allocated memory space for the array m
of type float? What happens if the data file contains more than N2 lines?

There is a counter in the while loop to make sure that only N*N elements are read. If the data file
contains more than N2 lines, then only the first N2 lines are read.

Q7: Compile the code and run it. Show that the results match with the expected result. How can
you improve the expected result and make it closer to the theoretical result? State the
modification, do it, compile the code, and show that you actually reduced the error.

Theoretical Result

729.2368 -9.6953 165.2654 -23.4958 91.6099 19.7154 60.8633 40.4326

122.4807 78.3133 1.1055 -5.7550 -2.9000 1.8442 -2.9100 6.6953

133.2608 8.9482 -41.8019 -48.1487 -9.0634 -15.6713 -7.4661 -5.9354

29.0202 -41.3986 -52.9816 -20.0031 -5.7922 2.4324 0.6491 0.1461

56.8547 -8.3817 -16.7897 24.9657 16.5809 5.7814 0.9052 9.9045

42.0635 -2.0454 1.1394 17.0512 14.6140 -5.8286 -7.2513 5.4120

56.3506 -7.3933 1.3281 9.6004 16.8015 -4.1217 -4.1891 5.4726


47.2290 -5.3825 -1.9240 6.9931 2.6257 3.7168 3.2242 3.6918

C=

729.76 -9.77 165.81 -22.93 91.74 19.60 60.84 40.60

122.52 78.32 1.11 -5.74 -2.92 1.85 -2.93 6.76

133.80 8.89 -41.63 -48.13 -9.00 -15.71 -7.47 -5.89

29.55 -41.49 -52.93 -20.06 -5.73 2.40 0.67 0.14

56.99 -8.42 -16.76 25.03 16.64 5.77 0.90 9.95

42.04 -1.99 1.11 17.09 14.61 -5.85 -7.28 5.43


56.34 -7.38 1.32 9.68 16.82 -4.15 -4.22 5.50

47.51 -5.37 -1.88 7.01 2.65 3.72 3.23 3.72

Increase the quantization constant to 10000.

Modified C =

729.23 -9.70 165.26 -23.48 91.68 19.74 60.87 40.45


122.48 78.31 1.11 -5.75 -2.89 1.85 -2.91 6.70

133.27 8.95 -41.80 -48.15 -9.05 -15.67 -7.47 -5.93

29.04 -41.40 -52.98 -20.00 -5.79 2.43 0.64 0.15

56.93 -8.38 -16.78 24.96 16.60 5.78 0.91 9.91

42.09 -2.04 1.14 17.05 14.62 -5.83 -7.25 5.42

56.36 -7.39 1.32 9.60 16.81 -4.12 -4.19 5.47

47.25 -5.38 -1.92 7.00 2.63 3.72 3.22 3.69

Q8: Compile the code. Run it. Include the measured time in your report. Run it five more times.
Report the measured times. Do they differ? What might be the reason?

Measured time = 0.273920 ms, 0.272960 ms, 0.272096 ms, 0.276448 ms, 0.270048 ms, 0.276288 ms

We can see that the times differ. The differing times for synchronization might account for the
discrepancies.

You might also like