You are on page 1of 4

NOTA KOTAK DAN MISAI

A box-and-whisker plot (often simply called a box plot) is a graphical


way of showing data. It is useful for quickly finding outliers - data
points out of line with the rest of the data set.

Suppose we want to construct a box plot of the following test scores:

50, 60, 73, 77, 80, 81, 82, 83, 84, 84, 84, 85, 88, 95, 100

If they're not already in numerical order, it's best to arrange them


in ascending order.

First, we need to construct the "box." To do so, we must find the


upper and lower quartiles and the median. The median is the number in
the middle of our set (when arranged in numerical order). The upper
and lower quartiles are the values 1/4 of the way from the top or
bottom of our set. In our example:

50, 60, 73, 77, 80, 81, 82, 83, 84, 84, 84, 85, 88, 95, 100
^ ^ ^
L.Q. Median U.Q.

To draw the box, we'll put a scale on the x-axis and draw a box from
the lower quartile to the upper quartile. We'll add a vertical line to
mark the median, like so:

LQ M UQ
+-------+
| | |
+-------+
^.........^.........^.........^.........^.........^.........^
50 60 70 80 90 100 110

where LQ = Lower Quartile, M = Median, UQ = Upper Quartile.

Now we add "fences." First, we compute the inner quartile range (IQR).
The IQR = UQ - LQ. So in our example IQR = 85 - 77 = 8. The inner
fences are 1.5*IQR below the L.Q. and 1.5*IQR above the U.Q. For our
example, the inner fences are at:

77 - 1.5*8 = 77 - 12 = 65
and at 85 + 1.5*8 = 85 + 12 = 97

We'll mark these with a dotted line (I'll use colons ":"). Sometimes
the fences are not drawn on the box plot, but we'll put them in so we
can see where they are:

LIF LQ M UQ UIF
: +-------+ :
: | | | :
: +-------+ :
^.........^.........^.........^.........^.........^.........^
50 60 70 80 90 100 110

where LIF = Lower Inner Fence, UIF = Upper Inner Fence.


There is also a set of outer fences. These are 3*IQR below the L.Q.
and 3*IQR above the U.Q. For our example, the outer fences are at:

77 - 3*8 = 77 - 24 = 53
and at 85 + 3*8 = 85 + 24 = 109

We'll mark these with another dotted line. These are always twice as
far out as the inner fences. Here's what we have so far:

LOF LIF LQ M UQ UIF UOF


: : +-------+ : :
: : | | | : :
: : +-------+ : :
^.........^.........^.........^.........^.........^.........^
50 60 70 80 90 100 110

where LOF = Lower Outer Fence, UOF = Upper Outer Fence.

Now we add the "whiskers." Find the first value above (to the right
of) the Lower Inner Fence. Mark it with an X and draw a line
connecting it to the box. Similarly, find the first value below (to
the left of) the Upper Inner Fence. Mark it with an X and draw a line
connecting it to the box as well. In our example, the end values for
our whiskers are at 73 (the first value above 65) and 95 (the first
value below 97.) Our plot now looks like this:

LOF LIF LQ M UQ UIF UOF


: : +-------+ : :
: : X---| | |---------X : :
: : +-------+ : :
^.........^.........^.........^.........^.........^.........^
50 60 70 80 90 100 110

Finally, we have to mark the outliers. Values between the inner and
outer fences are called "suspect outliers." We mark them with an
asterisk "*".

Values outside the outer fences are called "highly suspect outliers."
We mark them with an "o". In our example, we have two suspect
outliers: the 60 and the 100. We also have one highly suspect outlier:
the 50. Once we mark these on our plot, we're finished:

LOF LIF LQ M UQ UIF UOF


: : +-------+ : :
o : * : X---| | |---------X : * :
: : +-------+ : :
^.........^.........^.........^.........^.........^.........^
50 60 70 80 90 100 110

We could "erase" the fences and labels, but I'd probably leave them in
so that the person looking at the graph can see where they are. If we
erase them, we'll have:
+-------+
o * X---| | |---------X *
+-------+
^.........^.........^.........^.........^.........^.........^
50 60 70 80 90 100 110

As you can see, this plot quickly gives an idea of what our data look
like. Half the numbers are between 77 and 85, the middle of the data
set is at 83, the "reasonable" range of the data goes from 73 to 95,
and we have three suspect data values at 50, 60, and 100.

A nice feature of this kind of plot is that all the computations are
relatively simple. We never had to do anything more than add,
subtract, and multiply by 1.5 and 3.

Soalan statistik :

1 i. Sebuah syarikat yang membuat baja telah menjalankan satu ujian sikap buat
pekerjanya.
Berikut adalah markah yang diperolehi

70, 26, 85, 34, 57, 37, 50, 48, 47, 70, 46, 78, 50, 51, 63, 51, 62, 56, 57, 40, 62, 53,
63, 51, 90, 68, 14, 47, 46, 24, 63

a. Lukiskan plot batang dan daun (Stem-and-Leaf Plot) bagi markah tersebut.
[2 markah]
b. Kirakan median dan julat antara kuartile IQR (Interquartile Range)
[2 markah]

c. Pengurusan syarikat memutuskan pekerjanya perlu dihantar untuk dilatih semula.


Kenalpasti pekerja mana yang patut dihantar menjalani kursus latihan semula.
Bagaimana anda putuskan ?
[3 markah]

d. Bina plot kotak Whisker (box-and-whisker plot) dan tandakan data terpencil (outliers)
dengan * jika ada
[2 markah]

e. Penyelia syarikat mengatakan pengurusan syarikat sebenarnya tidak perlu risau kerana
75% dari pekerja mendapat markah melebihi 63%. Berikan komen anda.
[2 markah]
ii. Seorang petani menyimpan data bagi buah tomato yang ditanamnya mengguna baja jenis
A dan baja jenis B. Berikut adalah plot kotak Whisker bagi berat buah tomato (dalam
gram) yang ditanam mengguna dua jenis baja tersebut

Baja jenis apakah yang anda akan nasihatkan untuk diguna oleh petani itu dan kenapa?
[3 marks]

You might also like