You are on page 1of 13

CHARACTERISTIC OF HIGHER ORDER THINKING SKILLS

ASSESSMENT TO THE ENVIRONMENT LESSON


Kusuma Wardany
Nahdlatul Ulama Lampung University
kusuma.wardany@ymail.com
kusuma.wardany@0619@mail.com

ABSTRACT
This research is aimed to know the characteristic of Higher Order Thinking Skills
assessment. The instrument was arranged based on Development Borg & Gall step. The
arrangement of examination draft validated by qualified validator and teacher. Then,
examination draft tested to few students in small trial. The trial result analysed as
quantitatively used Quest program that used to determine reliability, difficulty level, differ
effort, and distractor effectivity. There are some revisions after small trial, and then field
trial did. Based on field trial, it is known that assessment instrument of Higher Order
Thinking Skills to senior high school students in Surakarta for environment topic, has high
validity and reliability with high and enough interpretation score. Inside of environment
topic, there are questions with difficulty level of 0,68% difficult 0,68%, 9,65% medium,
and 9,6% are easy (multiple choice) 60% difficult, 40% easy (essay), there are some of
them that has differ effort with proportion score 6,89% less, 6,20% enough and 6,89%
(multiple choice), 40% enough, 51,11% good and 8,88% very good (essay). Result show
that instrument of Higher Order Thinking Skillsassessment about environment topic be able
to measure higher thinking skill level of students.

Keywords: Higher Order Thinking Skills, assessment, environment topic

INTRODUCTION Higher Order Thinking Skill is a


thinking skill that not only requires
Assessment or evaluation is a remembering skills, but also requires
general term that covers the entire other higher skills. Indicators for
procedure used to obtain information measuring Higher Order Thinking
about student learning outcomes Skills are include analysing skills
(observation, ranking, testing using (C4), evaluating (C5), and creating
paper and pencil) and making (C6) (Anderson & Krathwohl, 2001).
assessment about the learning Higher Order Thinking Skills as
process (Gronlund & Linn, 1995). In thinking skills that occur when
education, assessment is defined as a someone takes new information and
procedure used to obtain information information that has been stored in
to measure the students' level of his memory, then connects the
knowledge and skills whose results information and delivers it in order to
will be used for evaluation purposes achieve the goals or give the answers
(Reynolds, Livingston, Willson, needed (Lewis & Smith, 1993).
2010). This is in line with the 21st century
community skills characteristics that
1 Nahdlatul Ulama Lampung University, Indonesia. Email: Infoistech@unulampung.ac.id
published by the Partnership of 21st wrong. If the data is wrong, then the
Century Skill which identifying that results of the assessment will be
students in the 21st century must be wrong and consequently the decision
able to develop the competitive skills will be wrong. Therefore, it is
needed in the 21st century that necessary to have a good test tool so
focusing on developing Higher Order that the data obtained is accurate
Thinking Skills, such as: critical (Subali, 2010).
thinking (critical thinking), problem Based on this, the problem in this
solving (communication solving), research is how the characteristics of
communication skills, ICT literacy, the Higher Order Thinking Skill
information and communication assessment in the environmental
technology (ICT, information and material that will be tested in high
communication technology), school students in class XI. The
information literacy, and media purpose of this research is to find out
literacy ) (Basuki & Haryanto, 2012). the characteristics of Higher Order
The results of High School Thinking Skill assessment on
Observations in Surakarta that were environmental material that will be
chosen randomly through tested in high school students of class
Deuteronomy and Exams such as XI.
National Exams, Final-term Exams,
Middle-term Exams, School Exams, METHODS
Daily Tests, as well as from
textbooks that teachers and students The instrument was prepared
use on ecosystem and environmental based on the steps of Borg & Gall's
materials show that the problems and Development Research (1983). Data
questions are still in the low analysis techniques used are
cognitive domain (lower order qualitative and quantitative data
thinking skills). The low percentage analysis. Qualitative analysis is done
of High Order Thinking skills through a research to determine the
questions is an indicator of the validity of the contents of the test
students' low cognitive level in instrument, namely the reliability
school. between the questions in the test and
Assessments that measure the indicators that have been
Higher Order Thinking Skills can use prepared previously. Quantitative
subjective and objective tests form. A analysis is carried out by using the
subjective test is an essay test. Essay Quest program. Some aspects that
test is a kind of test that needs are analysed quantitatively are the
description answer using the test- reliability, level of difficulty, test-
takers own words. In essay form items, distinguishing points and
tests, the students are required to efficiency of distractor.
think about and use what is known
regarding the questions that must be Reliability
answered. Objective test is a kind of Reliability is often called as the
test that consist the true-false answer degree of consistency (constancy).
test (true false), multiple choices, When a measuring instrument has
completion, and matching (Suwandi, high reliability, it means that even
2009). though the assessment done
To be able to state that a repeatedly with these measuring
learning result is good or bad, instruments, it will produce the same
successful or failed, the data obtained or almost the same as the required
must also be truly reliable / accurate information (Purwanto, 2010). The
so that the determination taken is not reliability of the test instruments is
2 Nahdlatul Ulama Lampung University, Indonesia. Email: Infoistech@unulampung.ac.id
also reviewed from the results of the Distinguishing points
Quest program analysis. The distinguishing point is the
ability of a question to distinguish
Level of Difficulty between high and low-ability
The difficulty level of a test students in answering correctly in the
subject or problem (denoted by P) is analysed questions. The
the proportion of all students who distinguishing point of the test
answer the test subject or problem instrument is also viewed from the
correctly. Numbers that indicate the results of the Quest program
difficulty and ease of a problem are analysis.
called the difficulty index (p). The
magnitude of the difficulty index is RESULTS AND DISCUSSIONS
between 0.00 and 1.00. The problem
with the 0.0 index difficulty shows Results
that the problem is too difficult; Qualitative analysis was
whereas index 1.0 shows that the conducted to review the test-items
problem is too easy. The level of seen from the material, construction
difficulty of the test instrument is and language aspects so that the
also viewed from the results of the validity of the test instrument was
Quest program analysis. obtained.

Table 1. Results of assessment instruments validation from expert validators on


Environmental material

NO VALIDATION Average Conversi Criteria


(Score %) on
1 Material 100 A Very
Expert Good
2 Evaluation 80,95 B Good
Instrument
s Expert
3 Senior 85 B Goo
Teacher d
4 Practitione 92,73 A Very
r Teacher Good

Limited Trial (Small Group Trial)

Table 2. The Conclusion of the of Small Trial Environmental Test Analysis Material
Results

Categories No.Item Total


4,7,10,11,14,19,
22, 23, 26,
28, 30, 31, 32, 33,
Accepted 34, 35, 36, 22 (55%)
37, 38, 39, 40

Revised 1,2,3, 5, 6, 8, 9, 18 (45%)


12, 13, 15,

3 Nahdlatul Ulama Lampung University, Indonesia. Email: Infoistech@unulampung.ac.id


16, 17, 18, 20, 21,
24, 25, 29

Rejected - -
Total - 40

Field Trial Results


The results of the field trials were conducted on five schools in Surakarta, namely
SMA 2, SMA 3, SMA 4, SMA 6, and SMA 7 Surakarta in the 2015/2016 academic year
with the previous environmental subject in Grade X.

1) Validity

Table 3. Validity of Test-Items in Environmental Materials


Conclusion Interpretation

School Valid Very High Fair Low


Invalid High

SMA 2 32 6 235 3 - -

SMA 3 35 3 34 1 3 -
SMA 3
4 36 2 5 3 - -
SMA
6 38 - 35 2 1 -
SMA 7 38 - 37 1 - -
Total 179 11(5,78) 176 10(5,26%) 4(2,1%) -
(94,2 (92,63%
1%) )

35,8(18,84 2,2(1,15 2(1.05% 0,8(0,42


Average %) %) 23,2 ) %) -
(18,56%
)

2) Reliability
Table 4. Reliability Test-Items about Environment
Reliability
School Test Form Value Note

0,
SMA 2 MC 77 High
Essay 0,68 High
0,
SMA 3 MC 44 Fair
Essay 0,74 High
0, Very
SMA 4 MC 81 High
4 Nahdlatul Ulama Lampung University, Indonesia. Email: Infoistech@unulampung.ac.id
Essay 0,64 High
0,
SMA 6 MC 76 High
Very
Essay 0,82 High
Very
SMA 7 MC 0,81 High
Very
Essay 1,00 High

3) Level of Difficulty

Table 5. Difficulty level of Environment materials (Multiple Choice)

Table 6. Difficulty level of Environment materials (Essay)

5 Nahdlatul Ulama Lampung University, Indonesia. Email: Infoistech@unulampung.ac.id


4) Distinguishing Points

Table 7. Distinguishing Points of Environment Material (Multiple Choice)

Table 8.Distinguishing Points of Environment Material (Essay)

6 Nahdlatul Ulama Lampung University, Indonesia. Email: Infoistech@unulampung.ac.id


5) The Effectiveness of Distractor

Table9.The Effectiveness of Environmental Material Field Test-Items

Table10.Summary of Higher Order Thinking Skills test-items assessment on


environment materials

7 Nahdlatul Ulama Lampung University, Indonesia. Email: Infoistech@unulampung.ac.id


Discussions to understand and the causes, and
Before the instrument is tested which test items are not relevant to
in the field, the instrument is the material presented. Data
analyzed qualitatively. The obtained in small group trials
instrument review is theoretically or (assessment and student responses)
descriptively carried out to see the were compiled and analysed to
readability of the instrument and for revise the product.
content validation. Descriptive
review of the instrument was carried 2. Field Trial Results
out by reviewing material,
construction and language aspects. The results of field trials were
In this study, a descriptive conducted on five schools in
instrument review was carried out Surakarta, namely SMA 2, SMA 3,
by evaluation instrument experts, SMA 4, SMA 6, and SMA 7
material expert lecturers, linguists Surakarta with Environment as
and practitioner teachers. The results subject matter.
of the descriptive study found 1) Validity
several questions that were not in
Validity is the most important
accordance with the criteria so they
requirement in an assessment tool.
had to be revised.
An assessment technique can be said
to have high validity if the
1. Limited Trial (Small Group
assessment or the test technique can
Trial)
measure what is actually going to be
measured (Sukiman, 2012). The
After the test instrument was
test-items can be said to be valid if
analysed descriptively and validated
they have great support for the total
by the expert validator, then the
score (Arikunto, 2007). The item
instrument was tested on class XI
has high validity if the score on the
students in Karanganyar 2 SMA.
item has a collateral or correlation
Quantitative analysis on trial I was
with the total score. The validity test
done by using MicroCat ITEMAN
of the test-items is done by using the
version 3.00. The ITEMAN program
Quest program.
version 3.00 automatically analyses
Table 3 shows the results of the
the level of difficulty, distinguishing
test-items validity test; 38 test-items
points, the reliability of questions
or about 94,21% of the test items in
and some other statistical data. The
each school is stated as valid with
results of the overall test instrument
average 18,84 % and 5,78% is
analysis in test I can be seen in the
invalidwith an averageof 1,15%. In
table below. Based on Table 2, it can
addition, the interpretation of the
be concluded that in the small
test-itemsthat includes very high
groups trials of environmental
category is about 92,63% with
material there were 22 items or 55%
18,52% average, considering as
received, and 18 items or 45% were
“high” about 5,26 % with the
revised from total 40 test-items.
average 1,05%, and fair about 2,1%
Small group trials are a stage of
with the average 0,42%.
limited trials. Suparman (2012)
mentions the importance of limited
2) Reliability
trials is to find out how easily
students understand the material in Reliability is the determination
the development assessment of an assessment tool. Tests or
instrument, which parts of the assessment tests are said to be
assessment instruments are difficult reliable if the results are trustworthy,
consistent and stable. Reliability by students on the item in question
testing is done using the Quest is called the level of difficulty of the
program. Reliability is the item (Nitko, 1996).
determination or constancy of an Table 5 shows the level of
evaluation tool (Sudjana 2001). difficulty in environment material in
While Singarimbun and Soffian the form of multiple choice
(2008) stated that reliability is an questions in each school showing
index that presenting about how far 0.68% stated as difficult with an
a measuring device can be trusted average of 0.13%. 89.65% is stated
and reliable. as medium with an average of
Table 4 shows the reliability of 17.93%, and 9.6% is stated to be
the test-items in the environmental easy at 9.6% with an average of
material in each school, namely; in 1.93%. Table 6 is showing the level
SMA 2 by using multiple choice of difficulty in the environmental
test-items about 0,77% interpreted material in the form of essay
as high, and about 0,68% of the questions in each school shows that
essay form is interpreted as high, in 60% is under "difficult" with an
SMA 3, the multiplechoice form got average of 12%, and 40% is stated
0,44% that interpreted as fair, the to be moderate with an average of
essay about 0,74% interpreted as 8%. The table above does not show
high, SMA 4 multiple choice form is the easy difficulty level in each
0.81% got very high interpretation, school.
essay form is 0.64% high
interpretation, SMA 6's multiple 4) Distinguishing Points
choice form got 0.76% under the
high interpretation, and the essay Distinguishing point is the
form is 0.82% with very high ability of an assessment instrument
interpretation, SMA 7 multiple to distinguish students who belongs
choice got 0.81% under the very to the clever group (upper group)
interpreted, and the essay form is with students belonging to the lesser
1.00% under the very high group (lower group). Distinguishing
interpretation. points aims to be able to distinguish
(discriminate) the high-ability testee
3) Level of Difficulty (clever) with low ability testee (not
clever). A question that has a high D
The level of difficulty is the price means that the problem is able
opportunity to answer correctly a to distinguish students who master
problem at a certain level of ability the subject matter with students who
which is usually expressed in the do not master the material lessons
form of an index. The difficulty (Sukiman, 2012).
level index ranges from 0,000-1,000 Table 7 shows the distinguish
(Aiken, 1994). The greater the level points in environment material in
of difficulty index obtained from the the form of multiple choice
results of the calculation, it means questions about 6.89% stated as less
that the problem is easier. A matter with an average of 1.37%, 86.20%
of having TK = 0.00 means that stated enough with an average of
there are no students who answer 17.24%, and 6.89% declared good
correctly and if they have TK = 1.00 with an average of 1.37%. In the
it means that students answer table above does not indicate that
correctly. Calculation of the the distinguish points is stated as
difficulty level index is carried out very good.
for each problem number. In
principle, the average score obtained
Table 8 shows the distinguish functioning well with an average of
power of environmental material in 12.26%, and 38.61% functioning
the form of essay questions, about poorly with an average of 8.4%.
40% stated as enough with an Thus, the items that have effective
average of 8%, 51.11% stated as distractors function can be used
good with an average of 10.22%, According to Aprianto (2008),
and 8.88% stated as very good with there are several factors that
average 1.77%. However, the table influence the function or not of
above does not show the distinguish deception, that is, if the problem is
points that is stated in each school. too easy, the subject matter gives
If a distinguish point of the test- instructions to the answer key and
item cannot distinguish the two the student already knows the
abilities of the student, then the material to be asked too easily.
possible reasons can be as follows: In table 10 several questions,
1) The answer key of the test-item namely 2, 4, 5 11, 13, 14, 16 and 20
is incorrect. are invalid; it means that the
2) The item has 2 or more correct question does not measure students'
answers abilities. The valid interpretation is
3) The measured competence is very high in maximum and low in
unclear minimum. Has a moderate level of
4) The distractor does not work difficulty and a maximum
5) The material asked is too distinguishing points is enough.
difficult, so there are many The results of the field test
students guess. obtained the results that the
6) Most students who understand development assessment instruments
the material presented in thee included a good category of
question think that there is questions. Assessment instruments
wrong information in the test- on environmental material have
items. (Ministry of National validity and reliability with very
Education, 2008). high average interpretations, and at
least enough on environmental
5) The effectiveness of the material. In the environment
Distractor material has a difficulty level with
the proportion of 0.68% difficult,
In multiple choice questions 89.65% moderate and 9.6% easy
there is an alternative answer / (multiple choice), 60% difficult and
option which is called as a 40% moderate (essay), has a
distractor. The good test-item, the different power with a proportion of
distractor will be chosen equally by 6.89% less, 86.20% enough, and
the students who are wrong. On the 6.89% good (multiple choice), 40%
contrary, if the test-item is not good, enough, 51.11% good and 8.88%
the distractor will be chosen very good (essay).
unevenly. The distractor is Higher Order Thinking Skills
considered as good if the number of aspects in the instruments developed
students who choose the distractor is consist of 3 indicators according to
the same or close to the ideal Anderson, W. L & Krathwohl, R, D.
number. According to Surapranata (2001), namely; students are able to
(2005), a distractor can be said to analyse, students are able to
function well if at least 5% are evaluate, and students are able to
selected by the test participants. create, and use 4 aspects of
Table 9 shows the effectiveness of cognitive dimensions which consists
distractors in environment material of factual, conceptual, procedural
in each school by 61.36%
and metacognition. Test items create charts or pictures to solve a
compiled from indicators on these problem, explain some possibilities
aspects mostly have questions in the as solutions, think open, make
form of statements containing cases decisions, be able to work carefully,
or problems that occur in our dare to speculate and be able to
environment and everyday life to reflect on the effectiveness of the
provide stimulus to students to be problem solving process.
critical in solving the problem Comparison of this test
Characteristics of this test instrument with the test instrument
instrument with a test instrument developed by PISA is very relevant
developed by Emi Rofiah (2013) has and the same, the questions that
the same correlation with the HOTS must be answered are multiple
instrument developed, but has a choice questions starting from
HOTS indicator whose aspects are choosing one simple alternative
different, namely aspects of high- answer, like answering yes / no
level thinking skills consisting of question, to an alternative answer
critical thinking consisting of 6 that is rather complex, such as
indicators, namely students able to responding to several choices
ask questions, revise wrong presented. In questions that require a
concepts, plan strategies, evaluate description answer, students are
decisions, criticize statements, and asked to answer with a short answer
be able to evaluate decisions. Test in the form of words or phrases, then
items compiled from indicators in a quite long answer in the form of a
this aspect mostly have questions in description that is limited by the
the form of statements containing number of sentences, and answers in
problems to provide stimulus to the form of an open description.
students to be critical in solving the Measuring real-life skills related to
problem. The aspect of creative reading, mathematics and science
thinking ability consists of 12 with a focus on everyday life and in
indicators, namely students are able fields where science is used such as
to formulate equations, build health, earth and environment, and
relationships between concepts, technology. So that students are not
propose new ideas, compile trained to use logic and analytical
concepts in the form of schemes, abilities.
describe ideas, dare to experiment, Based on the analysis and the
organize concepts, produce above statement, it can be concluded
something new, design experiments, that the instrument of Higher Order
modify concepts with new things, Thinking Skills assessment on
able to combine concepts that are environment material shows that the
coherent, and able to change results are able to measure students'
equations. Test items that test high-level thinking skills.
creative thinking skills test many
students to solve questions in the CONCLUSIONS
form of pictures and present
problems that can bring students' The conclusions that can be
creativity. The aspect of problem inferred from the results of the
solving ability consists of 11 research on the development of
indicators, namely students are able instruments for evaluating Higher
to identify problems, declare a Order Thinking Skills on ecosystem
causal relationship, are able to apply and environment material are as
concepts that are appropriate to the follows:
problem, have curiosity, able to
1. The results show that the field test (delapan) prinsip manajemen
of HOTS's assessment has mutupada SMK Negeri 13
validity and reliability with very Bandung. Universitas
high average interpretations, and Tekhnilogi Bandung:
at least enough on environment Bandung
material.
2. The question on environment Arikunto, Suharsimi. (2007). Dasar-
material has a difficulty level Dasar Evaluasi Pendidikan.
with the proportion of 0.68% Jakarta : Bumi Aksara
considered as difficult, 89.65%
considered moderate and 9.6% Basuki, Ismet, & Hariyanto. (2012).
considered as easy ( in multiple Assesmen Penelitian.
choice), while 60% is interpreted Bandung: PT Remaja
as difficult and 40% is Rosdakarya Offset
interpreted as moderate (in Borg and Gall. (1983). Educational
essay), has a distinguishing Research, An Introduction (4th
points with proportion; 6,89% Ed). Newyork and London.
less, 86.20% enough, and 6.89% Longman Inc
good (multiple choice), 40%
enough, 51.11% good and 8.88% Depdiknas. (2008). Panduan
very good (essay). Analisis Butir Soal. Jakarta:
Dirjen Manajemen
The suggestions for further
researchers to develop instruments Gronlund, M. S and Linn, Joyce.
for assessing Higher Order Thinking (1995). Measurement And
Skills are: Assesment In Teaching.
1. Subsequent research is expected Prentice-Hall, Pearson
to conduct multi-location testing Education Upper Saddle
of assessments that have been River, New Jersey
prepared in this research.
2. Assessment needs to be Lewis, A., & Smith, D. (2001).
developed in other forms such as Defining Higher Order
two-tier, three-tier, four-tier, and Thingking. Theory Into
other kinds of multiple choices. Practice, XXXII (3), 131-137
Nitko, Anthony J. (1996).
REFERENCES
Educational Assessment of
Aiken, Lewis R. (1994). Students,
Psychological Testing and
Assessment,(Eight Edition) Second Edition.Ohio: Merrill
an imprint of Prentice Hall
Anderson, W. L & Krathwohl, R, D. Englewood Cliffs.
(2001). A Taxonomy for
Learning Teaching and Partnership for 21st century Skill.
Assesing A Revision of (2009). 21st Century Skills
Bloom’s Taxonomy of Map.
Educational Objectives. USA: http://science.nsta.org/ps/Fin
Addison Wesley Longman. al21stCenturyMapScience.pdf
. Diakses 1 maret 2015
Aprianto. (2008). Implementasi
Sistem Sistem Manajemen Purwanto. (2010). Evaluasi Hasil
Mutu ISO 9001: Belajar. Yogyakarta: Pustaka
2008Berdasarkan Pada 8 Pelajar
Reynold, Livingston, Willson. Sudjana, Nana. (2010). Penilaian
(2010). Measurement and Hasil Proses Belajar
Assesment in Education. Mengajar. Bandung:
Pearson Eucation, Inc., Upper Remaja Rosdakarya
Saddle River, New Jersey
07458. Pearson. Sukiman. 2012. Pengembangan
Media Pembelajaran.
Rofiah, E, Nonoh S. A, dan Yogyakarta: Pendajogja
Elvin Y.E. 2013.
Penyusunan Instrumen Tes Sungarimbun, Masri dan Sofian
Kemampuan Berfikir Tingkat Effendi.(2008). Metode
Tinggi Fisika Pada Siswa Penelitian Survei. Jakarta:
SMP. Jurnal Pendidikan LP3ES
Fisika. ISSN: 2338-0691.
Surakarta: FKIP Fisika UNS. Suparman,Atwi. (2012). Desain
Instruksional Modern.
Subali, Bambang. (2010). Bandung: Erlangga
Penilaian, Evaluasi dan
Remediasi Pembelajaran Surapranata, S., (2005). Analisis,
Biologi. UNY: Yogyakarta Validitas, Reliabilitas dan
Interpretasi HasilTes.
Suwandi, Sarwiji. (2009). Model Implementasi kurikulum 2004.
Asesmen Dalam Bandung: Remaja Rosdakarya
Pembelajaran. Mata Padi Offset
Presindo: Surakarta

You might also like