ED 211 061
DOC0HENT BESQME
IB 009 676
AOTHOB
TITtE
i •
INSTITUTION
SPCNS AGENCY
REPORT NO
POE DATE
CONTRACT
NOTE
EC5S PRICE
DESCRIPTORS
IDENTIFIERS
ABSTRACT
Tatsuoka, Kikumi; Linn, Robert L-
Indices for Detecting Onusu'al Item Response Patterns
in Personnel Testing: Links Between Direct and
Item-Response-Theor.y Approaches. Computerized
Adaptive Testing and Measurement-
Illinois Oniv*, Orbana,. Computer-Based .Education
Research Lab* (
Office of Naval Research, Washington, E.G.
Psychological^ Sciences Div- * (
CERL-RR-81-5
Aug 81. , ' • \
N000-1«-79-C-0752 ,
38p,. ; For related dccument. See IR OQS 679..
MF01/PC02 Plus .Postage- * * , %
Arithmetic; *Ccmputer Assisted Testing; ♦'Latent Trait
Theory; *Matrices; ♦Statistical Analysis; Testing
Problems ft
♦ Response Pattei/ns; S F Curve Theory
Two distinct approaches, one .based
theory and the other based on observed item respons
summary statistics, have been proposed to identify
patterns. A link between these two approaches is pr
certain correspondences between Satc f s S-P Curve Th
response theory. This link makes possible several e
Sato's caution index that take advantage of the res
response theory. Several such indices are introduce
illustrated by application to a set of achievement
the newly introduced extended indices were found to
for purposes of identifying ^persons Vho consistentl
rule in attempting to solve* signed number %rithmeti
potential inportance of this, result is briefly" disc
references are listed. (Author/LLS) '
cri item
es ard s
unusual
cvided b
eory *and
xteusicn
ultsr of
d and th
test dat
te very
y use an
c prcble
ussed, a
response
tandard ^
response
y showing
item
s of
item
eir use
Two of
effective
erron eous
ks « tThe
nd 15
****** ******************>^
* Reproductions supplied by EDRS are the be^t that can be made
* 'i • from* the original document.
************* ************************ 4 4******^
U.S DEPARTMENT DF EDUCATION
national Institute of education
educational resources information
center <erici
yC Thts document has been reproduced as
received from the person or organization
originating tt
Minor changes have been made to improve
reproduction quality
• Points of view or opinions stated in this docu
ment do not necessarily represent official NIE
position or policy
University of Illinois
Computer-based Education Research Laboratory
Urbana, Illinois
INDICES FOR DETECTING UNUSUAL ITEM RESPONSE
PATTERNS IN PERSONNEL TESTING: LINKS <
BETWEEN DIRECT AND ITEM-RESPONSE-THEORY APPROACHES
by ' y
Kikumi Tatsuoka ^ .
and
Robert L. Linn
\
Computerized" Adaptive testing and Measurement
Research Report 81-5
August 1981
' PERMISSION TO REPRODUCE THIS
MATERIAL HAS BEEN GRANTED BY
D. Bitzer
TO THE EDUCATIONAL RESOURCES
INFORMATION CENTER (EpIO)."
Acknowledgment
The authors wish to acknowledge the kind cooperation
extended to us by the people involved with this report.
Bob Baillie programmed the lessons and data collection and
analysis routines, along with- his assistant, Oavid Dennis,
Mary Klein gave insight and meaning to many things as a
teacher of the children whom we seek to help. Roy Lipschut
did tne layouts and Louise Brodie did the typing!,
<x 0
V
4
Abstract «
, Two distinct approaches, one based on item-response theory and the
other based on observed item responses and standard summary statistics,
have been proposed to identify unusual response patterns.^ A link
between these two approaches is provided by showing certain -
correspondences between Sato's S-P curve Theory and item response
theory. This link makes ^ossib^e several extensions of Sato's caution
index that take advantage of" the results of item response theory.
Several such indices are introduced and their use illustrated by
application to a set of achievement test data. Two of the newly
introduced Extended indices were found to be very effective for purposes
of identifying persons who consistently use an erroneous rule in
attempting to solve signed-number arithmetic problems. The potential
importance of this result is briefly discussed.
\
4
9
ERJC
Introduction
Several authors have recently shown an interest in using
information from patterns o^response to test items to extract
information not contained in the total score* A variety of purposes have
been envisioned for use of the additional information. Wright (1977),
for example,- refers to identification of "guessing, sleeping, fumbling,
and plodding" (p. 110) from the plots of residual item scores based on
the differences between item responses and the expected responses for an
individual based oh the Rasch model. Levine and Rubin (1979) discuss
response patterns that are "so atypical. .. that his or her aptitude test
score fails to be a completely appropriate measure" (p. 269). Sato \
\
(1975) proposed a "caution" index which is intended to identify student^
\
\
Whose total scores on a test must be treated with caution* Tatsuoka ai)<£
Tatsuoka (1980) and Harnisch and , Linn (1981) have dislussed the
relationship of response patterns to instructional experiences and the
possible use of item' response pattern information to help diagnose the
types of errors a student is making. j
Indices of the degree to which an individual's pattern of responses
is unusual are conveniently classified into two ge^eral^ types: those
that use item response theory (IRT) to identify unusua^ patterhs and
those that rely only on observed item responses and standard summary
statistics based on those responses (e.g. the number or proportion of
4
people in a norm group answering an item correctly). The work of Wright
(1979) and of Levine and Rubin (1979) are examples of approaches based
on IRT while the work of Sato (1975), Tatsuoka and Tatsuoka (1980), and
Harnisch and Linn (1981) are of the latter type.
9
ERIC
/
7 •
i
The primary purpose of, this paper is to develop a link bej£wee/t
these two general approaches. More specifically, we will show a / ^
correspondence between Sato's (1975) S-P Curve theory and. teht response
curves and "group response curves" developed from IRT. Al^o, Sato's
Caution Index defined in the S-P curve theory is generalised into a
continuous domain utilizing IRT. That is, S-P curve theory and the-
Caution Index are originally developed in a discrete domain of 0 - 1
scoring, but this^study extends the theory to a more general case of,
probabilities.
Several different geaeralized versions of the caution index are
presented Results of applying these indices suggest that there are two
categories. One set of indices functions in a marxpet similar t^ Sato's
p
original index . — The- other -set fnnrHrme mnrp n]f ? T*t-g MO fra a ad .._
Tatsuoka's Individual Consistency Index in that it successfully ,
distinguishes examinees who make consistent errors in responding to test
, . ******
i terns . f *
t
We first briefly review Sato/s S-P Curve theory. Next, a group
response curve. (GRC) is developed for the one parameter logistic model.
The GRC is based on the dualistic nature of the one parameter logistic
model which depends on the choice of fixed and random parameters in the
r _
model. We then present an extended caution index with several special
areas which are applicable to^RT. The cases of two and bhree parameter
logistic models are briefly discussed with special attention given to
problems^ with person and group response curves^in these models.
Finally, we discuss applications of the new caution indices for the
detection of anomalous response patterns.
8
«~ S-P Curve Theory
Sato's (1975) caution index is applicable to* either an item or an
individual examinee* In either form the index is conveniently, obtained
»
4 *
from an especially arranged table of binary item scores referred to as,
an "S-P Table". The S-P table, the associated 9-P curves and various
indices as the caiition index are widely used in Japan for diagnosing
student performance, detecting "aberrant response patterns and for
assessing the quality of a test or instructional sequence. ?
The S-P table is a data matrix in which the students (represented
by rows) have been arranged in descending ordes of their total* test
scores from top to bottom and the items (represented by columns) have
been arranged in ascending order of difficulty from left to right. A
hypothetical S-P table is shown in 'Table 1. The solid stair-step line
is called an S-curve which is short for Student curve. For eaqh person,
represented by a given row, a vertical line is drawn to the right of the
nth cell from the left where n is the number of correct answers obtained
by that j>erson. The S-curve is then obtained by connecting the right
edge of the nth cell of each row. The P-curve is drawn in an analogous
fashion by counting down from the'top the number of cells equal to the
number of students who correctly answered the item corresponding to a
given column. The P-curve for the^data in Table 1 is shown by the
dashed line.
. Insert Table 1 about here
Let yjj be the binary response for student (row) i to item (column)
j of the S-P e^&ie.' Rotf and column sums are denoted by y^. and y.j
respectively. j The total number of ones in the S-P table is denoted y # #
Table 1
A Hypothetical Score Matrix (V^) and
S- (solid line) and P- (dotted line) Curves
sitem j
subject
1
2
- 3
4
5
6
7
8
9
10
v
y i.
P
M i.
IN
1
}
1
1
1
1
1
1
1
1
1
10
1.0
10
2 ■
1
1
1
1
1
1
1
1
0
9
0 9
Q
. 3
1
1
1
1
0
1
1
0
1
8
0.8
8
4
0
1
1
1
r
0
1
0
0
6
0.6
6
5
1
• 1
1
0
i
0
0
1
r°~.
6
0.6
6
1
1
0
1
0
1
10
1
0
' 6
0.6
7
1
1
1
0
o
1
0
0
0
5
0.5
5
8
1
1
0
1
o
0
0
0
5
0.5
5
9
. 0
0
1
!"o"
1
0
1
1
0
5
0.5
5
IP
1
of
"l
0
0
1
0
0
1
5
0.5
5
11
1
1
0
0
0
0
0
4'
0.4
4
12
' 0
0
0
1
1
0
0
0
1
4
0.4
4
13
-_L_j
.sx.
-0-
_1_
-0 .
-0 - JD- 0 -
. 3-
-0-3-
14
0
1 1
0
0
0
0
0
0
0
2
0.2
2
' 15
0
1
0
0
0
0
0
0
0
0
1
0.1
1
13
11
10
9
8
8
6
5
5
4
y .. =
79
P .
.87 .73 j67 j60 53 .53 AO .33 .33 .27
P
.527
O
• •
13
11
10
9
8
/8-6
5
5
4
)
ERIC
t p
and the proportion of correct responses by Pi,, P.j and'P #t for the row,
column and entire table respectively. As can be seen*in Table 1, the S-
curve is the step function ogive of the mutative distribution
function of total scores, yi mf for the 15 students and the'P-curve is
the corresponding function of y.j, the number of right answers for the 10
items.
Insert Table 2 about here '
— — ~~ %
i *
If the S-curve is held invariant and all the 0's to the left of the
S-curve are changed to l's and all the l's to the right of the same
curve to 6's the result is the S-P table shown in Table 2 is called a
perfect S-curve. The entries in Table 2 are denoted Mjj. Similarly a
perfect P-curve will be obtained and the entries in the new table are
denoted by m£j. As can be seen, M?. - yi- all i which corresponds
to the fact that the S-curve is unchanged as the result of changing the
cell entries from y^ t0 m| j . The values of the column sums for Tables
1 and 2, i.e., y.j ancfc m£j are not in general equal, however.
Sato (1975) defined a Caution Index for subject i by taking the ratl/o
of two covariance^. The numerator of the ratio is the covariance of
observed row vector i, (yij) j»l > • • • > n and the sum-of-column vector,
(y.j)> # j"lf2 f «««*n and the denominator is the covariance of the
corresponding scores (assuming S-curve is perfect) (M?j)» j=l,...,n and
the column-sum vectqr (y.j)> j"4,2,...,n. More specifically, the
caution index C± for the subject l"is given by
ACyij - Pi.Xy.j ~
2 (Mfj - Pi.Xy.j - p..)
• ' Table 2
Perfect S-curve Obtained by Changing l f s to the Rf£ht
of S-curve to 0 and O's to the Left to 1..
item 1 - - s
^ 1 2 3 4 5 6 7 8 9 10 y ' M ±
subject i >j^_
1 1
1
I
1
1
1
1
10
10
2 1
1
1
-r
1 I
9
9
3 1
1
1
1
1
1
0
0
8
8
4 1
1
v 0
or
0
6
6
5 1
r
j
I
■0
\o
0
0
6'
6
6 1
i
0
0
0
6
6 '
7 1
i
0
0
) o
0
5
5
8 1
i
1 1,
0
0
0
0
5
5
' 9 "1
i.
I
0
Oj
A)
0
0
5
5
10 1
i
1
6
0
0
0
5
5
11 • 1
i
0
0
0
0
0'
"0
4 «
4
12 1
i
0
0
0
0
0
0
4
4
13 1
i
0
0
o nr\ o
0
0
3
3
14 1
i
0
0
0
0
0
0
0
0
2
2
15 1
0
0
0
0
0
0
0
0
0
1
1
11
10
9
' 8
8
6
5
i
5
4
4
79
79
H*"'""' 15~DTI3 12" 10 6 3 3 2 "1
Perfect P-Curve Obtained by Changing l's Below
^ the P-Curve to 0 and 0 T s Above to 1
item j
1
2
3
4
5
'6
7
8
9
10
y i.
M i.
subject' i
\
1
1
•1
i
10
10
'2
1
1
9
10
3
1
1
8
10
* 4
1
1
6
10
5
1
0
6 • '
9
6
1
0
•o
0
6
7
7
1
0
0
0
0
5
6
8
1
0
0
0
0
5
6
9 < "
1
0
0
0
0
0
0 '
5
4
10
1
0
0
0
0
0
0
o-
' 5
3
111
0
0
0
0
0
0
0
4'
2
12
h
t
0
0
0
.0
0
0
0
0
4
1
13
0
0
0
0
0
0
0
3
1
14
0
0
0
f
0
0
0
0
0
0
2
0
US
0
0
0 •
0
0
0
0
0
0
0
1
0
>
y .j -
13
11
10
9
8
8
6
5
5
4
79
13
11
1
10
9
8
8
6
5
5
4
79
10
and the caution index, Cj, for item j is given by
Ci - 1 -
^(Mij - p.jXyi: - p..)
The second term of the caution index for item j is the ratio of two
covariances: The numerator is the covariance of column vecXor j, (yij)
and (yi.), i=l N and the denominator is the covariance of the vectors
'(•y i# ) and (Mij), i=l,2...,N.' The value of the denominator is considered
as a norm value to standardize the numerator.
It can be said that /this ratio in the above caution index is equal
to the ratio of the traditional ^discriminating index, r j , total-^tem
correlation tp the standardized (or ideal in a'sense illustrated in
— Ta¥l6 -2-) discriminati^gJ^ex^ rj', for itenL4*_^ That i^
covj(yij, yjj
covj(yij, yt . ) = qj(yi j) o(Yi J ■ £4
v covj(Mfj, y±.) < 0 covj(Mgj, y t ) rj
r ojfrf . )a (yi.y
o p x 2
It is clear that 2. (y, . - P .) * 2 (M. . - P ,) because the number of
ij .3 iJ -J
1/s in column j is invariant as can be seen in Tables 1 and ^, so the
number of l's in the column vector j, (m£j) and (yij) are the same.,
\ Therefore, the two variances (^(y^) anda^(M^j) are equal. \
The .Extended Caution Index in Conjunction With Response Theory
Test and Group Response Ctfrves: One Pa rame t e r L ogistic Model . *
According to the one parameter logistic model, the item response
curve may be written fc
, . L. , j-1,2 n, ,
P bj (8J = l+exp[-D(e-bj)] '
u
where 8 is the latent ability, bj is the difficulty of item j and D is
a constant which is set equal to -1.7 for convenience of comparison to
the normal ogive model (see'Lotd & Novick, 1968, p. 400). In the above
— * " r
equation, b-t is fixed and 9 is a random var
Although in practice, the number of ^it^s, n, is a finite number,
it is useful to consider b as a continuous variable. By holding 8^ fixed
and treating b as a continuous variable, che dual function, S^b), of
the one, parameter logistic function may be denned,
• SQ i (b) " l-fexpt-DCei-b)] ' 1=1 >2 N *
Of course , the expression
• - , ■ <
l+exp[-D(8i-bj)]
may be considered to be a function of either 8 or F. By ch^ice-of which
variable is fixed, the function may be used to define either the item
response curve, Pbj(8) or the perlon response curve Se 1 (b) [see Lumsden,
1978, Weisa (1977)]. Hence, the variable described within the
parenthesis of the function is considered as a random variable and the
subscript Variable is a fixed variable.
r
The curves for the pair of functions, Pb,(8) and Sq. (b) are
J? J
symmetric about the. vertical axis at 8 » 8 0 (or equivalently b - b 0 )
provided 8 0 - b 0 . As illustrated in Figure 1, however, the item
response curve (IRC) and the person respons/curve (PRC), intersect at
/ . (e 0 + b 0 )/2 if 8 0 t b 0 .
Insert Figure 1 about here
+
1
ERJC . 12
-2
-1
Bo- \ '
'2
0 or b Scale
* ■ f
Person Response Curve (monotonically decreasing ) and Item Response Curve
(monotonically increasing) with the Mean Values of 0 Q and b 0<J Qne Parometer
Logistic. f
13
0
i
Addition and subtraction, an inner pi/oduct of two functions in the ^
.same family (i.e. in ong of the two families {P^O), P^(*S) , . . -Pb^®)^
or {Sq (b),. ••S0 M (b)} in this paper), thte norm of a function and the
distance of any two functions in the same family -will be defined below. x
Definition, Addition and subtraction! of two functions ^ P^, (8) and
ut \ ^ 1
Pb 2 ( e )> or s 6i( b )> and s 62^ b ^ is define & as pairwise addition or
subtractj^ of the two. That is, > . s * ^
(Pbi ± Pb > (e > ^ p vbi(®) ± p b 2 ( e >
and (S 6l ± S e2 )(b)*'E S 6l (b) ± S 62 (b)
Definition . An inner p^pduct (or the sum of the cross products)
of the two functions is the sum of pairwise products Pbj( e i ) p b 2 ( e i ) ' y
[or equivalently S 6l (b-j)^ e2 0bj)] or more generally, the integration
of the product of the two functions with respect to 6 (or b). Thus*
.[Pb^e), P b2 *e>] = £ Pbi(ei)Pb 2 (ei) '
' , or^ r J •= " / p bl (e) P b2 (e)de
ana [s 9l (b), Se 2 (b)J = \ s 9l (b j)Se 2 ( b j)
j=i
or / S ei (b)Se 2 (b)db .
9
ERIC
Definition . The squared norms of functions Pb(©) and Se(b) are given by the
A "
inner product of themselves. Thus, we have
», I IPbl I 2, - [Pb( e )» V e >l '
= Jl Pb2(6i) ° r / P b 2(e)de » ' » ' '
and I |S e l I 2 Vfs e (b), S e (b)]
— 2 S e (b-j) or';' S^ 2 (b)db.
j-1 J •>«►••
Definition . The squared distance of two functions PbjO'6) and Pb 2 (6)
[or S ei (b) and Se 2 (b)] is the inner product of their difference, *
14
10
That is
= [Pb^e) - Pb 2 ( e >. Pb^e) -J , b 2 (e) J
^ = l lP bl l l 2 + I iPb 2 ' I 2 " 2(p b x » p b 2 )
and ' | |S Ql - Se 2 l I 2 =
= [Se^b) - s 6l (b), s 6l (b) - se 2 (b)]
= lis 6l ll 2 + MSe 2 ll 2 - 2(s 6l , ^ )
a® 2
By using the notation of integration,
UPb! " p b 2 ll 2 «/ [P b j( e > " Pb 2 ( e >^ 2 de
or iis 6l - Se 2 1 1 2 = / tse^b) - s 62 ( b )] 2 db -
With these definitions, we are ready to introduce the dual concept
of Test Response Curve (Lord, 1980; Lord and Novick, 1968)* This is ^th
Group Response Curve as an average function of N different Person
Response Curves. The Test Response Curve (TRC) is an average function
pf n IRC's defined as £
T(9) = (1/n) \ p bi (e).
Similarly, the Group Response Curve (GRC) is an average function of N
PRC's, that is, 1
f)(b) « (l/NjJ^Se^b).
Illustrative PRC's and IRC's for 100 hypothetical persons were
generated by randomly sampling 100 values of 6 from a unit normal
distribution. The resulting TRC for the simulated 100 item test iM
shown-jas the monotonically increasing Sanction in Figure 2.
Insert Figure 2 about hfere
The curve that Js a monotonically decreasing function is the PRC of 9 =
f
denoted by S 0 (b>. The curve representee! by u +"s is a Group Response
• 15
12
Curve which is obtained by taking the pointwise mean of 100 PRC's over
the randomly generated 100 b values. That is,
1=1 1
As the number of b values approaches infinity, then G(b) in the figure
will be a smooth curve, monotonically decreasing and moreover, if the
number of 9 values is also very large then *G(b) will be a symmetric
curve of T(8) about the vertical fine of 8 = b * 0. With this figure, 8i
1=1,2,. ..100- and bj, j=l,2,...100 are randomly chosen from N(0,1) so
their means are not exactly zero. It can be shown numerically that T(8)
and G(b) reach 1/2, at 0 - -~ £ and b = * b respectively.
Let us denote the average of T(8j[), i - by T,
T = (1/N) 2 T(8i)
i=l
and the average of G(bj), j=l,...,n by G,
G = (1/n) 2 GOjO.
1=1 J
Then T ■ G, because
T = (1/N) 2 T(6i)
1=1
= (1/nN) 2 2 {l/l+expI-DOi-b^)]}
j i
n •
= (1/n) 2 G(b i ) = G
j=l J
De finition of Various Extended Caution Indices
Sato's (1975) S-curve may be viewed as a discrete test response
curve. The perfect S-curve divides l's and 0's into two mutually
I
exclusive areas. with l's under the curve and O's above it. Note,
7
13
however, that direct correspondence in this way involves a reordering of
the subjects from low to high rather than from high to low as typically
presented by Sato and. as was shown in Table 2. represents the average
/
probability of correctly answering items on the test .when a person's
s *
ability is equal to 9. The analogy between the S~curve and a TRC may be
seen by considering an alternative N by n score matrix with real numbers
based on IRT rather than binary scores. More specif icaly, let
PMij = P bj (3i)
A A
where 9j[ is an estimated ability parameter, 9, for person i and bj
estimated item parameter for item j under the condition that
11 A "
2 P^Oi) " 2
J-l °J j=l
Since Pbj(Qi) = S e± (b d )
for fixed i and j, the cells of the probability matrix (PMij) are also
-i A
equal to SQ^(bj). If the rows and columns of, this matrix are arranged
in the manne^ of the S-P table and columnwise sums of the cell entries
are obtained, the result is N times G(bj), which corresponds to the P-
curve. Similarly, n times T(9^) corresponding to the S-curve may be
obtained by summing the cell entries for each row.
Selected rows and columns of a probability matrix (PMj[j) are
illustrated in Table 3 for a 32 item test involving the subtraction of
signed numbers t,hat was administered to a sample of 127 students
(Tatsuoka & Tatsuoka, 1981). Also shown in Table 3 are the values of
the estimated item and ability parameters and the test and group
response curves evaluated at those estimated parameter values (i.e.,
T(9i) and G(bj) respectively).
■ ' 13
9
14
Table 3
The 127 x
32 Probability Matrix
(EM ) for
Signed-Number Subtraction Problems
\item j
■
1
2
15
16 -
-'- 31
32
T(& ± )
\
1
2
.000
.000
.001
.002
• * ' .040
• ? • .061
1
.002 ' '
.004 • ■
• .017
• .035
.082_
.129
.026
.038
-1. 114
-0.916 ^
3
.000
•
.002
•
• • • .061
— •
.004 • ■
•
• .035
•
.129
.038
-0.722
60
.549
•
.635
•
• • • .783
•
.871 • •
•
• .969
.969
.809
. 700
N 61
.567-
.647
• • • .789
.878 •
• • .9-70
.970
.816
.710
62
.568
.648
•
• • • .789
•
.878 •
•
■ • .970
•
.970
.817
.714
88
.860
•
•
.854
•
•
• • • .832
•
•
•
.962 •
•
•
•
• • .994
•
.993
.968
1.22?
127
1.000
, 1.000
•
• • «1.000
•
1.000 •
•
■ '1*000
l.ooo-"
1.000
+ «
.327
S* .570
.691
.708
.837 ;
.843
-..467
* ""'•V, 7
-.044
.021
t
.289
.378
<<
i
4
19
ERIC
\
15
Insert Table 3 about here
Before introducing the extended caution index, it is useful to
compare the S and P curves for the data fron/which the estimates in
• Table 3 were obtained with their counterparts, i.e., n times T(8j[) and N
times G(bi). The two comparisons S with nT(6j[) and P with NG(bO are
\
provided in Figures. 3 and 4 respectively. The tic marks on the
horizontal axis in Figure 3 indicate the location of the G's for the 127
students in the study. The tic marks in Figure 4 show the values of bj
for the 32 items* The close correspondence between the two pairs of
'curves is apparent. The number of items and the limited range of values
that bj assumes for these data obviously limits the evaluation of the
correspondence between the curve's in Figure 4, however.
Insert Figures 3 & 4 about here
Given, the parallels between the S-P curves and the GRC and TRC, the
extension of the caution index for use with the latter curves is
relatively straightforward. There are, however, several natural ways
in which the extension can be made. Possibly the m6st obvious extension
8 * u
is to simply replace the term (Mj[j - P^ # ) in the denominator of
equation (1) -by its counterpart froj^ the PMjj matrix, i.e.,
' JPM i:| - T(6i)] - [Sfi ,(8i) - T(9i)].
th the above substitution, out first extended caution index, Cl^, is defined
2 ( yij - Pi.Xy.j - P..)
CIa = 1 -
2tiT(bp - T(6 i )](y. j - FT}
J
20
ERIC
P/N
or
0(b)
1.0
0.8
0.6
0.4
0.2
0.0
-3
— h
-2
— I-
-1
4-
4-
4-
4
B 12
0 or b Scale
Figure 4. Comparison of P-curve (Converted to the Proportion Correct to Item)
With tne Group Response Curve' for the Data in Table 3.
9
ERIC
18
The numerator divided by n, i.e., the covariance of (yij) and (y.j),
can be expanded to the sum of
(l/n) 5 yijy.j and -P..Pi. .
The value of the second term does not depend on a person's response to
each item but depends on his/her total score. As long as the total
score is fixed, the anomaly of response patterns will not be detected by
this value. This value varies between persons, so if two persons have
the same achievement level 6^, then the judgment regarding the extent
to which each response pattern deviates from the norm depends only on
the first term of the numerator. Since the denominator is a normalized
constant for a fixed value, 9i , it* is unlikely that a particular
aberrant response pattern pr6duced by an individual whose achievement
level is 6j[ will affect the denominator.
Thus, if is natural to expect that if both the quantities are replace
by' the inner products of the two row vectors (yij) and (y^) for'j»l,
2,..., n, the value£"of CI ^ will be affected by the degree of anomaly of
individual response patterns. Moreover, calculation of inner products
is easier^Than That of covariances. Let us define four other natural
extentions of the Cautioi^ index as follows.
Definition . Four alternative definitions of the extended caution
* ft V-
index for person i are: 1 .
C2i - 1
j 1 Se i (6j)y; j
j/ijSeitfj)
C3i - 1 -
j 1 G(6 J> S %(b j )
19
C4i
and •
C5i
/ 2 S§ (bj)G(bj)
J^ijSe^bj) .
1 ~ n
j-1 J J
The denominators of the four indices are considered as normalizing
constants but the characteristics of the numerators will be divided into
two categories. The indices in the first category, C2i and C4j[ give
measures that are more group dependent, because they are the sums of
cross products of the corresponding elements of the observed vector
(yi j) 1 and the row-sum total vector (y.j)> and Group Response Curve G(bj)
resjjectively • They measure the relationship of an observed response
pattern for, a person i to a normed variable derived from the group the
person i belongs to. Thus these indices have a similiar function to the
Norm Conformity Index, NCI, defined in Tatsuoka & Tatsuoka (1980). The
remaining indices, C3 t and C5 if are more individually oriented. That
means the quantities obtained from C3± and £5j[ reflect the extent* a
person i's response pattern (y£j) relates to a theoretically derived PRC
at the fixed level of Q i . Thus, it can be said that the indices C3i and
C5j[ are similar to the Individual Consistency Index (Tatsuoka &
Tatsuoka, 1980). )
These extended caution indices for person i will be easily altered
v
to those for item j.
N
C2
j
1 -
d^i yi -
N
9
ERIC
24
20
r /
\
and
C3^
1 -
2 yij -V«i>
J 1 P Bj< S i )T(S l )
N
C4^ = 1 -
2 yij T(6i)
i=l J
N
C5^ = 1 -
2P B ,(e i )T(e i )
i=l J
N
2 yij T(^i)
i=l J -
X
SimUfSrly, the indices C3j and C5j are^ potentially useful for detecting
anomalous respqnse patterns in comparison with item j's IRC while C2j
and C4j are potentially useful indices for purposes of identifying ^J- ems
of which patterns deviate from that of test, TRC.
The Case of Two and Three Parameter Logistic Models
Problems in Person Response Curves and Groujf Response Curv es
Person Response Curves for the one parameter logistic model are
represented by Smooth monotonically decreasing functions defined over
^the difficulties of the infinitely many items. But BRC for the two
parameter logistic model is no longer a smooth, monotpnically decreasirtg
curve. .Figure 5 provides the graph of Person Response Curve for the
ability levels of 6 » ,0 as veil- as Test Rersponse Curve of the two >
parameter logistic model where ability measures 8^, i»l ,2 , • • • , 100 , were
randomly sampled frdm abnormal (0^1) distribution, the difficulties bj,
j=l ,2, . . 100 were also randomly sampled from a normal (0,1)*-
distribution and the item discrimination indices, aj j=l,...,100, were
9
EMC
25
21
v
drawn from the uniform distribution of tbe in^erv^l (0.8, 1). Test
Response Curve, Person Response Curves are given by
and
N T(8) - (1/n) 2 P b (8)
j = l J
*° l+exp[-D^(9 0 -b)J
for.a fixed 6 0 and variable b
Insert Figure 5 about he re
The dotted line (+++) in the figure is the Group Response Curve of a
hundred s.ub jects. XAlthough^ach locally oscillated, especially
abound the origin, the GRC (the mean curv^of these PRCs) becomes fairly
smooth and almost monotonically decreasing. Since b-j, j=l,...100 are
randomly selected from N(0,1), a larger oscillation of PRC around the
mean 0 is expected. But GRC is expected to be smoother as the number of
students and it^etife increase to a larger number.
Insert Figure 6 about h ere
Figure 6 is the gfaph of TRC, GRC, PRC of 6 « 0 for the
three parameter logistic model. The parameters 9^, bj and aj were
generated by the same method as that of the two parameter model then
fifty C-values of 0.15, and 50 of 0.20 were randomly assigned to
100 pairs of aj and bj to make the three parameter logisticT'model.
It seems that the smoothness of the curve GRC f6*r*three parameter
logistic model is about the same, differing only as expected in terms of
the lower asymptote. A larger number of subjects will be needed for the
three parameter case in .order to obtain smoother GRC.
The definition of the' extended caution indices may 'be applied more
generally to the two and three parameter logistic models' in essentially
24
the same manner as it was develoed for the one parameter model. f ^+
Note that the arrangement of rows and columns according to the
orders of the proportion, corrects (p values), for n items £nd the total
scores for N subjects is essential to determine S-P curves, and the
values of MP and M s , i=l,2,...,N, j=l,...,n. With our extended caution
indices, the arrangements 6f rows and columns in monotonic order of the
probability are no longer necessary.
Application of New Indices for the Detection of Anomalous Responses .
There is evidence that^student errors on certain types of arithmetic
problems are frequently quite systematic (Brown and Burton, %1 978;
Birenbaum and Tatfsuoka, 1980 Davis, McKnight, 1980). That is*, students
seem to consistently apply erroneous algorithms in attempting to answer
a problem 6f a particular fopi. Sometimes erroneous or incomplete rules
resultin the right answer. For example, a student who consistently
treats a multiplication sign as if it were an addition sign would get
the right answer to the problem 2x2=4, but would get it for the
wrong reason. A score of zero for using the wrong operation would be _
better reflection of the student's ability to multiply than a score of
one for answering "4" to the item. %
Birenbaum and Tatsuoka (1980) haVe demonstrated, that the customary zero-
one scoring of incorrect and corrent answers can give the appearance of
higher dimensionality and cause difficulty in attempting to apply IRT when
* • t.
students consistently apply erroneous rules to the addition and subtraction
of signed numbers. The* difficulties result from the fact that several erroneous
rules frequently yield the right answer for some problems. Right answers
for the wrong reasons not only cause problems in applying IRT, but more
29
25
importantly they can result in misleading scores and make it difficult to
diagnose what a student is doing wrongs
By painstaking work Tatsuoka and her colleagues (Birenbaum and
Tatsuoka, 1980; Birenbaum, 1981) were able to identify several erroneous
rules that were consistently applied by certain students. Birenbaum and
Tatsuoka (1980) reanalyzed their data after converting ones to zeroes
for items that students got right for the wrong reasons. That is, an
item score was changed from one to zero if (1), a student was identified
as consistently applying an erroneous rule and (2) application of that
erroneous rule would lead to the correct answer for the particular item
in qu£&tion. Analysis of the resulting modified data indicated that the
data were more nearly unidimensional and there was good evidence that
IRT was more applicable to the modified data than to the original data.
Anomalous response patterns can sometimes be found by conducting an
intuitive error analysis or by clinical interviews. Both/approaches
require enormous effort. Brown and Burton (1978) and Tatsuoka et al.
(1*980) have dev€T^xed cumputerized approaches to error analysis. But
these methods' are expensive and were based on extensive work with highly
specific item content.
, Tatsuoka and Tatsuoka (1981) demonstrated an index, called the
individualized consistency index (ICI) Which was shown to be useful in
detecting a variety of erroneous rules of operation of signed—number
addition and subtraction problems. Using the ICI to detect examinees
who are apt to hafre a misconception saves considerable effort because
only examinees so identified Jiave their item responses routed t^ the
detailed error-diagnostic system. Applicatipn of the ICI is limited,
30
26
however, because it requires repeated measures, i.e., several items
based on an identical item form, within the test* Such repetition is
not common on most tests*
As will be seen below, the index similar to ICI, C3i, not only
avoids the repeated measure limitation but Is apparently more effective
"fpr purposes of detecting anomalous response patterns resulting from the
consistent application of an erroneous rule* Tat^uoka & Tatsuoka
(1981) showed a list of erroneous rules of operation ("bugs") detected
by ICI* The 32 response patterns resulting from these bugs are
classified in Group A. The rest of the 103 response patterns are
classified into two groups according to the error-diagnostic system, SIGNBUG.
Group B consists of 7 responses which are P ro ^ a ^^ using one or two
erroneous rules inconsistently; Group C, responding adequately using the
r
right rule of operation and/or no indication of systematic errors. The
errors observed in Group C are apparently just random errors. The
estimated item and person ability parameters nee^l to compute the
extended caution indices were obtained by the computer program GETAB
(Robert Baillie, 1979), using Birenbaum & Tatsuoka's modified dataset.
Distributions of the indices C2j[ and C3 ± are displayed in Figures 6
and 7 respectively. Only members of groups A and B (persons who
consistently used an erroneous rule) and of group C (persons who made a
substantial number of errors but whose errors were not the result of
consistent use of an erroneous rule) are included in the distributions
shown in Figures 6 and 7. In both figures, persons in group k and B are
depicted by shaded boxes and those in group C by unshaded boxes.
31
27
ERIC
Insert Figures 7 and 3 about here
As can be seen in Figure 6, C2± does not provide any basis for
distinguishing persons who are consistently using an erroneous rule from .
those who aren't. /The two groups are distinguished almost perfectly,
however, by the magnitude of C3i (see Figure 7). Indeed, there is
almost no overlap between the two groups • All 39 members of Groups A and B
have values of C^f of .05 or higher whereas only two of the 88 members
of group C have positive values 6f C3i and the rest of the members of
group C have values of C3j[ as large as .05. Thus, C3j[ may be used to
identify with a high degree of accuracy those persons who consistently
use an erroneous rule.
As might be expected from a comparison of the coefficients, C4j[
works in a fashion quite similar to C2^, and C5j[ works much like C3j[ in
terms of the abiliy of these indices to distinguish membenPof groups A,
B and C. It is clear that C2^ and C4j[ are not useful for detecting
anomalous response patterns resulting from consistent application of an
erroneous rule. These indices may be useful for other tasks for which
NCI or Van.de Flier's index (Harnisch & Linn, 1981) have Seen found to 4
be useful. The third and fifth indices (C3* and CSj) however, are quite
1
effective for purposes of detecting persons who make consistent errors.
Insert Table 4 about here
Table 4 shows a summary of t-statistics comparing the means on the
four generalized caution indices and ICI in the two groups: A and B
combined versus C by itself.* 1 The t-value for index 2 is not significant
32
28
40
ooooooooqoooooo
^M** 0l pM 0 o o <j- 2 SM Sfi 8
80 o o o ^ w ~ - - - -
(0CJC0^OcpCsJC05j-O(0WO
o o o
dddddooooo
o
O O Q O O O O
M (0 O t CO M (0
5t ^ io io m to (0
I I I I I I 1 1
f I
o o
do ° d d d d d d d djd d d o
' • ■ . A
Figure 7. Histogram of Index C2 ± : The shaded Area Represents the
Members of Group A and B, — Using Some Erroneous Rules, N=127
ERLC
33
^9
zo
10 --
D
8° QOO QO OOQOOOOOQO OOOQOO'OOOU>NC0
m 3- rorOcJ cj — O O J - — cvJcvJfO^^- w»(0SK«)8«j-
d d d d d d d d d 9 d o d d d ci d o d d d d d d d
l I III || | | o
Figure Histogram of Index C3^: The Black Area Represents the Members
of Group A and the Shaded for Group B — Using Some Erroneous Rules
The White Area is for Those Using the Right Rute. H=127
ERLC
34
> Table 4
A Summary of t>8 tat is tics Comparing the Means on the
Four Generalized Caution Indices and ICI in the Two Groups
T«/1 ■( poo
VjiOUp A 01 D
L Value '
p
k
N
39
,88 .
Index 2
Mean
S.D. •
.0929
-^0065
.0306 »
.689
.4980
Index 3
Mean
.5310
,2444
-.2688
.1300
'-19.293
< .00005
Index 4
yv t — —
Mean
c n
.0650
-.0045
♦ UZ7 J
—J. HDD
Index 5
Mean
S.D.
.5091
.2615
-.2643
.1350
-17.467
< .00005
&
ICI;
.Mean
S.D.
.9223
.0645
.8144
.1058
-7T121
< .00005
♦
i
i
13
35
31 ,.
/
but all others are. significant* Index 1 is. excluded in the analysis
because the denominator of this index becomes infinity when all items
are correctly answered by all examinees,
| > * Discussion
As was shown above, the caution index* which Sato developed based
solely on a comparison of observed item responses to group responses can be
readily extended to theory based estimates of person and group response
probabilities. The caution inde^ is a linear transformation of the
covariance of a person's .response pattern with one or another
theoretical curves computed' using item-response theory. Alternatively,
the extended caution indices m£$ be viewed as linear transformations of
the distance bewteen a person's response pattern and a theoretical curve
(either the person response curve, as in the case of C3j[ and C5i or the
group response curve, as in the case df C4j[).
The application of the extended caution indices that were
introduced in th}.s paper provided strong evidence that the indices that
depend on the distance between a person's response pattern and their
theoretical person response curve (i.e., C3i and C5j[) are quite
effective for purposes of identifying persons who consistently 'use an
V
erroneous rule in answering signed-number arithmetic problems. This is
a potentially important result that deserves further investigation with
other data sets involving' different types of achievement test data. If
additional research yields similar results, £hese indices may have\
«■
considerable instructional utility because instruction can be made much
*-* .
c
more specific once it is determined that a student is consistently
making an error as the result of a particular misconception.
ERJC
38
32
References
«
Birenbaum, M. Error Analysis — It does make a difference . Doctoral
Dissertation, University of Illinois at Urbana-Champaign, 1981.
Birenbaum, M. , & Tatsuoka, K. K. The use of information from wrong
responses in measuring students' achievement (Research Report 80-1 )•
Urbana, 111.: University of Illinois, Computer-based Education
Research Laboratory, 1980.
Brown, J. S., & Burton, R. R. Diagnostic models for procedural bugs
in basic mathematics skills. Cognitive Science , 1978, 2, 155-192.
Davis, R. B., & McKnight, C. The influence of semantic content on
algorithmic behavior. The Journal of Mathematical Behavior , 1980, J3> '39
Harnisch, D. L., & Linn, R. L. Analysis of item response patterns:
questionable test data and dissimilar curriculum' practices.
The Journal of Educational Measurement , 1981, in press.
Levine, M. V., & Rubin, D. B. Measuring the appropriateness of
multiple-choice test scores. Journal of Educational Statistics ,
1979, 4, 269-290.
/
Lord, F. M. Application of item response theory to practical teBTlhg
problems .' jlillsdale, N.J.: Erlbaiim, 1980.
Lord, F. M., & Novick, M. R. Statistical^ theories of mental test
scores. Reading: Addison-Wesley , 1968v^
\
Lumsden, J. Tests are perfectly reliable. British Journal of Mathe-
and Statistical Psychology , 1978, 3J^, 19-26.
Sato, T. The construction and interpf etation of S-P tables .
Tokyo: Meiji Tosho, 1975 (in Japanese).
r*
Tatsuoka, K. K. , & Tatsuoka, M. M. Detection of aberrant response
patterns and, their effect on dimensionality (Research Report 80-4).
Urbana, 111.: University of Illinois, Computer-based Education
Research* Laboratory , 1 980 •
Tatsuoka, K. K., & Tatsuoka, M. M. Spotting erroneous rules of
operation by the Individual Consistency Index (Research Report 81-4).
Urbana, 111.: University of Illinois, Computer-based Education
Research Laboratory, 1981.
Tatsuoka, K. K. , Birenbaum, M. , /Tatsuoka, M. M. , & Baillie, R. A psycho-
metric approach to error analysis on response .patterns (Research
Report 80-3). Urbana,- 111.: University of Illinois, Computer-based
Education Research Laboratory, 1980.
33
Trabirii T. E. t & Weiss, D. J, The person response curve: fit of •
individuals to item characteristic curve models (Research
Report 79-7). Minneapolis: University of Minnesota, Department
of Psychology, Psychometric Methods Program, 1979.
Wright, B. D., & Stone, M. H. Best test design, Rasch Measurement .
Chicago: The University of^Chicago, Mesa Press, 1979.
e
• r ■
38