Skip to main content

Full text of "ERIC ED211061: Indices for Detecting Unusual Item Response Patterns in Personnel Testing: Links Between Direct and Item-Response-Theory Approaches. Computerized Adaptive Testing and Measurement."

See other formats


ED 211 061 



DOC0HENT BESQME 



IB 009 676 



AOTHOB 
TITtE 

i • 



INSTITUTION 

SPCNS AGENCY 

REPORT NO 
POE DATE 
CONTRACT 
NOTE 

EC5S PRICE 
DESCRIPTORS 

IDENTIFIERS 
ABSTRACT 



Tatsuoka, Kikumi; Linn, Robert L- 

Indices for Detecting Onusu'al Item Response Patterns 
in Personnel Testing: Links Between Direct and 
Item-Response-Theor.y Approaches. Computerized 
Adaptive Testing and Measurement- 
Illinois Oniv*, Orbana,. Computer-Based .Education 
Research Lab* ( 

Office of Naval Research, Washington, E.G. 
Psychological^ Sciences Div- * ( 

CERL-RR-81-5 

Aug 81. , ' • \ 

N000-1«-79-C-0752 , 

38p,. ; For related dccument. See IR OQS 679.. 

MF01/PC02 Plus .Postage- * * , % 

Arithmetic; *Ccmputer Assisted Testing; ♦'Latent Trait 
Theory; *Matrices; ♦Statistical Analysis; Testing 
Problems ft 
♦ Response Pattei/ns; S F Curve Theory 



Two distinct approaches, one .based 
theory and the other based on observed item respons 
summary statistics, have been proposed to identify 
patterns. A link between these two approaches is pr 
certain correspondences between Satc f s S-P Curve Th 
response theory. This link makes possible several e 
Sato's caution index that take advantage of the res 
response theory. Several such indices are introduce 
illustrated by application to a set of achievement 
the newly introduced extended indices were found to 
for purposes of identifying ^persons Vho consistentl 
rule in attempting to solve* signed number %rithmeti 
potential inportance of this, result is briefly" disc 
references are listed. (Author/LLS) ' 



cri item 
es ard s 
unusual 
cvided b 
eory *and 
xteusicn 
ultsr of 
d and th 
test dat 
te very 
y use an 
c prcble 
ussed, a 



response 
tandard ^ 
response 
y showing 

item 
s of 
item 
eir use 
Two of 
effective 
erron eous 
ks « tThe 
nd 15 



****** ******************>^ 

* Reproductions supplied by EDRS are the be^t that can be made 

* 'i • from* the original document. 

************* ************************ 4 4******^ 



U.S DEPARTMENT DF EDUCATION 

national Institute of education 
educational resources information 
center <erici 

yC Thts document has been reproduced as 
received from the person or organization 
originating tt 

Minor changes have been made to improve 
reproduction quality 

• Points of view or opinions stated in this docu 
ment do not necessarily represent official NIE 
position or policy 



University of Illinois 

Computer-based Education Research Laboratory 
Urbana, Illinois 



INDICES FOR DETECTING UNUSUAL ITEM RESPONSE 

PATTERNS IN PERSONNEL TESTING: LINKS < 

BETWEEN DIRECT AND ITEM-RESPONSE-THEORY APPROACHES 

by ' y 

Kikumi Tatsuoka ^ . 

and 

Robert L. Linn 



\ 



Computerized" Adaptive testing and Measurement 
Research Report 81-5 
August 1981 



' PERMISSION TO REPRODUCE THIS 
MATERIAL HAS BEEN GRANTED BY 

D. Bitzer 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (EpIO)." 



Acknowledgment 

The authors wish to acknowledge the kind cooperation 
extended to us by the people involved with this report. 
Bob Baillie programmed the lessons and data collection and 
analysis routines, along with- his assistant, Oavid Dennis, 
Mary Klein gave insight and meaning to many things as a 
teacher of the children whom we seek to help. Roy Lipschut 
did tne layouts and Louise Brodie did the typing!, 



<x 0 



V 



4 



Abstract « 
, Two distinct approaches, one based on item-response theory and the 
other based on observed item responses and standard summary statistics, 
have been proposed to identify unusual response patterns.^ A link 
between these two approaches is provided by showing certain - 
correspondences between Sato's S-P curve Theory and item response 
theory. This link makes ^ossib^e several extensions of Sato's caution 
index that take advantage of" the results of item response theory. 
Several such indices are introduced and their use illustrated by 
application to a set of achievement test data. Two of the newly 
introduced Extended indices were found to be very effective for purposes 
of identifying persons who consistently use an erroneous rule in 



attempting to solve signed-number arithmetic problems. The potential 
importance of this result is briefly discussed. 



\ 

4 



9 

ERJC 



Introduction 



Several authors have recently shown an interest in using 
information from patterns o^response to test items to extract 
information not contained in the total score* A variety of purposes have 
been envisioned for use of the additional information. Wright (1977), 
for example,- refers to identification of "guessing, sleeping, fumbling, 
and plodding" (p. 110) from the plots of residual item scores based on 
the differences between item responses and the expected responses for an 
individual based oh the Rasch model. Levine and Rubin (1979) discuss 
response patterns that are "so atypical. .. that his or her aptitude test 
score fails to be a completely appropriate measure" (p. 269). Sato \ 



\ 



(1975) proposed a "caution" index which is intended to identify student^ 

\ 

\ 

Whose total scores on a test must be treated with caution* Tatsuoka ai)<£ 
Tatsuoka (1980) and Harnisch and , Linn (1981) have dislussed the 
relationship of response patterns to instructional experiences and the 
possible use of item' response pattern information to help diagnose the 
types of errors a student is making. j 

Indices of the degree to which an individual's pattern of responses 
is unusual are conveniently classified into two ge^eral^ types: those 
that use item response theory (IRT) to identify unusua^ patterhs and 
those that rely only on observed item responses and standard summary 
statistics based on those responses (e.g. the number or proportion of 

4 

people in a norm group answering an item correctly). The work of Wright 
(1979) and of Levine and Rubin (1979) are examples of approaches based 
on IRT while the work of Sato (1975), Tatsuoka and Tatsuoka (1980), and 
Harnisch and Linn (1981) are of the latter type. 



9 

ERIC 



/ 

7 • 



i 

The primary purpose of, this paper is to develop a link bej£wee/t 
these two general approaches. More specifically, we will show a / ^ 
correspondence between Sato's (1975) S-P Curve theory and. teht response 
curves and "group response curves" developed from IRT. Al^o, Sato's 
Caution Index defined in the S-P curve theory is generalised into a 
continuous domain utilizing IRT. That is, S-P curve theory and the- 
Caution Index are originally developed in a discrete domain of 0 - 1 
scoring, but this^study extends the theory to a more general case of, 
probabilities. 

Several different geaeralized versions of the caution index are 
presented Results of applying these indices suggest that there are two 

categories. One set of indices functions in a marxpet similar t^ Sato's 

p 

original index . — The- other -set fnnrHrme mnrp n]f ? T*t-g MO fra a ad .._ 



Tatsuoka's Individual Consistency Index in that it successfully , 

distinguishes examinees who make consistent errors in responding to test 

, . ****** 
i terns . f * 

t 

We first briefly review Sato/s S-P Curve theory. Next, a group 
response curve. (GRC) is developed for the one parameter logistic model. 
The GRC is based on the dualistic nature of the one parameter logistic 

model which depends on the choice of fixed and random parameters in the 

r _ 

model. We then present an extended caution index with several special 
areas which are applicable to^RT. The cases of two and bhree parameter 
logistic models are briefly discussed with special attention given to 
problems^ with person and group response curves^in these models. 
Finally, we discuss applications of the new caution indices for the 
detection of anomalous response patterns. 



8 



«~ S-P Curve Theory 

Sato's (1975) caution index is applicable to* either an item or an 

individual examinee* In either form the index is conveniently, obtained 
» 

4 * 

from an especially arranged table of binary item scores referred to as, 
an "S-P Table". The S-P table, the associated 9-P curves and various 
indices as the caiition index are widely used in Japan for diagnosing 
student performance, detecting "aberrant response patterns and for 
assessing the quality of a test or instructional sequence. ? 

The S-P table is a data matrix in which the students (represented 
by rows) have been arranged in descending ordes of their total* test 
scores from top to bottom and the items (represented by columns) have 
been arranged in ascending order of difficulty from left to right. A 
hypothetical S-P table is shown in 'Table 1. The solid stair-step line 
is called an S-curve which is short for Student curve. For eaqh person, 
represented by a given row, a vertical line is drawn to the right of the 
nth cell from the left where n is the number of correct answers obtained 
by that j>erson. The S-curve is then obtained by connecting the right 
edge of the nth cell of each row. The P-curve is drawn in an analogous 
fashion by counting down from the'top the number of cells equal to the 
number of students who correctly answered the item corresponding to a 
given column. The P-curve for the^data in Table 1 is shown by the 
dashed line. 

. Insert Table 1 about here 
Let yjj be the binary response for student (row) i to item (column) 
j of the S-P e^&ie.' Rotf and column sums are denoted by y^. and y.j 
respectively. j The total number of ones in the S-P table is denoted y # # 



Table 1 

A Hypothetical Score Matrix (V^) and 



S- (solid line) and P- (dotted line) Curves 
sitem j 



subject 


1 


2 


- 3 


4 


5 


6 


7 


8 


9 


10 


v 

y i. 


P 


M i. 


IN 


























1 


} 


1 


1 


1 


1 


1 


1 


1 


1 


1 


10 


1.0 


10 


2 ■ 




1 


1 


1 


1 


1 


1 


1 


1 


0 


9 


0 9 


Q 


. 3 




1 


1 


1 


1 


0 


1 


1 


0 


1 


8 


0.8 


8 


4 




0 


1 


1 


1 


r 


0 


1 


0 


0 


6 


0.6 


6 


5 




1 


• 1 


1 


0 


i 


0 


0 


1 


r°~. 


6 


0.6 




6 




1 


1 


0 


1 


0 


1 


10 


1 


0 


' 6 


0.6 




7 




1 


1 


1 


0 


o 


1 


0 


0 


0 


5 


0.5 


5 


8 




1 


1 


0 


1 




o 


0 


0 


0 


5 


0.5 


5 


9 




. 0 


0 


1 


!"o" 


1 


0 


1 


1 


0 


5 


0.5 


5 


IP 




1 


of 


"l 


0 


0 


1 


0 


0 


1 


5 


0.5 


5 


11 






1 


1 


0 


0 


0 


0 


0 


4' 


0.4 


4 


12 




' 0 


0 


0 


1 


1 


0 


0 


0 


1 


4 


0.4 


4 


13 


-_L_j 






.sx. 


-0- 


_1_ 


-0 . 


-0 - JD- 0 - 


. 3- 


-0-3- 




14 




0 


1 1 


0 


0 


0 


0 


0 


0 


0 


2 


0.2 


2 


' 15 


0 


1 


0 


0 


0 


0 


0 


0 


0 


0 


1 


0.1 


1 




13 


11 


10 


9 


8 


8 


6 


5 


5 


4 




y .. = 


79 


P . 


.87 .73 j67 j60 53 .53 AO .33 .33 .27 




P 


.527 


O 
























• • 






13 


11 


10 


9 


8 


/8-6 


5 


5 


4 









) 



ERIC 



t p 

and the proportion of correct responses by Pi,, P.j and'P #t for the row, 
column and entire table respectively. As can be seen*in Table 1, the S- 
curve is the step function ogive of the mutative distribution 
function of total scores, yi mf for the 15 students and the'P-curve is 
the corresponding function of y.j, the number of right answers for the 10 
items. 



Insert Table 2 about here ' 

— — ~~ % 

i * 

If the S-curve is held invariant and all the 0's to the left of the 
S-curve are changed to l's and all the l's to the right of the same 
curve to 6's the result is the S-P table shown in Table 2 is called a 
perfect S-curve. The entries in Table 2 are denoted Mjj. Similarly a 
perfect P-curve will be obtained and the entries in the new table are 
denoted by m£j. As can be seen, M?. - yi- all i which corresponds 
to the fact that the S-curve is unchanged as the result of changing the 
cell entries from y^ t0 m| j . The values of the column sums for Tables 
1 and 2, i.e., y.j ancfc m£j are not in general equal, however. 

Sato (1975) defined a Caution Index for subject i by taking the ratl/o 
of two covariance^. The numerator of the ratio is the covariance of 
observed row vector i, (yij) j»l > • • • > n and the sum-of-column vector, 
(y.j)> # j"lf2 f «««*n and the denominator is the covariance of the 
corresponding scores (assuming S-curve is perfect) (M?j)» j=l,...,n and 
the column-sum vectqr (y.j)> j"4,2,...,n. More specifically, the 
caution index C± for the subject l"is given by 

ACyij - Pi.Xy.j ~ 
2 (Mfj - Pi.Xy.j - p..) 



• ' Table 2 

Perfect S-curve Obtained by Changing l f s to the Rf£ht 
of S-curve to 0 and O's to the Left to 1.. 



item 1 - - s 
^ 1 2 3 4 5 6 7 8 9 10 y ' M ± 
subject i >j^_ 



1 1 


1 




I 






1 


1 


1 


1 


10 


10 


2 1 


1 










1 


-r 


1 I 




9 


9 


3 1 


1 


1 


1 






1 


1 


0 


0 


8 


8 


4 1 


1 












v 0 


or 


0 


6 


6 


5 1 


r 


j 


I 






■0 


\o 


0 


0 


6' 


6 


6 1 


i 










0 




0 


0 


6 


6 ' 


7 1 


i 








0 


0 




) o 


0 


5 


5 


8 1 


i 


1 1, 






0 


0 




0 


0 


5 


5 


' 9 "1 


i. 






I 


0 


Oj 


A) 


0 


0 


5 


5 


10 1 


i 






1 


6 




0 


0 


0 


5 


5 


11 • 1 


i 






0 


0 


0 


0 


0' 


"0 


4 « 


4 


12 1 


i 






0 


0 


0 


0 


0 


0 


4 


4 


13 1 


i 




0 


0 


o nr\ o 


0 


0 


3 


3 


14 1 


i 


0 


0 


0 


0 


0 


0 


0 


0 


2 


2 


15 1 


0 


0 


0 


0 


0 


0 


0 


0 


0 


1 


1 




11 


10 


9 


' 8 


8 


6 


5 

i 


5 


4 


4 

79 


79 



H*"'""' 15~DTI3 12" 10 6 3 3 2 "1 



Perfect P-Curve Obtained by Changing l's Below 
^ the P-Curve to 0 and 0 T s Above to 1 



item j 




























1 


2 


3 


4 


5 


'6 


7 


8 


9 


10 


y i. 


M i. 


subject' i 






















\ 




1 


1 


















•1 


i 

10 


10 


'2 


1 


















1 


9 


10 


3 


1 


















1 


8 


10 


* 4 


1 


















1 


6 


10 


5 


1 


















0 


6 • ' 


9 


6 


1 














0 


•o 


0 


6 


7 


7 


1 












0 


0 


0 


0 


5 


6 


8 


1 












0 


0 


0 


0 


5 


6 


9 < " 


1 








0 


0 


0 


0 


0 


0 ' 


5 


4 


10 


1 






0 


0 


0 


0 


0 


0 


o- 


' 5 


3 


111 








0 


0 


0 


0 


0 


0 


0 


4' 


2 


12 


h 


t 


0 


0 


0 


.0 


0 


0 


0 


0 


4 


1 


13 






0 




0 


0 


0 


0 


0 


0 


3 


1 


14 


0 


0 


0 


f 


0 


0 


0 


0 


0 


0 


2 


0 


US 


0 


0 


0 • 


0 


0 


0 


0 


0 


0 


0 


1 


0 
















> 












y .j - 


13 


11 


10 


9 


8 


8 


6 


5 


5 


4 


79 






13 


11 


1 

10 


9 


8 


8 


6 


5 


5 


4 


79 































10 



and the caution index, Cj, for item j is given by 



Ci - 1 - 



^(Mij - p.jXyi: - p..) 

The second term of the caution index for item j is the ratio of two 
covariances: The numerator is the covariance of column vecXor j, (yij) 

and (yi.), i=l N and the denominator is the covariance of the vectors 

'(•y i# ) and (Mij), i=l,2...,N.' The value of the denominator is considered 
as a norm value to standardize the numerator. 

It can be said that /this ratio in the above caution index is equal 
to the ratio of the traditional ^discriminating index, r j , total-^tem 
correlation tp the standardized (or ideal in a'sense illustrated in 
— Ta¥l6 -2-) discriminati^gJ^ex^ rj', for itenL4*_^ That i^ 

covj(yij, yjj 

covj(yij, yt . ) = qj(yi j) o(Yi J ■ £4 

v covj(Mfj, y±.) < 0 covj(Mgj, y t ) rj 

r ojfrf . )a (yi.y 

o p x 2 

It is clear that 2. (y, . - P .) * 2 (M. . - P ,) because the number of 

ij .3 iJ -J 

1/s in column j is invariant as can be seen in Tables 1 and ^, so the 

number of l's in the column vector j, (m£j) and (yij) are the same., 

\ Therefore, the two variances (^(y^) anda^(M^j) are equal. \ 

The .Extended Caution Index in Conjunction With Response Theory 

Test and Group Response Ctfrves: One Pa rame t e r L ogistic Model . * 

According to the one parameter logistic model, the item response 

curve may be written fc 

, . L. , j-1,2 n, , 

P bj (8J = l+exp[-D(e-bj)] ' 

u 



where 8 is the latent ability, bj is the difficulty of item j and D is 
a constant which is set equal to -1.7 for convenience of comparison to 

the normal ogive model (see'Lotd & Novick, 1968, p. 400). In the above 

— * " r 

equation, b-t is fixed and 9 is a random var 

Although in practice, the number of ^it^s, n, is a finite number, 
it is useful to consider b as a continuous variable. By holding 8^ fixed 
and treating b as a continuous variable, che dual function, S^b), of 
the one, parameter logistic function may be denned, 

• SQ i (b) " l-fexpt-DCei-b)] ' 1=1 >2 N * 
Of course , the expression 

• - , ■ < 



l+exp[-D(8i-bj)] 

may be considered to be a function of either 8 or F. By ch^ice-of which 
variable is fixed, the function may be used to define either the item 
response curve, Pbj(8) or the perlon response curve Se 1 (b) [see Lumsden, 
1978, Weisa (1977)]. Hence, the variable described within the 
parenthesis of the function is considered as a random variable and the 
subscript Variable is a fixed variable. 

r 

The curves for the pair of functions, Pb,(8) and Sq. (b) are 

J? J 
symmetric about the. vertical axis at 8 » 8 0 (or equivalently b - b 0 ) 

provided 8 0 - b 0 . As illustrated in Figure 1, however, the item 

response curve (IRC) and the person respons/curve (PRC), intersect at 

/ . (e 0 + b 0 )/2 if 8 0 t b 0 . 



Insert Figure 1 about here 
+ 



1 



ERJC . 12 



-2 



-1 



Bo- \ ' 



'2 



0 or b Scale 

* ■ f 

Person Response Curve (monotonically decreasing ) and Item Response Curve 
(monotonically increasing) with the Mean Values of 0 Q and b 0<J Qne Parometer 
Logistic. f 



13 



0 



i 

Addition and subtraction, an inner pi/oduct of two functions in the ^ 
.same family (i.e. in ong of the two families {P^O), P^(*S) , . . -Pb^®)^ 
or {Sq (b),. ••S0 M (b)} in this paper), thte norm of a function and the 
distance of any two functions in the same family -will be defined below. x 
Definition, Addition and subtraction! of two functions ^ P^, (8) and 

ut \ ^ 1 

Pb 2 ( e )> or s 6i( b )> and s 62^ b ^ is define & as pairwise addition or 

subtractj^ of the two. That is, > . s * ^ 

(Pbi ± Pb > (e > ^ p vbi(®) ± p b 2 ( e > 
and (S 6l ± S e2 )(b)*'E S 6l (b) ± S 62 (b) 

Definition . An inner p^pduct (or the sum of the cross products) 
of the two functions is the sum of pairwise products Pbj( e i ) p b 2 ( e i ) ' y 



[or equivalently S 6l (b-j)^ e2 0bj)] or more generally, the integration 
of the product of the two functions with respect to 6 (or b). Thus* 

.[Pb^e), P b2 *e>] = £ Pbi(ei)Pb 2 (ei) ' 
' , or^ r J •= " / p bl (e) P b2 (e)de 

ana [s 9l (b), Se 2 (b)J = \ s 9l (b j)Se 2 ( b j) 

j=i 

or / S ei (b)Se 2 (b)db . 



9 

ERIC 



Definition . The squared norms of functions Pb(©) and Se(b) are given by the 

A " 

inner product of themselves. Thus, we have 

», I IPbl I 2, - [Pb( e )» V e >l ' 

= Jl Pb2(6i) ° r / P b 2(e)de » ' » ' ' 

and I |S e l I 2 Vfs e (b), S e (b)] 

— 2 S e (b-j) or';' S^ 2 (b)db. 

j-1 J •>«►•• 

Definition . The squared distance of two functions PbjO'6) and Pb 2 (6) 
[or S ei (b) and Se 2 (b)] is the inner product of their difference, * 

14 



10 



That is 



= [Pb^e) - Pb 2 ( e >. Pb^e) -J , b 2 (e) J 
^ = l lP bl l l 2 + I iPb 2 ' I 2 " 2(p b x » p b 2 ) 

and ' | |S Ql - Se 2 l I 2 = 

= [Se^b) - s 6l (b), s 6l (b) - se 2 (b)] 
= lis 6l ll 2 + MSe 2 ll 2 - 2(s 6l , ^ ) 

a® 2 
By using the notation of integration, 

UPb! " p b 2 ll 2 «/ [P b j( e > " Pb 2 ( e >^ 2 de 

or iis 6l - Se 2 1 1 2 = / tse^b) - s 62 ( b )] 2 db - 

With these definitions, we are ready to introduce the dual concept 
of Test Response Curve (Lord, 1980; Lord and Novick, 1968)* This is ^th 
Group Response Curve as an average function of N different Person 
Response Curves. The Test Response Curve (TRC) is an average function 
pf n IRC's defined as £ 

T(9) = (1/n) \ p bi (e). 
Similarly, the Group Response Curve (GRC) is an average function of N 
PRC's, that is, 1 

f)(b) « (l/NjJ^Se^b). 
Illustrative PRC's and IRC's for 100 hypothetical persons were 
generated by randomly sampling 100 values of 6 from a unit normal 
distribution. The resulting TRC for the simulated 100 item test iM 
shown-jas the monotonically increasing Sanction in Figure 2. 

Insert Figure 2 about hfere 

The curve that Js a monotonically decreasing function is the PRC of 9 = 

f 

denoted by S 0 (b>. The curve representee! by u +"s is a Group Response 



• 15 



12 



Curve which is obtained by taking the pointwise mean of 100 PRC's over 
the randomly generated 100 b values. That is, 

1=1 1 

As the number of b values approaches infinity, then G(b) in the figure 
will be a smooth curve, monotonically decreasing and moreover, if the 
number of 9 values is also very large then *G(b) will be a symmetric 
curve of T(8) about the vertical fine of 8 = b * 0. With this figure, 8i 
1=1,2,. ..100- and bj, j=l,2,...100 are randomly chosen from N(0,1) so 
their means are not exactly zero. It can be shown numerically that T(8) 
and G(b) reach 1/2, at 0 - -~ £ and b = * b respectively. 

Let us denote the average of T(8j[), i - by T, 

T = (1/N) 2 T(8i) 
i=l 



and the average of G(bj), j=l,...,n by G, 

G = (1/n) 2 GOjO. 
1=1 J 




Then T ■ G, because 

T = (1/N) 2 T(6i) 
1=1 

= (1/nN) 2 2 {l/l+expI-DOi-b^)]} 
j i 

n • 

= (1/n) 2 G(b i ) = G 
j=l J 

De finition of Various Extended Caution Indices 

Sato's (1975) S-curve may be viewed as a discrete test response 

curve. The perfect S-curve divides l's and 0's into two mutually 

I 

exclusive areas. with l's under the curve and O's above it. Note, 



7 



13 



however, that direct correspondence in this way involves a reordering of 
the subjects from low to high rather than from high to low as typically 
presented by Sato and. as was shown in Table 2. represents the average 



/ 

probability of correctly answering items on the test .when a person's 

s * 

ability is equal to 9. The analogy between the S~curve and a TRC may be 
seen by considering an alternative N by n score matrix with real numbers 
based on IRT rather than binary scores. More specif icaly, let 

PMij = P bj (3i) 

A A 

where 9j[ is an estimated ability parameter, 9, for person i and bj 
estimated item parameter for item j under the condition that 



11 A " 

2 P^Oi) " 2 
J-l °J j=l 



Since Pbj(Qi) = S e± (b d ) 

for fixed i and j, the cells of the probability matrix (PMij) are also 

-i A 

equal to SQ^(bj). If the rows and columns of, this matrix are arranged 
in the manne^ of the S-P table and columnwise sums of the cell entries 
are obtained, the result is N times G(bj), which corresponds to the P- 
curve. Similarly, n times T(9^) corresponding to the S-curve may be 
obtained by summing the cell entries for each row. 

Selected rows and columns of a probability matrix (PMj[j) are 
illustrated in Table 3 for a 32 item test involving the subtraction of 
signed numbers t,hat was administered to a sample of 127 students 
(Tatsuoka & Tatsuoka, 1981). Also shown in Table 3 are the values of 
the estimated item and ability parameters and the test and group 
response curves evaluated at those estimated parameter values (i.e., 
T(9i) and G(bj) respectively). 

■ ' 13 



9 



14 

Table 3 









The 127 x 


32 Probability Matrix 


(EM ) for 










Signed-Number Subtraction Problems 






\item j 


■ 

1 


2 


15 


16 - 


-'- 31 


32 


T(& ± ) 


\ 


1 

2 


.000 
.000 


.001 
.002 


• * ' .040 

• ? • .061 

1 


.002 ' ' 
.004 • ■ 


• .017 

• .035 


.082_ 
.129 


.026 
.038 


-1. 114 
-0.916 ^ 


3 


.000 

• 


.002 

• 


• • • .061 

— • 


.004 • ■ 

• 


• .035 

• 


.129 


.038 


-0.722 


60 


.549 


• 

.635 


• 

• • • .783 


• 

.871 • • 


• 

• .969 


.969 


.809 


. 700 


N 61 


.567- 


.647 


• • • .789 


.878 • 


• • .9-70 


.970 


.816 


.710 


62 


.568 


.648 

• 


• • • .789 

• 


.878 • 

• 


■ • .970 

• 


.970 


.817 


.714 


88 


.860 


• 
• 

.854 


• 
• 

• • • .832 

• 


• 

• 

.962 • 

• 


• 
• 

• • .994 

• 


.993 


.968 


1.22? 


127 


1.000 


, 1.000 


• 

• • «1.000 


• 

1.000 • 


• 

■ '1*000 


l.ooo-" 


1.000 


+ « 




.327 


S* .570 


.691 


.708 


.837 ; 


.843 








-..467 


* ""'•V, 7 


-.044 


.021 

t 


.289 


.378 


<< 
i 





4 



19 



ERIC 



\ 



15 



Insert Table 3 about here 
Before introducing the extended caution index, it is useful to 
compare the S and P curves for the data fron/which the estimates in 
• Table 3 were obtained with their counterparts, i.e., n times T(8j[) and N 



times G(bi). The two comparisons S with nT(6j[) and P with NG(bO are 

\ 

provided in Figures. 3 and 4 respectively. The tic marks on the 
horizontal axis in Figure 3 indicate the location of the G's for the 127 

students in the study. The tic marks in Figure 4 show the values of bj 

for the 32 items* The close correspondence between the two pairs of 

'curves is apparent. The number of items and the limited range of values 

that bj assumes for these data obviously limits the evaluation of the 
correspondence between the curve's in Figure 4, however. 



Insert Figures 3 & 4 about here 
Given, the parallels between the S-P curves and the GRC and TRC, the 
extension of the caution index for use with the latter curves is 
relatively straightforward. There are, however, several natural ways 
in which the extension can be made. Possibly the m6st obvious extension 

8 * u 

is to simply replace the term (Mj[j - P^ # ) in the denominator of 
equation (1) -by its counterpart froj^ the PMjj matrix, i.e., 
' JPM i:| - T(6i)] - [Sfi ,(8i) - T(9i)]. 
th the above substitution, out first extended caution index, Cl^, is defined 

2 ( yij - Pi.Xy.j - P..) 



CIa = 1 - 



2tiT(bp - T(6 i )](y. j - FT} 

J 



20 



ERIC 



P/N 



or 



0(b) 



1.0 



0.8 



0.6 



0.4 



0.2 



0.0 



-3 




— h 
-2 



— I- 
-1 



4- 



4- 



4- 



4 



B 12 

0 or b Scale 

Figure 4. Comparison of P-curve (Converted to the Proportion Correct to Item) 
With tne Group Response Curve' for the Data in Table 3. 



9 

ERIC 



18 



The numerator divided by n, i.e., the covariance of (yij) and (y.j), 
can be expanded to the sum of 

(l/n) 5 yijy.j and -P..Pi. . 
The value of the second term does not depend on a person's response to 
each item but depends on his/her total score. As long as the total 
score is fixed, the anomaly of response patterns will not be detected by 
this value. This value varies between persons, so if two persons have 
the same achievement level 6^, then the judgment regarding the extent 
to which each response pattern deviates from the norm depends only on 
the first term of the numerator. Since the denominator is a normalized 
constant for a fixed value, 9i , it* is unlikely that a particular 
aberrant response pattern pr6duced by an individual whose achievement 
level is 6j[ will affect the denominator. 

Thus, if is natural to expect that if both the quantities are replace 
by' the inner products of the two row vectors (yij) and (y^) for'j»l, 
2,..., n, the value£"of CI ^ will be affected by the degree of anomaly of 
individual response patterns. Moreover, calculation of inner products 
is easier^Than That of covariances. Let us define four other natural 
extentions of the Cautioi^ index as follows. 

Definition . Four alternative definitions of the extended caution 

* ft V- 

index for person i are: 1 . 



C2i - 1 



j 1 Se i (6j)y; j 
j/ijSeitfj) 

C3i - 1 - 

j 1 G(6 J> S %(b j ) 



19 



C4i 



and • 



C5i 



/ 2 S§ (bj)G(bj) 

J^ijSe^bj) . 

1 ~ n 

j-1 J J 



The denominators of the four indices are considered as normalizing 



constants but the characteristics of the numerators will be divided into 
two categories. The indices in the first category, C2i and C4j[ give 
measures that are more group dependent, because they are the sums of 
cross products of the corresponding elements of the observed vector 
(yi j) 1 and the row-sum total vector (y.j)> and Group Response Curve G(bj) 
resjjectively • They measure the relationship of an observed response 
pattern for, a person i to a normed variable derived from the group the 
person i belongs to. Thus these indices have a similiar function to the 
Norm Conformity Index, NCI, defined in Tatsuoka & Tatsuoka (1980). The 
remaining indices, C3 t and C5 if are more individually oriented. That 
means the quantities obtained from C3± and £5j[ reflect the extent* a 
person i's response pattern (y£j) relates to a theoretically derived PRC 
at the fixed level of Q i . Thus, it can be said that the indices C3i and 
C5j[ are similar to the Individual Consistency Index (Tatsuoka & 
Tatsuoka, 1980). ) 

These extended caution indices for person i will be easily altered 

v 

to those for item j. 



N 



C2 



j 



1 - 



d^i yi - 



N 



9 

ERIC 



24 



20 



r / 



\ 



and 



C3^ 



1 - 



2 yij -V«i> 

J 1 P Bj< S i )T(S l ) 



N 



C4^ = 1 - 



2 yij T(6i) 
i=l J 



N 



C5^ = 1 - 



2P B ,(e i )T(e i ) 

i=l J 



N 



2 yij T(^i) 

i=l J - 



X 



SimUfSrly, the indices C3j and C5j are^ potentially useful for detecting 
anomalous respqnse patterns in comparison with item j's IRC while C2j 
and C4j are potentially useful indices for purposes of identifying ^J- ems 
of which patterns deviate from that of test, TRC. 

The Case of Two and Three Parameter Logistic Models 
Problems in Person Response Curves and Groujf Response Curv es 

Person Response Curves for the one parameter logistic model are 
represented by Smooth monotonically decreasing functions defined over 
^the difficulties of the infinitely many items. But BRC for the two 
parameter logistic model is no longer a smooth, monotpnically decreasirtg 
curve. .Figure 5 provides the graph of Person Response Curve for the 
ability levels of 6 » ,0 as veil- as Test Rersponse Curve of the two > 
parameter logistic model where ability measures 8^, i»l ,2 , • • • , 100 , were 
randomly sampled frdm abnormal (0^1) distribution, the difficulties bj, 
j=l ,2, . . 100 were also randomly sampled from a normal (0,1)*- 
distribution and the item discrimination indices, aj j=l,...,100, were 



9 

EMC 



25 



21 



v 

drawn from the uniform distribution of tbe in^erv^l (0.8, 1). Test 
Response Curve, Person Response Curves are given by 



and 



N T(8) - (1/n) 2 P b (8) 

j = l J 



*° l+exp[-D^(9 0 -b)J 



for.a fixed 6 0 and variable b 



Insert Figure 5 about he re 
The dotted line (+++) in the figure is the Group Response Curve of a 
hundred s.ub jects. XAlthough^ach locally oscillated, especially 

abound the origin, the GRC (the mean curv^of these PRCs) becomes fairly 
smooth and almost monotonically decreasing. Since b-j, j=l,...100 are 
randomly selected from N(0,1), a larger oscillation of PRC around the 
mean 0 is expected. But GRC is expected to be smoother as the number of 
students and it^etife increase to a larger number. 

Insert Figure 6 about h ere 

Figure 6 is the gfaph of TRC, GRC, PRC of 6 « 0 for the 
three parameter logistic model. The parameters 9^, bj and aj were 
generated by the same method as that of the two parameter model then 
fifty C-values of 0.15, and 50 of 0.20 were randomly assigned to 
100 pairs of aj and bj to make the three parameter logisticT'model. 

It seems that the smoothness of the curve GRC f6*r*three parameter 
logistic model is about the same, differing only as expected in terms of 
the lower asymptote. A larger number of subjects will be needed for the 
three parameter case in .order to obtain smoother GRC. 

The definition of the' extended caution indices may 'be applied more 
generally to the two and three parameter logistic models' in essentially 



24 



the same manner as it was develoed for the one parameter model. f ^+ 
Note that the arrangement of rows and columns according to the 



orders of the proportion, corrects (p values), for n items £nd the total 
scores for N subjects is essential to determine S-P curves, and the 
values of MP and M s , i=l,2,...,N, j=l,...,n. With our extended caution 
indices, the arrangements 6f rows and columns in monotonic order of the 
probability are no longer necessary. 

Application of New Indices for the Detection of Anomalous Responses . 

There is evidence that^student errors on certain types of arithmetic 
problems are frequently quite systematic (Brown and Burton, %1 978; 
Birenbaum and Tatfsuoka, 1980 Davis, McKnight, 1980). That is*, students 
seem to consistently apply erroneous algorithms in attempting to answer 
a problem 6f a particular fopi. Sometimes erroneous or incomplete rules 
resultin the right answer. For example, a student who consistently 
treats a multiplication sign as if it were an addition sign would get 
the right answer to the problem 2x2=4, but would get it for the 
wrong reason. A score of zero for using the wrong operation would be _ 
better reflection of the student's ability to multiply than a score of 
one for answering "4" to the item. % 

Birenbaum and Tatsuoka (1980) haVe demonstrated, that the customary zero- 
one scoring of incorrect and corrent answers can give the appearance of 

higher dimensionality and cause difficulty in attempting to apply IRT when 

* • t. 

students consistently apply erroneous rules to the addition and subtraction 

of signed numbers. The* difficulties result from the fact that several erroneous 

rules frequently yield the right answer for some problems. Right answers 

for the wrong reasons not only cause problems in applying IRT, but more 



29 



25 



importantly they can result in misleading scores and make it difficult to 
diagnose what a student is doing wrongs 

By painstaking work Tatsuoka and her colleagues (Birenbaum and 
Tatsuoka, 1980; Birenbaum, 1981) were able to identify several erroneous 
rules that were consistently applied by certain students. Birenbaum and 
Tatsuoka (1980) reanalyzed their data after converting ones to zeroes 
for items that students got right for the wrong reasons. That is, an 
item score was changed from one to zero if (1), a student was identified 
as consistently applying an erroneous rule and (2) application of that 
erroneous rule would lead to the correct answer for the particular item 
in qu£&tion. Analysis of the resulting modified data indicated that the 
data were more nearly unidimensional and there was good evidence that 
IRT was more applicable to the modified data than to the original data. 

Anomalous response patterns can sometimes be found by conducting an 
intuitive error analysis or by clinical interviews. Both/approaches 
require enormous effort. Brown and Burton (1978) and Tatsuoka et al. 
(1*980) have dev€T^xed cumputerized approaches to error analysis. But 
these methods' are expensive and were based on extensive work with highly 
specific item content. 

, Tatsuoka and Tatsuoka (1981) demonstrated an index, called the 
individualized consistency index (ICI) Which was shown to be useful in 
detecting a variety of erroneous rules of operation of signed—number 
addition and subtraction problems. Using the ICI to detect examinees 
who are apt to hafre a misconception saves considerable effort because 
only examinees so identified Jiave their item responses routed t^ the 
detailed error-diagnostic system. Applicatipn of the ICI is limited, 



30 



26 



however, because it requires repeated measures, i.e., several items 
based on an identical item form, within the test* Such repetition is 
not common on most tests* 

As will be seen below, the index similar to ICI, C3i, not only 
avoids the repeated measure limitation but Is apparently more effective 
"fpr purposes of detecting anomalous response patterns resulting from the 
consistent application of an erroneous rule* Tat^uoka & Tatsuoka 
(1981) showed a list of erroneous rules of operation ("bugs") detected 
by ICI* The 32 response patterns resulting from these bugs are 
classified in Group A. The rest of the 103 response patterns are 
classified into two groups according to the error-diagnostic system, SIGNBUG. 
Group B consists of 7 responses which are P ro ^ a ^^ using one or two 
erroneous rules inconsistently; Group C, responding adequately using the 

r 

right rule of operation and/or no indication of systematic errors. The 
errors observed in Group C are apparently just random errors. The 
estimated item and person ability parameters nee^l to compute the 
extended caution indices were obtained by the computer program GETAB 
(Robert Baillie, 1979), using Birenbaum & Tatsuoka's modified dataset. 

Distributions of the indices C2j[ and C3 ± are displayed in Figures 6 
and 7 respectively. Only members of groups A and B (persons who 
consistently used an erroneous rule) and of group C (persons who made a 
substantial number of errors but whose errors were not the result of 
consistent use of an erroneous rule) are included in the distributions 
shown in Figures 6 and 7. In both figures, persons in group k and B are 
depicted by shaded boxes and those in group C by unshaded boxes. 



31 



27 



ERIC 



Insert Figures 7 and 3 about here 
As can be seen in Figure 6, C2± does not provide any basis for 
distinguishing persons who are consistently using an erroneous rule from . 
those who aren't. /The two groups are distinguished almost perfectly, 
however, by the magnitude of C3i (see Figure 7). Indeed, there is 
almost no overlap between the two groups • All 39 members of Groups A and B 
have values of C^f of .05 or higher whereas only two of the 88 members 
of group C have positive values 6f C3i and the rest of the members of 
group C have values of C3j[ as large as .05. Thus, C3j[ may be used to 
identify with a high degree of accuracy those persons who consistently 
use an erroneous rule. 

As might be expected from a comparison of the coefficients, C4j[ 
works in a fashion quite similar to C2^, and C5j[ works much like C3j[ in 
terms of the abiliy of these indices to distinguish membenPof groups A, 
B and C. It is clear that C2^ and C4j[ are not useful for detecting 
anomalous response patterns resulting from consistent application of an 
erroneous rule. These indices may be useful for other tasks for which 
NCI or Van.de Flier's index (Harnisch & Linn, 1981) have Seen found to 4 
be useful. The third and fifth indices (C3* and CSj) however, are quite 

1 

effective for purposes of detecting persons who make consistent errors. 



Insert Table 4 about here 



Table 4 shows a summary of t-statistics comparing the means on the 
four generalized caution indices and ICI in the two groups: A and B 
combined versus C by itself.* 1 The t-value for index 2 is not significant 



32 



28 



40 




ooooooooqoooooo 
^M** 0l pM 0 o o <j- 2 SM Sfi 8 



80 o o o ^ w ~ - - - - 
(0CJC0^OcpCsJC05j-O(0WO 



o o o 



dddddooooo 



o 



O O Q O O O O 
M (0 O t CO M (0 

5t ^ io io m to (0 



I I I I I I 1 1 



f I 



o o 



do ° d d d d d d d djd d d o 



' • ■ . A 

Figure 7. Histogram of Index C2 ± : The shaded Area Represents the 

Members of Group A and B, — Using Some Erroneous Rules, N=127 



ERLC 



33 



^9 



zo 



10 -- 



D 




8° QOO QO OOQOOOOOQO OOOQOO'OOOU>NC0 

m 3- rorOcJ cj — O O J - — cvJcvJfO^^- w»(0SK«)8«j- 

d d d d d d d d d 9 d o d d d ci d o d d d d d d d 

l I III || | | o 



Figure Histogram of Index C3^: The Black Area Represents the Members 

of Group A and the Shaded for Group B — Using Some Erroneous Rules 
The White Area is for Those Using the Right Rute. H=127 



ERLC 



34 



> Table 4 

A Summary of t>8 tat is tics Comparing the Means on the 



Four Generalized Caution Indices and ICI in the Two Groups 



T«/1 ■( poo 




VjiOUp A 01 D 




L Value ' 


p 


k 


N 


39 


,88 . 






Index 2 


Mean 
S.D. • 


.0929 


-^0065 
.0306 » 


.689 


.4980 


Index 3 


Mean 


.5310 
,2444 


-.2688 
.1300 


'-19.293 


< .00005 


Index 4 

yv t — — 


Mean 
c n 


.0650 


-.0045 

♦ UZ7 J 


—J. HDD 




Index 5 


Mean 
S.D. 


.5091 
.2615 


-.2643 
.1350 


-17.467 


< .00005 


& 

ICI; 


.Mean 
S.D. 


.9223 
.0645 


.8144 
.1058 


-7T121 


< .00005 




















♦ 


i 





i 

13 



35 



31 ,. 

/ 

but all others are. significant* Index 1 is. excluded in the analysis 
because the denominator of this index becomes infinity when all items 
are correctly answered by all examinees, 

| > * Discussion 

As was shown above, the caution index* which Sato developed based 
solely on a comparison of observed item responses to group responses can be 
readily extended to theory based estimates of person and group response 
probabilities. The caution inde^ is a linear transformation of the 
covariance of a person's .response pattern with one or another 
theoretical curves computed' using item-response theory. Alternatively, 
the extended caution indices m£$ be viewed as linear transformations of 
the distance bewteen a person's response pattern and a theoretical curve 
(either the person response curve, as in the case of C3j[ and C5i or the 
group response curve, as in the case df C4j[). 

The application of the extended caution indices that were 

introduced in th}.s paper provided strong evidence that the indices that 

depend on the distance between a person's response pattern and their 

theoretical person response curve (i.e., C3i and C5j[) are quite 

effective for purposes of identifying persons who consistently 'use an 

V 

erroneous rule in answering signed-number arithmetic problems. This is 

a potentially important result that deserves further investigation with 

other data sets involving' different types of achievement test data. If 

additional research yields similar results, £hese indices may have\ 
«■ 

considerable instructional utility because instruction can be made much 

*-* . 

c 

more specific once it is determined that a student is consistently 
making an error as the result of a particular misconception. 



ERJC 



38 



32 



References 

« 

Birenbaum, M. Error Analysis — It does make a difference . Doctoral 
Dissertation, University of Illinois at Urbana-Champaign, 1981. 

Birenbaum, M. , & Tatsuoka, K. K. The use of information from wrong 

responses in measuring students' achievement (Research Report 80-1 )• 
Urbana, 111.: University of Illinois, Computer-based Education 
Research Laboratory, 1980. 

Brown, J. S., & Burton, R. R. Diagnostic models for procedural bugs 

in basic mathematics skills. Cognitive Science , 1978, 2, 155-192. 

Davis, R. B., & McKnight, C. The influence of semantic content on 

algorithmic behavior. The Journal of Mathematical Behavior , 1980, J3> '39 

Harnisch, D. L., & Linn, R. L. Analysis of item response patterns: 
questionable test data and dissimilar curriculum' practices. 
The Journal of Educational Measurement , 1981, in press. 

Levine, M. V., & Rubin, D. B. Measuring the appropriateness of 

multiple-choice test scores. Journal of Educational Statistics , 
1979, 4, 269-290. 



/ 



Lord, F. M. Application of item response theory to practical teBTlhg 
problems .' jlillsdale, N.J.: Erlbaiim, 1980. 

Lord, F. M., & Novick, M. R. Statistical^ theories of mental test 
scores. Reading: Addison-Wesley , 1968v^ 

\ 

Lumsden, J. Tests are perfectly reliable. British Journal of Mathe- 
and Statistical Psychology , 1978, 3J^, 19-26. 

Sato, T. The construction and interpf etation of S-P tables . 
Tokyo: Meiji Tosho, 1975 (in Japanese). 

r* 

Tatsuoka, K. K. , & Tatsuoka, M. M. Detection of aberrant response 

patterns and, their effect on dimensionality (Research Report 80-4). 
Urbana, 111.: University of Illinois, Computer-based Education 
Research* Laboratory , 1 980 • 

Tatsuoka, K. K., & Tatsuoka, M. M. Spotting erroneous rules of 

operation by the Individual Consistency Index (Research Report 81-4). 
Urbana, 111.: University of Illinois, Computer-based Education 
Research Laboratory, 1981. 

Tatsuoka, K. K. , Birenbaum, M. , /Tatsuoka, M. M. , & Baillie, R. A psycho- 
metric approach to error analysis on response .patterns (Research 
Report 80-3). Urbana,- 111.: University of Illinois, Computer-based 
Education Research Laboratory, 1980. 



33 



Trabirii T. E. t & Weiss, D. J, The person response curve: fit of • 
individuals to item characteristic curve models (Research 
Report 79-7). Minneapolis: University of Minnesota, Department 
of Psychology, Psychometric Methods Program, 1979. 

Wright, B. D., & Stone, M. H. Best test design, Rasch Measurement . 
Chicago: The University of^Chicago, Mesa Press, 1979. 



e 

• r ■ 



38