# Full text of "ERIC ED211061: Indices for Detecting Unusual Item Response Patterns in Personnel Testing: Links Between Direct and Item-Response-Theory Approaches. Computerized Adaptive Testing and Measurement."

## See other formats

ED 211 061 DOC0HENT BESQME IB 009 676 AOTHOB TITtE i • INSTITUTION SPCNS AGENCY REPORT NO POE DATE CONTRACT NOTE EC5S PRICE DESCRIPTORS IDENTIFIERS ABSTRACT Tatsuoka, Kikumi; Linn, Robert L- Indices for Detecting Onusu'al Item Response Patterns in Personnel Testing: Links Between Direct and Item-Response-Theor.y Approaches. Computerized Adaptive Testing and Measurement- Illinois Oniv*, Orbana,. Computer-Based .Education Research Lab* ( Office of Naval Research, Washington, E.G. Psychological^ Sciences Div- * ( CERL-RR-81-5 Aug 81. , ' • \ N000-1«-79-C-0752 , 38p,. ; For related dccument. See IR OQS 679.. MF01/PC02 Plus .Postage- * * , % Arithmetic; *Ccmputer Assisted Testing; ♦'Latent Trait Theory; *Matrices; ♦Statistical Analysis; Testing Problems ft ♦ Response Pattei/ns; S F Curve Theory Two distinct approaches, one .based theory and the other based on observed item respons summary statistics, have been proposed to identify patterns. A link between these two approaches is pr certain correspondences between Satc f s S-P Curve Th response theory. This link makes possible several e Sato's caution index that take advantage of the res response theory. Several such indices are introduce illustrated by application to a set of achievement the newly introduced extended indices were found to for purposes of identifying ^persons Vho consistentl rule in attempting to solve* signed number %rithmeti potential inportance of this, result is briefly" disc references are listed. (Author/LLS) ' cri item es ard s unusual cvided b eory *and xteusicn ultsr of d and th test dat te very y use an c prcble ussed, a response tandard ^ response y showing item s of item eir use Two of effective erron eous ks « tThe nd 15 ****** ******************>^ * Reproductions supplied by EDRS are the be^t that can be made * 'i • from* the original document. ************* ************************ 4 4******^ U.S DEPARTMENT DF EDUCATION national Institute of education educational resources information center <erici yC Thts document has been reproduced as received from the person or organization originating tt Minor changes have been made to improve reproduction quality • Points of view or opinions stated in this docu ment do not necessarily represent official NIE position or policy University of Illinois Computer-based Education Research Laboratory Urbana, Illinois INDICES FOR DETECTING UNUSUAL ITEM RESPONSE PATTERNS IN PERSONNEL TESTING: LINKS < BETWEEN DIRECT AND ITEM-RESPONSE-THEORY APPROACHES by ' y Kikumi Tatsuoka ^ . and Robert L. Linn \ Computerized" Adaptive testing and Measurement Research Report 81-5 August 1981 ' PERMISSION TO REPRODUCE THIS MATERIAL HAS BEEN GRANTED BY D. Bitzer TO THE EDUCATIONAL RESOURCES INFORMATION CENTER (EpIO)." Acknowledgment The authors wish to acknowledge the kind cooperation extended to us by the people involved with this report. Bob Baillie programmed the lessons and data collection and analysis routines, along with- his assistant, Oavid Dennis, Mary Klein gave insight and meaning to many things as a teacher of the children whom we seek to help. Roy Lipschut did tne layouts and Louise Brodie did the typing!, <x 0 V 4 Abstract « , Two distinct approaches, one based on item-response theory and the other based on observed item responses and standard summary statistics, have been proposed to identify unusual response patterns.^ A link between these two approaches is provided by showing certain - correspondences between Sato's S-P curve Theory and item response theory. This link makes ^ossib^e several extensions of Sato's caution index that take advantage of" the results of item response theory. Several such indices are introduced and their use illustrated by application to a set of achievement test data. Two of the newly introduced Extended indices were found to be very effective for purposes of identifying persons who consistently use an erroneous rule in attempting to solve signed-number arithmetic problems. The potential importance of this result is briefly discussed. \ 4 9 ERJC Introduction Several authors have recently shown an interest in using information from patterns o^response to test items to extract information not contained in the total score* A variety of purposes have been envisioned for use of the additional information. Wright (1977), for example,- refers to identification of "guessing, sleeping, fumbling, and plodding" (p. 110) from the plots of residual item scores based on the differences between item responses and the expected responses for an individual based oh the Rasch model. Levine and Rubin (1979) discuss response patterns that are "so atypical. .. that his or her aptitude test score fails to be a completely appropriate measure" (p. 269). Sato \ \ (1975) proposed a "caution" index which is intended to identify student^ \ \ Whose total scores on a test must be treated with caution* Tatsuoka ai)<£ Tatsuoka (1980) and Harnisch and , Linn (1981) have dislussed the relationship of response patterns to instructional experiences and the possible use of item' response pattern information to help diagnose the types of errors a student is making. j Indices of the degree to which an individual's pattern of responses is unusual are conveniently classified into two ge^eral^ types: those that use item response theory (IRT) to identify unusua^ patterhs and those that rely only on observed item responses and standard summary statistics based on those responses (e.g. the number or proportion of 4 people in a norm group answering an item correctly). The work of Wright (1979) and of Levine and Rubin (1979) are examples of approaches based on IRT while the work of Sato (1975), Tatsuoka and Tatsuoka (1980), and Harnisch and Linn (1981) are of the latter type. 9 ERIC / 7 • i The primary purpose of, this paper is to develop a link bej£wee/t these two general approaches. More specifically, we will show a / ^ correspondence between Sato's (1975) S-P Curve theory and. teht response curves and "group response curves" developed from IRT. Al^o, Sato's Caution Index defined in the S-P curve theory is generalised into a continuous domain utilizing IRT. That is, S-P curve theory and the- Caution Index are originally developed in a discrete domain of 0 - 1 scoring, but this^study extends the theory to a more general case of, probabilities. Several different geaeralized versions of the caution index are presented Results of applying these indices suggest that there are two categories. One set of indices functions in a marxpet similar t^ Sato's p original index . — The- other -set fnnrHrme mnrp n]f ? T*t-g MO fra a ad .._ Tatsuoka's Individual Consistency Index in that it successfully , distinguishes examinees who make consistent errors in responding to test , . ****** i terns . f * t We first briefly review Sato/s S-P Curve theory. Next, a group response curve. (GRC) is developed for the one parameter logistic model. The GRC is based on the dualistic nature of the one parameter logistic model which depends on the choice of fixed and random parameters in the r _ model. We then present an extended caution index with several special areas which are applicable to^RT. The cases of two and bhree parameter logistic models are briefly discussed with special attention given to problems^ with person and group response curves^in these models. Finally, we discuss applications of the new caution indices for the detection of anomalous response patterns. 8 «~ S-P Curve Theory Sato's (1975) caution index is applicable to* either an item or an individual examinee* In either form the index is conveniently, obtained » 4 * from an especially arranged table of binary item scores referred to as, an "S-P Table". The S-P table, the associated 9-P curves and various indices as the caiition index are widely used in Japan for diagnosing student performance, detecting "aberrant response patterns and for assessing the quality of a test or instructional sequence. ? The S-P table is a data matrix in which the students (represented by rows) have been arranged in descending ordes of their total* test scores from top to bottom and the items (represented by columns) have been arranged in ascending order of difficulty from left to right. A hypothetical S-P table is shown in 'Table 1. The solid stair-step line is called an S-curve which is short for Student curve. For eaqh person, represented by a given row, a vertical line is drawn to the right of the nth cell from the left where n is the number of correct answers obtained by that j>erson. The S-curve is then obtained by connecting the right edge of the nth cell of each row. The P-curve is drawn in an analogous fashion by counting down from the'top the number of cells equal to the number of students who correctly answered the item corresponding to a given column. The P-curve for the^data in Table 1 is shown by the dashed line. . Insert Table 1 about here Let yjj be the binary response for student (row) i to item (column) j of the S-P e^&ie.' Rotf and column sums are denoted by y^. and y.j respectively. j The total number of ones in the S-P table is denoted y # # Table 1 A Hypothetical Score Matrix (V^) and S- (solid line) and P- (dotted line) Curves sitem j subject 1 2 - 3 4 5 6 7 8 9 10 v y i. P M i. IN 1 } 1 1 1 1 1 1 1 1 1 10 1.0 10 2 ■ 1 1 1 1 1 1 1 1 0 9 0 9 Q . 3 1 1 1 1 0 1 1 0 1 8 0.8 8 4 0 1 1 1 r 0 1 0 0 6 0.6 6 5 1 • 1 1 0 i 0 0 1 r°~. 6 0.6 6 1 1 0 1 0 1 10 1 0 ' 6 0.6 7 1 1 1 0 o 1 0 0 0 5 0.5 5 8 1 1 0 1 o 0 0 0 5 0.5 5 9 . 0 0 1 !"o" 1 0 1 1 0 5 0.5 5 IP 1 of "l 0 0 1 0 0 1 5 0.5 5 11 1 1 0 0 0 0 0 4' 0.4 4 12 ' 0 0 0 1 1 0 0 0 1 4 0.4 4 13 -_L_j .sx. -0- _1_ -0 . -0 - JD- 0 - . 3- -0-3- 14 0 1 1 0 0 0 0 0 0 0 2 0.2 2 ' 15 0 1 0 0 0 0 0 0 0 0 1 0.1 1 13 11 10 9 8 8 6 5 5 4 y .. = 79 P . .87 .73 j67 j60 53 .53 AO .33 .33 .27 P .527 O • • 13 11 10 9 8 /8-6 5 5 4 ) ERIC t p and the proportion of correct responses by Pi,, P.j and'P #t for the row, column and entire table respectively. As can be seen*in Table 1, the S- curve is the step function ogive of the mutative distribution function of total scores, yi mf for the 15 students and the'P-curve is the corresponding function of y.j, the number of right answers for the 10 items. Insert Table 2 about here ' — — ~~ % i * If the S-curve is held invariant and all the 0's to the left of the S-curve are changed to l's and all the l's to the right of the same curve to 6's the result is the S-P table shown in Table 2 is called a perfect S-curve. The entries in Table 2 are denoted Mjj. Similarly a perfect P-curve will be obtained and the entries in the new table are denoted by m£j. As can be seen, M?. - yi- all i which corresponds to the fact that the S-curve is unchanged as the result of changing the cell entries from y^ t0 m| j . The values of the column sums for Tables 1 and 2, i.e., y.j ancfc m£j are not in general equal, however. Sato (1975) defined a Caution Index for subject i by taking the ratl/o of two covariance^. The numerator of the ratio is the covariance of observed row vector i, (yij) j»l > • • • > n and the sum-of-column vector, (y.j)> # j"lf2 f «««*n and the denominator is the covariance of the corresponding scores (assuming S-curve is perfect) (M?j)» j=l,...,n and the column-sum vectqr (y.j)> j"4,2,...,n. More specifically, the caution index C± for the subject l"is given by ACyij - Pi.Xy.j ~ 2 (Mfj - Pi.Xy.j - p..) • ' Table 2 Perfect S-curve Obtained by Changing l f s to the Rf£ht of S-curve to 0 and O's to the Left to 1.. item 1 - - s ^ 1 2 3 4 5 6 7 8 9 10 y ' M ± subject i >j^_ 1 1 1 I 1 1 1 1 10 10 2 1 1 1 -r 1 I 9 9 3 1 1 1 1 1 1 0 0 8 8 4 1 1 v 0 or 0 6 6 5 1 r j I ■0 \o 0 0 6' 6 6 1 i 0 0 0 6 6 ' 7 1 i 0 0 ) o 0 5 5 8 1 i 1 1, 0 0 0 0 5 5 ' 9 "1 i. I 0 Oj A) 0 0 5 5 10 1 i 1 6 0 0 0 5 5 11 • 1 i 0 0 0 0 0' "0 4 « 4 12 1 i 0 0 0 0 0 0 4 4 13 1 i 0 0 o nr\ o 0 0 3 3 14 1 i 0 0 0 0 0 0 0 0 2 2 15 1 0 0 0 0 0 0 0 0 0 1 1 11 10 9 ' 8 8 6 5 i 5 4 4 79 79 H*"'""' 15~DTI3 12" 10 6 3 3 2 "1 Perfect P-Curve Obtained by Changing l's Below ^ the P-Curve to 0 and 0 T s Above to 1 item j 1 2 3 4 5 '6 7 8 9 10 y i. M i. subject' i \ 1 1 •1 i 10 10 '2 1 1 9 10 3 1 1 8 10 * 4 1 1 6 10 5 1 0 6 • ' 9 6 1 0 •o 0 6 7 7 1 0 0 0 0 5 6 8 1 0 0 0 0 5 6 9 < " 1 0 0 0 0 0 0 ' 5 4 10 1 0 0 0 0 0 0 o- ' 5 3 111 0 0 0 0 0 0 0 4' 2 12 h t 0 0 0 .0 0 0 0 0 4 1 13 0 0 0 0 0 0 0 3 1 14 0 0 0 f 0 0 0 0 0 0 2 0 US 0 0 0 • 0 0 0 0 0 0 0 1 0 > y .j - 13 11 10 9 8 8 6 5 5 4 79 13 11 1 10 9 8 8 6 5 5 4 79 10 and the caution index, Cj, for item j is given by Ci - 1 - ^(Mij - p.jXyi: - p..) The second term of the caution index for item j is the ratio of two covariances: The numerator is the covariance of column vecXor j, (yij) and (yi.), i=l N and the denominator is the covariance of the vectors '(•y i# ) and (Mij), i=l,2...,N.' The value of the denominator is considered as a norm value to standardize the numerator. It can be said that /this ratio in the above caution index is equal to the ratio of the traditional ^discriminating index, r j , total-^tem correlation tp the standardized (or ideal in a'sense illustrated in — Ta¥l6 -2-) discriminati^gJ^ex^ rj', for itenL4*_^ That i^ covj(yij, yjj covj(yij, yt . ) = qj(yi j) o(Yi J ■ £4 v covj(Mfj, y±.) < 0 covj(Mgj, y t ) rj r ojfrf . )a (yi.y o p x 2 It is clear that 2. (y, . - P .) * 2 (M. . - P ,) because the number of ij .3 iJ -J 1/s in column j is invariant as can be seen in Tables 1 and ^, so the number of l's in the column vector j, (m£j) and (yij) are the same., \ Therefore, the two variances (^(y^) anda^(M^j) are equal. \ The .Extended Caution Index in Conjunction With Response Theory Test and Group Response Ctfrves: One Pa rame t e r L ogistic Model . * According to the one parameter logistic model, the item response curve may be written fc , . L. , j-1,2 n, , P bj (8J = l+exp[-D(e-bj)] ' u where 8 is the latent ability, bj is the difficulty of item j and D is a constant which is set equal to -1.7 for convenience of comparison to the normal ogive model (see'Lotd & Novick, 1968, p. 400). In the above — * " r equation, b-t is fixed and 9 is a random var Although in practice, the number of ^it^s, n, is a finite number, it is useful to consider b as a continuous variable. By holding 8^ fixed and treating b as a continuous variable, che dual function, S^b), of the one, parameter logistic function may be denned, • SQ i (b) " l-fexpt-DCei-b)] ' 1=1 >2 N * Of course , the expression • - , ■ < l+exp[-D(8i-bj)] may be considered to be a function of either 8 or F. By ch^ice-of which variable is fixed, the function may be used to define either the item response curve, Pbj(8) or the perlon response curve Se 1 (b) [see Lumsden, 1978, Weisa (1977)]. Hence, the variable described within the parenthesis of the function is considered as a random variable and the subscript Variable is a fixed variable. r The curves for the pair of functions, Pb,(8) and Sq. (b) are J? J symmetric about the. vertical axis at 8 » 8 0 (or equivalently b - b 0 ) provided 8 0 - b 0 . As illustrated in Figure 1, however, the item response curve (IRC) and the person respons/curve (PRC), intersect at / . (e 0 + b 0 )/2 if 8 0 t b 0 . Insert Figure 1 about here + 1 ERJC . 12 -2 -1 Bo- \ ' '2 0 or b Scale * ■ f Person Response Curve (monotonically decreasing ) and Item Response Curve (monotonically increasing) with the Mean Values of 0 Q and b 0<J Qne Parometer Logistic. f 13 0 i Addition and subtraction, an inner pi/oduct of two functions in the ^ .same family (i.e. in ong of the two families {P^O), P^(*S) , . . -Pb^®)^ or {Sq (b),. ••S0 M (b)} in this paper), thte norm of a function and the distance of any two functions in the same family -will be defined below. x Definition, Addition and subtraction! of two functions ^ P^, (8) and ut \ ^ 1 Pb 2 ( e )> or s 6i( b )> and s 62^ b ^ is define & as pairwise addition or subtractj^ of the two. That is, > . s * ^ (Pbi ± Pb > (e > ^ p vbi(®) ± p b 2 ( e > and (S 6l ± S e2 )(b)*'E S 6l (b) ± S 62 (b) Definition . An inner p^pduct (or the sum of the cross products) of the two functions is the sum of pairwise products Pbj( e i ) p b 2 ( e i ) ' y [or equivalently S 6l (b-j)^ e2 0bj)] or more generally, the integration of the product of the two functions with respect to 6 (or b). Thus* .[Pb^e), P b2 *e>] = £ Pbi(ei)Pb 2 (ei) ' ' , or^ r J •= " / p bl (e) P b2 (e)de ana [s 9l (b), Se 2 (b)J = \ s 9l (b j)Se 2 ( b j) j=i or / S ei (b)Se 2 (b)db . 9 ERIC Definition . The squared norms of functions Pb(©) and Se(b) are given by the A " inner product of themselves. Thus, we have », I IPbl I 2, - [Pb( e )» V e >l ' = Jl Pb2(6i) ° r / P b 2(e)de » ' » ' ' and I |S e l I 2 Vfs e (b), S e (b)] — 2 S e (b-j) or';' S^ 2 (b)db. j-1 J •>«►•• Definition . The squared distance of two functions PbjO'6) and Pb 2 (6) [or S ei (b) and Se 2 (b)] is the inner product of their difference, * 14 10 That is = [Pb^e) - Pb 2 ( e >. Pb^e) -J , b 2 (e) J ^ = l lP bl l l 2 + I iPb 2 ' I 2 " 2(p b x » p b 2 ) and ' | |S Ql - Se 2 l I 2 = = [Se^b) - s 6l (b), s 6l (b) - se 2 (b)] = lis 6l ll 2 + MSe 2 ll 2 - 2(s 6l , ^ ) a® 2 By using the notation of integration, UPb! " p b 2 ll 2 «/ [P b j( e > " Pb 2 ( e >^ 2 de or iis 6l - Se 2 1 1 2 = / tse^b) - s 62 ( b )] 2 db - With these definitions, we are ready to introduce the dual concept of Test Response Curve (Lord, 1980; Lord and Novick, 1968)* This is ^th Group Response Curve as an average function of N different Person Response Curves. The Test Response Curve (TRC) is an average function pf n IRC's defined as £ T(9) = (1/n) \ p bi (e). Similarly, the Group Response Curve (GRC) is an average function of N PRC's, that is, 1 f)(b) « (l/NjJ^Se^b). Illustrative PRC's and IRC's for 100 hypothetical persons were generated by randomly sampling 100 values of 6 from a unit normal distribution. The resulting TRC for the simulated 100 item test iM shown-jas the monotonically increasing Sanction in Figure 2. Insert Figure 2 about hfere The curve that Js a monotonically decreasing function is the PRC of 9 = f denoted by S 0 (b>. The curve representee! by u +"s is a Group Response • 15 12 Curve which is obtained by taking the pointwise mean of 100 PRC's over the randomly generated 100 b values. That is, 1=1 1 As the number of b values approaches infinity, then G(b) in the figure will be a smooth curve, monotonically decreasing and moreover, if the number of 9 values is also very large then *G(b) will be a symmetric curve of T(8) about the vertical fine of 8 = b * 0. With this figure, 8i 1=1,2,. ..100- and bj, j=l,2,...100 are randomly chosen from N(0,1) so their means are not exactly zero. It can be shown numerically that T(8) and G(b) reach 1/2, at 0 - -~ £ and b = * b respectively. Let us denote the average of T(8j[), i - by T, T = (1/N) 2 T(8i) i=l and the average of G(bj), j=l,...,n by G, G = (1/n) 2 GOjO. 1=1 J Then T ■ G, because T = (1/N) 2 T(6i) 1=1 = (1/nN) 2 2 {l/l+expI-DOi-b^)]} j i n • = (1/n) 2 G(b i ) = G j=l J De finition of Various Extended Caution Indices Sato's (1975) S-curve may be viewed as a discrete test response curve. The perfect S-curve divides l's and 0's into two mutually I exclusive areas. with l's under the curve and O's above it. Note, 7 13 however, that direct correspondence in this way involves a reordering of the subjects from low to high rather than from high to low as typically presented by Sato and. as was shown in Table 2. represents the average / probability of correctly answering items on the test .when a person's s * ability is equal to 9. The analogy between the S~curve and a TRC may be seen by considering an alternative N by n score matrix with real numbers based on IRT rather than binary scores. More specif icaly, let PMij = P bj (3i) A A where 9j[ is an estimated ability parameter, 9, for person i and bj estimated item parameter for item j under the condition that 11 A " 2 P^Oi) " 2 J-l °J j=l Since Pbj(Qi) = S e± (b d ) for fixed i and j, the cells of the probability matrix (PMij) are also -i A equal to SQ^(bj). If the rows and columns of, this matrix are arranged in the manne^ of the S-P table and columnwise sums of the cell entries are obtained, the result is N times G(bj), which corresponds to the P- curve. Similarly, n times T(9^) corresponding to the S-curve may be obtained by summing the cell entries for each row. Selected rows and columns of a probability matrix (PMj[j) are illustrated in Table 3 for a 32 item test involving the subtraction of signed numbers t,hat was administered to a sample of 127 students (Tatsuoka & Tatsuoka, 1981). Also shown in Table 3 are the values of the estimated item and ability parameters and the test and group response curves evaluated at those estimated parameter values (i.e., T(9i) and G(bj) respectively). ■ ' 13 9 14 Table 3 The 127 x 32 Probability Matrix (EM ) for Signed-Number Subtraction Problems \item j ■ 1 2 15 16 - -'- 31 32 T(& ± ) \ 1 2 .000 .000 .001 .002 • * ' .040 • ? • .061 1 .002 ' ' .004 • ■ • .017 • .035 .082_ .129 .026 .038 -1. 114 -0.916 ^ 3 .000 • .002 • • • • .061 — • .004 • ■ • • .035 • .129 .038 -0.722 60 .549 • .635 • • • • .783 • .871 • • • • .969 .969 .809 . 700 N 61 .567- .647 • • • .789 .878 • • • .9-70 .970 .816 .710 62 .568 .648 • • • • .789 • .878 • • ■ • .970 • .970 .817 .714 88 .860 • • .854 • • • • • .832 • • • .962 • • • • • • .994 • .993 .968 1.22? 127 1.000 , 1.000 • • • «1.000 • 1.000 • • ■ '1*000 l.ooo-" 1.000 + « .327 S* .570 .691 .708 .837 ; .843 -..467 * ""'•V, 7 -.044 .021 t .289 .378 << i 4 19 ERIC \ 15 Insert Table 3 about here Before introducing the extended caution index, it is useful to compare the S and P curves for the data fron/which the estimates in • Table 3 were obtained with their counterparts, i.e., n times T(8j[) and N times G(bi). The two comparisons S with nT(6j[) and P with NG(bO are \ provided in Figures. 3 and 4 respectively. The tic marks on the horizontal axis in Figure 3 indicate the location of the G's for the 127 students in the study. The tic marks in Figure 4 show the values of bj for the 32 items* The close correspondence between the two pairs of 'curves is apparent. The number of items and the limited range of values that bj assumes for these data obviously limits the evaluation of the correspondence between the curve's in Figure 4, however. Insert Figures 3 & 4 about here Given, the parallels between the S-P curves and the GRC and TRC, the extension of the caution index for use with the latter curves is relatively straightforward. There are, however, several natural ways in which the extension can be made. Possibly the m6st obvious extension 8 * u is to simply replace the term (Mj[j - P^ # ) in the denominator of equation (1) -by its counterpart froj^ the PMjj matrix, i.e., ' JPM i:| - T(6i)] - [Sfi ,(8i) - T(9i)]. th the above substitution, out first extended caution index, Cl^, is defined 2 ( yij - Pi.Xy.j - P..) CIa = 1 - 2tiT(bp - T(6 i )](y. j - FT} J 20 ERIC P/N or 0(b) 1.0 0.8 0.6 0.4 0.2 0.0 -3 — h -2 — I- -1 4- 4- 4- 4 B 12 0 or b Scale Figure 4. Comparison of P-curve (Converted to the Proportion Correct to Item) With tne Group Response Curve' for the Data in Table 3. 9 ERIC 18 The numerator divided by n, i.e., the covariance of (yij) and (y.j), can be expanded to the sum of (l/n) 5 yijy.j and -P..Pi. . The value of the second term does not depend on a person's response to each item but depends on his/her total score. As long as the total score is fixed, the anomaly of response patterns will not be detected by this value. This value varies between persons, so if two persons have the same achievement level 6^, then the judgment regarding the extent to which each response pattern deviates from the norm depends only on the first term of the numerator. Since the denominator is a normalized constant for a fixed value, 9i , it* is unlikely that a particular aberrant response pattern pr6duced by an individual whose achievement level is 6j[ will affect the denominator. Thus, if is natural to expect that if both the quantities are replace by' the inner products of the two row vectors (yij) and (y^) for'j»l, 2,..., n, the value£"of CI ^ will be affected by the degree of anomaly of individual response patterns. Moreover, calculation of inner products is easier^Than That of covariances. Let us define four other natural extentions of the Cautioi^ index as follows. Definition . Four alternative definitions of the extended caution * ft V- index for person i are: 1 . C2i - 1 j 1 Se i (6j)y; j j/ijSeitfj) C3i - 1 - j 1 G(6 J> S %(b j ) 19 C4i and • C5i / 2 S§ (bj)G(bj) J^ijSe^bj) . 1 ~ n j-1 J J The denominators of the four indices are considered as normalizing constants but the characteristics of the numerators will be divided into two categories. The indices in the first category, C2i and C4j[ give measures that are more group dependent, because they are the sums of cross products of the corresponding elements of the observed vector (yi j) 1 and the row-sum total vector (y.j)> and Group Response Curve G(bj) resjjectively • They measure the relationship of an observed response pattern for, a person i to a normed variable derived from the group the person i belongs to. Thus these indices have a similiar function to the Norm Conformity Index, NCI, defined in Tatsuoka & Tatsuoka (1980). The remaining indices, C3 t and C5 if are more individually oriented. That means the quantities obtained from C3± and £5j[ reflect the extent* a person i's response pattern (y£j) relates to a theoretically derived PRC at the fixed level of Q i . Thus, it can be said that the indices C3i and C5j[ are similar to the Individual Consistency Index (Tatsuoka & Tatsuoka, 1980). ) These extended caution indices for person i will be easily altered v to those for item j. N C2 j 1 - d^i yi - N 9 ERIC 24 20 r / \ and C3^ 1 - 2 yij -V«i> J 1 P Bj< S i )T(S l ) N C4^ = 1 - 2 yij T(6i) i=l J N C5^ = 1 - 2P B ,(e i )T(e i ) i=l J N 2 yij T(^i) i=l J - X SimUfSrly, the indices C3j and C5j are^ potentially useful for detecting anomalous respqnse patterns in comparison with item j's IRC while C2j and C4j are potentially useful indices for purposes of identifying ^J- ems of which patterns deviate from that of test, TRC. The Case of Two and Three Parameter Logistic Models Problems in Person Response Curves and Groujf Response Curv es Person Response Curves for the one parameter logistic model are represented by Smooth monotonically decreasing functions defined over ^the difficulties of the infinitely many items. But BRC for the two parameter logistic model is no longer a smooth, monotpnically decreasirtg curve. .Figure 5 provides the graph of Person Response Curve for the ability levels of 6 » ,0 as veil- as Test Rersponse Curve of the two > parameter logistic model where ability measures 8^, i»l ,2 , • • • , 100 , were randomly sampled frdm abnormal (0^1) distribution, the difficulties bj, j=l ,2, . . 100 were also randomly sampled from a normal (0,1)*- distribution and the item discrimination indices, aj j=l,...,100, were 9 EMC 25 21 v drawn from the uniform distribution of tbe in^erv^l (0.8, 1). Test Response Curve, Person Response Curves are given by and N T(8) - (1/n) 2 P b (8) j = l J *° l+exp[-D^(9 0 -b)J for.a fixed 6 0 and variable b Insert Figure 5 about he re The dotted line (+++) in the figure is the Group Response Curve of a hundred s.ub jects. XAlthough^ach locally oscillated, especially abound the origin, the GRC (the mean curv^of these PRCs) becomes fairly smooth and almost monotonically decreasing. Since b-j, j=l,...100 are randomly selected from N(0,1), a larger oscillation of PRC around the mean 0 is expected. But GRC is expected to be smoother as the number of students and it^etife increase to a larger number. Insert Figure 6 about h ere Figure 6 is the gfaph of TRC, GRC, PRC of 6 « 0 for the three parameter logistic model. The parameters 9^, bj and aj were generated by the same method as that of the two parameter model then fifty C-values of 0.15, and 50 of 0.20 were randomly assigned to 100 pairs of aj and bj to make the three parameter logisticT'model. It seems that the smoothness of the curve GRC f6*r*three parameter logistic model is about the same, differing only as expected in terms of the lower asymptote. A larger number of subjects will be needed for the three parameter case in .order to obtain smoother GRC. The definition of the' extended caution indices may 'be applied more generally to the two and three parameter logistic models' in essentially 24 the same manner as it was develoed for the one parameter model. f ^+ Note that the arrangement of rows and columns according to the orders of the proportion, corrects (p values), for n items £nd the total scores for N subjects is essential to determine S-P curves, and the values of MP and M s , i=l,2,...,N, j=l,...,n. With our extended caution indices, the arrangements 6f rows and columns in monotonic order of the probability are no longer necessary. Application of New Indices for the Detection of Anomalous Responses . There is evidence that^student errors on certain types of arithmetic problems are frequently quite systematic (Brown and Burton, %1 978; Birenbaum and Tatfsuoka, 1980 Davis, McKnight, 1980). That is*, students seem to consistently apply erroneous algorithms in attempting to answer a problem 6f a particular fopi. Sometimes erroneous or incomplete rules resultin the right answer. For example, a student who consistently treats a multiplication sign as if it were an addition sign would get the right answer to the problem 2x2=4, but would get it for the wrong reason. A score of zero for using the wrong operation would be _ better reflection of the student's ability to multiply than a score of one for answering "4" to the item. % Birenbaum and Tatsuoka (1980) haVe demonstrated, that the customary zero- one scoring of incorrect and corrent answers can give the appearance of higher dimensionality and cause difficulty in attempting to apply IRT when * • t. students consistently apply erroneous rules to the addition and subtraction of signed numbers. The* difficulties result from the fact that several erroneous rules frequently yield the right answer for some problems. Right answers for the wrong reasons not only cause problems in applying IRT, but more 29 25 importantly they can result in misleading scores and make it difficult to diagnose what a student is doing wrongs By painstaking work Tatsuoka and her colleagues (Birenbaum and Tatsuoka, 1980; Birenbaum, 1981) were able to identify several erroneous rules that were consistently applied by certain students. Birenbaum and Tatsuoka (1980) reanalyzed their data after converting ones to zeroes for items that students got right for the wrong reasons. That is, an item score was changed from one to zero if (1), a student was identified as consistently applying an erroneous rule and (2) application of that erroneous rule would lead to the correct answer for the particular item in qu£&tion. Analysis of the resulting modified data indicated that the data were more nearly unidimensional and there was good evidence that IRT was more applicable to the modified data than to the original data. Anomalous response patterns can sometimes be found by conducting an intuitive error analysis or by clinical interviews. Both/approaches require enormous effort. Brown and Burton (1978) and Tatsuoka et al. (1*980) have dev€T^xed cumputerized approaches to error analysis. But these methods' are expensive and were based on extensive work with highly specific item content. , Tatsuoka and Tatsuoka (1981) demonstrated an index, called the individualized consistency index (ICI) Which was shown to be useful in detecting a variety of erroneous rules of operation of signed—number addition and subtraction problems. Using the ICI to detect examinees who are apt to hafre a misconception saves considerable effort because only examinees so identified Jiave their item responses routed t^ the detailed error-diagnostic system. Applicatipn of the ICI is limited, 30 26 however, because it requires repeated measures, i.e., several items based on an identical item form, within the test* Such repetition is not common on most tests* As will be seen below, the index similar to ICI, C3i, not only avoids the repeated measure limitation but Is apparently more effective "fpr purposes of detecting anomalous response patterns resulting from the consistent application of an erroneous rule* Tat^uoka & Tatsuoka (1981) showed a list of erroneous rules of operation ("bugs") detected by ICI* The 32 response patterns resulting from these bugs are classified in Group A. The rest of the 103 response patterns are classified into two groups according to the error-diagnostic system, SIGNBUG. Group B consists of 7 responses which are P ro ^ a ^^ using one or two erroneous rules inconsistently; Group C, responding adequately using the r right rule of operation and/or no indication of systematic errors. The errors observed in Group C are apparently just random errors. The estimated item and person ability parameters nee^l to compute the extended caution indices were obtained by the computer program GETAB (Robert Baillie, 1979), using Birenbaum & Tatsuoka's modified dataset. Distributions of the indices C2j[ and C3 ± are displayed in Figures 6 and 7 respectively. Only members of groups A and B (persons who consistently used an erroneous rule) and of group C (persons who made a substantial number of errors but whose errors were not the result of consistent use of an erroneous rule) are included in the distributions shown in Figures 6 and 7. In both figures, persons in group k and B are depicted by shaded boxes and those in group C by unshaded boxes. 31 27 ERIC Insert Figures 7 and 3 about here As can be seen in Figure 6, C2± does not provide any basis for distinguishing persons who are consistently using an erroneous rule from . those who aren't. /The two groups are distinguished almost perfectly, however, by the magnitude of C3i (see Figure 7). Indeed, there is almost no overlap between the two groups • All 39 members of Groups A and B have values of C^f of .05 or higher whereas only two of the 88 members of group C have positive values 6f C3i and the rest of the members of group C have values of C3j[ as large as .05. Thus, C3j[ may be used to identify with a high degree of accuracy those persons who consistently use an erroneous rule. As might be expected from a comparison of the coefficients, C4j[ works in a fashion quite similar to C2^, and C5j[ works much like C3j[ in terms of the abiliy of these indices to distinguish membenPof groups A, B and C. It is clear that C2^ and C4j[ are not useful for detecting anomalous response patterns resulting from consistent application of an erroneous rule. These indices may be useful for other tasks for which NCI or Van.de Flier's index (Harnisch & Linn, 1981) have Seen found to 4 be useful. The third and fifth indices (C3* and CSj) however, are quite 1 effective for purposes of detecting persons who make consistent errors. Insert Table 4 about here Table 4 shows a summary of t-statistics comparing the means on the four generalized caution indices and ICI in the two groups: A and B combined versus C by itself.* 1 The t-value for index 2 is not significant 32 28 40 ooooooooqoooooo ^M** 0l pM 0 o o <j- 2 SM Sfi 8 80 o o o ^ w ~ - - - - (0CJC0^OcpCsJC05j-O(0WO o o o dddddooooo o O O Q O O O O M (0 O t CO M (0 5t ^ io io m to (0 I I I I I I 1 1 f I o o do ° d d d d d d d djd d d o ' • ■ . A Figure 7. Histogram of Index C2 ± : The shaded Area Represents the Members of Group A and B, — Using Some Erroneous Rules, N=127 ERLC 33 ^9 zo 10 -- D 8° QOO QO OOQOOOOOQO OOOQOO'OOOU>NC0 m 3- rorOcJ cj — O O J - — cvJcvJfO^^- w»(0SK«)8«j- d d d d d d d d d 9 d o d d d ci d o d d d d d d d l I III || | | o Figure Histogram of Index C3^: The Black Area Represents the Members of Group A and the Shaded for Group B — Using Some Erroneous Rules The White Area is for Those Using the Right Rute. H=127 ERLC 34 > Table 4 A Summary of t>8 tat is tics Comparing the Means on the Four Generalized Caution Indices and ICI in the Two Groups T«/1 ■( poo VjiOUp A 01 D L Value ' p k N 39 ,88 . Index 2 Mean S.D. • .0929 -^0065 .0306 » .689 .4980 Index 3 Mean .5310 ,2444 -.2688 .1300 '-19.293 < .00005 Index 4 yv t — — Mean c n .0650 -.0045 ♦ UZ7 J —J. HDD Index 5 Mean S.D. .5091 .2615 -.2643 .1350 -17.467 < .00005 & ICI; .Mean S.D. .9223 .0645 .8144 .1058 -7T121 < .00005 ♦ i i 13 35 31 ,. / but all others are. significant* Index 1 is. excluded in the analysis because the denominator of this index becomes infinity when all items are correctly answered by all examinees, | > * Discussion As was shown above, the caution index* which Sato developed based solely on a comparison of observed item responses to group responses can be readily extended to theory based estimates of person and group response probabilities. The caution inde^ is a linear transformation of the covariance of a person's .response pattern with one or another theoretical curves computed' using item-response theory. Alternatively, the extended caution indices m£$ be viewed as linear transformations of the distance bewteen a person's response pattern and a theoretical curve (either the person response curve, as in the case of C3j[ and C5i or the group response curve, as in the case df C4j[). The application of the extended caution indices that were introduced in th}.s paper provided strong evidence that the indices that depend on the distance between a person's response pattern and their theoretical person response curve (i.e., C3i and C5j[) are quite effective for purposes of identifying persons who consistently 'use an V erroneous rule in answering signed-number arithmetic problems. This is a potentially important result that deserves further investigation with other data sets involving' different types of achievement test data. If additional research yields similar results, £hese indices may have\ «■ considerable instructional utility because instruction can be made much *-* . c more specific once it is determined that a student is consistently making an error as the result of a particular misconception. ERJC 38 32 References « Birenbaum, M. Error Analysis — It does make a difference . Doctoral Dissertation, University of Illinois at Urbana-Champaign, 1981. Birenbaum, M. , & Tatsuoka, K. K. The use of information from wrong responses in measuring students' achievement (Research Report 80-1 )• Urbana, 111.: University of Illinois, Computer-based Education Research Laboratory, 1980. Brown, J. S., & Burton, R. R. Diagnostic models for procedural bugs in basic mathematics skills. Cognitive Science , 1978, 2, 155-192. Davis, R. B., & McKnight, C. The influence of semantic content on algorithmic behavior. The Journal of Mathematical Behavior , 1980, J3> '39 Harnisch, D. L., & Linn, R. L. Analysis of item response patterns: questionable test data and dissimilar curriculum' practices. The Journal of Educational Measurement , 1981, in press. Levine, M. V., & Rubin, D. B. Measuring the appropriateness of multiple-choice test scores. Journal of Educational Statistics , 1979, 4, 269-290. / Lord, F. M. Application of item response theory to practical teBTlhg problems .' jlillsdale, N.J.: Erlbaiim, 1980. Lord, F. M., & Novick, M. R. Statistical^ theories of mental test scores. Reading: Addison-Wesley , 1968v^ \ Lumsden, J. Tests are perfectly reliable. British Journal of Mathe- and Statistical Psychology , 1978, 3J^, 19-26. Sato, T. The construction and interpf etation of S-P tables . Tokyo: Meiji Tosho, 1975 (in Japanese). r* Tatsuoka, K. K. , & Tatsuoka, M. M. Detection of aberrant response patterns and, their effect on dimensionality (Research Report 80-4). Urbana, 111.: University of Illinois, Computer-based Education Research* Laboratory , 1 980 • Tatsuoka, K. K., & Tatsuoka, M. M. Spotting erroneous rules of operation by the Individual Consistency Index (Research Report 81-4). Urbana, 111.: University of Illinois, Computer-based Education Research Laboratory, 1981. Tatsuoka, K. K. , Birenbaum, M. , /Tatsuoka, M. M. , & Baillie, R. A psycho- metric approach to error analysis on response .patterns (Research Report 80-3). Urbana,- 111.: University of Illinois, Computer-based Education Research Laboratory, 1980. 33 Trabirii T. E. t & Weiss, D. J, The person response curve: fit of • individuals to item characteristic curve models (Research Report 79-7). Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program, 1979. Wright, B. D., & Stone, M. H. Best test design, Rasch Measurement . Chicago: The University of^Chicago, Mesa Press, 1979. e • r ■ 38