11. CHAPTER 11 ANALYSIS OF ROTATED BOOKLETS AND PARTIAL CREDIT ITEMS
11.1. Overview
Use the PILOT2 sample data set to carry out this exercise.
The answer key for this test is in the EXCEL workbook,
ItemDataAllTests in the sheet named PILOT2.
Following with the existing
national assessment scenario
introduced in previous chapters, this walkthrough follows
the continued development of the test instrument with two methods: piloting
items using a balanced rotated
booklet design and using short
answer test items that have been scored using partial credits.
Other than the initial analysis
specifications, for the last four items in PILOT2 (discussed
below), the remainder of the workflow follows the same procedures
that were explained
in the previous walkthroughs. As with the previous chapter,
this walkthrough will focus on the unique requirements of the balanced
rotated booklet (see page 12) and partial- credit item analysis (see Anderson and Morgan, 2008).
11.2. Step 1: LOADING DATA
The analysis
begins with the “Response data analysis”
workflow. In the examinee response data interface, select the PILOT2 sample data file. These data represent
a three-booklet design, with 104 test items administered to 712 respondents. Not all of the test items are included in each booklet.
This type of situation might arise if the national assessment steering committee
requested that the test should be quite long to cover an extensive curriculum. To reduce student fatigue, each student would only be administered a booklet containing
a subset of the items. In the data shown in Figure 11.1 the third column contains a variable
labelled “BOOKLETID”. This variable
contains values of 1, 2 or 3 denoting the booklet administered to each student.
In addition to the alphabetic
item responses and missing response code of 9 (not shown in Figure 11.1), you will notice a frequently
occurring value of 7, which indicates
that a specific item was not in the booklet assigned to particular student. The data in Figure 11.1, for instance, indicate
that the booklet for the student with PILOT2STDID=2 did not contain items MATHC2058. The code of 7 will be treated as omitted and will not affect a student’s
test results. Click “Next>>” to continue.
Figure 11.1 Student responses for PILOT2 data
On the item data loading interface,
load the PILOT2 item data from the ItemDataAllTests.xls file, as shown in Figure 11.2, in which the data pane has been scrolled to the bottom. The bottom four rows of the item data file contain data for partial credit items. The scoring key for these items contains the information required
to assign different
numeric scores to the various codes in the response file, depending on the quality of a student’s
response to individual partial credit items. In this
example, the scoring key reflects the manual scoring of each item; the code of 1 is scored as 1, 2 as 2, and 3 as 3. It is not necessary to provide a score for a code of 0. If a response code is not in the answer key and is not treated
as missing, it will be assigned a score of 0.
Figure 11.2 Item answer keys and metadata for PILOT2 data
Confirm that you have loaded the correct response
data and item data, then click the “Next>>” button to continue to the analysis specifications;
11.3. Step 2: ANALYSIS SPECIFICATIONS
Under Select ID (optional)
enter PILOT2STDID. In the Specifyy missing treatment specifications in the previous examples, the box in the column headed Incorrect was checked
only when there was missing data.
Now, because of the rotated booklet design, you have to include in
the Specify missing
treatment an omit code to indicate that some responses are not to be scored. Check the value of ‘7’ in the “Do Not Score” column,
as shown in Figure 11.3. When the value of ‘7’ is encountered in the response data for a particular student, IATA will ignore the item, so it will not affect the student’s results.
Similarly, participants with response codes of ‘7’ for an item will not affect the estimation
of statistics or parameters for that item.
Figure 11.3 Analysis specifications for rotated booklets with partial
credit items, PILOT2 data
When you have entered the analysis specifications, click the “Next>>” button to continue; the analysis will begin automatically. The computational time is affected more by the number of test items than the number of students
in the data, so the analysis will take longer to run than the previous walkthroughs.
11.4. Step 3: ITEM ANALYSIS RESULTS
When IATA finishes running the analysis,
the results will be displayed
as in Figure 11.4, where the table has been scrolled down to display MATH2003, which has been assigned a warning symbol.
The results indicate
that this item is has a weak relationship with proficiency; students
tend to have a 0.61 probability of correctly responding, regardless of their level of proficiency. IATA suggests that this weakness may be the result of misleading requirements, which tend to be associated with poorly functioning distractors. In other words, when students do not understand the requirements of the item or there is no unambiguously correct
response, they tend to guess. The distractor
analysis table displays
the data underlying this summary, showing that option D is the only distractor eliciting the desired behaviour (the empty column with the header “7 Omit” reminds that the code of “7” was not allowed to influence the estimation.). To prevent
MATHC2003 from potentially reducing the accuracy of the analysis results, remove the checkmark beside the item name and clicking Analyze to update the results IATA will notify you that it will recalibrate
partial credit items; click Yes to proceed. .
Figure 11.4 Item analysis results
for PILOT2 data, item MATHC2003
At the bottom of the table on the left, you can see the different
rows that were created automatically by IATA for each of the item scores for each of the partial credit items. For rows that represent
scores of partial credit items (where the “Name” column contains the “@” symbol followed by an integer),
the statistics are estimated as if
each score were a single correct/incorrect item where the correct answer is any score value
greater than or equal to the selected
score. IATA will create an additional set of statistical results for each partial credit score that is provided in the scoring key for an item. For example,
with a partial credit item having non-zero scores of 1 and 2, the item facility
for the score of 1 (“ItemName@1”) would describe the proportion
of students with item scores greater than or equal to 1, and the item facility
for the score of 2 (“ItemName@2”) would describe the proportion of students only with a score of 2.
In the distractor
analysis table in Figure 11.5, note that MATHSA001@2 uses codes of both 2 and 3 as keyed responses; for MATHSA001@1, codes 1, 2 and 3 would be used as keyed responses. The item facilities are always larger for lower scores of an item because they include all the students who were assigned higher scores. For example, the results presented in Figure 13.5 highlight the results for item MATHSA001 with the score of 2 selected.
For this item, the score of 1 (MATHSA001@1) has a PVal of 0.90; the score 2 (MATHSA001@2) has a PVal of 0.77; and the score of 3
(MATHSA001@3) has a PVal of 0.45.
Figure 11.5 Partial credit
item response function, CYCLE2 data,
item MATHSA001, score=2
Although the statistics describe each score separately, the IRF’s for partial credit items reflect the fact that a student can only be assigned a single score value. The IRF for a partial credit item is represented as a set of item category
characteristic curves (ICCCs), one for each item score. When a row corresponding to specific score is selected, the graph illustrates the ICCC of the selected
score in bold. At each level of proficiency, an ICCC expresses
the probability that a respondent with a particular level of proficiency will be assigned
a particular score, exclusive of all other scores. As
shown in Figure 11.5, as proficiency increases, the probability of each score value first
increases then decreases
as the students become more likely to achieve higher score values. Additional description of IRFs for partial-credit
items is found on page
156.
Although there are no simple rules for analyzing
an IRF for a partial credit item, a useful partial credit scoring scheme tends to have the property that each score value will
have the greatest
probability of being selected over a certain
range of proficiency.
For example, the first score value for item MATHSA001 has been assigned
a caution symbol. The ICCC indicates
that the probability of being assigned a score of
1 is
not higher than any other score at any level of proficiency. The score value of 2 for MATHSA001, illustrated by the bell-shaped curve with a peak around -0.5, is the most likely score value for all students with below-average
proficiency. These results indicate that the score value of 1 does not provide useful information because it is statistically indistinguishable from the score value of 2. With most partial credit items, the highest category is typically the most useful, as human markers tend to be able to clearly identify
completely correct or completely incorrect responses, but are less able to accurately
distinguish between degrees of partial correctness.
One of the main tasks of analysing
partial credit items in pilot testing
is to determine if all of the score values are useful and how to improve the scoring process. For example, if a score category (such as 1) has low probability of being assigned,
there are two main possibilities: either 1) no or very few respondents produced responses
that correspond to that score, or 2) the scorers are not able to identify which responses should be assigned the score. In the first case, the responses associated
with the score should be consolidated with an adjacent
score category (such as merging scores of 1 and 2 into one score category of either 1 or 2), and in the second case, the problem may
be remediated with more intensive
or standardized training of scorers. Thus,
while the results of dichotomous item analyses are mainly relevant
to item writers and test developers, the results of partial credit analysis should also be shared with the teams
responsible for scoring student test booklets.
After reviewing the results of the analysis,
you can proceed with replicating the analyses that were demonstrated in the previous chapters. The specification and interpretation of the remaining
tasks in the workflow are largely the same as presented
in the previous walkthroughs. There are two new considerations for this example: a) interpretation of dimensional analysis
with rotated booklets,
and b) the specification required for partial credit items in IATA’s item selection interface
If the test administration uses multiple test booklets containing
different sets of items, IATA can only perform the dimensional analysis
if there are a sufficient number of shared items between booklets. If the rotation
is complex, the general principle
for successful design is to prevent test items or blocks of items from being completely
orphaned. For example,
if there are three blocks of test items (A, B, and C), and three test
forms (1, 2, and 3), then test form 1 should contain blocks A/B, test form 2 should contains blocks B/C, and test form 3 should contain blocks C/A. Because the blocks are fully rotated, there are no orphaned items. Conversely, if test form 1 contains
block A/B, test form 2 contains block A/B and test form 3 contains
block B/C, block C is orphaned from block A, and therefore
it is not possible to estimate the
correlations between items in block A and items in block B. If you do have orphaned item blocks and you wish to perform a dimensional analysis, you must remove the orphaned items from the analysis or perform the dimensional analysis
on each test form separately. However, once you have sufficient evidence to confirm that the collective set of items is assessing
a single dimension,
you may include all items back in the analysis
for IRT item parameter and score estimation.
To run the selection
properly with partial credit items, you must first manually include all
the partial credit item scores by checking
the item scores for an item and count each
score value as a separate
item when entering
the total number of ‘items’. Thus,
if you wish to select 10 items, and one of those items is a partial credit item with two score categories, then you need to specify a selection of 11 ‘items’ and manually preselect the item score values for your desired partial credit item. Repeating
the analyses discussed
in previous chapters with the PILOT2 data is left as an independent exercise.
For reference, the item data results of this analysis
walkthrough are included in the ItemDataAllTests.xls file, in the worksheet named “ReferenceP2.”
11.5. SUMMARY
In this chapter you performed an analysis of a balanced
rotated booklet pilot test with partial credit items. You used the “Do Not Score” missing
data treatment to exclude
certain
response codes from analysis and used complex answer keys to specify multiple non-zero scores for test items. You examined IRFs of partial credit items, which are composed of multiple item category characteristic curves or ICCCs.
In the following
chapter, you will be introduced to the use of formal statistical linking to adjust the scales of one test administration to be comparable to the results of a second test administration where the tests share common items.
Không có nhận xét nào:
Đăng nhận xét