11. CHAPTER 11 ANALYSIS OF ROTATED BOOKLETS AND PARTIAL CREDIT ITEMS

11.1. Overview

Use the PILOT2 sample data set to carry out this exercise. The answer key for this test is in the EXCEL workbook, ItemDataAllTests in the sheet named PILOT2.

Following with the existing national assessment scenario introduced in previous chapters, this walkthrough follows the continued development of the test instrument with two methods: piloting items using a balanced rotated booklet design and using short answer test items that have been scored using partial credits. Other than the initial analysis specifications, for the last four items in PILOT2 (discussed below), the remainder of the workflow follows the same procedures that were explained in the previous walkthroughs. As with the previous chapter, this walkthrough will focus on the unique requirements of the balanced rotated booklet (see page 12) and partial- credit item analysis (see Anderson and Morgan, 2008).

11.2. Step 1: LOADING DATA

The analysis begins with the “Response data analysis” workflow. In the examinee response data interface, select the PILOT2 sample data file. These data represent a three-booklet design, with 104 test items administered to 712 respondents. Not all of the test items are included in each booklet. This type of situation might arise if the national assessment steering committee requested that the test should be quite long to cover an extensive curriculum. To reduce student fatigue, each student would only be administered a booklet containing a subset of the items. In the data shown in Figure 11.1 the third column contains a variable labelled “BOOKLETID”. This variable contains values of 1, 2 or 3 denoting the booklet administered to each student. In addition to the alphabetic item responses and missing response code of 9 (not shown in Figure 11.1), you will notice a frequently occurring value of 7, which indicates that a specific item was not in the booklet assigned to particular student. The data in Figure 11.1, for instance, indicate that the booklet for the student with PILOT2STDID=2 did not contain items MATHC2058. The code of 7 will be treated as omitted and will not affect a student’s test results. Click “Next>>” to continue.

Figure 11.1 Student responses for PILOT2 data

On the item data loading interface, load the PILOT2 item data from the ItemDataAllTests.xls file, as shown in Figure 11.2, in which the data pane has been scrolled to the bottom. The bottom four rows of the item data file contain data for partial credit items. The scoring key for these items contains the information required to assign different numeric scores to the various codes in the response file, depending on the quality of a student’s response to individual partial credit items. In this example, the scoring key reflects the manual scoring of each item; the code of 1 is scored as 1, 2 as 2, and 3 as 3. It is not necessary to provide a score for a code of 0. If a response code is not in the answer key and is not treated as missing, it will be assigned a score of 0.

Figure 11.2 Item answer keys and metadata for PILOT2 data

Confirm that you have loaded the correct response data and item data, then click the “Next>>” button to continue to the analysis specifications;

11.3. Step 2: ANALYSIS SPECIFICATIONS

Under Select ID (optional) enter PILOT2STDID. In the Specifyy missing treatment specifications in the previous examples, the box in the column headed Incorrect was checked only when there was missing data. Now, because of the rotated booklet design, you have to include in the Specify missing treatment an omit code to indicate that some responses are not to be scored. Check the value of ‘7’ in the “Do Not Score” column, as shown in Figure 11.3. When the value of ‘7’ is encountered in the response data for a particular student, IATA will ignore the item, so it will not affect the student’s results. Similarly, participants with response codes of ‘7’ for an item will not affect the estimation of statistics or parameters for that item.

Figure 11.3 Analysis specifications for rotated booklets with partial credit items, PILOT2 data

When you have entered the analysis specifications, click the “Next>>” button to continue; the analysis will begin automatically. The computational time is affected more by the number of test items than the number of students in the data, so the analysis will take longer to run than the previous walkthroughs.

11.4. Step 3: ITEM ANALYSIS RESULTS

When IATA finishes running the analysis, the results will be displayed as in Figure 11.4, where the table has been scrolled down to display MATH2003, which has been assigned a warning symbol. The results indicate that this item is has a weak relationship with proficiency; students tend to have a 0.61 probability of correctly responding, regardless of their level of proficiency. IATA suggests that this weakness may be the result of misleading requirements, which tend to be associated with poorly functioning distractors. In other words, when students do not understand the requirements of the item or there is no unambiguously correct response, they tend to guess. The distractor analysis table displays the data underlying this summary, showing that option D is the only distractor eliciting the desired behaviour (the empty column with the header “7 Omit” reminds that the code of “7” was not allowed to influence the estimation.). To prevent MATHC2003 from potentially reducing the accuracy of the analysis results, remove the checkmark beside the item name and clicking Analyze to update the results IATA will notify you that it will recalibrate partial credit items; click Yes to proceed. .

Figure 11.4 Item analysis results for PILOT2 data, item MATHC2003

At the bottom of the table on the left, you can see the different rows that were created automatically by IATA for each of the item scores for each of the partial credit items. For rows that represent scores of partial credit items (where the “Name” column contains the “@” symbol followed by an integer), the statistics are estimated as if each score were a single correct/incorrect item where the correct answer is any score value greater than or equal to the selected score. IATA will create an additional set of statistical results for each partial credit score that is provided in the scoring key for an item. For example, with a partial credit item having non-zero scores of 1 and 2, the item facility for the score of 1 (“ItemName@1”) would describe the proportion of students with item scores greater than or equal to 1, and the item facility for the score of 2 (“ItemName@2”) would describe the proportion of students only with a score of 2.

In the distractor analysis table in Figure 11.5, note that MATHSA001@2 uses codes of both 2 and 3 as keyed responses; for MATHSA001@1, codes 1, 2 and 3 would be used as keyed responses. The item facilities are always larger for lower scores of an item because they include all the students who were assigned higher scores. For example, the results presented in Figure 13.5 highlight the results for item MATHSA001 with the score of 2 selected. For this item, the score of 1 (MATHSA001@1) has a PVal of 0.90; the score 2 (MATHSA001@2) has a PVal of 0.77; and the score of 3 (MATHSA001@3) has a PVal of 0.45.

Figure 11.5 Partial credit item response function, CYCLE2 data, item MATHSA001, score=2

Although the statistics describe each score separately, the IRF’s for partial credit items reflect the fact that a student can only be assigned a single score value. The IRF for a partial credit item is represented as a set of item category characteristic curves (ICCCs), one for each item score. When a row corresponding to specific score is selected, the graph illustrates the ICCC of the selected score in bold. At each level of proficiency, an ICCC expresses the probability that a respondent with a particular level of proficiency will be assigned a particular score, exclusive of all other scores. As shown in Figure 11.5, as proficiency increases, the probability of each score value first increases then decreases as the students become more likely to achieve higher score values. Additional description of IRFs for partial-credit items is found on page 156.

Although there are no simple rules for analyzing an IRF for a partial credit item, a useful partial credit scoring scheme tends to have the property that each score value will have the greatest probability of being selected over a certain range of proficiency. For example, the first score value for item MATHSA001 has been assigned a caution symbol. The ICCC indicates that the probability of being assigned a score of 1 is not higher than any other score at any level of proficiency. The score value of 2 for MATHSA001, illustrated by the bell-shaped curve with a peak around -0.5, is the most likely score value for all students with below-average proficiency. These results indicate that the score value of 1 does not provide useful information because it is statistically indistinguishable from the score value of 2. With most partial credit items, the highest category is typically the most useful, as human markers tend to be able to clearly identify completely correct or completely incorrect responses, but are less able to accurately distinguish between degrees of partial correctness.

One of the main tasks of analysing partial credit items in pilot testing is to determine if all of the score values are useful and how to improve the scoring process. For example, if a score category (such as 1) has low probability of being assigned, there are two main possibilities: either 1) no or very few respondents produced responses that correspond to that score, or 2) the scorers are not able to identify which responses should be assigned the score. In the first case, the responses associated with the score should be consolidated with an adjacent score category (such as merging scores of 1 and 2 into one score category of either 1 or 2), and in the second case, the problem may be remediated with more intensive or standardized training of scorers. Thus, while the results of dichotomous item analyses are mainly relevant to item writers and test developers, the results of partial credit analysis should also be shared with the teams responsible for scoring student test booklets.

After reviewing the results of the analysis, you can proceed with replicating the analyses that were demonstrated in the previous chapters. The specification and interpretation of the remaining tasks in the workflow are largely the same as presented in the previous walkthroughs. There are two new considerations for this example: a) interpretation of dimensional analysis with rotated booklets, and b) the specification required for partial credit items in IATA’s item selection interface

If the test administration uses multiple test booklets containing different sets of items, IATA can only perform the dimensional analysis if there are a sufficient number of shared items between booklets. If the rotation is complex, the general principle for successful design is to prevent test items or blocks of items from being completely orphaned. For example, if there are three blocks of test items (A, B, and C), and three test forms (1, 2, and 3), then test form 1 should contain blocks A/B, test form 2 should contains blocks B/C, and test form 3 should contain blocks C/A. Because the blocks are fully rotated, there are no orphaned items. Conversely, if test form 1 contains block A/B, test form 2 contains block A/B and test form 3 contains block B/C, block C is orphaned from block A, and therefore it is not possible to estimate the correlations between items in block A and items in block B. If you do have orphaned item blocks and you wish to perform a dimensional analysis, you must remove the orphaned items from the analysis or perform the dimensional analysis on each test form separately. However, once you have sufficient evidence to confirm that the collective set of items is assessing a single dimension, you may include all items back in the analysis for IRT item parameter and score estimation.

To run the selection properly with partial credit items, you must first manually include all the partial credit item scores by checking the item scores for an item and count each score value as a separate item when entering the total number of ‘items’. Thus, if you wish to select 10 items, and one of those items is a partial credit item with two score categories, then you need to specify a selection of 11 ‘items’ and manually preselect the item score values for your desired partial credit item. Repeating the analyses discussed in previous chapters with the PILOT2 data is left as an independent exercise.

For reference, the item data results of this analysis walkthrough are included in the ItemDataAllTests.xls file, in the worksheet named “ReferenceP2.”

11.5. SUMMARY

In this chapter you performed an analysis of a balanced rotated booklet pilot test with partial credit items. You used the “Do Not Score” missing data treatment to exclude

certain response codes from analysis and used complex answer keys to specify multiple non-zero scores for test items. You examined IRFs of partial credit items, which are composed of multiple item category characteristic curves or ICCCs.

In the following chapter, you will be introduced to the use of formal statistical linking to adjust the scales of one test administration to be comparable to the results of a second test administration where the tests share common items.

MỘT SỐ VẤN ĐỀ GIÁO DỤC

Thứ Năm, 5 tháng 11, 2015

Hướng dẫn sử dụng phần mềm phân tích đề thi IATA (Chapter 11_Convert to PDF to Word full)