13. CHAPTER 13 SPECIALIZED METHODS IN IATA

13.1. Overview

The previous walkthroughs provided examples of how to use all the task interfaces in IATA. The combination of the two workflows: “Response data analysis” and “Response data analysis with linking” will satisfy almost all of the analysis situations required by a national assessment. However, many of these interfaces can also be accessed in separate workflows and may be used with different input data to perform specialized analyses.

This chapter presents an overview of the three workflows that use IATA item data files as their main input data. Also, the principle of anchoring item parameters during estimation is introduced and used in an example with response data to demonstrate how several functions that are available as explicit tasks, such as DIF analysis or linking, may be performed simply by providing input item parameters prior to the analysis of response data.

Because the main IATA interfaces used to access these functions have already been discussed in previous chapters, this chapter will focus on the considerations specific to each topic, rather than provide complete walkthroughs.

13.2. LINKING ITEM DATA

The walkthrough in Chapter 14 discussed a scenario in which the linking was performed in the same workflow as the analysis of item response data. However, in practice, these two activities may occur at different times. A common practice, used on international assessments such as the Program for International Student Assessment (PISA) is for the initial response data analysis to be completed and to have cycle-specific scores included in the data that are shared with analysts prior to establishing the link to previous assessments. The linkage is then performed using the item parameter estimates from the different cycles, and then the linking constants are provided to the analysts and the data are updated with linked scores and parameters. To perform this sequence of operations, you can use the typical “Response data analysis with linking” workflow to conduct the initial analysis, item parameter estimation and scoring. Then, the “Linking item data” workflow may be used to conduct the linking using the saved item parameters.

After selecting the workflow from the main menu, you must perform the following steps:

1. If you wish to apply the results of the linkage to existing scores, load a data file containing the “IRTscore” variable created by IATA during analysis of the new assessment data. This step is optional. For example, if you were manually linking the CYCLE2 results to the CYCLE1 scale, you would load the Scored table from the saved CYCLE2 analysis results.

2. Load the newly estimated item parameters in an item data file. The file must contain item names as well as the IRT parameters. For example, if you were manually linking the CYCLE2 results to the CYCLE1 scale, you could load the ReferenceC2 table from the ItemDataAllTests.xls IATA sample data file.

3. Load the reference item data file. The reference item data file must contain item names and IRT parameters, and all names must be the same as the names used in the new item data file. For example, if you were manually linking the CYCLE2 results to the CYCLE1 scale, you could load the ReferenceC1 table from the ItemDataAllTests.xls IATA sample data file.

4. On the common-item linking interface, calculate the initial linkage using the interface shown in Figure 12.4.

5. Review the quality of the linkage, in particular any items with idiosyncratic linked IRFs.

6. Determine if each potentially problematic item is to be removed or excluded from the estimation of the link.

7. Recalculate the linking constants

8. Apply the link constants to the item parameters and test scores (if available).

9. Save the analysis results.

For this workflow, if the scored data are not available to be loaded into IATA, the linking constants may be applied manually using any spreadsheet or data analysis software using the equations presented in Chapter 15 (page 197). Alternately, the process of finalizing the selection of items to include in the estimation of the linking constants may be performed without loading student data with IRT scores. Afterwards, the scored student data may be loaded with the finalized linking data to generate the linked scores as part of a final implementation of the “Linking item data” workflow.

13.3. SELECTING OPTIMAL TEST ITEMS

The previous walkthroughs discussed the selection of test items in the context of analysing response data. However, the selection of specific test items for inclusion in a new assessment might happen long after response data are analysed and the assessment cycle is over. This situation may occur if there has been a long gap between cycles of a national assessment, or if a follow-up to a previous national assessment has recently received funding and there was no test linking strategy developed during the analysis of the original assessment. In these scenarios, the test developers will need to select anchor items from the previous assessment in order to link the new assessment results to the existing assessment results.

The item selection interface is available from any workflow that analyses response data. However, because the item selection task only requires items with calibrated parameters, response data are not strictly necessary. To perform the item selection using only existing item parameters, perform the following steps:

1. Select the “Select optimal test items” workflow from the main menu.

2. Load the item data file that was saved from a previous data analysis (typically named “Items1” if it was produced automatically by IATA). These data must include item names and IRT parameters. The data should also include information on item Level and Content. The required data and the format in which they should be saved are described in detail in section 8.3.2 (page 16).

3. Follow the item selection process using the item selection interface described in previous chapters (see Figure 9.18).

4. Save the results. Note that on the final interface of the workflow, IATA will have both the original item data that you loaded (labelled “Items1”) as well as distinct item data tables for all item selections you have made. The item selections are assigned the prefix of “CustomTest” followed by the unique name you had specified for the item selection in IATA.

13.4. DEVELOPING AND ASSIGNING PERFORMANCE STANDARDS

Setting performance standards is an important step in making the results of the national assessment accessible to wider audiences of stakeholders. Chapter 10 discussed the task of setting performance standards as a relatively straightforward exercise. However, in practice, setting performance standards typically requires iterative work involving review of both item contents and statistical results. The process may take input from multiple people over a long period of time as the feedback from different stakeholders is incorporated into the process. Rather than perform response data analysis at each stage, simply referring to the previously- estimated results will save time and reduce the chances of introducing errors through incorrectly-specified analyses. Using the “Developing and assigning performance standards” workflow in IATA allows you to use results from previous analyses to facilitate the standard setting process. Note that using this workflow requires that you have previously completed an analysis of item response data and have saved the results.

To properly inform the development of performance standards, both IRT scores and item parameters should be loaded into IATA. However, the scores are optional, as only the item parameter are used as the basis of calculations. To complete this workflow, perform the following steps:

1. Select the “Developing and assigning performance standards” workflow from the main menu.

2. Load IRT scored student data, if they are available. These scores should have been produced using the items that will be used to estimate that performance standards. The IRT scores must be the original scores produced by IATA (or other software) that are on the scale of the item parameters, without any rescaling or standardization. Although the scores are optional, having the distribution of proficiency to compare against the provisional proficiency levels is desirable in the standard setting process. For example, if you are using results produced automatically by IATA, the IRT scores are contained in the “Scored” results table.

3. Load the item data file containing the IRT parameters and pre-assigned Level for each item. Although the item level assignments may (and likely will) be modified, each item that will be used should have a level assigned to it. For example, if you are using results produced automatically by IATA, the item data are contained in the “Items1” results table.

4. Perform the standard setting procedures described in Chapter 12. Because this procedure may be iterative, it may be necessary to repeat steps 1-4 several times, using various Bookmark data reviews (described on page 88).

5. At this stage, if the cut-points for the performance standards have been finalized, you should make sure that you have loaded the scored student data file so that you can apply the thresholds to the scored data.

6. With the panel of stakeholders responsible for establishing performance standards, enter final cut-points into the Threshold column on the bottom right of the screen (the table on the performance standards interface).

7. If the scored data have been loaded, apply the thresholds to the scored data (if scored data are loaded) through the performance standards interface by clicking the “Add Levels” button..

8. Save the results. In general, you should save all tables that contain data that have been modified. These include the PLevels table, which has been updated with new thresholds, the Items1 table, which may have new level assignments for items, and the Score table, which will have level assignments for students.

13.5. RESPONSE DATA ANALYSIS WITH ANCHORED ITEM PARAMETERS

In previous chapters, the working paradigm for analysing response data assumed that all IRT item parameters are unknown and must be estimated using the response data. Even if item parameters are available from a previous cycle and are used to calculate linking constants, IATA first estimated all item parameters using the response data. The linking process involved first estimating new parameters, then linking the new parameters to the old parameters. However, IATA also provides the facility to import fixed item parameters that will not be adjusted by IATA during the analysis of response data. These are called anchored item parameters.

Anchored item parameters are a, b and (optionally) c parameters that have been assigned values in an item data file for some test items prior to the analysis of a particular response data file, much like the anchor items used in formal linking. When the response data are analyzed, the parameters of the non-anchored items are calculated, but the anchored item parameters will remain fixed at their pre-specified values. Because of the iterative nature of item parameter estimation, any newly estimated results, such as the IRT parameters for non-anchored items and IRT scores for students will be expressed on the scale that is defined by the anchored item parameters.

This technique can be used with either the “Response data analysis” or “Response data analysis with linking” workflows. The only difference between the use of anchored items and the walkthroughs shown in previous chapters is that some items will already have item parameters assigned in the input item data file.

Consider a scenario where the national assessment committee has decided to use a test from a previous national assessment cycle with only minor modifications. In this case, it is not necessary to perform the complete linking procedure that was described in Chapter 12. Instead, the existing item parameters can be used from the analysis of the previous national assessment data for the majority of the items that were retained. For the few items that are newly introduced, IATA will automatically calibrate their IRT parameters such that they are expressed on the same scale as the anchored item parameters. The final IRT scores, which will be based on both the anchored item parameters as well as the newly calibrated items, will also be expressed on the same scale as the anchored item parameters.

To demonstrate this procedure, you will use the CYCLE3 sample data set to carry out this exercise. The item data for this test is in the EXCEL workbook, ItemDataAllTests in the sheet named CYCLE3. These data represent the third cycle of the national assessment program that you have been analysing throughout the previous chapters. In these data, a decision was made. The national assessment team decided to use the items from the CYCLE2 test after making minor modifications in content, and replacing only eight of the multiple choice items and all of the short answer items. Rather than re-estimate new parameters and linking constants, the national assessment steering decided to use the item parameters from CYCLE2 as to anchor the parameter estimates for the new items.

To perform analysis with anchored items, complete the following steps:

1. Select the “Response data analysis” workflow from the main menu.

2. Load the CYCLE3.xls response data (containing 2539 records and 61 variables) from the IATA sample data folder and click the “Next>>” button.

3. Load the ItemDataAllTests.xls file and select the CYCLE3 table as the item data. The table contains 53 records and 7 variables. Note that, unlike the item data files used in previous analyses, values for the a and b parameters are present for some, but not all items, as shown in Figure 13.1. The item parameters that already have assigned values are the anchored item parameters. Their values were produced during the analysis of CYCLE2 data and are linked to the original scale that was established for the CYCLE1 data. Several items with answer keys specified but no item parameters will be assigned new item parameters that are estimated from the response data. Because the anchored parameters were already linked to the CYCLE1 scale in the previous analysis, the newly estimated parameters in the current analysis of CYCLE3 data will also be linked to the CYCLE1 scale.

Figure 13.1 Item data for CYCLE3 with anchored item parameters

4. Click the “Next>>” button to proceed to the analysis specifications.

5.Set the identification variable as “CYCLE3STDID”, weight variable as “CYCLE3weight”, and the value of “9” to be treated as incorrect. Click the “Next>>” button to begin the analysis.

6. The results produced are shown in Figure 13.2. Note that all items now have parameters, but the anchor items maintain their original values (see Figure 13.1). Unlike the test-level linking, it is now possible to see how the anchored item parameters fit to the current response data by comparing the theoretical IRF to the empirical IRF in each item’s results. For example, item MATHC2047 used anchored item parameters, so the IRF labelled “Theoretical” in Figure 13.2 is not based on the CYCLE3 data, whereas the IRF labelled “Empirical” is. If the fit is poor between theoretical and empirical IRFs for a set of response data and the sample of new response data is large, then an item should not use anchored parameters. However, if the sample is small (e.g., less than 500), then lack of fit between the Theoretical and Empirical IRFs may simply be due to random error and can be ignored.

Figure 13.2 Item analysis results with anchored item parameters, CYCLE3 data, item MATHC2047

8. Because the specification of the remaining tasks in this workflow is identical to what you have performed in previous walkthroughs, reviewing or performing the remaining analyses is left as an independent exercise. Note that, because the results are automatically linked to the CYCLE1 scale, the mean and standard deviation of the IRT score may deviate significantly from 0 and 1 (in this case, the mean = 0.02, s.d. = 1.04). There is one important consideration for scaling results that use anchored item parameters: because the IRT scores are anchored to the linked parameters from CYCLE2, you should use the “Rescale” option (IATA Page 7/10) to produce scale scores, specifying the mean and standard deviation and equal to the values used when establishing the NAMscore scale in CYCLE 1 (500 and 100, respectively). Similarly, the performance standard thresholds from CYCLE1 (Level 4: 0.95, Level 3: 0.35, Level 2: -0.25, Level 1: -0.85) may be applied directly, because the IRT score is expressed on the scale that was established with the CYCLE1 data. Press Enter when you are finished entering the last threshold value to ensure IATA updates the values properly.

For reference, the item data results of this analysis walkthrough are included in the ItemDataAllTests.xls file, in the worksheet named “ReferenceC3.”

The use of anchored parameters is particularly useful in situations where either the sample size of the a new assessment administration is small, the tests have substantial overlap, or response data are available from both tests. In the last scenario, the response data should include all respondents from both cycles; the item data would include answer keys for all items, but only item parameters for the items that were used in the previous cycle would be assigned values.

13.6. 13.6 SUMMARY

In this chapter, you were introduced to several specialised uses of the IATA interfaces to which you had been introduced in previous chapters. Each of the examples in this chapter made use of the results produced using previous analyses. Although data analysis and reporting of national assessments results will use the scaled score created using an arbitrary set of scaling constants (mean and standard deviation), any analyses in IATA that involve previously-estimated IRT scores and parameters must use the raw IRT scores.

The analyses described in this chapter used results that had produced from previous analyses. Although these input data should be preserved in the output data directories from the original analyses, if there were some modifications to the data during the analysis, such as removing items from scoring, adjusting level or content assignments, it is a good idea to save all data tables from for the purpose of documentation. You should write a short description of the changes made to any reference data file during the current analysis in a ReadMe text file (see Greaney and Kellaghan, 2012, Part III).

MỘT SỐ VẤN ĐỀ GIÁO DỤC

Thứ Năm, 5 tháng 11, 2015

Hướng dẫn sử dụng phần mềm phân tích đề thi IATA (Chapter 13_Convert to PDF to Word full)