13. CHAPTER 13 SPECIALIZED METHODS IN IATA
13.1. Overview
The previous walkthroughs provided
examples of how to use all the task interfaces in IATA. The combination of the two workflows: “Response data analysis” and “Response data analysis with linking” will satisfy almost all of the analysis
situations required by a national
assessment. However, many of these interfaces can also be accessed in separate workflows
and may be used with different input data to perform specialized analyses.
This chapter presents
an overview of the three workflows that use IATA item data files as their main input data. Also, the principle of anchoring item parameters during estimation is introduced and used in an example with response
data to demonstrate how several functions that are available
as explicit tasks, such as DIF analysis
or linking, may be performed
simply by providing
input item parameters prior to the analysis of response data.
Because
the main IATA interfaces used to access these functions
have already been discussed in previous chapters,
this chapter will focus on the considerations specific to each topic, rather than provide complete walkthroughs.
13.2. LINKING ITEM DATA
The walkthrough in Chapter 14 discussed a scenario in which the linking was performed in the same workflow as the analysis
of item response data. However,
in practice, these two activities
may occur at different times. A common practice, used on international assessments such as the Program for International Student Assessment (PISA) is for the initial response data analysis to be completed
and to have cycle-specific scores included in the data that are shared with analysts prior to establishing the link to previous assessments. The linkage is then performed
using the item parameter
estimates from the different cycles, and then the linking constants are provided to the analysts
and the data are updated with linked scores and parameters. To perform this sequence
of operations, you can use the typical “Response data analysis with linking” workflow to conduct the initial analysis,
item parameter estimation and scoring.
Then, the “Linking
item data” workflow may be used to conduct the linking using the saved item parameters.
After selecting the workflow from the main menu, you must perform the following steps:
1. If you wish to apply the results of the linkage to existing
scores, load a data file containing the “IRTscore” variable created by IATA during analysis of the new assessment data. This step is
optional. For example, if you were manually linking the CYCLE2 results to
the CYCLE1 scale, you would load the Scored table from the saved CYCLE2 analysis
results.
2. Load the newly estimated item
parameters in an item data file. The file must contain item names as well as the
IRT parameters. For example, if you were manually linking the CYCLE2 results to
the CYCLE1 scale, you could load the ReferenceC2 table from the ItemDataAllTests.xls
IATA sample data file.
3. Load the reference item data
file. The reference item data file must contain item names and IRT parameters, and
all names must be the same as the names used in the new item data file. For example,
if you were manually linking the CYCLE2 results to the CYCLE1 scale, you could load
the ReferenceC1 table from the ItemDataAllTests.xls IATA sample data file.
4. On the common-item linking
interface, calculate the initial linkage using the interface shown in Figure
12.4.
6. Determine if each potentially
problematic item is to be removed or excluded from the estimation of the link.
7. Recalculate the linking
constants
8. Apply the link constants to
the item parameters and test scores (if available).
9. Save the analysis
results.
For this workflow,
if the scored data are not available
to be loaded into IATA, the linking constants
may be applied manually using any spreadsheet or data analysis software using the equations
presented in Chapter
15 (page 197). Alternately, the process of finalizing the selection of items to include in the estimation of the linking constants may be performed
without loading student data with IRT scores. Afterwards, the scored student data may be loaded with the finalized linking data to generate
the linked scores as part of a final implementation of the “Linking
item data” workflow.
13.3. SELECTING OPTIMAL TEST ITEMS
The previous walkthroughs discussed the selection
of test items in the context of analysing response data. However, the selection of
specific test items for inclusion in a
new assessment might happen long after response data are analysed and the assessment cycle is over. This situation
may occur if there has been a long gap between
cycles of a national assessment, or if a follow-up to a previous national assessment has recently received
funding and there was no test linking strategy
developed during the analysis of the original assessment. In these
scenarios, the test developers will
need to select anchor items from the previous assessment in order to link the new assessment results to the
existing assessment results.
The item selection
interface is available from any workflow
that analyses response data. However,
because the item selection
task only requires
items with calibrated
parameters, response data are
not strictly necessary. To perform the item
selection
using only existing
item parameters, perform
the following steps:
1. Select the “Select optimal test items” workflow from the main menu.
2. Load the item data file that was saved
from a previous data analysis (typically named “Items1” if it was produced automatically
by IATA). These data must include item names and IRT parameters. The data should
also include information on item Level and Content. The required data and the format
in which they should be saved are described in detail in section 8.3.2 (page
16).
3. Follow the item selection process using
the item selection interface described in previous chapters (see Figure 9.18).
4. Save the results. Note that on the final
interface of the workflow, IATA will have both the original item data that you loaded
(labelled “Items1”) as well as distinct item data tables for all item selections
you have made. The item selections are assigned the prefix of “CustomTest” followed
by the unique name you had specified for the item selection in IATA.
13.4. DEVELOPING AND ASSIGNING PERFORMANCE STANDARDS
Setting
performance standards is an important
step in making the results of the national assessment accessible to wider audiences of stakeholders. Chapter
10 discussed the task of setting performance standards as a relatively straightforward exercise. However, in practice, setting performance standards typically requires iterative work involving review of both item contents and statistical results. The process
may take input from multiple people over a long period of time as the feedback from different stakeholders is incorporated into the process. Rather than perform response data analysis
at each stage, simply referring
to the previously- estimated results will save time and reduce the chances of introducing errors through incorrectly-specified analyses. Using the “Developing and assigning performance
standards” workflow in IATA allows you to use results from previous analyses to facilitate the standard setting process. Note that using this workflow
requires that you have previously completed an analysis
of item response data and have saved the results.
To properly inform the development of performance standards, both IRT scores and item parameters should be loaded into IATA. However, the scores are optional,
as only the item parameter
are used as the basis of calculations. To complete this workflow, perform
the following steps:
1. Select the “Developing and assigning performance standards” workflow from the main menu.
2. Load IRT scored student data, if they
are available. These scores should have been produced using the items that will
be used to estimate that performance standards. The IRT scores must be the original
scores produced by IATA (or other software) that are on the scale of the item parameters,
without any rescaling or standardization. Although the scores are optional, having
the distribution of proficiency to compare against the provisional proficiency levels
is desirable in the standard setting process. For example, if you are using results
produced automatically by IATA, the IRT scores are contained in the “Scored” results
table.
3. Load the item data file containing the
IRT parameters and pre-assigned Level for each item. Although the item level assignments
may (and likely will) be modified, each item that will be used should have a level
assigned to it. For example, if you are using results produced automatically by
IATA, the item data are contained in the “Items1” results table.
4. Perform the standard setting procedures
described in Chapter 12. Because this procedure may be iterative, it may be necessary
to repeat steps 1-4 several times, using various Bookmark data reviews (described
on page 88).
5. At this stage, if the cut-points for the
performance standards have been finalized, you should make sure that you have loaded
the scored student data file so that you can apply
the thresholds to the scored data.
6. With the panel of stakeholders responsible
for establishing performance standards, enter final cut-points into the Threshold
column on the bottom right of the screen (the table on the performance standards
interface).
7. If the scored data have been loaded,
apply the thresholds to the scored data (if scored data are loaded) through the
performance standards interface by clicking the “Add Levels” button..
8. Save the results. In general, you should
save all tables that contain data that have been modified. These include the PLevels
table, which has been updated with new thresholds, the Items1 table, which may have
new level assignments for items, and the Score table, which will have level assignments
for students.
13.5. RESPONSE DATA ANALYSIS WITH ANCHORED ITEM PARAMETERS
In previous chapters,
the working paradigm for analysing
response data assumed that all IRT item parameters are unknown and must be estimated using the response data. Even
if item parameters are available
from a previous cycle and are used to calculate
linking constants, IATA first estimated all item parameters using the response data. The linking process involved first estimating new parameters, then linking the new parameters to the old parameters. However,
IATA also provides the facility to import fixed item parameters that will not be adjusted
by IATA during the analysis of response data. These are called anchored item
parameters.
Anchored item parameters are a, b and (optionally) c parameters that have been assigned values in an item data file for some test items prior to the analysis of a particular response
data file, much like the anchor items used in formal linking. When the response data are analyzed,
the parameters of the non-anchored items are calculated, but the anchored
item parameters will remain fixed at their pre-specified values. Because of the iterative
nature of item parameter estimation, any newly estimated results, such as the IRT parameters for non-anchored items and IRT scores for
students will be expressed on the scale that is defined by the anchored item
parameters.
This technique can be used with either the “Response data analysis” or “Response data analysis with linking”
workflows. The only difference between the use of anchored
items and the walkthroughs shown in previous chapters is that some items will
already have item parameters
assigned in the input item data file.
Consider a scenario where the national
assessment committee has decided to use a test from a previous
national assessment cycle with only minor modifications. In this case, it is not necessary
to perform the complete linking procedure
that was described in Chapter 12. Instead, the existing item parameters can be used from the analysis of the previous national assessment data for the majority of the items that were retained. For the few items that are newly introduced, IATA will automatically calibrate their IRT parameters such that they are expressed
on the same scale as the anchored
item parameters. The final IRT scores, which will be based on both the anchored item parameters as well as the newly calibrated items, will also be expressed
on the same scale as the anchored
item parameters.
To demonstrate this
procedure, you will use the CYCLE3 sample
data set to carry out this exercise.
The item data for this test is in the EXCEL workbook,
ItemDataAllTests in the
sheet named CYCLE3. These data
represent the third cycle of the
national assessment program that you have been analysing throughout the previous chapters. In these data, a
decision was made. The national assessment team
decided to use the items from the CYCLE2 test after making minor
modifications in content, and
replacing only eight of the multiple choice items and all of the short answer items. Rather than re-estimate
new parameters and linking constants, the national
assessment steering decided to use the item parameters from CYCLE2 as to anchor the parameter estimates for the new items.
To perform analysis with anchored items, complete the
following steps:
1. Select the “Response data analysis” workflow
from the main menu.
2. Load the CYCLE3.xls response data (containing 2539
records and 61 variables) from the IATA sample data folder and click the
“Next>>” button.
3. Load the
ItemDataAllTests.xls file and select the CYCLE3 table as the item data. The table
contains 53 records and 7 variables. Note that, unlike the item data files used
in previous analyses, values for the a and b parameters are present for some, but
not all items, as shown in Figure 13.1. The item parameters that already have assigned
values are the anchored item parameters. Their values were produced during the
analysis of CYCLE2 data and are linked to the original scale
that was established for the CYCLE1 data. Several items with answer keys specified
but no item parameters will be assigned new item parameters that are estimated from
the response data. Because the anchored parameters were already linked to the CYCLE1
scale in the previous analysis, the newly estimated parameters in the current
analysis of CYCLE3 data will also be linked to the CYCLE1 scale.
Figure 13.1 Item data for CYCLE3 with anchored
item parameters
5.Set the
identification variable as “CYCLE3STDID”, weight variable as “CYCLE3weight”,
and the value of “9” to be treated as incorrect. Click the “Next>>”
button to begin the analysis.
6. The results
produced are shown in Figure 13.2. Note that all items now have parameters, but
the anchor items maintain their original values (see Figure 13.1). Unlike the
test-level linking, it is now possible to see how the anchored item parameters fit to
the current response data by comparing the theoretical IRF to the empirical IRF
in each item’s results. For example, item MATHC2047 used anchored item
parameters, so the IRF labelled “Theoretical” in Figure 13.2 is not based on
the CYCLE3 data, whereas the IRF labelled “Empirical” is. If the fit is poor
between theoretical and empirical IRFs for a set of
response data and the sample of new response
data is large, then an item should not use anchored parameters. However, if the
sample is small (e.g., less than 500), then lack of fit between the Theoretical
and Empirical IRFs may simply be due to random error and can be ignored.
8. Because the specification of the remaining
tasks in this workflow is identical to what you have performed in previous walkthroughs, reviewing or performing
the remaining analyses
is left as an independent exercise. Note that, because the results are automatically linked to the CYCLE1 scale, the mean and standard deviation
of the IRT score may deviate significantly from 0 and 1 (in this case, the mean = 0.02, s.d. = 1.04). There is one important
consideration for scaling results that use anchored item parameters: because the IRT scores are anchored to the linked parameters from CYCLE2, you should use the “Rescale”
option (IATA Page 7/10) to produce scale scores, specifying the mean and standard deviation
and equal to the values used when establishing the NAMscore scale in CYCLE 1 (500 and 100, respectively). Similarly, the performance standard
thresholds from CYCLE1 (Level 4: 0.95, Level
3: 0.35, Level 2: -0.25, Level 1: -0.85) may be
applied directly, because the IRT score is expressed on the scale that was established with the CYCLE1 data. Press Enter when you are finished entering
the last threshold value to ensure IATA updates the values properly.
For reference, the item data results of this analysis
walkthrough are included in the ItemDataAllTests.xls file, in the worksheet named “ReferenceC3.”
The use of anchored parameters is particularly useful in situations
where either the sample
size of the a new assessment administration is small, the tests have substantial overlap, or response data are available from both tests. In the last scenario,
the response data should include all respondents from both cycles; the item data would include answer keys for all items, but only item parameters for the items that were used in the previous cycle would be assigned values.
13.6. 13.6 SUMMARY
In this chapter,
you were introduced to several specialised uses of the IATA interfaces to which you had been introduced in previous chapters.
Each of the examples in this chapter made use of the results produced using previous analyses.
Although data analysis and reporting of national assessments results will use the scaled score created using an arbitrary set of scaling
constants (mean and standard deviation), any analyses in
IATA that involve previously-estimated IRT scores and parameters must use the raw IRT scores.
The analyses described
in this chapter used results that had produced from previous analyses. Although these input data should be preserved in the output data directories
from the original analyses, if there were some modifications to the data during the analysis, such as removing
items from scoring, adjusting
level or content assignments, it is a good idea to save all data tables from for the purpose of documentation. You should write a short description of the changes made to any reference
data file during the current analysis in a ReadMe text file (see Greaney and Kellaghan, 2012, Part III).
Không có nhận xét nào:
Đăng nhận xét