8. Chapter 8 Introduction to IATA
8.1. OVERVIEW
The item and test analysis
software (IATA)
accompanying Part II of this book is intended to help national
assessment practitioners, researchers, and others analyze test item data as well as build effective assessment
tools. IATA was designed to offer a user-friendly
way to address many statistical considerations related to national assessments. It targets specifically those who are interested in analyzing test data, creating a new test from an item bank, or comparing
or scaling test items between different samples. It is likely to be useful for individuals involved in educational assessment who have experience doing item analysis
as well as others with some statistical competence but who are less familiar with the specific
statistical processes that are a feature
of national assessments.
The instructions in this book assume that you are familiar with basic computing
functions on a Windows PC, such as starting programs,
browsing directories, and opening files. The following
chapters also assume that you have installed
IATA correctly and can access the main menu in IATA. If you have not installed IATA yet
or cannot start the program,
please refer to the installation guide for IATA on
the accompanying CD. Readers of this section
of the book should also have some understanding of statistical concepts such as probability and properties of statistical
distributions.
The overarching goal of IATA is
to increase the usability and interpretability of test scores. The primary means of accomplishing this goal is to reduce the error of measurement. Error of measurement is the underlying
concept that unifies all test creation and test analysis.
A test is intended to measure a specific domain,
such as mathematics skill or reading proficiency. However, no test is perfectly
accurate. All test scores have some uncertainty; if a student were to take equivalent
versions of a test with different items, it is unlikely that his or her score would be the same across each
test. Error of measurement describes
the degree to which a student’s score on a specific test differs from his or her ‘true score,’ the score that he or she would have achieved in the absence of uncertainty. An important goal of test development from a statistical perspective is to reduce the error of measurement. To reduce error of measurement, IATA identifies problematic items that contribute to error so that they may be revised, replaced,
or removed altogether.
The second means of accomplishing this goal is to establish meaningful and consistent scales on which to report test scores.
Throughout this section of the book, the terms statistic
and parameter are used to describe characteristics of test items. A statistic
is the result of a calculation using a particular sample of students
and items. Because the value of a statistic depends on the sample, it cannot generalise to
different samples or populations that are not equivalent to the sample from which it was estimated. Consequently, test scores that are calculated
as statistics may not be directly comparable
between different tests or groups of students.
In contrast, a parameter relates
the statistical properties
of a student or test item as functions of sample characteristics. Accordingly, parameters may be used to characterize students and items in generalizable ways that are not dependent
on particular samples. When IATA estimates parameters for students
or test items, these parameters
may be used or compared
across different tests, which allows for greater efficiency and information than if each test in a national assessment program were simply interpreted by itself.
8.2. CHAPTER PREVIEW
The chapters in Part II begin with the point in the development of a national
assessment where test items have been created and bundled into a test booklet and response data have been collected. Chapter
8 provides a review of information on data processing and formatting presented
in previous books in this series, as well as an introduction to the IATA interface, including a description of the main menu and the different interactive elements of screen in IATA and the results it produces. Chapters 9 through 13 provide detailed
walkthroughs for the three main analysis workflows in IATA, which will familiarize you each of the functions
used in IATA. These workflows are designed
to mimic the phases of development and implementation of a national assessment program,
from pilot testing to full-scale testing and follow-up
testing in subsequent
assessment cycles. Taken together, these walkthroughs will familiarize the user with how to perform practically all psychometric analyses
required for a national assessment system.
Chapter
14 provides a summary of the different
workflows and presents
examples of how each workflow
might be used in different
real-life scenarios. These chapters introduce many concepts that may be new or only partially
familiar to those with experience in educational assessment. Although some background
statistical information is presented in each section
to help interpret the results generated by the software, detailed explanations of the theory or mathematical principles behind these concepts are presented in Chapter 15.
8.3. Assessment Data
There
are two main types of data produced
by and used in the analysis
of assessments: response data and item data. Response data are produced
by the individual learners as they answer questions
on a test. A test is a specific collection
of questions that evaluate a common domain of proficiency or knowledge. Individual questions on a test are refered to as items. Test items can be multiple-choice,
short-answer, open- ended
questions, or performance tasks. Item data are produced
by analysing or reviewing items and recording
their statistical or cognitive properties. Each row in a response
data file describes
the characteristics of a learner or test-taker, whereas each row in an item data file describes the characteristics of a test item.
IATA can read and write a variety
of common data table formats (for example, Access, Excel, SPSS, delimited
text files) if they are formatted correctly.
If the data are not formatted with the correct structure, IATA will not be able to carry out the analyses. Database-compatible format such as Access or SPSS already take care of most data formatting issue. However, if the data are stored in a less restrictive format, such as Excel or text file, the following
conventions should be followed:
· The names of variables
should appear in the cell at the top of each column (known as the header). Each column with data must have a column header. The name of each variable must be distinct from the names of other variables in a data file. The names of variables must begin with a letter and should not contain
any spaces.
· The data range must not contain any empty rows or columns.
The data range is the rectangle
of cells that contain data, beginning with the variable
name of the first variable
to appear in the data file and ending with the value of
the last
variable in the bottom-most row.
·
The data range must begin at the first cell in the spreadsheet or file. In Excel, this
cell is labelled “A1.” In text files, this is the top-left cursor position
in the text file.
The two examples
in Figure 9.1 illustrate
incorrect and correct
data formats. In the incorrect data format, on the left, there is a blank row above the data rectangle and a blank column to the left of it. There are also blank rows and columns within the rectangle and a column containing data without a header. In the correct format,
on the right, all of the data are gathered into a single rectangle
at the top-left of the spreadsheet with no blank rows or columns.
Figure 8.1 Incorrect and correct data formatting
examples
8.3.1. Response data
Response data include the response of each student to each test item. The test results imported in the response data file must allow for automated
scoring; this means that the item response
data should include the codes representing how students actually
response to items. For example, if the response data are from a multiple choice test, the data should record codes representing the options endorsed
by each student (e.g., A,
B, C, D, etc.). IATA will transform the coded responses into scores using the answer
key you either enter manually
or provide as an answer key file.
Other
information may be stored in a response
data file that may be useful for analyzing test results. Examples of this include demographic information on variables
such as age, grade, gender, school, and region. Other useful information may be collected from questionnaires (such as student and teacher questionnaires) or administrative records.
If a stratified sample of students was used, the sample weighting for each student should also be included
on this file.
A unique identifier for each student should be provided
each student, although
IATA will automatically produce
unique identifiers based on the record order if a unique identifier is not specified. However, if the results will be linked to other data sources, such as follow-up surveys or administrative records, it is a good idea to use a previously-defined
identifier such as student name or number to facilitate
future linkages between data sets.
All responses should be assigned
codes. For multiple-choice
items, this procedure
is straightforward, because each response option is already coded as correct or incorrect. For open-ended items, a scoring rubric is required to help score item responses using a common coding framework. Open-ended items may be scored as either incorrect
or as with partial credit given to different responses.
A partial credit test item has more than one score that is greater than 0. Answers to open-ended questions must have been
previously coded during the preparation of the response data. Volumes
2 and 3 of this series describe
coding procedures for test items (Anderson
and Morgan 2008; Greaney and Kellaghan 2012). In order to score the response data, for most analyses,
an answer key must be loaded into IATA. An answer key is a list of response codes that indicates the correct answer(s)
for each test item. The answer key may be imported as a data file or entered manually.
If the analysis is using anchored item parameters, these anchored item parameters must be included
in the answer key file; they may not be entered manually (see Item Data,
page 16).
8.3.1.1. Treatment of Missing and Omitted Data
Missing
data occur when a student does not provide a response to a test item. When this happens,
rather than leaving the data field blank, a missing value code is used to record why the response is missing. There are two types of missing
responses: missing and omitted.
Missing codes are assigned to variables when students could have responded
to an item but chose not do so, leaving the answer blank. Such missing data will be scored as
incorrect. In contrast,
omitted data codes are used when students were not administered an item, as when a national assessment uses a rotated
booklet design.
Codes that apply to student responses that were unreadable
or where the student has answered inappropriately, such as selecting
two multiple-choice options,
are a form of missing
response for the purpose of item distractor analysis. Depending
on the circumstances of your test administration or data processing, you must decide whether these codes will be processed as missing or omitted data. Generally, if these data errors are the result of student error, the codes should be treated
as missing and will be scored as incorrect. However, if the errors are the result of limitations in the data processing, such as imprecision in score card scanning that were not human-verified,
then the codes should be treated as omitted.
Another
use of omitted data codes occurs when a balanced
rotated booklet design is used. Balanced rotating
booklet designs involve giving different
randomly equivalent samples of items to different students,
so that not all students
answer the same test items
(see Anderson and Morgan, 2008). These designs permit extensive subject matter coverage
while limiting the amount of student test taking time. In a rotating booklet design, omitted codes must be assigned to all items for a student except those contained in the test booklet presented
to the student. Omitted codes are not normally assigned to items in situations where all students
are required to answer all of the
items.
Common
conventions use specific
values for the different types of non-response
data. See Greaney and Kellaghan (2012) for information on response codes. Common values used are:
- 9 for missing responses, where students have not responded
at all to an item,
- 8
for unscorable responses, which typically occurs in multiple choice tests when
students provide multiple response and in open-ended items when student
responses are illegible, and
- 7
for omitted or not-presented items, which might be used in a rotated booklet
design.
- Regardless
of the specific codes used, you must specify how IATA is to treat each
non-response code, as either missing or omitted.
8.3.1.2. Item naming
It is important to assign a unique name to each item in a national
assessment program (see Anderson and Morgan 2008; Greaney
and Kellaghan 2012). All statistical
analyses performed on a test item should be linked clearly to the name or label of an item. If an item is repeated in several cycles of a national
assessment, it should retain the
same name in the data files for each cycle. For example, a mathematics item first used in 2009 might have the name M003, to indicate that it was the third item to appear in the 2009 test. If this same item is used in a 2010 test, it should still receive the
name M003, regardless of where it appears on a test. Naming items by position in a test may cause confusion when items are reused. For this reason, it is more useful to assign permanent names to test items when they are first developed,
rather than when they are first used in an assessment.
Using consistent names also facilitates linking the results of different
tests. When IATA estimates statistical linkages between tests, it matches items in the linking procedure using item names. If an item name refers to different items in the two tests being linked,
the results of the linkage will not be accurate.
Although it is possible to rename items to facilitate
the linking process,
it is simpler and less likely to introduce errors if unique item names are maintained from the start.
8.3.1.3. Variables reserved by IATA
During
the analysis of response data, IATA will calculate
a variety of different working variables. The names of these working or output variables
are restricted and should not be used as names of test items or questionnaire variables. These variables,
which IATA adds to the scored test data file, are listed in Table 8.1.
Score Name
|
Description
|
XWeight
|
The design
weight of the case that is used during analysis (if not specified, the value
is equal to 1 for all students)
|
Missing
|
This variable describes the number of items that are omitted for a
student;
|
PercentScore
|
The percent score is the number of items a student answered correctly
expressed as a percentage of the total number of items administered to the
student (excluding omitted response data).
|
PercentError
|
The error of measurement for the percent score (this estimate is
specific to each student; its value depends on the percent score and number
of items to which a student responded);
|
Percentile
|
The percentile rank is a number between 0 and 100 that describes, for
each student, the percentage of other students with lower percent scores.
|
RawZScore
|
The RawZScore is the percent score, transformed to have a mean of 0 and
a standard deviation of 1 within the sample.
|
ZScore
|
This score is the normal-distribution equivalent of the percentile
score. It is also referred to as the ‘bell-curve score.’ Whereas the
distribution of the RawZScore depends on the distribution of the percent
correct score, the ZScore distribution tends to be more perfectly
bell-shaped.
|
IRTscore
|
The IRTscore is the proficiency estimate of the student; this score is
similar to the typically has a mean and standard deviation around 0 and 1, respectively.
The IRTscore facilitates generalization beyond a specific sample of items
because its estimation considers the statistical properties of different test
items;[1].
|
IRTerror
|
The error of measurement for the IRTscore.
|
IRTskew
|
The skewness of the proficiency estimate, which indicates if the test
is better at measuring the lower or upper bound of a student’s proficiency
(for example, an easy test may accurately describe if students have reached a
minimum level of proficiency but may be ambiguous about exactly how high the
level of proficiency actually is)
|
IRTkurt
|
The kurtosis of the proficiency estimate, which describes how precise
the estimate is for a given level of error (for example, for two scores with
the same measurement error, one with the greater kurtosis is more precise).
|
TrueScore
|
This score is an estimate of a percent score that is calculated from
the IRT score. It is preferable to the raw percent score because it corrects
for differences in measurement error between items. This score is calculated
as the average of the probability of correct response to each item, given the
IRT score of the student and the parameters of the test items.
|
Level
|
This variable is an estimate of the proficiency level for a student
that has been assigned based on standard setting procedures (if no standard
setting procedures have been performed, the default is for all students to be
assigned a value of 1).
|
In addition to these specific
names, you should also avoid using names that contain the
“@” symbol. This symbol is reserved for processing
partial credit items, which
are test items that have more than one possible
score value greater
than 0.
8.3.2. Item Data
IATA produces and uses item data files with a specific format. An item data file contains all the information required to perform statistical analysis of items and may contain
the parameters used to describe
the statistical properties
of items. An item bank produced or used by IATA should contain the variables
listed in Table 8.2.
Name
|
(MANDATORY) the unique name of each
test item;
|
Key
|
(MANDATORY) the information used to
assign a numeric score to each item response, which is either the single code
corresponding to the correct response, or a delimited array of values that
defines a variety of acceptable responses and their corresponding numerical
scores;
|
a
|
(OPTIONAL) the first of three
parameters that describe how performance on a test item relates to
proficiency on the performance domain, referred to as the slope or
discrimination parameter;
|
b
|
(OPTIONAL) the second item parameter,
referred to as the location or difficulty parameter;
|
c
|
(OPTIONAL) the third parameter,
referred to as the pseudo guessing parameter[2];
|
Level
|
(OPTIONAL) a previously assigned
proficiency level for an item based
on the initial item specification and
expert review (values should be natural numbers, beginning with 1); and
|
Content
|
(OPTIONAL) a code or description used
to describe the subdomain of the curriculum, also known as a strand or
thread, to which each item is most strongly aligned.
|
Table
8.3 presents examples
from an item bank data file containing information about
five science items named C1Sci31, C1Sci32,
C1Sci33, C1Sci34 and
C1Sci35. Note that
the item named “C1Sci35” does not have any data in the columns labelled
a, b, c and Content. As indicated in Table 8.3, the only data fields that are mandatory
are the Name and Key. If a, b, or c parameters are missing, they will be estimated during the analysis. There are many situations that may require
you to enter an item data file into IATA that is missing these parameters. The most common scenario occurs when the response data for the items have never before been analysed;
in this case, the item bank data file is simply being used as an answer key. Another scenario occurs when some items have parameters that have been estimated in a previous
data analysis, and you wish to
fix the values of these items instead of
having IATA re-estimated them; in this scenario, you would leave empty a, b, and c
values only for items that you wish to estimate
new parameters for (see Chapter 15, page 119). Values for Level and Content may be either manually entered in the IATA interface or left empty.
Table 8.3 Sample section of an item data file
a
|
b
|
c
|
Key
|
Level
|
Content
|
|
C1Sci31
|
0.34
|
0.83
|
0.01
|
3
|
3
|
Scientific Reasoning
|
C1Sci32
|
0.46
|
0.4
|
0.12
|
4
|
2
|
Physics
|
C1Sci33
|
0.32
|
0.31
|
0.06
|
3
|
2
|
Physics
|
C1Sci34
|
0.18
|
0.75
|
0.16
|
1
|
3
|
Biology
|
C1Sci35
|
|
|
|
5
|
|
Environment
|
An item data file may also include additional variables. For example, additional
information typically stored with item data include the question stem in the item bank, statistics describing the number of times the item has been used, or a list of the tests in which each item appears.
However, any additional variables other than the seven required data fields listed in Table 8.3 will not be used by IATA.
The national assessment team can use information from any source as long as they have the required item data in the format presented in Table 8.2. For example, national assessments may obtain permission
to use items from various large–scale assessments such as those administered by the International Association for the Evaluation of Educational Achievement
(IEA) which includes TIMSS and PIRLS (http://timss.bc.edu/). If items from existing large scale
assessments are included on a national
assessment, the item parameters from the existing assessments may be used to
create an item data file that IATA can import.
8.3.2.1. Answer key formats
In the column with the header ‘Key’ in an item bank data file, you must provide IATA with information it can use to score each item. In the simplest case, for multiple choice test items with a single correct option,
the value in each column should be the alphanumeric character corresponding to the correct option.
The value is case- sensitive, which means that, for example,
if the correct response is coded as an upper- case “A”, then the upper-case letter “A” must be provided
in the answer key; if a key value of “a” is provided, then any item responses with a value of “A” will be scored incorrect.
In rare cases, during a process of test review, it may be determined that there is more than
one correct option to a test item. To assign more than one key value to an answer key, you must enter a list of correct
values, separated by commas. Do not enter spaces between any values or after the commas. For example, if the responses
of “A” and “C” are acceptable as correct responses
for a test item, then the value of the key for this item should be “A,C”[3]
8.3.2.2. Item data formats for partial credit items
Partial credit (or graded-response) items are test items that have more than one score value. For example, instead
of being scored as 0 or 1, an item with different
levels of correctness may be scored as
0, 1,
or 2, where 0 represents an attempted response,
1 represents a partially-correct
response, and 2 represents a perfect response.
To accommodate the different score values, the answer key must be entered for each score
value that is greater than 0. If the marking scheme used for partial credit items uses scores that are all greater
than 0, then answer key information should not be entered for the lowest score value. For example,
if the possible item scores are 1, 2 and 3, then the answer key should only provide scoring information for scores 2 and 3. The format for a partial credit answer key is: <score1>:<value list 1>;<score 2>:<value list 2>; … <score n>:<value list n>.
For example, for a partial credit item with three scores, coded as A, B, and C, with scores of 1, 2, and 3, respectively, then the answer key for this item should be entered as “1:A;2:B;3:C”.
If a partial credit item has already been analyzed,
it will have a greater number of parameters than a regular
test item. Each score value will have a distinct
value for the b-parameter, although
the a-parameter will have the same for all score values. These item data must be entered in a special
format. In addition to providing the main entry with
the full answer key, a new entry must be added for each score value (except for the lowest score value) as if each item score were a separate test item. The parameter
fields for the main item entry should be left blank. For example, if an item has scores of 0, 1 and 2, then a total of three rows would be required
in the item data file: one row for
the overall item, which would only have the item name and answer key, and two score-specific entries
for 1 and 2 that have name, answer key, and parameter
information.
The value of the name field for each new score-specific entry is the original item name followed by “@<score value>”. For example, if the original
item name is “TestItem”
then the name for an item score of 1 is “TestItem@1”. IATA uses an item response model that requires
the values of the different
b-parameters be in the same order as the scores. Therefore, if there are two score entries, 1 and 2, then the b- parameter
value for score 2 must be greater
than the b-parameter for score 1, as shown in Table 8.4.
When a new row is entered for each item score, the values of the answer key field must also be specified
differently. The analysis of a partial credit item assumes that a student achieving a particular item score has also mastered
whatever level of skill that is associated
with a lower score on that item. In other words, if each score is treated as a separate test item, then a student with a high partial credit score has effectively also performed correctly
on the lower credit scores.
To manage this relationship in IATA, the answer key for each score value should list its own key value(s)
as well as the values
of any higher scores.
An example of proper partial credit item data formatting for an item with scores of 0, 1, 2 and 3 is given in Table 8.4. Note that no scoring information is provided for the lowest
score (0). The main item entry has no parameter
values or a value for Level. Because each score value could correspond to a different standard of performance, it does not make sense to have the level specified for the entire item. Even though the responses are already scored, scoring information still must be specified with using the proper answer key format. In order for IATA to
score the responses properly, the answer
key must provide both the values found in the data and the score assigned to each value.
Table 8.4 Sample section of an item data file for a partial credit
item
a
|
b
|
c
|
Key
|
Level
|
Content
|
|
PCItem001
|
|
|
|
1:1;2:2;3:3
|
|
Thành phần của câu
Parts of speech
|
PCItem001@1
|
0.61
|
-0.43
|
0
|
1,2,3
|
1
|
Parts of speech
|
PCItem001@2
|
0.61
|
0.22
|
0
|
2,3
|
1
|
Parts of speech
|
PCItem001@3
|
0.61
|
0.74
|
0
|
3
|
2
|
Parts of speech
|
8.4. Data produced by IATA
IATA produces
several data tables that contain the specifications of the current analysis and the analysis results.
In general, all results should be archived for future reference. Table 8.5 summarizes the list of data tables
produced by IATA. These data tables may be saved individually or
collectively directly from IATA into
a variety of common formats, including
Excel (*.xls/*.xlsx), SPSS (*.sav), comma-delimited
(*.csv) or tab-delimited (*.txt).
Data
Tables
|
Description
|
Responses
|
Original response data (including non-test
data) imported into IATA
|
Values
|
Unique response codes for all test
items, and indication as to whether each response value is coded as a valid
missing (valid skip) or missing
|
Scored
|
Response data that have been scored as
correct (1) or incorrect (0) using the specified answer key, as well as all
summary scores and their standard errors
|
Items1[4]
|
Item answer keys, and statistics
related to the current analysis and item parameters
|
Items2
|
Item answer keys and parameters of the
reference item parameter file used for linking
|
MergedItems
|
Item-by-item matching of items in both
the new and reference item parameter files used by the linking process
|
Eigenvalues
|
The proportion of variance explained by
each of the dimensions in the item responses
|
PatternMatrix
|
The proportion of variance explained of
each item by each of the dimensions underlying the item responses
|
Levels
|
The thresholds used to define
proficiency levels
|
LinkingConstants
|
Scale transformation constants used to
adjust the latent trait scale between populations or samples
|
BookmarkData
|
An ordered list of items that can be
used to facilitate standard setting or creating definitions for performance
levels
|
DIF_<specifications>
|
The results of a differential item
functioning analysis, where
|
|
the <specifications> portion of
the table name summarizes the variable and groups compared in the analysis.
|
CustomTest<name>
|
A set of items chosen to optimize
minimize error of measurement over a specific range of proficiency. The <name>
is a user-specified value.
|
8.5. Other output produced by IATA
In addition to the data tables described
in Table 8.5, IATA also produces
several charts, textual
summaries, and tables of results that are only displayed in the IATA interface. These results may be copied directly from IATA and pasted into other documents for future reference. The method of copying the output depends on the type of output.
For charts, right-clicking on the chart body will raise a popup menu with the options
to either: 1) copy the image to the clipboard, 2) save the chart image directly to a file, or 3) print the image. For results that are displayed
in tabular form, you must copy the data by first selecting
the cells, rows, or columns that you wish to copy, then copying the data either by selecting “Copy” from the right-click popup menu or by typing Ctrl+c. The copied data may be pasted to a text file or directly
into spreadsheet software such as Excel or SPSS.
8.6. Interpreting IATA results
Whenever IATA produces itemized
analysis results for different items, it will also present ‘traffic symbol’ summary
indicators that provide a general idea of how to interpret the results. There are three different symbols that IATA uses, explained in Table 8.6.
Symbol
|
Meaning
|
|
Green circles indicate no major problems.
|
|
A yellow diamond indicates that the results are less than optimal.
This indicator is used to suggest that modifications may be required to either
the analysis specifications or the items themselves. However, the item is not
introducing any significant error into the analysis results.
|
|
A red warning triangle appears beside any potentially problematic items.
This indicator is used either to indicate items that could not be included in
the analysis due to problems with the data or specifications, or to recommend
a more detailed examination of the specifications or underlying data and test
item. When this indicator appears, it does not necessarily mean that there is
a problem, but it does suggest that the overall analysis results may be more
accurate if the indicated test item were removed or if the analysis were
re-specified.
|
For analyses where there are multiple pieces of information to consider when interpreting results for a specific item, such as the Item Analysis and Test Dimensionality results,
IATA will also generate interpretive statements that attempt to summarize the different statistics. These statements are intended as a useful suggestion for how to proceed. However, in any case where IATA recommends a modification to either the analysis specifications or test items, you should verify that the recommendation is appropriate by examining the statistical results and/or actual
test booklets yourself.
8.7. SAMPLE DATA
When IATA is installed
on your computer, it will create a folder on your desktop
called IATA. This folder contains
sample data that are required for the walkthrough examples in this book. There are six different
files in the sample data folder. These include four response data sets, each in Excel format, and an Excel file containing the answer keys for each of the response data sets. The files are in *.xls format in order to be compatible with older and open-source software
(depending on your computer settings, you may not be able to see the “.xls” file extension). The names and contents of
the files are:
· PILOT1(.xls)
– a response data set corresponding to a pilot test containing
multiple choice items.
· CYCLE1(.xls) – a response data set corresponding to a national
assessment
administration.
· PILOT2(.xls) – a
response data set corresponding to a pilot test containing multiple choice and partial credit items in a balanced
rotated design.
· CYCLE2(.xls) – a response data set corresponding to a national
assessment administration with items common with a previous
administration.
· CYCLE3(.xls) – a response
data set corresponding to a national assessment
administration with items common with a previous
administration.
· ItemDataAllTests(.xls) – an Excel file with multiple sheets containing answer keys
and information about the items on each of the different response data files
These
sample data are fictional data sets which were developed
with the sole intent of providing concrete examples and applications of this software.
Although they reflect typical patterns of student response and the relationships in these data are similar to those found in most large-scale assessments, the results and discussions of the analysis findings do not represent any actual national
assessment.
If you
delete any of the sample data files,
you may recover them by reinstalling IATA. The data can also be found on the accompanying CD to this book
or downloaded from the IATA website (http://www.polymetrika.org/IATA).
8.8. IATA analysis workflows and interfaces
IATA differs from many statistical analysis programs, which tend to provide a variety of
analysis functions that may be accessed individually. In contrast,
all of the analysis functions in IATA are accessed through workflows, where the results from each step in the workflow may be used to inform the specifications or interpretation of results in subsequent steps.
There are five workflows available
in IATA:
1. Response data analysis,
2. Response data analysis with linking,
3. Linking item data,
4. Selecting optimal test items, and
5. Developing and assigning performance
standards.
The different workflows
reflect the needs of different
goals that may arise in the context
of a national assessment. Listed below are some common situations that might require the different workflows:
· If you have conducted a pilot test and need detailed information about item behavior to determine the contents of the final test, you should use the “Response data analysis” workflow;
· If you have finished collecting data for the first national
assessment in a planned series of assessments, you should use the “Response data analysis” workflow;
· If you are
assigning new scale scores to a sample of students who have been administered the same test that was
used in a previous national assessment, you
should use the “Response data analysis”
workflow;
· If you have conducted
a national assessment that shares items with a previous assessment and are interested
in comparing the results of the two, you should use the “Response data analysis with linking” or “Linking item data”
workflows;
· If you wish to modify your test and need to know the best items to retain in the new test in order to maintain comparability with the previous test, you should use
the “Selecting optimal test items” workflow;
· If you have already conducted the national assessment and want
to interpret the results in a way that
are consistent with curriculum expectations, rather
than simply comparing students to each other, you should use the “Developing and assigning performance
standards” workflow.
To perform an analysis with IATA, you must select one of these workflows from the main menu. The main menu is reached by clicking
the “Main Menu” button on the bottom
right of the language selection
and registration screen that loads with IATA, shown in Figure 8.2.
Figure 8.2 Initial language selection and optional
registration for IATA
The default language for IATA is English. Registration is optional and is not required in
order to access any of the functionality that is discussed in this book. The IATA main menu is shown in Figure 8.3.
Figure 8.3 The IATA main menu
Each workflow is composed
of a set of tasks which are completed
in order. Most of the workflows
share many of the same tasks. There are 10 different tasks that IATA performs, and each task has its own interface. These tasks generally appear in the following order:
1. Loading data
2. Setting analysis specifications
3. Analyzing test items
4. Scaling test results
5. Analysing test dimensionality
6. Analyzing differential item
functioning
7. Linking
8. Selecting optimal test items
9. Informing development of performance
standards
10. Saving results.
Not all tasks appear in all workflows. The workflows are designed so that you are only required
to perform tasks that are relevant to your analysis
goals. Table 8.7 summarizes which tasks appear in which workflows.
Table 8.7 Different tasks in IATA and the workflows in which they are used
|
Workflow:
Response data analysis
Response data analysis with
linking
Linking item data
Selecting optimal test
items
Developing and assigning
performance standards.
|
||||
|
A
|
B
|
C
|
D
|
E
|
1. Loading data
|
●
|
●
|
●
|
●
|
●
|
2. Setting analysis specifications
|
●
|
●
|
|
|
|
3. Analyzing test items
|
●
|
●
|
|
|
|
4. Analysing test dimensionality
|
●
|
●
|
|
|
|
5. Analyzing differential item functioning
|
●
|
●
|
|
|
|
6. Linking
|
|
●
|
●
|
|
|
7. Scaling test results
|
●
|
●
|
|
|
|
8. Selecting optimal test items
|
●
|
●
|
|
●
|
|
9. Informing development of performance
standards
|
●
|
●
|
|
|
●
|
10. Saving results
|
●
|
●
|
●
|
●
|
●
|
The first two work
flow s (A and B) are very similar in terms of their tasks, because all three require the analysis
of response data, which requires
some analyses to determine that the use of the statistical measurement models is appropriate for the response data. In contrast, the last three workflows (C, D and E) only analyse item data. All workflows require
that data be loaded into IATA and allow you to save results.
8.9. Navigating through IATA workflows
When you select a workflow
from the IATA main menu, you will be directed
into the set of tasks for that workflow. Each task has its own interface that allows you to specify
how IATA should perform
the task and, if applicable, view the results produced after IATA has performed
the task.
At the top of each task interface, there are some elements that are common for all tasks. These elements are the instructions box and the navigation
buttons, shown in Figure
8.4. The instructions box, on the left, provides a brief summary of what specifications are required for each task and how to interpret any results. On the top- right, the buttons labelled
“<<Back” and “Next>>” allow you to review a previous task or move on to the next task by clicking on the respective
button. Note that, although IATA does not prevent you from moving back and forth through the workflow, latter tasks in the workflow may not provide meaningful
results unless you have correctly
completed the earlier tasks in the workflow.
Figure 8.4 IATA task
interface instructions and navigation buttons
Regardless of the workflow in which it appears,
the general specifications for each task remain the same. The different task interfaces are
explained in detail in the example
walkthroughs in chapters 9 through 13.
8.10. Summary
In this chapter,
you reviewed the data requirements for item and test analysis and were presented an overview
of the types of information produced by IATA. You were also introduced to the IATA interface, including
the task interfaces, main menu, and workflow navigation.
In the following
five chapters, you will learn how to use each of the interfaces
by exploring the different workflows.
Chapter 9 begins with the analysis of pilot test data. Chapter 10 introduces continues the scenario with the analysis of data from a complete
administration of a national assessment. Chapter 11 introduces
the analysis of rotated booklet designs and the specification and interpretation of results for partial credit items.
Chapter 12 walks through the requirements and procedures for linking multiple cycles of results from a national assessment. Chapter 13 describes
the partial workflows that only analyse item data and discusses an alternate linking scenario
where existing item parameters are used to anchor the estimation of new item parameters and test scores.
[1] See the Chapter 15, page 197, for more details on IRT scaling. Additional IRT scaling options are available in IATA’s advanced functionality; refer to the installation guide on the accompanying
CD.
[2] Use of the
c-parameter to describe items may cause certain functions, such as equating, to
not work properly. For most purposes,
the items are most useful if the value of the c-parameter is equal or set to 0. The 3-parameter model should only
be used by expert users who are aware of its shortcomings.
Estimation and use of the c-parameter is provided by the advanced
functionality of IATA. Refer to Chapter
15 for a more details on the c-parameter. Registration of IATA, which is free, provides access to this advanced
functionality. For registration instructions, see the installation guide on the accompanying CD.
[3] This format
requirement means that commas should
never be used as answer key values.
[4] The Items1 data table produced
by IATA following an analysis of response data will serve as an item bank data file, but it also has several additional statistics. These additional statistics are discussed in the later
sections on analysis of response data and in the theoretical annex.
These statistics describe the behaviour of items in a specific sample
and are useful for advising test analysis and construction but are not required to be maintained in an item bank file that will be used by IATA
Không có nhận xét nào:
Đăng nhận xét