8. Chapter 8 Introduction to IATA

8.1. OVERVIEW

The item and test analysis software (IATA) accompanying Part II of this book is intended to help national assessment practitioners, researchers, and others analyze test item data as well as build effective assessment tools. IATA was designed to offer a user-friendly way to address many statistical considerations related to national assessments. It targets specifically those who are interested in analyzing test data, creating a new test from an item bank, or comparing or scaling test items between different samples. It is likely to be useful for individuals involved in educational assessment who have experience doing item analysis as well as others with some statistical competence but who are less familiar with the specific statistical processes that are a feature of national assessments.

The instructions in this book assume that you are familiar with basic computing functions on a Windows PC, such as starting programs, browsing directories, and opening files. The following chapters also assume that you have installed IATA correctly and can access the main menu in IATA. If you have not installed IATA yet or cannot start the program, please refer to the installation guide for IATA on the accompanying CD. Readers of this section of the book should also have some understanding of statistical concepts such as probability and properties of statistical distributions.

The overarching goal of IATA is to increase the usability and interpretability of test scores. The primary means of accomplishing this goal is to reduce the error of measurement. Error of measurement is the underlying concept that unifies all test creation and test analysis. A test is intended to measure a specific domain, such as mathematics skill or reading proficiency. However, no test is perfectly accurate. All test scores have some uncertainty; if a student were to take equivalent versions of a test with different items, it is unlikely that his or her score would be the same across each test. Error of measurement describes the degree to which a student’s score on a specific test differs from his or her ‘true score,’ the score that he or she would have achieved in the absence of uncertainty. An important goal of test development from a statistical perspective is to reduce the error of measurement. To reduce error of measurement, IATA identifies problematic items that contribute to error so that they may be revised, replaced, or removed altogether.

The second means of accomplishing this goal is to establish meaningful and consistent scales on which to report test scores. Throughout this section of the book, the terms statistic and parameter are used to describe characteristics of test items. A statistic is the result of a calculation using a particular sample of students and items. Because the value of a statistic depends on the sample, it cannot generalise to different samples or populations that are not equivalent to the sample from which it was estimated. Consequently, test scores that are calculated as statistics may not be directly comparable between different tests or groups of students. In contrast, a parameter relates the statistical properties of a student or test item as functions of sample characteristics. Accordingly, parameters may be used to characterize students and items in generalizable ways that are not dependent on particular samples. When IATA estimates parameters for students or test items, these parameters may be used or compared across different tests, which allows for greater efficiency and information than if each test in a national assessment program were simply interpreted by itself.

8.2. CHAPTER PREVIEW

The chapters in Part II begin with the point in the development of a national assessment where test items have been created and bundled into a test booklet and response data have been collected. Chapter 8 provides a review of information on data processing and formatting presented in previous books in this series, as well as an introduction to the IATA interface, including a description of the main menu and the different interactive elements of screen in IATA and the results it produces. Chapters 9 through 13 provide detailed walkthroughs for the three main analysis workflows in IATA, which will familiarize you each of the functions used in IATA. These workflows are designed to mimic the phases of development and implementation of a national assessment program, from pilot testing to full-scale testing and follow-up testing in subsequent assessment cycles. Taken together, these walkthroughs will familiarize the user with how to perform practically all psychometric analyses required for a national assessment system.

Chapter 14 provides a summary of the different workflows and presents examples of how each workflow might be used in different real-life scenarios. These chapters introduce many concepts that may be new or only partially familiar to those with experience in educational assessment. Although some background statistical information is presented in each section to help interpret the results generated by the software, detailed explanations of the theory or mathematical principles behind these concepts are presented in Chapter 15.

8.3. Assessment Data

There are two main types of data produced by and used in the analysis of assessments: response data and item data. Response data are produced by the individual learners as they answer questions on a test. A test is a specific collection of questions that evaluate a common domain of proficiency or knowledge. Individual questions on a test are refered to as items. Test items can be multiple-choice, short-answer, open- ended questions, or performance tasks. Item data are produced by analysing or reviewing items and recording their statistical or cognitive properties. Each row in a response data file describes the characteristics of a learner or test-taker, whereas each row in an item data file describes the characteristics of a test item.

IATA can read and write a variety of common data table formats (for example, Access, Excel, SPSS, delimited text files) if they are formatted correctly. If the data are not formatted with the correct structure, IATA will not be able to carry out the analyses. Database-compatible format such as Access or SPSS already take care of most data formatting issue. However, if the data are stored in a less restrictive format, such as Excel or text file, the following conventions should be followed:

· The names of variables should appear in the cell at the top of each column (known as the header). Each column with data must have a column header. The name of each variable must be distinct from the names of other variables in a data file. The names of variables must begin with a letter and should not contain any spaces.

· The data range must not contain any empty rows or columns. The data range is the rectangle of cells that contain data, beginning with the variable name of the first variable to appear in the data file and ending with the value of the last variable in the bottom-most row.

· The data range must begin at the first cell in the spreadsheet or file. In Excel, this cell is labelled “A1.” In text files, this is the top-left cursor position in the text file.

The two examples in Figure 9.1 illustrate incorrect and correct data formats. In the incorrect data format, on the left, there is a blank row above the data rectangle and a blank column to the left of it. There are also blank rows and columns within the rectangle and a column containing data without a header. In the correct format, on the right, all of the data are gathered into a single rectangle at the top-left of the spreadsheet with no blank rows or columns.

Figure 8.1 Incorrect and correct data formatting examples

8.3.1. Response data

Response data include the response of each student to each test item. The test results imported in the response data file must allow for automated scoring; this means that the item response data should include the codes representing how students actually response to items. For example, if the response data are from a multiple choice test, the data should record codes representing the options endorsed by each student (e.g., A, B, C, D, etc.). IATA will transform the coded responses into scores using the answer key you either enter manually or provide as an answer key file.

Other information may be stored in a response data file that may be useful for analyzing test results. Examples of this include demographic information on variables such as age, grade, gender, school, and region. Other useful information may be collected from questionnaires (such as student and teacher questionnaires) or administrative records. If a stratified sample of students was used, the sample weighting for each student should also be included on this file.

A unique identifier for each student should be provided each student, although IATA will automatically produce unique identifiers based on the record order if a unique identifier is not specified. However, if the results will be linked to other data sources, such as follow-up surveys or administrative records, it is a good idea to use a previously-defined identifier such as student name or number to facilitate future linkages between data sets.

All responses should be assigned codes. For multiple-choice items, this procedure is straightforward, because each response option is already coded as correct or incorrect. For open-ended items, a scoring rubric is required to help score item responses using a common coding framework. Open-ended items may be scored as either incorrect or as with partial credit given to different responses. A partial credit test item has more than one score that is greater than 0. Answers to open-ended questions must have been previously coded during the preparation of the response data. Volumes 2 and 3 of this series describe coding procedures for test items (Anderson and Morgan 2008; Greaney and Kellaghan 2012). In order to score the response data, for most analyses, an answer key must be loaded into IATA. An answer key is a list of response codes that indicates the correct answer(s) for each test item. The answer key may be imported as a data file or entered manually. If the analysis is using anchored item parameters, these anchored item parameters must be included in the answer key file; they may not be entered manually (see Item Data, page 16).

8.3.1.1. Treatment of Missing and Omitted Data

Missing data occur when a student does not provide a response to a test item. When this happens, rather than leaving the data field blank, a missing value code is used to record why the response is missing. There are two types of missing responses: missing and omitted.

Missing codes are assigned to variables when students could have responded to an item but chose not do so, leaving the answer blank. Such missing data will be scored as incorrect. In contrast, omitted data codes are used when students were not administered an item, as when a national assessment uses a rotated booklet design.

Codes that apply to student responses that were unreadable or where the student has answered inappropriately, such as selecting two multiple-choice options, are a form of missing response for the purpose of item distractor analysis. Depending on the circumstances of your test administration or data processing, you must decide whether these codes will be processed as missing or omitted data. Generally, if these data errors are the result of student error, the codes should be treated as missing and will be scored as incorrect. However, if the errors are the result of limitations in the data processing, such as imprecision in score card scanning that were not human-verified, then the codes should be treated as omitted.

Another use of omitted data codes occurs when a balanced rotated booklet design is used. Balanced rotating booklet designs involve giving different randomly equivalent samples of items to different students, so that not all students answer the same test items (see Anderson and Morgan, 2008). These designs permit extensive subject matter coverage while limiting the amount of student test taking time. In a rotating booklet design, omitted codes must be assigned to all items for a student except those contained in the test booklet presented to the student. Omitted codes are not normally assigned to items in situations where all students are required to answer all of the items.

Common conventions use specific values for the different types of non-response data. See Greaney and Kellaghan (2012) for information on response codes. Common values used are:

- 9 for missing responses, where students have not responded at all to an item,

- 8 for unscorable responses, which typically occurs in multiple choice tests when students provide multiple response and in open-ended items when student responses are illegible, and

- 7 for omitted or not-presented items, which might be used in a rotated booklet design.

- Regardless of the specific codes used, you must specify how IATA is to treat each non-response code, as either missing or omitted.

8.3.1.2. Item naming

It is important to assign a unique name to each item in a national assessment program (see Anderson and Morgan 2008; Greaney and Kellaghan 2012). All statistical analyses performed on a test item should be linked clearly to the name or label of an item. If an item is repeated in several cycles of a national assessment, it should retain the same name in the data files for each cycle. For example, a mathematics item first used in 2009 might have the name M003, to indicate that it was the third item to appear in the 2009 test. If this same item is used in a 2010 test, it should still receive the name M003, regardless of where it appears on a test. Naming items by position in a test may cause confusion when items are reused. For this reason, it is more useful to assign permanent names to test items when they are first developed, rather than when they are first used in an assessment.

Using consistent names also facilitates linking the results of different tests. When IATA estimates statistical linkages between tests, it matches items in the linking procedure using item names. If an item name refers to different items in the two tests being linked, the results of the linkage will not be accurate. Although it is possible to rename items to facilitate the linking process, it is simpler and less likely to introduce errors if unique item names are maintained from the start.

8.3.1.3. Variables reserved by IATA

During the analysis of response data, IATA will calculate a variety of different working variables. The names of these working or output variables are restricted and should not be used as names of test items or questionnaire variables. These variables, which IATA adds to the scored test data file, are listed in Table 8.1.

Table 8.1 Variables produced or used by IATA to describe student proficiency and test performance

Score Name	Description
XWeight	The design weight of the case that is used during analysis (if not specified, the value is equal to 1 for all students)
Missing	This variable describes the number of items that are omitted for a student;
PercentScore	The percent score is the number of items a student answered correctly expressed as a percentage of the total number of items administered to the student (excluding omitted response data).
PercentError	The error of measurement for the percent score (this estimate is specific to each student; its value depends on the percent score and number of items to which a student responded);
Percentile	The percentile rank is a number between 0 and 100 that describes, for each student, the percentage of other students with lower percent scores.
RawZScore	The RawZScore is the percent score, transformed to have a mean of 0 and a standard deviation of 1 within the sample.
ZScore	This score is the normal-distribution equivalent of the percentile score. It is also referred to as the ‘bell-curve score.’ Whereas the distribution of the RawZScore depends on the distribution of the percent correct score, the ZScore distribution tends to be more perfectly bell-shaped.
IRTscore	The IRTscore is the proficiency estimate of the student; this score is similar to the typically has a mean and standard deviation around 0 and 1, respectively. The IRTscore facilitates generalization beyond a specific sample of items because its estimation considers the statistical properties of different test items;[1].
IRTerror	The error of measurement for the IRTscore.
IRTskew	The skewness of the proficiency estimate, which indicates if the test is better at measuring the lower or upper bound of a student’s proficiency (for example, an easy test may accurately describe if students have reached a minimum level of proficiency but may be ambiguous about exactly how high the level of proficiency actually is)
IRTkurt	The kurtosis of the proficiency estimate, which describes how precise the estimate is for a given level of error (for example, for two scores with the same measurement error, one with the greater kurtosis is more precise).
TrueScore	This score is an estimate of a percent score that is calculated from the IRT score. It is preferable to the raw percent score because it corrects for differences in measurement error between items. This score is calculated as the average of the probability of correct response to each item, given the IRT score of the student and the parameters of the test items.
Level	This variable is an estimate of the proficiency level for a student that has been assigned based on standard setting procedures (if no standard setting procedures have been performed, the default is for all students to be assigned a value of 1).

In addition to these specific names, you should also avoid using names that contain the “@” symbol. This symbol is reserved for processing partial credit items, which are test items that have more than one possible score value greater than 0.

8.3.2. Item Data

IATA produces and uses item data files with a specific format. An item data file contains all the information required to perform statistical analysis of items and may contain the parameters used to describe the statistical properties of items. An item bank produced or used by IATA should contain the variables listed in Table 8.2.

Table 8.2 Variables in an item data file

Name	(MANDATORY) the unique name of each test item;
Key	(MANDATORY) the information used to assign a numeric score to each item response, which is either the single code corresponding to the correct response, or a delimited array of values that defines a variety of acceptable responses and their corresponding numerical scores;
a	(OPTIONAL) the first of three parameters that describe how performance on a test item relates to proficiency on the performance domain, referred to as the slope or discrimination parameter;
b	(OPTIONAL) the second item parameter, referred to as the location or difficulty parameter;
c	(OPTIONAL) the third parameter, referred to as the pseudo guessing parameter[2];
Level	(OPTIONAL) a previously assigned proficiency level for an item based on the initial item specification and expert review (values should be natural numbers, beginning with 1); and
Content	(OPTIONAL) a code or description used to describe the subdomain of the curriculum, also known as a strand or thread, to which each item is most strongly aligned.

Table 8.3 presents examples from an item bank data file containing information about five science items named C1Sci31, C1Sci32, C1Sci33, C1Sci34 and C1Sci35. Note that the item named “C1Sci35” does not have any data in the columns labelled a, b, c and Content. As indicated in Table 8.3, the only data fields that are mandatory are the Name and Key. If a, b, or c parameters are missing, they will be estimated during the analysis. There are many situations that may require you to enter an item data file into IATA that is missing these parameters. The most common scenario occurs when the response data for the items have never before been analysed; in this case, the item bank data file is simply being used as an answer key. Another scenario occurs when some items have parameters that have been estimated in a previous data analysis, and you wish to fix the values of these items instead of having IATA re-estimated them; in this scenario, you would leave empty a, b, and c values only for items that you wish to estimate new parameters for (see Chapter 15, page 119). Values for Level and Content may be either manually entered in the IATA interface or left empty.

Table 8.3 Sample section of an item data file

Name	a	b	c	Key	Level	Content
C1Sci31	0.34	0.83	0.01	3	3	Scientific Reasoning
C1Sci32	0.46	0.4	0.12	4	2	Physics
C1Sci33	0.32	0.31	0.06	3	2	Physics
C1Sci34	0.18	0.75	0.16	1	3	Biology
C1Sci35				5		Environment

An item data file may also include additional variables. For example, additional information typically stored with item data include the question stem in the item bank, statistics describing the number of times the item has been used, or a list of the tests in which each item appears. However, any additional variables other than the seven required data fields listed in Table 8.3 will not be used by IATA.

The national assessment team can use information from any source as long as they have the required item data in the format presented in Table 8.2. For example, national assessments may obtain permission to use items from various large–scale assessments such as those administered by the International Association for the Evaluation of Educational Achievement (IEA) which includes TIMSS and PIRLS (http://timss.bc.edu/). If items from existing large scale assessments are included on a national assessment, the item parameters from the existing assessments may be used to create an item data file that IATA can import.

8.3.2.1. Answer key formats

In the column with the header ‘Key’ in an item bank data file, you must provide IATA with information it can use to score each item. In the simplest case, for multiple choice test items with a single correct option, the value in each column should be the alphanumeric character corresponding to the correct option. The value is case- sensitive, which means that, for example, if the correct response is coded as an upper- case “A”, then the upper-case letter “A” must be provided in the answer key; if a key value of “a” is provided, then any item responses with a value of “A” will be scored incorrect.

In rare cases, during a process of test review, it may be determined that there is more than one correct option to a test item. To assign more than one key value to an answer key, you must enter a list of correct values, separated by commas. Do not enter spaces between any values or after the commas. For example, if the responses of “A” and “C” are acceptable as correct responses for a test item, then the value of the key for this item should be “A,C”[3]

8.3.2.2. Item data formats for partial credit items

Partial credit (or graded-response) items are test items that have more than one score value. For example, instead of being scored as 0 or 1, an item with different levels of correctness may be scored as 0, 1, or 2, where 0 represents an attempted response, 1 represents a partially-correct response, and 2 represents a perfect response. To accommodate the different score values, the answer key must be entered for each score value that is greater than 0. If the marking scheme used for partial credit items uses scores that are all greater than 0, then answer key information should not be entered for the lowest score value. For example, if the possible item scores are 1, 2 and 3, then the answer key should only provide scoring information for scores 2 and 3. The format for a partial credit answer key is: <score1>:<value list 1>;<score 2>:<value list 2>; … <score n>:<value list n>. For example, for a partial credit item with three scores, coded as A, B, and C, with scores of 1, 2, and 3, respectively, then the answer key for this item should be entered as “1:A;2:B;3:C”.

If a partial credit item has already been analyzed, it will have a greater number of parameters than a regular test item. Each score value will have a distinct value for the b-parameter, although the a-parameter will have the same for all score values. These item data must be entered in a special format. In addition to providing the main entry with the full answer key, a new entry must be added for each score value (except for the lowest score value) as if each item score were a separate test item. The parameter fields for the main item entry should be left blank. For example, if an item has scores of 0, 1 and 2, then a total of three rows would be required in the item data file: one row for the overall item, which would only have the item name and answer key, and two score-specific entries for 1 and 2 that have name, answer key, and parameter information.

The value of the name field for each new score-specific entry is the original item name followed by “@<score value>”. For example, if the original item name is “TestItem” then the name for an item score of 1 is “TestItem@1”. IATA uses an item response model that requires the values of the different b-parameters be in the same order as the scores. Therefore, if there are two score entries, 1 and 2, then the b- parameter value for score 2 must be greater than the b-parameter for score 1, as shown in Table 8.4.

When a new row is entered for each item score, the values of the answer key field must also be specified differently. The analysis of a partial credit item assumes that a student achieving a particular item score has also mastered whatever level of skill that is associated with a lower score on that item. In other words, if each score is treated as a separate test item, then a student with a high partial credit score has effectively also performed correctly on the lower credit scores. To manage this relationship in IATA, the answer key for each score value should list its own key value(s) as well as the values of any higher scores.

An example of proper partial credit item data formatting for an item with scores of 0, 1, 2 and 3 is given in Table 8.4. Note that no scoring information is provided for the lowest score (0). The main item entry has no parameter values or a value for Level. Because each score value could correspond to a different standard of performance, it does not make sense to have the level specified for the entire item. Even though the responses are already scored, scoring information still must be specified with using the proper answer key format. In order for IATA to score the responses properly, the answer key must provide both the values found in the data and the score assigned to each value.

Table 8.4 Sample section of an item data file for a partial credit item

Name	a	b	c	Key	Level	Content
PCItem001				1:1;2:2;3:3		Thành phần của câu Parts of speech
PCItem001@1	0.61	-0.43	0	1,2,3	1	Parts of speech
PCItem001@2	0.61	0.22	0	2,3	1	Parts of speech
PCItem001@3	0.61	0.74	0	3	2	Parts of speech

8.4. Data produced by IATA

IATA produces several data tables that contain the specifications of the current analysis and the analysis results. In general, all results should be archived for future reference. Table 8.5 summarizes the list of data tables produced by IATA. These data tables may be saved individually or collectively directly from IATA into a variety of common formats, including Excel (*.xls/*.xlsx), SPSS (*.sav), comma-delimited (*.csv) or tab-delimited (*.txt).

Table 8.5 Data tables produced by IATA

Data Tables	Description
Responses	Original response data (including non-test data) imported into IATA
Values	Unique response codes for all test items, and indication as to whether each response value is coded as a valid missing (valid skip) or missing
Scored	Response data that have been scored as correct (1) or incorrect (0) using the specified answer key, as well as all summary scores and their standard errors
Items1[4]	Item answer keys, and statistics related to the current analysis and item parameters
Items2	Item answer keys and parameters of the reference item parameter file used for linking
MergedItems	Item-by-item matching of items in both the new and reference item parameter files used by the linking process
Eigenvalues	The proportion of variance explained by each of the dimensions in the item responses
PatternMatrix	The proportion of variance explained of each item by each of the dimensions underlying the item responses
Levels	The thresholds used to define proficiency levels
LinkingConstants	Scale transformation constants used to adjust the latent trait scale between populations or samples
BookmarkData	An ordered list of items that can be used to facilitate standard setting or creating definitions for performance levels
DIF_<specifications>	The results of a differential item functioning analysis, where
	the <specifications> portion of the table name summarizes the variable and groups compared in the analysis.
CustomTest<name>	A set of items chosen to optimize minimize error of measurement over a specific range of proficiency. The <name> is a user-specified value.

8.5. Other output produced by IATA

In addition to the data tables described in Table 8.5, IATA also produces several charts, textual summaries, and tables of results that are only displayed in the IATA interface. These results may be copied directly from IATA and pasted into other documents for future reference. The method of copying the output depends on the type of output.

For charts, right-clicking on the chart body will raise a popup menu with the options to either: 1) copy the image to the clipboard, 2) save the chart image directly to a file, or 3) print the image. For results that are displayed in tabular form, you must copy the data by first selecting the cells, rows, or columns that you wish to copy, then copying the data either by selecting “Copy” from the right-click popup menu or by typing Ctrl+c. The copied data may be pasted to a text file or directly into spreadsheet software such as Excel or SPSS.

8.6. Interpreting IATA results

Whenever IATA produces itemized analysis results for different items, it will also present ‘traffic symbol’ summary indicators that provide a general idea of how to interpret the results. There are three different symbols that IATA uses, explained in Table 8.6.

Table 8.6 Traffic symbols in IATA and their meanings

Symbol	Meaning
	Green circles indicate no major problems.
	A yellow diamond indicates that the results are less than optimal. This indicator is used to suggest that modifications may be required to either the analysis specifications or the items themselves. However, the item is not introducing any significant error into the analysis results.
	A red warning triangle appears beside any potentially problematic items. This indicator is used either to indicate items that could not be included in the analysis due to problems with the data or specifications, or to recommend a more detailed examination of the specifications or underlying data and test item. When this indicator appears, it does not necessarily mean that there is a problem, but it does suggest that the overall analysis results may be more accurate if the indicated test item were removed or if the analysis were re-specified.

For analyses where there are multiple pieces of information to consider when interpreting results for a specific item, such as the Item Analysis and Test Dimensionality results, IATA will also generate interpretive statements that attempt to summarize the different statistics. These statements are intended as a useful suggestion for how to proceed. However, in any case where IATA recommends a modification to either the analysis specifications or test items, you should verify that the recommendation is appropriate by examining the statistical results and/or actual test booklets yourself.

8.7. SAMPLE DATA

When IATA is installed on your computer, it will create a folder on your desktop called IATA. This folder contains sample data that are required for the walkthrough examples in this book. There are six different files in the sample data folder. These include four response data sets, each in Excel format, and an Excel file containing the answer keys for each of the response data sets. The files are in *.xls format in order to be compatible with older and open-source software (depending on your computer settings, you may not be able to see the “.xls” file extension). The names and contents of the files are:

· PILOT1(.xls) – a response data set corresponding to a pilot test containing multiple choice items.

· CYCLE1(.xls) – a response data set corresponding to a national assessment administration.

· PILOT2(.xls) – a response data set corresponding to a pilot test containing multiple choice and partial credit items in a balanced rotated design.

· CYCLE2(.xls) – a response data set corresponding to a national assessment administration with items common with a previous administration.

· CYCLE3(.xls) – a response data set corresponding to a national assessment administration with items common with a previous administration.

· ItemDataAllTests(.xls) – an Excel file with multiple sheets containing answer keys and information about the items on each of the different response data files

These sample data are fictional data sets which were developed with the sole intent of providing concrete examples and applications of this software. Although they reflect typical patterns of student response and the relationships in these data are similar to those found in most large-scale assessments, the results and discussions of the analysis findings do not represent any actual national assessment.

If you delete any of the sample data files, you may recover them by reinstalling IATA. The data can also be found on the accompanying CD to this book or downloaded from the IATA website (http://www.polymetrika.org/IATA).

8.8. IATA analysis workflows and interfaces

IATA differs from many statistical analysis programs, which tend to provide a variety of analysis functions that may be accessed individually. In contrast, all of the analysis functions in IATA are accessed through workflows, where the results from each step in the workflow may be used to inform the specifications or interpretation of results in subsequent steps. There are five workflows available in IATA:

1. Response data analysis,

2. Response data analysis with linking,

3. Linking item data,

4. Selecting optimal test items, and

5. Developing and assigning performance standards.

The different workflows reflect the needs of different goals that may arise in the context of a national assessment. Listed below are some common situations that might require the different workflows:

· If you have conducted a pilot test and need detailed information about item behavior to determine the contents of the final test, you should use the “Response data analysis” workflow;

· If you have finished collecting data for the first national assessment in a planned series of assessments, you should use the “Response data analysis” workflow;

· If you are assigning new scale scores to a sample of students who have been administered the same test that was used in a previous national assessment, you should use the “Response data analysis” workflow;

· If you have conducted a national assessment that shares items with a previous assessment and are interested in comparing the results of the two, you should use the “Response data analysis with linking” or “Linking item data” workflows;

· If you wish to modify your test and need to know the best items to retain in the new test in order to maintain comparability with the previous test, you should use the “Selecting optimal test items” workflow;

· If you have already conducted the national assessment and want to interpret the results in a way that are consistent with curriculum expectations, rather than simply comparing students to each other, you should use the “Developing and assigning performance standards” workflow.

To perform an analysis with IATA, you must select one of these workflows from the main menu. The main menu is reached by clicking the “Main Menu” button on the bottom right of the language selection and registration screen that loads with IATA, shown in Figure 8.2.

Figure 8.2 Initial language selection and optional registration for IATA

The default language for IATA is English. Registration is optional and is not required in order to access any of the functionality that is discussed in this book. The IATA main menu is shown in Figure 8.3.

Figure 8.3 The IATA main menu

Each workflow is composed of a set of tasks which are completed in order. Most of the workflows share many of the same tasks. There are 10 different tasks that IATA performs, and each task has its own interface. These tasks generally appear in the following order:

1. Loading data

2. Setting analysis specifications

3. Analyzing test items

4. Scaling test results

5. Analysing test dimensionality

6. Analyzing differential item functioning

7. Linking

8. Selecting optimal test items

9. Informing development of performance standards

10. Saving results.

Not all tasks appear in all workflows. The workflows are designed so that you are only required to perform tasks that are relevant to your analysis goals. Table 8.7 summarizes which tasks appear in which workflows.

Table 8.7 Different tasks in IATA and the workflows in which they are used

Task	Workflow: Response data analysis Response data analysis with linking Linking item data Selecting optimal test items Developing and assigning performance standards.
	A	B	C	D	E
1. Loading data	●	●	●	●	●
2. Setting analysis specifications	●	●
3. Analyzing test items	●	●
4. Analysing test dimensionality	●	●
5. Analyzing differential item functioning	●	●
6. Linking		●	●
7. Scaling test results	●	●
8. Selecting optimal test items	●	●		●
9. Informing development of performance standards	●	●			●
10. Saving results	●	●	●	●	●

The first two work flow s (A and B) are very similar in terms of their tasks, because all three require the analysis of response data, which requires some analyses to determine that the use of the statistical measurement models is appropriate for the response data. In contrast, the last three workflows (C, D and E) only analyse item data. All workflows require that data be loaded into IATA and allow you to save results.

8.9. Navigating through IATA workflows

When you select a workflow from the IATA main menu, you will be directed into the set of tasks for that workflow. Each task has its own interface that allows you to specify how IATA should perform the task and, if applicable, view the results produced after IATA has performed the task.

At the top of each task interface, there are some elements that are common for all tasks. These elements are the instructions box and the navigation buttons, shown in Figure 8.4. The instructions box, on the left, provides a brief summary of what specifications are required for each task and how to interpret any results. On the top- right, the buttons labelled “<<Back” and “Next>>” allow you to review a previous task or move on to the next task by clicking on the respective button. Note that, although IATA does not prevent you from moving back and forth through the workflow, latter tasks in the workflow may not provide meaningful results unless you have correctly completed the earlier tasks in the workflow.

Figure 8.4 IATA task interface instructions and navigation buttons

Regardless of the workflow in which it appears, the general specifications for each task remain the same. The different task interfaces are explained in detail in the example walkthroughs in chapters 9 through 13.

8.10. Summary

In this chapter, you reviewed the data requirements for item and test analysis and were presented an overview of the types of information produced by IATA. You were also introduced to the IATA interface, including the task interfaces, main menu, and workflow navigation.

In the following five chapters, you will learn how to use each of the interfaces by exploring the different workflows. Chapter 9 begins with the analysis of pilot test data. Chapter 10 introduces continues the scenario with the analysis of data from a complete administration of a national assessment. Chapter 11 introduces the analysis of rotated booklet designs and the specification and interpretation of results for partial credit items. Chapter 12 walks through the requirements and procedures for linking multiple cycles of results from a national assessment. Chapter 13 describes the partial workflows that only analyse item data and discusses an alternate linking scenario where existing item parameters are used to anchor the estimation of new item parameters and test scores.

[1] See the Chapter 15, page 197, for more details on IRT scaling. Additional IRT scaling options are available in IATA’s advanced functionality; refer to the installation guide on the accompanying CD.

[2] Use of the c-parameter to describe items may cause certain functions, such as equating, to not work properly. For most purposes, the items are most useful if the value of the c-parameter is equal or set to 0. The 3-parameter model should only be used by expert users who are aware of its shortcomings. Estimation and use of the c-parameter is provided by the advanced functionality of IATA. Refer to Chapter 15 for a more details on the c-parameter. Registration of IATA, which is free, provides access to this advanced functionality. For registration instructions, see the installation guide on the accompanying CD.

[3] This format requirement means that commas should never be used as answer key values.

[4] The Items1 data table produced by IATA following an analysis of response data will serve as an item bank data file, but it also has several additional statistics. These additional statistics are discussed in the later sections on analysis of response data and in the theoretical annex. These statistics describe the behaviour of items in a specific sample and are useful for advising test analysis and construction but are not required to be maintained in an item bank file that will be used by IATA

MỘT SỐ VẤN ĐỀ GIÁO DỤC

Thứ Năm, 5 tháng 11, 2015

Hướng dẫn sử dụng phần mềm phân tích đề thi IATA (Chapter 8_Convert to PDF to Word full)