Instrument Validity

Validity (a concept map shows the various types of validity)
A instrument is valid only to the extent that it’s scores permits appropriate inferences to be made about
1) a specific group of people for
2) specific purposes.

An instrument that is a valid measure of third grader’s math skills probably is not a valid measure of high school calculus student’s math skills. An instrument that is a valid predictor of how well students might do in school, may not be a valid measure of how well they will do once they complete school.  So we never say that an instrument is valid or not valid…we say it is valid for a specific purpose with a specific group of people. Validity is specific to the appropriateness of the interpretations we wish to make with the scores.

In the reliability section, we discussed a scale that consistently reported a weight of 15 pounds for someone. While it may be a reliable instrument, it is not a valid instrument to determine someone’s weight in pounds. Just as a measuring tape is a valid instrument to determine people’s height, it is not a valid instrument to determine their weight.

There are three general categories of instrument validity.
Content-Related Evidence (also known as Face Validity)
Specialists in the content measured by the instrument are asked to judge the appropriateness of the items on the instrument. Do they cover the breath of the content area (does the instrument contain a representative sample of the content being assessed)? Are they in a format that is appropriate for those using the instrument? A test that is intended to measure the quality of science instruction in fifth grade, should cover material covered in the fifth grade science course in a manner appropriate for fifth graders. A national science test might not be a valid measure of local science instruction, although it might be a valid measure of national science standards.

Criterion-Related Evidence
Criterion-related evidence is collected by comparing the instrument with some future or current criteria, thus the name criterion-related. The purpose of an instrument dictates whether predictive or concurrent validity is warranted.

– Predictive Validity
If an instrument is purported to measure some future performance, predictive validity should be investigated. A comparison must be made between the instrument and some later behavior that it predicts.  Suppose a screening test for 5-year-olds is purported to predict success in kindergarten. To investigate predictive validity, one would give the prescreening instrument to 5-year-olds prior to their entry into kindergarten. The children’s kindergarten performance would be assessed at the end of kindergarten and a correlation would be calculated between the screening instrument scores and the kindergarten performance scores.

– Concurrent Validity
Concurrent validity compares scores on an instrument with current performance on some other measure.  Unlike predictive validity, where the second measurement occurs later, concurrent validity requires a second measure at about the same time.   Concurrent validity for a science test could be investigated by correlating scores for the test with scores from another established science test taken about the same time. Another way is to administer the instrument to two groups who are known to differ on the trait being measured by the instrument. One would have support for concurrent validity if the scores for the two groups were very different. An instrument that measures altruism should be able to discriminate those who possess it (nuns) from those who don’t (homicidal maniacs).  One would expect the nuns to score significantly higher on the instrument.

Construct-Related Evidence
Construct validity is an on-going process. Please refer to  pages 174-176 for more information. Construct validity will not be on the test.

– Discriminant Validity
An instrument does not correlate significantly with variables from which it should differ.

– Convergent Validity
An instrument correlates highly with other variables with which it should theoretically correlate.

Del Siegle, Ph.D.
Neag School of Education – University of Connecticut