What is RELIABILITY?
Reliability is such a cherished commodity. We all want our cars, washing machine, and spouses to be reliable. The term reliability, indeed, reeks of solid goodness. It conjures up visions of a mother’s love.
In education, it is very important the assessment to be reliable. In matters related to measurement, however, reliability has a very restricted meaning, if we encounter the term reliability in any assessment content, we should draw a mental equal sign between reliability and consistency, because reliability refers to the consistency with which a teach measures whatever it’s measuring. Reliability = Consistency
The standards for Educational and Psychological Testing, a joint publication of the American Educational Research Association, the American Psychological Association, and the National Council on Measurement in Education put it, “Reliability refers the degree of which test scores are free from errors of measurement”. In other words, the fewer the errors of measurement, the more consistently examinees’ scores will accurately reflect examinees’ actual status.
There are three varieties of Consistency in educational assessment
1. Stability – Consistency of results among different testing occasions. Similar result even if the tests were administered on different occasions. (e.g.. suppose Maria gave her students a Mid-term exam on Tuesday, but later in the afternoon all tests papers were snatched, the next day after describing to her students how tests paper were purloined by a snatcher she ask her students to retake the mid-term exam. Because there have been no intervening events of significance, such as more instruction from Maria on the topics covered by the examination, Maria would expect her students’ Wednesday retake examination scores to be fairly similar on their Tuesday examination scores.) And that’s what the stability conception of test reliability refers to —consistency over time. If the Wednesday scores are not comparable to the Tuesday scores, then Maria’s mid-term exam would be judged to have no stability.
2. Alternate form – Consistency of results among two or more different forms of a test. It deals with the question whether two or more allegedly equivalent test forms are, in fact, equivalent.
3. Internal Consistency – Consistency in the way an assessment instrument’s items function. This really different creature than stability and alternate-form reliability. Internal consistency does not focus on the consistency of examinees’ scores on the test, Rather, internal consistency deals with the extent to which the items in the educational assessment instrument are functioning in a consistent manner.
What is Validity?
To illustrate: (e.g..if a mathematics teacher discovers early in the school year that her students know much more about mathematics than she had previously suspected, the teacher is apt to decide that the class will tackle more advanced topics than originally planned). Teachers use the results of assessments to make decisions about students. Appropriate educational decisions depend on the accuracy of educational assessment, accurate assessment improve the quality of decisions whereas inaccurate assessments will do the opposite. That’s what validity is about . Validity = Accuracy.
Three varieties of Validity Evidence
Well, this should come as no surprise, as the reliability has three kinds of reliability evidence that can be used to help teachers decide how consistently a test is measuring what it’s measuring. Validity has also three varieties that psychometricians provide that can help educators determine whether their score-based inferences are valid. All, by itself, the evidence assures the educators that score-based inferences is truly accurate. Here’s, brief description of the three kinds of evidence that will determine whether the inference one makes an educational procedure is valid.
1. Content Related – The extent to which an assessment procedure adequately represents the content of the assessment domain being sampled.
2. Criterion Related – The degree to which performance on an assessment procedure accurately predicts the examinee’s performance on an external criterion.
3. Construct Related – The extent to which empirical evidence confirms that an inferred construct exists and that a given assessment procedure is measuring the inferred construct accurately.
Relationship of Reliability and Validity
According to the Research Methods (KNOWLEDGE BASE) Reliability & Validity ( where I got these images below, Reliability and Validity are not separate ideas, instead they are related to each other. http://www.socialresearchmethods.net/kb/relandval.php
Here, according to my resources, we will think of the center of the target as the concept we are trying to measure. Imagine that, one shot at the target for each person. If we measure the concept perfectly for a person, we are hitting the center of the target. If don’t, we are missing the center. The more we off for that person, the further we are from the center.
Reliable, Not Valid – you are hitting the target consistently, but are missing the center of the target. That is, you are consistently and systematically measuring the wrong value for all respondents. This measure is reliable, but not valid (it’s consistent but wrong).
I think the best example here is the analogy of Robenille “Rain” Malit, in her post “Well-aligned to the wrong objective”. Open Forum on Alignment.
“If I use the hunting birds situation as an analogy, it would be like the teacher telling you that the goal is to shoot as much birds as you can, give and show you instructions on how use your bow and arrow well, does practice sessions with you on moving targets. You are able to shoot a lot of birds. You tie them up, bring them home and excitedly show your Ma, “Look at all the birds I killed.” Then she retorts, “Oh dear, why did you bring all those dead birds for home? They’ll just ROT here. Go bury them outside.”
Valid, Not Reliable – hits that is randomly spread across the target. You seldom hit the center of the target but, on average, you are getting the right answer for the group (but not very well for individuals). In this case, you get a valid group estimate, but you are inconsistent. Clearly see that reliability is directly related to the variability of your measure.
Can I put like this? Out of 50 students, only one got the correct answer and the rest were wrong. In this case, the feedback was, only one student could understand your lesson and the rest couldn’t.
Neither Reliable Nor Valid – hits are spread across the target and you are consistently missing the center.
Assessment tools were wrong because no one got a correct answer.
Both Reliable and Valid – consistently hit the center of the target. Your measure is both reliable and valid.
Assessment tool perfectly aligned. (Learning Objectives, Assessment, Instructional strategies are well-aligned).
My own understanding and example: