Hareshwari Kapdi: Thinking activity: Testing and Evaluation

Hello Readers!

Welcome to my blog,

This task was given by Dr. Dilip Barad sir head of department of English M.K Bhavnagar university.

questions:-

Write on validity and reliability of the test
Write on practicality of the test
What do you understand by backwash ?
Difference between assessment and evaluation
How do you define good assessment?

1) write on validity and reliability of the test

Reliability

Reliability is the extent to which measurements are repeatable –when different persons perform the measurements, on different occasions, under different conditions, with supposedly alternative instruments which measure the same thing. In sum, reliability is consistency of measurement or stability of measurement over a variety of conditions in which basically the same results should be obtained.

Methods of testing reliability

Because reliability is consistency of measurement over time or stability of measurement over a variety of conditions, the most commonly used technique to estimate reliability is with a measure of association, the correlation coefficient, often termed reliability. The reliability coefficient is the correlation between two or more variables (here, tests, items, or raters) which measure the same thing.

1.Test-retest reliability

Test-retest reliability refers to the temporal stability of a test from one measurement session to another. The procedure is to administer the test to a group of respondents and then administer the same test to the same respondents at a later date. The correlation between scores on the identical tests given at different times operationally defines its test-retest reliability.

2. Alternative forms reliability

The alternative forms technique to estimate reliability is similar to the test retest method, except that different measures of a behaviour (rather than the same measure) are collected at different times. If the correlation between the alternative forms is low, it could indicate that considerable measurement error is present, because two different scales were used.

3. Split-half method

The split-half approach is another method to test reliability which assumes that a number of items are available to measure a behavior. Half of the items are combined to form one new measure and the other half is combined to form the second new measure. The result is two tests and two new measures testing the same behavior. In contrast to the test-retest and alternative form methods, the split-half approach is usually measured in the same time period. The correlation between the two halves tests must be corrected to obtain the reliability coefficient for the whole test.

4. Inter-rater reliability

When raters or judges are used to measure behavior, the reliability of their judgments or combined internal consistency of judgments is assessed.

The correlation between the ratings made by the two judges will tell us the reliability of either judge in the specific situation. The composite reliability of both judges, referred to as effective reliability, is calculated using the Spearman-Brown formula.

5. Internal consistency

Internal consistency concerns the reliability of the test components. Internal consistency measures consistency within the instrument and questions how well a set of items measures a particular behaviour or characteristic within the test. For a test to be internally consistent, estimates that reliability are based on the average intercorrelations among all the single items within a test.

How to make a test more reliable?

Writing items clearly

Making test instructions easily understood

Training the raters effectively by making the rules for scoring as explicit as possible.

Identifying Candidates by number Instead of names so that scores cannot inevitably award scores to those whom he/she knows

By not allowing the candidate too much freedom

Factors affecting reliability of a test

1. Test length. Generally, the longer a test is, the more reliable it is.

2. Speed. When a test is a speed test, reliability can be problematic. It is inappropriate to estimate reliability using internal consistency, test-retest, or alternate form methods.

3. Group homogeneity. In general, the more heterogeneous the group of students who take the test, the more reliable the measure will be.

4. Item difficulty. When there is little variability among test scores, the reliability will be low. Thus, reliability will be low if a test is so easy that every student gets most or all of the items correct or so difficult that every student gets most or all of the items wrong.

5. Objectivity. Objectively scored tests, rather than subjectively scored tests, show a higher reliability.

6. Test-retest interval. The shorter the time interval between two administrations of a test, the less likely that changes will occur and the higher the reliability will be.

Variation with the testing situation. Errors in the testing situation (e.g., students misunderstanding or misreading test directions, noise level, distractions, and sickness) can cause test scores to vary.

Validity

The term validity refers to whether or not the test measures what it claims to measure. On a test with high validity the items will be closely linked to the test's intended focus. For many certification and licensure tests this means that the items will be highly related to a specific job or occupation. If a test has poor validity then it does not measure the job-related content and competencies it ought to. When this is the case, there is no justification for using the test results for their intended purpose.

There are nine types of validity that researchers should consider, it includes the following;

Statistical conclusion validity

Statistical conclusion validity pertains to the relationship being tested. Statistical conclusion validity refers to inferences about whether it is reasonable to presume co-variation given a specified alpha level and the obtained variances. There are some major threats to statistical conclusion validity such as low statistical power, violation of assumptions, reliability of measures, reliability of treatment, random irrelevancies in the experimental setting, and random heterogeneity of respondents.

Construct validity

Construct validity refers to how well you translated or transformed a concept, idea, or behaviour that is a construct – into a functioning and operating reality, that is, the operationalization.

Translation Validity

Translation validity centers on whether the operationalization reflects the true meaning of the construct. Translation validity attempts to assess the degree to which constructs are accurately translated into the operationalization, using subjective judgment.

Face Validity

Face validity is a subjective judgment on the operationalization of a construct. For instance, one might look at a measure of reading ability, read through the paragraphs, and decide that it seems like a good measure of reading ability. Even though subjective judgment is needed throughout the research process, the aforementioned method of validation is not very convincing to others as a valid judgment. As a result, face validity is often seen as a weak form of validity.

Content validity

Content validity is a qualitative type of validity where the domain of the concept is made clear and the analyst judges whether the measures fully represent the domain. According to Bollen, for most concepts in the social sciences, no consensus exists on theoretical definitions, because the domain of content is ambiguous.

Relationship between validity and reliability of a test

If a test is unreliable, it cannot be valid

For a test to be valid, it must be reliable

However, just because a test is reliable does not mean it would be valid

Reliability is a necessary but not a sufficient condition for validity

2) Write on practicality of the test

An effective test is practical. This means that it

Is not excessively expensive,

Stays within appropriate time constraints,

Is relatively easy to administer, and

Has a scoring/evaluation procedure that is specific and time-efficient.

A test that is prohibitively expensive is impractical. A test of language proficiency that takes a student five hours to complete is impractical-it consumes more time (and money) than necessary to accomplish its objective. A test that requires individual one-on-one proctoring is impractical for a group of several hundred test-takers and only a handful of examiners. A test that takes a few minutes for a student to take and several hours for an examiner too evaluate is impractical for most classroom situations.

3)What do you understand by backwash ?

a backward flow or movement (as of water or air) produced especially by a propelling force also : the fluid that is moving backward.

In terms of water treatment, including water purification and sewage treatment, backwashing refers to pumping water backwards through the filters media, sometimes including intermittent use of compressed air during the process.

4)Difference between assessment and evaluation

Assessment:-

For assuring quality and drawing out clear strategies for optimising education, we need to have a summative overview of all the key elements involved. This includes making things clear about how we should assess and evaluate the teaching-learning and student performances. In this blog, we intend to put the terms ‘assessment’ and ‘evaluation’ under the microscope, and see how we should assess the learning for efficient and optimised educational workflow.

Assessment is continuous, long term, and feedback-based. It revolves around clearly defined goals or outcomes, and whether the student achieves those goals or not. On the other hand, an evaluation is the act of critical interpretation of the learners’ work and providing marks/grades for that.

Assessments:-

Let’s begin by looking at some common misconceptions about learning assessments.

Evaluation:-

Evaluation is judgemental and is highly result oriented. It ‘measures’ quality and integrity of the work rather than improving it. The teacher has to take the evaluation data to form their strategy to improve the student performance. Evaluation also let us know the shortfalls of the course plan, where the student failed to perform well, and it’s mostly done against pre-set standards.

Under our current circumstances, evaluation is usually done with exams, projects, classroom interactions, and proficiency in skills. To definitively explain, we’ll break down the evaluation process into three main categories.

5)How do you define good assessment?

Several attempts to define good assessment have been made. There is a general agreement that good assessment (especially summative) should be:

Valid: measures what it is supposed to measure, at the appropriate level, in the appropriate domains (constructive alignment).

Fair: is non-discriminatory and matches expectations.

Transparent: processes and documentation, including assessment briefing and marking criteria, are clear.

Reliable: assessment is accurate, consistent and repeatable.

Feasible: assessment is practicable in terms of time, resources and student numbers.

Educational impact: assessment results in learning what is important and is authentic and worthwhile.

Hence it puts emphasis on being assessed on real life skills through real life tasks that will be or could be performed by students once they leave university. Some examples of how this can be achieved in practical terms can be found in Assessment methods.

Assessment principles

The good assessment principles below were created as part of the REAP Reenginering Assessment Practices Project which looked into re-evaluating and reforming assessment and feedback practice. This set of principles in particular is referred to here as it serves as the basis for many assessment strategies across UK HE institutions. For each of the principles a number of practical strategies are provided which give a more pragmatic indication of how to put them in practice.

Thank you…...

Hareshwari Kapdi

25 February 2021

Thinking activity: Testing and Evaluation

No comments:

Post a Comment

Blog Archive