Quality of assessment
High-quality assessment is essential for fair and effective education. In summative assessment, it ensures fair decisions about selecting students and course and program completion. Poor quality assessments can lead to incorrect and potentially harmful conclusions. In formative assessment, assessment quality is important for supporting student learning. Providing accurate, clear, and relevant assessment information allows students to track their progress and adapt their learning strategies, and helps lecturers tailor their teaching accordingly. Examples include diagnostic quizzes, practice tasks, or feedback moments.
Based on van der Veen (2016) and van Berkel et al. (2023), we distinguish five key quality criteria for assessments:
- Validity
- Reliability
- Transparency
- Effectiveness
- Feasibility
A key aspect of validity is the extent to which the assessment measures what it is intended to measure. Constructive alignment (Biggs, 1996) plays an important role: the assessment should align with the learning objectives of a course as well as with the teaching activities. It also refers to the accuracy, meaning, and usefulness of conclusions drawn from test scores. In summative assessments, scores guide decisions—so low validity can lead to conclusions based on distorted or irrelevant information. It is therefore essential that the assessment reflects what it is intended to measure. Aligned assessment If one of your learning objectives states that students must be able to formulate a research question, it would be logical to assess this through an assignment in which students are required to formulate their own research question, rather than through a multiple-choice question asking about the characteristics of good research questions. Unaligned assessment If one of your learning objectives describes that your students can briefly explain the water cycle, then a paper would not be the best assessment method to assess instead of an exam with open-ended questions. Reliability refers to the extent to which one can trust the assessment score as a measurement. Theoretically, if a student were to take the same assessment a second time (without test effects), they should achieve the same score. Reliability is a prerequisite for a valid instrument. Even if your assessment measures what it is supposed to measure, if it is not reliable, the information from the assessment is not usable. Objectivity also relates to the reliability of the assessment: the outcome should not depend on who grades the assessment, but on the learning objectives, the assessment itself, and the evaluation criteria. Reliable assessment For a comprehensive course, the student is assessed through three assessment methods: a portfolio, a closed-question exam, and a presentation. Rubrics are used to assess both the portfolio and the presentation. The portfolio is reviewed by a second examiner. Peer feedback is used during the presentation. The use of multiple forms of assessment and multiple assessors enhances the reliability of the evaluation. Unreliable assessment A student gives a presentation on research about the implications of Socrates for Greek society and receives a score of 5. In another course, the same student presents research on Hegel and receives a 9. This discrepancy is striking—one would expect the scores for comparable assessments to be closer. Students tend to tailor their learning strategies to the type and perceived importance of assessments, which makes assessment a powerful driver of learning (Van der Vleuten, 2012). When assessments focus mainly on factual recall, students are more likely to adopt rote learning strategies. In contrast, assessments that require critical thinking or application encourage deeper learning. If you aim for students to engage in deep learning, your assessment methods must align accordingly. Constructive alignment (Biggs, 1996) supports this by ensuring that learning objectives—both in content and level of mastery—are in line with teaching activities and assessment. Effective assessment During the first meeting of a new course, students are given information about the assessment of the course. They hear that they have to write a paper and they have an exam with open-ended questions on cases. Students use learning strategies that result in deep learning. Ineffective assessment During the first meeting of another course, students are given information about the assessment of the course. At the end of the course the students are assessed by a written test with questions at Bloom’s levels of ‘remembering’ and ‘understanding’. In this case, students use learning strategies that cause surface learning. Students should be informed in advance about the format of the assessment and, in the case of assignments, the criteria on which they will be evaluated. This allows them to prepare as effectively as possible to demonstrate how well they have achieved the learning objectives. After the assessment, it is important that students understand the criteria on which they were assessed and how the final grade and overall judgment were determined. his is crucial for supporting learning and maintaining fairness. High transparency In a course, students are assessed by a test with open-ended questions and a paper. During the first teaching session, students were given information about the open-ended test and how to prepare. They are also informed about the requirements of the paper using rubrics. They can also find this information in their syllabus. Low transparency Imagine students are given an open-ended test on German grammar without any information about the test. It is important that they know how many questions will be asked and which topics the test will cover (e.g., verb conjugation, correct use of cases). High transparency After submitting an argumentative essay in a sociology course, students receive their grades. The examiner provides a detailed rubric that was shared with students beforehand. The returned feedback includes: (1) the score for each criterion (e.g. argument clarity, use of evidence, structure, referencing), (2) Specific comments tied to those criteria and (3) a paragraph explaining the rationale behind the final grade. Students can clearly see how their work was evaluated and how the final grade was constructed. This transparency promotes trust, helps students understand how to improve, and aligns with principles of formative feedback and assessment literacy. Low transparency After submitting an argumentative essay in a sociology course students receive a single overall grade (e.g. 6.5/10) with a vague comment like “Good effort, but needs improvement.” There’s no reference to the specific criteria used in assessment, nor is it clear how the final grade was determined. Students are left guessing about what was done well or poorly. This lack of clarity also prevents meaningful learning from the assessment. Finally, alongside all quality criteria, consider how feasible the assessment is in terms of time, budget, and available resources. Practical constraints may require you to make informed choices and balance ideal design with what is realistically achievable. Highly feasible In a second-year language course with 60 students, each student writes a 1,000-word case analysis. The assessment includes a strict word limit, clear structure, detailed rubric, and online submission with plagiarism checking. This setup balances educational value with feasibility: the word limit keeps grading manageable, the structure and rubric streamline evaluation, and digital tools reduce administrative effort—enabling efficient grading, whilst upholding other quality criteria. Barely feasible In a second-year language course with 80 students, each student gives a 20-minute presentation followed by a 10-minute Q&A. The absence of a detailed rubric makes assessment inconsistent and more subjective, while scheduling so many presentations consumes several weeks of class time. Instructors are also expected to provide extensive individual feedback, which adds to the workload. Although the task promotes oral skills, the heavy time demands and lack of standardization make this form of assessment barely feasible. Examples of aligned and unaligned assessments
Strategies for upholding validity
Examples of (un)reliable assessments
Strategies for upholding reliability
Examples of (in)effective assessments
Strategies for upholding effectiveness
Examples of transparency before an assessment
Examples of transparency after an assessment
Strategies for upholding transparency
Examples of feasible and less feasible assessments
Strategies for upholding feasibility
References
- Biggs, J. (1996). Enhancing teaching through constructive alignment. Higher Education, 32(3), 347–364. https://doi.org/10.1007/BF00138871
- Van Berkel, H., Bax, L., Joosten-ten Brinke, D., Beekman, K. &, Schilt-Mol, T. van. (2023). Toetsing in het hoger onderwijs. Boom.
- Van der Vleuten, C. P. M., Schuwirth, L. W. T., Driessen, E. W., Dijkstra, J., Tigelaar, D., Baartman, L. K. J., & Van Tartwijk, J. (2012). A model for programmatic assessment fit for purpose. Medical Teacher, 34(3), 205–214. https://doi.org/10.3109/0142159X.2012.652239:contentReference[oaicite:1]{index=1}
- Veen, E. van de (2016). Hoe maak ik een toetsopdracht? / How to asses students through assignments. Leuven: Communicatiereeks