21 oktober 2025

Knowledge item

Quality of assessment

High-quality assessment is essential for fair and effective education. In summative assessment, it ensures fair decisions about selecting students and course and program completion. Poor quality assessments can lead to incorrect and potentially harmful conclusions. In formative assessment, assessment quality is important for supporting student learning. Providing accurate, clear, and relevant assessment information allows students to track their progress and adapt their learning strategies, and helps lecturers tailor their teaching accordingly. Examples include diagnostic quizzes, practice tasks, or feedback moments.

Based on van der Veen (2016) and van Berkel et al. (2023), we distinguish five key quality criteria for assessments:

  • Validity
  • Reliability
  • Transparency
  • Effectiveness
  • Feasibility

 

A key aspect of validity is the extent to which the assessment measures what it is intended to measure. Constructive alignment (Biggs, 1996) plays an important role: the assessment should align with the learning objectives of a course as well as with the teaching activities. It also refers to the accuracy, meaning, and usefulness of conclusions drawn from test scores. In summative assessments, scores guide decisions—so low validity can lead to conclusions based on distorted or irrelevant information. It is therefore essential that the assessment reflects what it is intended to measure.

Examples of aligned and unaligned assessments

Aligned assessment

If one of your learning objectives states that students must be able to formulate a research question, it would be logical to assess this through an assignment in which students are required to formulate their own research question, rather than through a multiple-choice question asking about the characteristics of good research questions.

Unaligned assessment

If one of your learning objectives describes that your students can briefly explain the water cycle, then a paper would not be the best assessment method to assess instead of an exam with open-ended questions.

Strategies for upholding validity

  • The learning objectives of your course are a translation of one or more final qualifications of the study programme. Be aware of which final qualifications are being assessed in your course. Design an assessment matrix at the course level to match each assessment method with the intended learning objectives and the corresponding level of Bloom’s taxonomy. This helps to ensure all learning objectives are covered and that appropriate assessment types are used.
  • Use a specification table to map topics and question levels in an exam with closed-ended and open-ended questions. For assignments, a similar blueprint—an assignment specification table—can be used.
  • Ask colleagues to review your assessment matrix (at course level) or specification table (at assessment level) and provide feedback on strengths and areas for improvement.

 

Reliability refers to the extent to which one can trust the assessment score as a measurement. Theoretically, if a student were to take the same assessment a second time (without test effects), they should achieve the same score. Reliability is a prerequisite for a valid instrument. Even if your assessment measures what it is supposed to measure, if it is not reliable, the information from the assessment is not usable. Objectivity also relates to the reliability of the assessment: the outcome should not depend on who grades the assessment, but on the learning objectives, the assessment itself, and the evaluation criteria.

Examples of (un)reliable assessments

Reliable assessment

For a comprehensive course, the student is assessed through three assessment methods: a portfolio, a closed-question exam, and a presentation. Rubrics are used to assess both the portfolio and the presentation. The portfolio is reviewed by a second examiner. Peer feedback is used during the presentation. The use of multiple forms of assessment and multiple assessors enhances the reliability of the evaluation.

Unreliable assessment

A student gives a presentation on research about the implications of Socrates for Greek society and receives a score of 5. In another course, the same student presents research on Hegel and receives a 9. This discrepancy is striking—one would expect the scores for comparable assessments to be closer.

Strategies for upholding reliability

 

  • Assess learning objectives using varied assessment types across the programme and at different levels of Bloom’s taxonomy to base final judgments on multiple data points and improve reliability.
  • Ask colleagues to review your exam questions (four-eyes-principle prior to administering test). This helps catch unclear wording, errors, and misalignment with learning objectives or level of mastery.
  • Organize assessor training and calibration sessions before grading. Align expectations, discuss sample work, and ensure consistent application of criteria across all assessors.
  • Grade anonymously and per exam question where possible to minimise bias, increase consistency, and focus on one criterion at a time.
  • Use rubrics to reduce subjective interpretation during grading. Rubrics promote consistency across assessors and over time, especially for open-ended tasks.
  • Involve a second assessor for larger or high-stakes assessments. If assessors disagree, appoint a third assessor to review and help reach a fair and consistent outcome.
  • Conduct test and item analyses (e.g., Cronbach’s alpha) for exams with multiple choice and/or short open questions to evaluate test and item reliability. Tools like Remindo can automate this process for efficiency and accuracy.

 

 

Students tend to tailor their learning strategies to the type and perceived importance of assessments, which makes assessment a powerful driver of learning (Van der Vleuten, 2012). When assessments focus mainly on factual recall, students are more likely to adopt rote learning strategies. In contrast, assessments that require critical thinking or application encourage deeper learning. If you aim for students to engage in deep learning, your assessment methods must align accordingly. Constructive alignment (Biggs, 1996) supports this by ensuring that learning objectives—both in content and level of mastery—are in line with teaching activities and assessment.

Examples of (in)effective assessments

Effective assessment

During the first meeting of a new course, students are given information about the assessment of the course. They hear that they have to write a paper and they have an exam with open-ended questions on cases. Students use learning strategies that result in deep learning.

Ineffective assessment

During the first meeting of another course, students are given information about the assessment of the course. At the end of the course the students are assessed by a written test with questions at Bloom’s levels of ‘remembering’ and ‘understanding’. In this case, students use learning strategies that cause surface learning.

Strategies for upholding effectiveness

 

  • Include assessment tasks that target Bloom’s higher levels—such as applying, analysing, evaluating, and creating—to encourage students to go beyond memorization.
  • Communicate the assessment methods and expectations at the start of the course. When students understand the method and purpose of assessments, they are more likely to engage meaningfully.
  • Design assessments that resemble real-life or professional situations like authentic assessment. These tasks promote engagement, deeper understanding, and meaningful application of knowledge.
  • Understanding how students approach their studies helps you assess whether your assessment design fosters deep learning or unintentionally leads to surface learning, allowing you to make informed adjustments.

 

 

Students should be informed in advance about the format of the assessment and, in the case of assignments, the criteria on which they will be evaluated. This allows them to prepare as effectively as possible to demonstrate how well they have achieved the learning objectives. After the assessment, it is important that students understand the criteria on which they were assessed and how the final grade and overall judgment were determined. his is crucial for supporting learning and maintaining fairness.

Examples of transparency before an assessment

 High transparency

In a course, students are assessed by a test with open-ended questions and a paper. During the first teaching session, students were given information about the open-ended test and how to prepare. They are also informed about the requirements of the paper using rubrics. They can also find this information in their syllabus.

Low transparency

Imagine students are given an open-ended test on German grammar without any information about the test. It is important that they know how many questions will be asked and which topics the test will cover (e.g., verb conjugation, correct use of cases).

Examples of transparency after an assessment

High transparency

After submitting an argumentative essay in a sociology course, students receive their grades. The examiner provides a detailed rubric that was shared with students beforehand. The returned feedback includes: (1) the score for each criterion (e.g. argument clarity, use of evidence, structure, referencing), (2) Specific comments tied to those criteria and (3) a paragraph explaining the rationale behind the final grade. Students can clearly see how their work was evaluated and how the final grade was constructed. This transparency promotes trust, helps students understand how to improve, and aligns with principles of formative feedback and assessment literacy.

Low transparency

After submitting an argumentative essay in a sociology course students receive a single overall grade (e.g. 6.5/10) with a vague comment like “Good effort, but needs improvement.” There’s no reference to the specific criteria used in assessment, nor is it clear how the final grade was determined. Students are left guessing about what was done well or poorly. This lack of clarity also prevents meaningful learning from the assessment.

Strategies for upholding transparency

  • Clearly explain the assessment in the study guide, including assessment method, duration, deadlines, test date, resit options, and criteria. Indicate whether spelling or language use will be assessed and how.
  • Avoid vague descriptions like “multiple choice” or “open questions.” Specify the type, focus, and level of the questions to support effective preparation.
  • Indicate the weight of each question during the test by showing point values and recommended time per task.
  • Provide an answer key or completed assessment form to show how grades are calculated.
  • Discuss the assessment criteria in class and use examples to clarify expectations.

 

 

Finally, alongside all quality criteria, consider how feasible the assessment is in terms of time, budget, and available resources. Practical constraints may require you to make informed choices and balance ideal design with what is realistically achievable.

Examples of feasible and less feasible assessments

 Highly feasible

In a second-year language course with 60 students, each student writes a 1,000-word case analysis. The assessment includes a strict word limit, clear structure, detailed rubric, and online submission with plagiarism checking. This setup balances educational value with feasibility: the word limit keeps grading manageable, the structure and rubric streamline evaluation, and digital tools reduce administrative effort—enabling efficient grading, whilst upholding other quality criteria.

 Barely feasible

In a second-year language course with 80 students, each student gives a 20-minute presentation followed by a 10-minute Q&A. The absence of a detailed rubric makes assessment inconsistent and more subjective, while scheduling so many presentations consumes several weeks of class time. Instructors are also expected to provide extensive individual feedback, which adds to the workload. Although the task promotes oral skills, the heavy time demands and lack of standardization make this form of assessment barely feasible.

Strategies for upholding feasibility

  • Find out how many hours are allocated to teaching the course, including time for assessment. Take into account how many hours teachers can realistically spend on grading, and design the assessment strategy to fit within those limits.
  • Set clear word or time limits to manage grading workload. Shorter exams or assignments help keep grading time realistic, especially in larger courses, and encourage students to be concise and focused.
  • Develop and apply detailed rubrics for consistent grading. Rubrics clarify expectations, reduce subjective interpretation, and speed up the grading process.
  • Use digital tools for submission, grading, and plagiarism checks. Online platforms can automate administrative tasks and help maintain academic integrity with less manual effort.

 

 

 

References

  • Biggs, J. (1996). Enhancing teaching through constructive alignment. Higher Education, 32(3), 347–364. https://doi.org/10.1007/BF00138871
  • Van Berkel, H., Bax, L., Joosten-ten Brinke, D., Beekman, K. &, Schilt-Mol, T. van. (2023). Toetsing in het hoger onderwijs. Boom.
  • Van der Vleuten, C. P. M., Schuwirth, L. W. T., Driessen, E. W., Dijkstra, J., Tigelaar, D., Baartman, L. K. J., & Van Tartwijk, J. (2012). A model for programmatic assessment fit for purpose. Medical Teacher, 34(3), 205–214. https://doi.org/10.3109/0142159X.2012.652239​:contentReference[oaicite:1]{index=1}
  • Veen, E. van de (2016). Hoe maak ik een toetsopdracht? / How to asses students through assignments. Leuven: Communicatiereeks​

 

You are free to share and adapt, if you give appropriate credit and use it non-commercially. More on Creative Commons

 

Are you looking for funding to innovate your education? Check our funding calender!