Validating performance assessments measures that may help to evaluate students expertise in doing science-LingLab

永久开放

Validating performance assessments measures that may help to evaluate students expertise in doing science

1018 阅读 4 下载 2021-07-23 10:44:05 上传 2.95 MB

Background: Several diﬀerent measures have been proposed to solve persistent validity problems, such as high task-sampling variability, in the assessment of students’ expertise in ‘doing science’. Such measures include working with a-priori progression models, using standardised item shells and rating manuals, augmenting the number of tasks per student and comparing diﬀerent measurement methods.

Purpose: The impact of these measures on instrument validity is examined here under three diﬀerent aspects: structural validity, generalisability and external validity.

Sample: Performance assessments were administered to 418 students (187 girls, ages 12–16) in grades 7, 8 and 9 in the 2 lowest school performance tracks in (lower) secondary school in the Swiss canton of Zurich.

Design and methods: Students worked with printed test sheets on which they were asked to report the outcomes of their investigations. In addition to the written protocols, direct observations and interviews were used as measurement methods. Evidence of the instruments’ validity was reported by using diﬀerent reliability and generalisability coeﬃcients and by comparing our results to those found in literature. Results: An a-priori progression model was successfully used to improve the instrument’s structural validity. The use of a standardised item shell and rating manual ensured reliable rating of the written

protocols (.79 ≤ p 0 ≤ .98; .56 ≤ κ ≤ .97). Augmenting the number of tasks

per student did not solve the challenge of reducing task-sampling variability. The observed performance diﬀered from the performance assessed via the written protocols.

Conclusions: Students’ performance in doing science can be reliably assessed with instruments that show good generalisability coeﬃcients ( ρ 2 = 0.72 in this case). Even after implementing the diﬀerent measures, task-sampling variability remains high ð ^ ¼ 47:2% Þ . More elaσpt 2 borate studies that focus on the substantive aspect of validity must be conducted to understand why students’ expertise as shown in written protocols diﬀers so markedly from their observed performance.

所需积分：2 分

永久开放