Main Article Content
This research aimed to (1) study the relative magnitude of variance components of the evaluating alignment scores between science items and indicators at the lower secondary level; (2) compare the generalizability coefficients of the evaluation scores of the science items-indicators alignment when the numbers of raters and evaluation designs are different; and (3) analyze the number of raters that contribute to reliability of the analysis of the science items-indicators alignment. Research subjects were 1,089 science classroom test items used at the lower secondary level in secondary schools under the Office of the Basic Education Commission in Bangkok Metropolis, and 20 expert panelists who evaluated the alignment. The employed research instrument was an alignment assessment rating scale. The inter-rater reliability examination using the Fleiss’ kappa statistic was 0.510, and 95% confidence interval for the intra-class correlation was 0.954. The G-coefficients were analyzed with the use of generalizability theory and compared. The findings revealed that (1) the maximum value of the variance was in alignment between items and indices scores, and the minimum value of the variance was in raters; (2) the generalizability coefficient increased when the number of raters increased in both cognitive demand evaluation and alignment between items and indices level evaluation; and (3) the cognitive demand evaluation needed greater number of raters than did the alignment between items and indicators level evaluation.