help_outline Skip to main content
NDEO logo 
HomeBlogsRead Post

Behind the Curtain

​NDEO's DELTA Exam: A Deeper Look
By Dale W Schmid
Posted: 2021-02-07T20:38:00Z

From time to time, NDEO features guest blog posts, written by our members about their experiences in the fields of dance and dance education. We continue this series with “NDEO's DELTA Exam: A Deeper Look.” This is the second of a two-part blog series about NDEO’s DELTA Exam and the related OPDI course, and focuses on the DELTA exam itself, for those interested in the details of test development and field testing.. 


This series is written by two NDEO members who are integral to our OPDI and DELTA programs: Dr. Elizabeth McPherson, who serves as professor for OPDI-M16: Introduction to DELTA – Dance Entry Level Teacher Assessment, and Dr. Dale Schmid, who is NDEO’s DELTA Senior Project Consultant.


If you are interested in learning more about the guest blogger program or submitting an article for consideration, please visit this link

NDEO's DELTA Exam: A Deeper Look

Dr. Elizabeth McPherson and Dr. Dale Schmid

(with borrowing of previous writings of Schmid)




DELTA Test Development Methodology / Technical Parameters


The quantitative analysis of DELTA field-tests items relied heavily on the tools of Item Response Theory, and more specifically on a subclass of the logistic model, the one-parameter logistic (Rasch) model and other related models from Classical Test Theory Results from the Rasch analyses of field-test items were used to inform revisions to items that did not differentiate well among the examinees. The items comprising the operationalized DELTA test forms were drawn from an item pool for which there is data on item difficulty, thus enabling assignment of items to separate forms with a fair assurance of relative difficulty. 

More specifically, the objective of field-testing and analysis of the accompanying performance data was to determine the extent to which DELTA is a fair and reliable indicator of pedagogic content knowledge with respect to: 

  1. Content validity; 


ii. Construct validity (including evidence of positive inter-test and inter-item correlations, and the independent identity of skills clusters);

  1. Statistical and correlational reliability with regard to item difficulty, item discrimination, and internal consistency;


iv. Stratification, including evidence of external/population validity and the presence or absence of measurement bias (e.g., gender bias, racial bias, geographic bias etc.); 


v. Concurrent and predictive validity (considered together as criterion-oriented validation procedures) 

It is important to note that DELTA is a criterion-referenced measure of course content. The operationalized form of DELTA is a computer-administered test form consisting of 130 unique selected response items, with and without stimulus materials. The examination takes approximately 90 minutes to administer using the testing platform SAKAI, and is proctored remotely.  

Scale and Item Analyses in Classical Test Theory Framework was employed to look at the overall test reliability of the (May 2014) exploratory DELTA field-test using Cronbach’s Alpha, Scale Mean, and Scale Variance to determine the comparability of Forms. The Corrected Item-Total Correlation was calculated by separating students falling into the top and bottom 27%, based on their total raw score. A Pearson Product-Moment Correlation Coefficient was used to compare the linear relationship (comparability) between the two DELTA test forms, which were parallel in construct, but not necessarily in difficulty or discernment of PCK. 

Rasch analysis was applied to each of the ten subscales of DELTA (i.e., PCK skills clusters). This was done in order to serve as a confirmatory factor analysis to support the construct validity of the scales. In addition to the Rasch analysis, the psychometric analysis of the performance of DELTA field tested (and retrial) items included the Cronbach’s Alpha test, Differential Item Functioning (DIF), Scale and Item Analyses in Classical Test Theory Framework, Corrected Item-Total Correlation, and the Pearson Product-Moment Correlation Coefficient. Finally, two arithmetic means were used to evaluate Item Difficulty: vis-à vis a statistical representation of the percentage of students who answered correctly; and the Scale mean if an item was deleted from the test battery, indicating how the item contributed to overall test difficulty; as well as by calculating the Scale Variance if an item was deleted. 

Factor Analysis of the entire scale could not be performed because there were too few students per item to provide reliable data. Nonetheless, each subscale [PCK Skills Cluster] was analyzed for field test Forms using Cronbach’s Alpha. In addition to Cronbach’s Alpha, the following statistical information was calculated as a means of presenting a reliability analysis for the complete set of items, organized by PCK Skills Cluster and sorted by difficulty within each Cluster. The information gleaned from this data was used to inform item remediation, and in the construction of operationalized forms of DELTA:  

1. Difficulty: represented by the percentage of students who answered the item correctly;

2. Scale mean if the item was deleted, (related to the difficulty of the item, this statistic indicates how the item contributes to overall test difficulty); 


3. Scale variance if the item was deleted, (items that contribute greater amounts of variance are preferable); 


4. Corrected item-total correlation, (a measure of discrimination, or how well the assessment distinguishes between higher- and lower-performing students (e.g., correlation of item score with total score minus the item)); 


5. Cronbach’s alpha if the item was deleted. If the reliability statistic would increase without the item (becoming greater than 0.60 in this case), the item is not contributing effectively to the consistency of the scale. 

Field Test Analysis Findings  


Based on the analysis of field test data, there is solid evidence to indicate that DELTA is a coherent, valid, and reliable instrument for measuring PCK in Dance, as a demonstration of subject-matter competency. This holds particularly true with respect to DELTAS’s Content  Validity (arrived at through consensus of a national expert group of K-16 dance educators from  thirteen states); it’s Construct validity (including evidence of positive inter-test and inter-item  correlations, and the independent identity of skills clusters); and it’s Statistical and Correlational  Reliability with regard to Item Difficulty, Item Discrimination, Internal Consistency; and the  Absence of Measurement Bias (e.g., gender bias, racial bias, geographic bias, etc. 

With respect to Factor Analysis, the target threshold for reliability is a minimum of 0.70. Using Scale and Item Analyses in a Classical Test Theory Framework, the full-scale statistics for the DELTA field test revealed that the overall reliability, as indicated by Cronbach’s Alpha, was .84, which is well within the acceptable range for tests at the development stage. (The minimum recommended threshold is 0.70 for exploratory field tests). Its reliability (Cronbach’s alpha=0.84) indicated that overall, the items work together consistently to measure student preparation for teaching dance. There were also relatively few items with negative item-total correlations. The exam thus appears to successfully distinguish between higher and lower-performing students. Items with negative item total correlations, indicating that a student who answers the item correctly is likely to have a lower total score, and vice versa were relatively rare.  These negative statistics were a sign that the items discriminate poorly between higher-and lower-performing students. All items falling within this realm were tagged for remediation, rewritten, and retested in subsequent field trials.  

With respect to the measurement of inherent test bias, when field-tested, a small number of items had Differential Item Functioning (DIF) in relation to race. For six of the seven such items, Nonwhite students were more likely to answer more correctly than White students. After the revision and retrial of these items, there were no signs of bias based on this analysis. Similarly, during field-testing, four items showed DIF within the category of postsecondary education level, with the likelihood of undergraduates at the same overall ability level answered these questions correctly more often than graduate students. Positive coefficients for one field test item indicated that graduate students were more likely to answer the question correctly more often than undergraduate students at the same ability level within their respective populations.  

Unlike the items with DIF for race, the items with DIF in relation to postsecondary education level were not revised. This decision was based on the conjecture that the three items with DIF in relation to postsecondary education, in which undergraduates were more likely to score correctly, was that this could be accounted for based on the nearness of instruction on that discreet topic and was not an unexpected result. For the instance in which graduate students were more likely to score that item correctly than undergraduate students, the conjecture was that this question required a level of nuanced understanding that would be atypical of expectations for undergraduates. As subsequent administrations of DELTA have taken place these items have been monitored to ascertain whether these conjectures are born out in larger sample sizes and over time. 

A limitation of DELTA as would be the case for any national dance teacher entry-level competency examination, is that there are relatively few accredited dance teacher preparation programs throughout the country. Hence, there is a small population of dance teacher preparation program completers in any given year. The small sample size creates an issue of reliability (consistency of results across the target population, over time) that will need to be addressed longitudinally. 

When the time comes for additional test forms of DELTA to be administered, items common to multiple test forms will be used as “anchor items”. This step is specifically intended to help measure test reliability across different populations of examinees. Moreover, embedded (non-scored) test items have been and are continuing to be included in all subsequent ‘live’ test forms as part of an effort to expand the DELTA Item Pool; and the performance results analyzed on an ongoing basis. These steps will help ensure that future (multiple) DELTA test forms are parallel in construct and content validity, are of equal rigor, and are free from inherent bias (Differential Item Functioning). All records of field-test item performance are maintained by NDEO within the DELTA item pool, including indicators of (Rasch) item difficulty, item discrimination, Cronbach’s Alpha and Cronbach’s Alpha if item is deleted (indicators of construct validity), and DIF (measurement bias).
  

Attitudinally Surveys


As an additional means of ensuring content and construct validity, survey instruments were employed to gauge the level of consensus among university pre-service dance education program coordinators regarding the importance of and relative degree of current alignment to ten PCK Skills Clusters embedded within three Domains of Knowledge comprising the DELTA Conceptual Framework. 

The testing parameters, including a topical outline of the ten knowledge and skills clusters comprising DELTA were shared with all the institutions participating in DELTA field-testing.  The program directors from the nineteen universities participating in field-testing were invited to complete two surveys. The first was an attitudinal survey comprised of 28 questions using a Likert scale. The conjecture was that the analysis of the DELTA field-test data would provide insight into the program coordinators' perceptions of alignment of their course syllabi to the tenants of DELTA. More specifically, the objective of the attitudinal survey was to ascertain:  

1. The extent to which the DELTA test content was covered in the college or universities’ syllabus; 

2. Whether the test items are of appropriate rigor; and  

3. Determine whether the items meet the litmus test of importance (i.e., critical knowledge and skills for entry level PCK versus nice-to-know knowledge and skills), as determined by program coordinators of dance teacher preparation programs across the country. 

Participation in the surveys was voluntary. Program directors from 11 of the 19 universities that participated in DELTA field-testing returned the first (anonymous) survey, in which the program directors were asked to rank order the ten PCK skills clusters into a hierarchy – or order of importance from simplest to most complex. They were instructed to base their ratings on the amount and complexity of the knowledge and skills examinees would have to bring to bear to respond correctly to the test items. 

The program directors were then asked to respond to a series of claims about DELTA using a Likert scale to comment on the developmental appropriateness of DELTA; their perception of the relative importance of each of the 10 PCK Skills Clusters in DELTA; and to address the extent of coverage of DELTA’s content within their respective institutions. In a separate portion of the survey, the program directors provided information about their own educational and professional dance backgrounds, years of experience, certifications held, number of students in their university programs etc. The function of this exercise was to “gather important validity evidence that can be obtained from an analysis of the relationship between a test’s content and the construct it is intended to measure” (AERA, APA, & NCME, 1999).  

Finally, the program coordinators from the 19 college and universities that participated in DELTA field-testing (as well as six content experts from the DELTA writing team) completed an Item Sorting survey, in which all 201 field-test items – stripped of the key and distractors were provided. Participants attributed the test items to one or more of the ten PCK Skills clusters they perceived the questions to have been derived from or were related to. The frequency with which the program coordinators and content experts were able correctly attribute test items to the categories for which they were written provided preliminary evidence of content validity. 

Survey Findings

  

In the absence of factor analysis, the item-sorting task described above was used as a mechanism to compile preliminary evidence of construct validity across the 10 PCK Skills Clusters. The data gleaned from the analysis of the item sorting task responses provided an indication of the degree  of independence of the PCK Skills Clusters. Seven of the respondents to the attitudinal survey also completed the item-sorting task. Because the surveys were anonymous, it was not possible to know, with any certainty, if the sampling of the respondents among the dance education program directors was broadly representative of the programs participating in the DELTA field testing. However, given the number of program directors that participated in the DELTA field tests, coupled with the fact that the directors constitute an expert group; they are being considered as a proxy for the entire cadre of college and university dance teacher preparation programs nationwide for the purposes of this study. 

The group of program directors that participated in the Attitudinal Survey who also completed the Item Sorting Task was somewhat smaller. Nonetheless, they too are being considered a proxy for the universe of college and university dance teacher preparation programs across the United States. The majority of respondents expressed the belief that the DELTA test items were targeted at a developmentally appropriate level; that the tested content represented important PCK and skills needed by novice teachers; and that the test items were of appropriate rigor (neither too hard or too easy). At least half of the respondents stipulated that all or most of the PCK tested in DELTA was covered in their respective college or university dance teacher preparation program.  There was less unanimity regarding the perception of how well students performed on DELTA.  45% of the respondents stipulated their students did about as well as they would have anticipated.  67% felt their students did not score as well as they expected, and 87% of the program directors felt their students exceeded performance expectations, based on the raw scores and antidotal evidence including student feedback.  

With respect to the content of the exam, the vast majority (91%) of the respondents either agree or strongly that applied PCK of dance movement, health and safety is critical to the success of a beginning teacher. Similarly, 82% of the respondents agreed or strongly agreed that applied PCK of dance movement practice is critical to the success of a beginning teacher. The importance of applied PCK of choreographic forms and processes was also deemed critically important to the success of a beginning teacher by 9 of 11 respondents (81%) and 100% of the program directors disagreed or strongly disagreed that applied PCK of choreographic forms and processes have little or nothing to do with the success of a beginning teacher. Like choreographic forms, 9 of 11 respondents (81%) agreed or strongly agreed that applied PCK of dance performance and production are critical to the success of a beginning teacher. 91% of the program directors either agreed or strongly agreed that PCK in dance language, literacy, and critical analysis; applied pedagogical theory and practice; Knowledge of the learner and how to accommodate the needs of individual learners; and assessment were all skills sets that are critical to success of the novice teacher. Support for the importance of PCK of dance history; and physical learning environments and leveraging instructional resources had slightly less support with 82% of the program directors either agreeing or strongly agreeing that PCK in these areas was critical to the success of a beginning teacher. 

The Operationalized Exam


As indicated, the current DELTA test form is comprised of the highest functioning items from previous field-tests and field-trials, based on item discrimination, item difficulty, and the contribution the item made in support of the tested construct (i.e., Cronbach’s Alpha and Cronbach’s Alpha if item is deleted), as well as DIF. By limiting the test to one form, the number of examinees increases over time, making robust analysis more plausible. When necessary, new parallel forms will be introduced.

In all future administrations of DELTA, new or previously field-tested items which are not part of the original operationalized form, will be embedded as non-scored field-test items. All records of field test item performance are maintained by SEADAE & NDEO within the DELTA item pool, including indicators of (Rasch) item difficulty, item discrimination, Cronbach’s Alpha and Cronbach’s Alpha if item is deleted, and DIF. Such data will be kept for all field-tested items on an ongoing basis as future iterations of DELTA are developed.

Cut Scores

  

Cut scores on tests of educational achievement are best determined by people who are aware of what students should know and be able to do, based on the instruction the students have received in the subject (Zieky & Livingston, 2006, Serici, 1995). For DELTA, preliminary cut scores were established using the Modified Angoff method, which relies on a panel of experts. The Angoff method involves a binary system whereby, working individually, each expert is asked to predict whether a nominally proficient examinee (bordering between mastery and non-mastery) could answer an item correctly using a scale of 0 to 1 (1 for a right and 0 for wrong answer). This prediction is made of all of the items comprising the test, and the predicted total number of correct responses is summed. These sums for all of the expert judges are then averaged to calculate the predicted number of items the barely proficient student (BPS) could answer; and this becomes the cut score. Results from recent empirical studies indicate no statistical difference between setting cut scores using the Angoff and the frequently used Bookmark/IRT method, although neither are completely free from controversy. As new test forms are assembled, new and existing items will be targeted around the mean item difficulty distribution to provide the means to distinguish among students that are on the cusp and determine who belongs above and below the cut. 

Conclusions


To conclude, as previously stipulated, DELTA was designed to be a criterion-referenced means of measuring teacher readiness as it pertains to the PCK of entry-level dance teachers. It will be some time, if ever, that DELTA becomes normative. It is important to note that NDEO and SEADAE’s are fully committed to the continuous improvement and development of DELTA as part of its ongoing efforts in service to the field of dance education. As such, NDEO and SEADAE continue to encourage state departments of education and university dance teacher preparation programs to embrace DELTA as a nationally recognized, valid measure of teacher readiness for K-12 public school dance educators. Moreover, in all future endeavors NDEO and SEADAE will remain constantly vigilant to ensure DELTA continues to discriminate well and is absent of testing bias. 

For more information about DELTA, or for a PDF version of the A validity study of the National  Dance Education Organization's Dance Entry Level Teachers' Assessment (DELTA), contact Dr.  Dale Schmid, DELTA Senior Project Consultant, dale.schmid@doe.state.nj.us.

About the OPDI Course


OPDI-M22: Using Dance Pedagogic Content Knowledge (PCK) to Drive Programmatic and Self Growth 

February 8 to April 4, 2021

Professors: Dr. Elizabeth McPherson and Dr. Dale Schmid; Tuition $350; 2 NDEO-endorsed CEUs

This course provides students with a useful conceptual framework to inspire thoughtful and informed curricular decisions about the allocation of instructional time and focus in K-16 dance education (elementary school to college) and to reflect on and renew one’s personal teaching practice. The conceptual framework explored is the 10 Pedagogic Content Knowledge (PCK) Skills Clusters that comprise the DELTA (Dance Entry Level Teacher Assessment) stemming from the National Core Arts Standards for Dance. These clusters include: 1) Performing Dance as an Intentional, Expressive Art Form (guiding principles), 2) Choreography (exploring, planning, revising), 3) Integrated Approaches to Historical, Cultural & Contemporary Dance Studies, 4) Dance Language, Literacy & Critical Analysis, 5) Pedagogical Theory & Practice, 6) Knowledge of the Learner, 7) Assessment Literacy, Evaluation & Reflective Practice, 8) School-based Policies, 9) Dance Classroom, and 10) Technical Production.  Anyone with an interest in dance education and dance teacher preparation would benefit from this course, from new teachers to seasoned dance education professionals from any teaching environment. It is designed to support and extend dance education content knowledge while expanding personal and professional expertise.


NDEO Members can register for the course here. Not yet an NDEO Member? Learn more about Membership and sign up here

Dale Schmid is President of the State Education Agency Directors for Arts Education and a Past-President of NDEO. He is also the Senior Advisor for NDEO’s DELTA project and oversaw the development, field-testing and psychometric analysis of the DELTA examination. Recently retired, Dr. Schmid served as the Visual & Performing Arts Coordinator for the New Jersey Department of Education since 1999. During his tenure, he oversaw the review and revision of every set of New Jersey Student Learning Standards in Visual and Performing Arts and was part of the team that created the 2015 National Core Arts Standards. Additionally, he was a contributing author to NDEO’s Standards for Learning and Teaching Dance in the Arts: Ages 5-18 (2005 & 2011); a co-author of Professional Teaching Standards for Dance in Arts Education (2005 & 2011), and NDEO’s Standards for a K-12 Model Program: Opportunities to Learn in Dance Arts Education (2005 & 2011). He also served on the executive steering committees of the national Arts Education Partnership; the States Collaborative on Assessment and Student Standards/Arts Education Consortium; the governance and advisory committees of Arts Ed NJ, and was one of NJN Public Broadcasting Authority’s Board of Commissioners. Additionally he is a member of the National Arts Education Policy Working Group (facilitated and managed by the Americans for the Arts & the League of American Orchestras, Washington, DC, 2005 to the present); the Arts Education Partnership Advisory Panel (operated under the aegis of the Education Commission on the States, Washington, DC, 2015 to the present); and has been a member of the College Board, Pre-AP Arts Development Team, New York, NY, since 2015. He is also a member of the Editorial Board: Journal of Movement Arts Literacy (JMAL).



Elizabeth McPherson, professor and director of the Dance Division at Montclair State University, is the editor of The Bennington School of the Dance: A History in Writings and Interviews, author of The Contributions of Martha Hill to American Dance and Dance Education, and co-author of Broadway, Balanchine, and Beyond: A Memoir.  Executive Editor for the journal Dance Education in Practice, she has written articles for various other publications including Ballet Review, the Journal of Dance Education, and the Journal of Movement Arts Literacy. She holds a BFA from Juilliard, an MA from The City College of NY and a PhD from New York University. 


Leave a Comment
 *
 *
Comments
Load More Comments
No more comments available