We recently published our first paper sharing validity evidence for the development of neurological emergency simulations for assessment. Are you familiar with forms of validity evidence? If you are an educator, you should be! A thread… rdcu.be/ceMm3
Steven Downing wrote a fantastic review on validity as it pertains to assessment in medical education. Let’s review the highlights! pubmed.ncbi.nlm.nih.gov/14506816/
As Downing states, validity is the sine qua non of assessment. It is approached as a hypothesis. No assessment is “valid” or “invalid” -> assessments have scores with more or less validity evidence to support interpretations.
Assessment data can be more or less valid for any specific purpose, at any specific time, for any specific population. For instance measuring IQ by asking what’s missing from a picture of a tennis match, would be less valid for examinees unfamiliar with the game.
Validity requires multiple sources of evidence. In our paper we used Messick’s framework of validity which includes 5: content, responses, internal structure, relationship to other variables, and consequences.
Content evidence: relationship between test questions and course objectives/scientific domains that are to be assessed. Do question items adhere to evidence-based principles? Are the item-writers content experts? Are there sufficient questions to adequately sample domain? Etc.
We had board-certified experts with subspecialty training develop cases and checklists. We based content choices on Neurocritical Care Society’s Emergency Neurological Life Support course, cross-references with other relevant guidelines.
Response process: data integrity such that sources of error associated with the test administration are controlled or eliminated as possible. Documentation of quality-control procedures, key validation, rationale for scoring methods.
We pre-briefed all participants, provided sim operator training, piloted the cases, and utilized a nurse confederate to clarify orders and prompt ddx. Rating was completed using checklists and global rating scales with attention to interrater reliability (see Internal structure)
Internal structure: the statistical or psychometric characteristics of the questions or performance prompts. Includes item analysis (computes difficulty of each item, discrimination of each question, etc.), reliability testing, and evaluation for bias.
Although we did not do in-depth measure of internal structure (coming soon!), we did show in subset of 50 cases 82% agreement between raters on 1073 critical action checklist items (kappa = 0.64). Global rating scale ratings were strongly correlated (Pearson correlation = 0.70)
Reliability is such an important source of validity evidence if deserves a deeper dive. We can’t draw large conclusions from assessments without being sure that our scores are reliable and reproducible. pubmed.ncbi.nlm.nih.gov/15327684/
Relationship to other variables: How does our assessment’s score correlate to an existing, accepted measure? Vascular neurologist scores on AIS and ICH sims should correlate, but scores on AIS and TBI sim may not.
Consequences: the impact on examinees from the assessment. High stakes exams (USMLE Step 1 for instance) have tremendous impact on futures. Passing rates and the appropriateness thereof (including process to determine cut offs) are examples of consequential validity evidence.
In our manuscript on the development of neurological emergency simulations for assessment we described content and response process evidence (with a little bit of internal structure via interrater reliability). We hope to publish evidence for other sources of validity soon!
Remember: tests have scores with more or less evidence to support interpretation that are unique to specific purpose, time, and population. Messick’s 5 sources of validity evidence: content, response process, internal structure, relationship to other variables, and consequences
IV glibenclamide shows promise for reducing cerebral edema and appears to be safe. PO glyburide leads to more hypoglycemia, especially if abnl renal fxn. Smaller, more frequent dosing may help. Kudos to @MikeA_42 for pushing this through to publication. sciencedirect.com/science/articl…