The problem with high-stakes testing and women in STEM

In New York City, there’s a big debate over who should gain admittance to eight elite public high schools, including the well-known Stuyvesant High School and the Bronx High School of Science. Currently, Asian-American students score high enough on an entry exam to win a considerable majority of the seats. Mayor Bill de Blasio and a new school chancellor want to bring in more black and Latino students, who make up most of the city’s school population.

This tension between demographics and academic excellence is prompting scholars to take a closer look at the data on scores and grades and how well the entry exam predicts achievement. But one researcher thinks the most consistent bias might be against gender.

“They’re trying to identify the very top students but every way you slice the data, it shows bias against girls,” said Jonathan Taylor, a research analyst at the Gender Equity Project at Hunter College of the City University of New York. “I’m not saying that the exams are racist or sexist. But we’re talking about statistical bias. The argument I’m making is that the test, as a sole criteria — it’s insufficient.”

Taylor’s data analysis has implications beyond high school admissions in New York City. It adds to a growing body of evidence that high-stakes testing based on multiple-choice questions might be one reason for the small pool of women reaching the upper echelons of math and science professions.

More on that after I explain Taylor’s study, “Fairness to Gifted Girls: Admissions to NYC’s Elite Public High Schools,” which has completed the peer-review process and is currently set for publication at the Journal of Women and Minorities in Science and Engineering. It is expected to be published in a forthcoming 2019 issue and I was given an advance copy.

Taylor didn’t deconstruct the text or math content of the questions on the Specialized High School Admissions Test (SHSAT), which is the sole criterion for getting into one of the eight selective high schools. The “bias” here is that among students who posted identical scores on the test, girls tended to have higher grades than boys, meaning that the test is consistently underestimating the achievement of girls.

Among all test takers with the same score – regardless of whether they were admitted to a specialized high school – the subsequent ninth-grade grades of girls tended to be 4.2 points higher on a 100-point scale than those of their male peers. For example, among kids who the test indicated were of the same academic aptitude, girls might have an 89 grade-point average (GPA) versus an 85 GPA for boys. At Stuyvesant, the most competitive school to get into, the boys’ GPAs were 3.55 points lower than the girls’ GPAs among students who had identical test scores. On the flip side, among girls and boys who earned the exact same grades at Stuyvesant, girls’ entry exam scores were 6.6 points lower.

It was the same pattern at the other seven selective high schools: boys arrived with higher scores; girls earned higher grades.

Taylor said the test’s bias against girls applied to Asian-American girls, too, explaining that lower achieving boys were taking slots that Asian-American girls might otherwise have earned.

Taylor began with data for all 28,000 eighth-graders who took the SHSAT in 2013. (Roughly one-third of the city’s 81,000 eighth-graders chose to take this optional exam.) Then, he compared scores for each student with their eventual ninth-grade grades. Some of these students ended up at one of the elite high schools but most didn’t.

Overall, the correlation was a loose one. Test scores predicted only 20 percent of the variation in students’ GPAs. In other words, students with the same test high scores had wildly different GPAs at school the following year. At first glance, the test doesn’t seem very good at discerning A students from B students. Seventh-grade GPAs were twice as likely to predict ninth-grade achievement as test scores.

“People say the SHSAT is objective and that grades are unreliable,” Taylor said. “Schools and teachers have different subjective grading standards and grades are all over the place. The exams were designed to be a uniform metric. It’s ironic that the exams don’t predict as well as grades.”

One might wonder if girls could be taking easier classes or not as many math and science classes once they get to high school, and perhaps that is why girls are getting higher grades. But Taylor checked and he found that girls were, in fact, well-represented in math and science classes in ninth grade and doing very well in them.

Indeed, the most startling finding is that girls were over-represented among the very top A-plus students in science and math classes at the specialized high schools, but were underrepresented among the very highest scorers on the exam. Specifically, girls accounted for only 40 percent of the top 3 percent of exam scores but they account for half of all the 95s and above in ninth-grade math and science classes, such as geometry, algebra, biology and physics. That’s all the more surprising because girls account for only about 40 percent of all the students in these elite schools, yet they’re earning half of the highest grades.

Taylor’s findings here shed some light on a bigger debate about women in math and science. Back in 2005, Lawrence Summers, then president of Harvard, controversially suggested that the reason there were fewer women in science, technology, engineering and mathematics fields, often abbreviated as STEM, was because there is a smaller pool of highly talented women at the very, very top and that was perhaps related to women’s “intrinsic aptitude.” By contrast, this study of exceptional students in New York City didn’t find evidence of a smaller pool of super-bright girls in STEM. Even in STEM classes at Stuyvesant, Brooklyn Tech and Bronx Science, girls were more likely than boys to post the highest A-plus grades.

This research echoes a large body of research on the SAT, the college admissions test, which has similarly found that boys outscore girls on the SAT but girls earn higher grades in college. “It’s precisely the same problem,” said Taylor. For example, a 1992 study in the Harvard Educational Review found that among 47,000 college students in 51 colleges, women who earned the same grade as men in the same math course had lower SAT math scores.

College professors are also noticing versions of this testing problem in their large lecture classes. University of Michigan physicist Timothy McKay found that even the brightest straight-A women underperformed male peers in his physics class that was based on two exam grades, but women outperformed men in lab sections where there was no testing pressure.

Why women are doing worse on high-stakes tests is a matter of conjecture. McKay speculated that something called “stereotype threat” is at play: even the brightest women may not perform at their best when they feel that they are in an stressful environment where women don’t traditionally succeed.

Taylor subscribes to a “risk aversion” theory that women prefer not to guess when they aren’t confident of an answer. Boys’ greater willingness to guess might be just enough to juice their test scores. Taylor is hoping to get access to SHSAT answer sheets to see if he can detect gender differences in unanswered questions.

This story about women in STEM was written by Jill Barshay and produced by The Hechinger Report, a nonprofit, independent news organization focused on inequality and innovation in education. Sign up for the Hechinger newsletter.

Letters to the Editor

3 Letters

At The Hechinger Report, we publish thoughtful letters from readers that contribute to the ongoing discussion about the education topics we cover. Please read our guidelines for more information. We will not consider letters that do not contain a full name and valid email address. You may submit news tips or ideas here without a full name, but not letters.

By submitting your name, you grant us permission to publish it with your letter. We will never publish your email address. You must fill out all fields to submit a letter.

Denise T Airola says:

January 8, 2019 at 2:23 pm

This is a very important topic that isn’t discussed in the general population and yet there is a research thread among measurement specialists in education that could inform the questions raised in Taylor’s study. One good source of information would be the works of Keena Arbuthnot. Her recent book, Global Perspectives on Education Testing: Examining Fairness, High Stakes and Policy Reform, builds on her exploration of performance gaps on standardized tests among race groups and gender that she explored in her earlier publications on stereotype threat and strategy choices among different race groups and genders.

I highly recommend a follow up article that might provide a summary of Arbuthnot’s findings since this is such a pervasive issue, not just for magnet schools in NYC, but across classrooms every day. And, in learning about how different races/genders use different strategies in standardized testing scenarios, we might help all students develop a variety of strategies to approach testing when necessary. Further, if we embrace multiple measures of student competency in areas of STEM and we create collaborative learning environments where different strategies are shared among students we will see a benefit for all students.
Thank you for covering this important topic and I hope you will follow up with a look into what the measurement community is finding that may explain these differences and that those explanations are not a question of aptitude.
Tom Campbell says:

January 9, 2019 at 7:47 am

Isn’t it possible that girls are being favored in the classroom, the classroom being a more subjective environment where teachers are under pressure to show politically favorable results? The pressures and anxieties girls might feel taking a standardized test are the same ones they would experience in a classroom. It would be interesting to drill down to find the exact questions on a test that were biased against girls.
D. Busch says:

January 11, 2019 at 12:28 pm

Dear Ms. Barshay,
I found your Women in STEM article quite interesting. I have a graduate degree in engineering, and I now have a daughter who is in Advanced-Placement STEM classes. While I don’t necessarily disagree with the conclusion of your article, I can’t help but make an observation. Have the authors of the study controlled for the ‘try-hard’ factor? My personal experience is that many boys simply place a lower priority on getting the highest possible classroom grades. At our rigorous public high school, the boys have a much different attitude toward school than the girls have. While the girls have study sessions and seek help from teachers, the boys are much more likely to pursue their own interests first, and they tend to put less effort into their assignments. It does not surprise me at all that comparably able girls have better grades than boys.
This would also be supported in the fact that fewer grading opportunities leads to less difference between the genders. The boys would obviously be more concerned with their grades when they know that each test or assignment carries more weight.
My point is not to undermine the research, but to hopefully to highlight a parameter to consider. As it is currently described, it could equally be concluded that the grade-focused classroom structure is biased toward girls, rather than that the standardized testing is biased against them.

The problem with high-stakes testing and women in STEM

Related articles

Jill BarshaySENIOR REPORTER

Letters to the Editor

Submit a letter