Friday, October 14, 2016

Is the European Southern Observatory Sexist?

The short answer is no. The somewhat more sophisticated statistical answer is probably not, and the slightly more certain formulation would be probably not very much.

This hasn't prevented both Science and HuffPo from reporting that the ESO is almost certainly sexist, enlisting none other than the past president of the American Astronomical Society, Meg Urry, in the effort to spin the story in that direction. "Female Astronomers Just Can’t Seem To Catch A Break," laments the Huffington Post over David Freeman's story, citing a "jarring new study" that "shows it’s tough for women to get telescope time." Science follows the same script: "ESO finds gender bias in awarding telescope time," runs the headline over Maggie Kuo's story, which begins: "Astronomers wanting time on the European Southern Observatory’s (ESO's) telescopes are less likely to get it if they’re women."

Both stories cite a recent study conducted by Ferdinando Patat of the ESO itself. Before I get into the details, I want to say that, after looking at the study myself, and discussing it with Patat, I find the media spin, and Urry's participation in it, much more jarring than the results it presents. Think about it. Suppose you are running a world class observatory and you decide to undertake, of your own accord, a study of the processes by which you allocate telescope time, specifically trying to detect a gender bias if there is one. Suppose your results are mainly inconclusive* about the question of gender bias but suggest pretty strongly that seniority, and experience in application writing, are key factors in deciding who gets time. Now suppose the media picks up the story and tells everyone your review process is biased by sexist views about women and, to put a button on it, relate the whole thing to growing concerns about harassment. I think maybe you'd think twice about conducting such a study or publishing the results next time!

I lost most of my respect for science writers during the Tim Hunt affair, so I'm not going to spend too much time criticizing the news coverage. What I want to do here is present this very interesting study, trying to reconstruct the care that went into the designing it, and by that means arrive at a more accurate sense of what it tells us about the ESO's time allocation review process. Freeman and Kuo, it seems to me, simply haven't appreciated the beauty of this study—its mastery of its own limitations (a definition of beauty in science?).

But, to start, I want emphasize Kuo's way of putting the conclusion: "Astronomers ... are less likely to get [telescope time] if they’re women." There's a clear causal implication there, but the study does not support it. The truth is that astronomers who get telescope time are less likely to be women, for reasons that the study does a pretty good job of identifying, if not quite nailing down. Like I say, it's probably not sexism.

Here's an obvious reason that fewer women than men might get telescope time: there are fewer women than men in astronomy. The breakdown is roughly 30/70. So, if we found that 30% of the successful applications for telescope time were written by women, there wouldn't immediately be a problem. But that's not, of course, what a study like this should look at. It should ask whether a female applicant has the same chance of getting telescope time as her male counterpart. And that's what Patat tried to do.

"The study found that female astronomers are less successful than their male counterparts at lining up critically important observing time on major telescopes," writes Freeman at HuffPo. But as I want to show in this post, this is a serious misstatement.

"Counterpart" is a crucial word here. After all, a female PhD student cannot expect to have the same chance as male professor, for the same reason that a male PhD student can't expect an equal chance against a female professor. What this means is that even a 30% female population cannot expect 30% of the telescope time if the women are distributed mainly on the lower rungs of the academic ladder, as they would be in a field with a legacy of male dominance. Moreover, since it takes about 30 years to get to the top of the career ladder we can expect progress on this front to be slow.

This is something Patat took into consideration, but Kuo misunderstands his procedure. "When he accounted for the career level of the proposer," she says, "the gap in success rate shrank, but not completely: The success rate for men was 22.1%, comparable with the raw data, whereas women's success rate inched up to 19.3%." As I pointed out in a footnote to my last post, "22.1%" is the result of a typo in Patat's paper, which will be corrected in a later version. The real number is 21.0%. But Kuo is any case wrong about how the paper "accounted for" career level. The 19.3 (F) vs. 21.0 (M) success rates indicate a kind of "null hypothesis": they are what we would expect to find if there is no sexism effect but seniority plays a role. That is, they aren't an adjustment of the observed values, they are just something more realistic to compare the observations to.

It's not so much that the gap in success rates "shrinks" in view of this, but that a gap is shown to be expected. The problem, however, as Patat points out in the paper, and in his emails to me, is that 19.3 / 21.0 almost certainly underestimates the gap that we should expect because the "seniority" measure in his data is very crude. In the data, it's only possible to distinguish between students, post-docs and "professional astronomers". We already know that the last of these "bins" is where the bulk of the successes are found, and we know that seniority is widely distributed there (from one's first position to one's last year before retirement). Seniority skews male between bins, so we can assume that it also skews male within the astronomer bin.

Indeed, the main conclusion of the study, as stated in the abstract, was not that there is a gender bias in ESO time allocation, but that "the disparity is related to different input distributions in terms of career level. The seniority of male PIs is significantly higher than that of female PIs, with only 34% of the female PIs being professionally employed astronomers (compared to 53% for male PIs)."

Let's look at the results more closely with this in mind.

The overall success rate for proposals is 20.7%, with professional astronomers, predictably, outperforming (23.4%) post-docs (18.3%) and post-docs outperforming students (13.2%). Overall, men (22.2) “outperform” women (16.0). That’s a 39% difference, but the scare quotes are necessary because of the non-homogeneity of the composition of the seniority bins, which, as our null hypothesis suggests, predicts a gap. Our crude estimate (19.3 / 21.0), which accounts for the gender composition of the bins, accounts for the first 9% of the 39%.

Now, the data shows that the student bin doesn’t explain the difference in male and female success rates. Men and women, it turns out, perform equally well in that bin. They are also equally “far along” from their PhD (which they are presumably engaged in trying to earn.) That is: there is no serious seniority difference in that bin and no serious gender difference in success rates. This is what suggests the hypothesis that the difference in outcomes between men and women in the higher seniority bins might be accounted for by differences in seniority, not gender differences, within the bins.

It's a hypothesis, not yet a conclusion, because it still needs to be tested. It could be measured by looking, not at three crude seniority levels, but the time elapsed since earning a PhD.

Patat tells me that the next time they do such a study, they will gather data about when the applicants completed their PhD to test this hypothesis. Note, however, that, as in pay-gap studies, we must here control for time off for maternity and paternity leave, which is notoriously unequal. The actual seniority of two astronomers who earned their PhDs in 1980 is likely to systematically differ by gender.

Keep in mind, however, that this additional data will first and foremost help us to construct a better null. The interesting variable will still be the male and female success rates themselves. Once a more granular approach is taken to seniority within the bins, and on the assumption that seniority within the astronomer bin skews male (that is: the balance gets worse the higher up you go), the hypothesized 19.3 / 21.0 spread is likely to widen. And then the 16 / 22.2 spread in success becomes less "jarring," to use HuffPo's word. [That's because we really will have matched the success rates of women with their male counterparts.]

Presumably, the 18.3 (F) vs. 24.4 (M) success rates in the astronomer bin are driven mostly by seniority, not sexism. Indeed, it's not just seniority but very a specific kind of experience that is at work here. As Patat notes, people who apply again and again are more successful than people who apply only once. This, he explains, is because it's not just about whether your proposed study is well thought out. It's also about whether you pitch it in the right way to the time allocation committee. It’s not just experience in the field of astronomy that matters but experience in the particular practice of applying for telescope time—indeed, specifically at the ESO. And, since the successful repeat applicants group is dependent on time spent in the field as well (just as career level), it will also skew male in a discipline, like astronomy, that has traditionally been (and remains) male dominated.

This post is already too long, but there something I have to say before wrapping it up. Patat did find "a small, but statistically significant, gender-dependent behaviour" among referees. Referees tended to withhold top grades from women slightly more than from men. "Both genders show the same systematics, but they are larger for males than females." It's impossible to know, at this point, how much this reflects differences in quality between the applications, differences in the standards used in refereeing, or actual conscious or unconscious biases in the minds of (both male and female) astronomers about female astronomers. The difference is certainly very small, and until we get a more accurate null for the difference in success rates, I don't think we should suspect systematic bias as the most likely culprit.

All in all, in any case, I’m pretty sure the ESO review process is not sexist, regardless of how Science and HuffPo (and the AAS and CSWA** for that matter) want to spin it. The data makes the most likely hypothesis at this point that it's all about seniority and experience, not gender. Sure, you can now make a case that this makes it all the more important to ensure that women have the same opportunities for advancement as men in astronomy. But that is actually Patat's point. To say that "an internal ESO study has found" a "gender bias in awarding telescope time", as Science reports, or, as HuffPo says, that women are less successful than "their male counterparts", in my opinion simply misstates Patat's conclusions.

________
*Ooops: this read "inclusive" when I first posted. I have fixed a number of other small typos too.

**Update (15/10/21016): I was disappointed to see the Women in Astronomy Newsletter link uncritically to the Science story, posting the first paragraph of Kuo's story, which says "Astronomers ... are less likely to get [time] if they’re women." Like I say, that seems to be a baseless attribution of causation. Also, like I say, Kuo misunderstands how seniority informs the null in the study. It would have been better to post the abstract of Patat's paper and a link to that.

4 comments:

Anonymous said...

> Referees tended to withhold top grades from women slightly more than from me.
I am sure that referees are unfair to you but ;-)

It is almost impossible to learn about gender effects from _administrative_ data and the largest problem might be that that is not well or widely appreciated.

This might be of interest http://www.mitpressjournals.org/doi/abs/10.1162/rest_a_00110?journalCode=rest#.WADyQfkwhpg as well as this http://a2jlab.org/we-are-the-a2j-lab/ (author of the first link).

Keith O'Rourke

Thomas said...

Thanks for catching that, Keith! (Strange, I'm sure I already corrected it once. But I think I must merely have noticed it and then forgotten to correct it.)

Will have a look at the paper.

Anonymous said...

A small correction: it's American Astronomical Society, not Association.

Thomas said...

Thanks. I seem to make that mistake every other time!