2016-02-13

Regarding that article about gender bias in GitHub Pull Requests

Regarding "Gender Bias In Open Source: Pull Request Acceptance Of Women Vs. Men", or even worse, regarding all the uncritical and breathless articles by the BBC, Vice, HuffPost, and so forth:

First of all, anyone who names their project "DeveloperLiberationFront" and uses an icon of a raised fist in woodcut style, has already predeclared their bias away from objective truth.

Second, the authors of the paper exhibit little knowledge of about the large differences in workflow between different projects, and no knowledge about all the different ways that PRs are used and all the different meanings of an "abandoned PR", and also their definition of "project insider" is broken, as for many projects, an "insider" has write access, and may never use PRs at all.

Third, despite GitHub's growing influence, just grabbing tens of thousands of GH PRs is not in the slightest bit representative.

Fourth, their process for computing the gender of PR authors is laughably bad, for reasons that went on for 3 paragraphs before I edited down this text.

Fifth, how many have heard of "p-hacking"? or even have ever actually computed a p value since you took that really annoying stats class in college?  Did you even notice that the this paper both obviously did p-hacking, and then didn't even report the p values?

Finally, allow me to present the following disruption to the breathless and self-reinforcing narrative:

"So, let’s review. A non-peer-reviewed paper shows that women get more requests accepted than men. In one subgroup, unblinding gender gives women a bigger advantage; in another subgroup, unblinding gender gives men a bigger advantage. When gender is unblinded, both men and women do worse; it’s unclear if there are statistically significant differences in this regard. Only one of the study’s subgroups showed lower acceptance for women than men, and the size of the difference was 63% vs. 64%, which may or may not be statistically significant. This may or may not be related to the fact, demonstrated in the study, that women propose bigger and less useful changes on average; no attempt was made to control for this. This tiny amount of discrimination against women seems to be mostly from other women, not from men."
// ScottAlexander

If this was a real paper, submitted for real peer review, a good peer review would be:

"1. Report gender-unblinding results for the entire population before you get into the insiders-vs.-outsiders dichotomy.
2. Give all numbers represented on graphs as actual numbers too.
3. Declare how many different subgroup groupings you tried, and do appropriate Bonferroni corrections.
4. Report the magnitude of the male drop vs. the female drop after gender-unblinding, test if they’re different, and report the test results.
5. Add the part about men being harder on men and vice versa, give numbers, and do significance tests.
6. Try to find an explanation for why both groups’ rates dropped with gender-unblinding. If you can’t, at least say so in the Discussion and propose some possibilities.
7. Fix the way you present “Women’s acceptance rates are 71.8% when they use gender neutral profiles, but drop to 62.5% when their gender is identifiable”, at the very least by adding the comparable numbers about the similar drop for men in the same sentence. Otherwise this will be the heading for every single news article about the study and nobody will acknowledge that the drop for men exists at all. This will happen anyway no matter what you do, but at least it won’t be your fault.
8. If possible, control for your finding that women’s changes are larger and less-needed and see how that affects results. If this sounds complicated, I bet you could find people here who are willing to help you.
9. Please release an anonymized version of the data."
// ScottAlexander

I am willing to bet money that doing real honest academic statistical analysis of their raw data will invalidate their implications and their claims.