Guidelines

Here we would like to briefly highlight some crucial aspects that need to be considered before using automated tools for gender inference. By discussing what these tools can or cannot do, we hope to help you judge whether these tools are likely to be suitable for your research.

First and foremost, automated gender inference methods are primarily designed for quantitative and aggregated empirical analysis and not for studying individuals. Therefore, while they may be useful for population-level estimates, individual-level errors may exist. Consequently, in our benchmarking analyses, we focus on their performance on various large datasets rather than qualitatively evaluating individual cases.

Another significant limitation of the tools we evaluate on this portal is that they cannot infer an individual's self-identified gender. Gender is a social construct and, thus, an achieved status that must be learned and can change over time (Butler 1988, Lindsey 2015). Although the term "gender inference" is commonly used in the context of these methods, their predictions are arguably closer to the sex assigned at birth rather than gender. This discrepancy is also reflected in their limitation of differentiating only between women and men while disregarding other non-binary genders.

To the best of our knowledge, there is currently no research on inferring non-binary gender. At best, the methods we benchmark can, therefore, be applied to predict observer-ascribed gender. While observed-ascribed gender is a poor measure of gender identity (Hamidi et al. 2018), it might nevertheless be useful for many research applications, especially when the alternative would be not to study gender at all. For these applications, our benchmarking analysis tries to assist researchers by finding the best measurement instrument based on factors such as data type, image content, and image quality, which can affect the performance of gender inference tools (Schwemmer et al. 2020).

Check out our methods section to learn more about the inference tools we benchmark.

References

Butler, J. (1988). Performative acts and gender constitution: An essay in phenomenology and feminist theory. Theatre journal, 40(4), 519-531. https://www.jstor.org/stable/3207893?seq=1

Hamidi, Foad, Scheuerman, Morgan K., Branham, Stacy M. (2018). “Gender Recognition or Gender Reductionism.” Pp. 1–13 in Proceedings of the 2018 Chi Conference on Human Factors in Computing Systems. New York: Association for Computing Machinery. https://dl.acm.org/doi/10.1145/3173574.3173582

Lindsey, L. L. (2015). Gender roles: A sociological perspective. Routledge. https://www.routledge.com/Gender-Roles-A-Sociological-Perspective/Lindsey/p/book/9780205899685

Schwemmer, C., Knight, C., Bello-Pardo, E. D., Oklobdzija, S., Schoonvelde, M., & Lockhart, J. W. (2020). Diagnosing Gender Bias in Image Recognition Systems. Socius. https://doi.org/10.1177/2378023120967171