Identification of Gender and Sexuality of Subjects in Big Data Sets
Data abundance is now the norm. With the proliferation of digital platforms, content generated by or from users has grown exponentially and with it, a growing recognition of the uses of such large data sets to provide insight into complex real-world problems. Demographic inference -- the prediction of population characteristics, such as gender, age, or geography -- from big data sources is emerging as a significant component of big data systems, with the promise of being used to inform decision making in a variety of sectors and industries.
The detection of a person’s gender and/or sexuality is often a key and foundational component of demographic inference. A survey has been conducted with the purpose of providing an approachable overview of the various methods researchers and other actors are using to infer gender and sexuality from large data sets. They are categorized according to two aspects of the methods surveyed: (1) the specific classifier utilized to infer gender, and (2) the data source being analyzed for said inference.
Table of Contents
1. Survey: Summaries and analysis of identification methods
2. Explainer: Overview of how inferencing methods work
3. Explainer: Type of data being analysed for inference
4. Resources: Tools and libraries for identification of gender and sexuality
5. Glossary: Definition of key terms
About
This survey is done by Saman Goudarzi at the Centre for Internet and Society (CIS), India, as part of the strategic network on Big Data for Development established by the International Development Research Centre, Canada. Contributors to this work includes Amber Sinha, Saumyaa Naidu, and Sumandro Chattapadhyay.
This website is created using Bootstrap and MixItUp, and hosted on GitHub.