Datasets Summary
Dataset | Source | Features | Number of images | Male images | Female images | |
---|---|---|---|---|---|---|
Images | Names | |||||
IMDB | 460,723 | 263,214 | 189,047 | |||
WIKI | 62,304 | 49,698 | 12,606 | |||
Scholar | 1,854 | 959 | 895 | |||
OUI | 19,370 | 5,770 | 6,440 | |||
3,362 | 1,534 | 1,792 | ||||
Gender Shades | 1,270 | 740 | 566 |
Dataset Descriptions
IMDB Dataset
Wikipedia Dataset
Scholar Dataset
The dataset contains images of 1854 individuals along with their full names, countires, and gender, where 51.7% are male and 48.3% female. The images in the dataset correspond to a random sample of academics [Karimi et al. 2016], were collected from Google Images, and were manually labeled with respect to their gender. All the images have high quality and show individual's full faces.
OUI-Adience Dataset
Twitter Dataset
Gender Shade Dataset
The dataset contains 460,723 facial images and names for the most popular 100,000 actors listed on the IMDB website. The collected images are sourced from movies, interviews, and film festivals. While they vary in terms of quality and poses, most images show full faces.
The dataset contains 62,359 profile images and names of elites from different fields (e.g., politics, social events, and the film industry) extracted from their Wikipedia pages. All the images are high quality, well-posed, often official images that show full faces. The Wikipedia dataset combined with IMBD dataset considered as the largest dataset for face images and age information.
The OUI dataset is a public dataset of 19,370 facial images. The images in this dataset were collected from Flickr albums (uploaded by users with Creative Commons license from smart phones), were taken in close to real-world conditions without careful preparation or posing. Therefore, the images in this dataset depict a large variety of poses, lighting conditions, noise, and facial accessories.
The Twitter dataset contains 3,326 images, of which 53.8% depict females and 46.2% depict males. It is considered the first multilingual gender, age, and organization dataset [Wang et al. 2019 ] containing profile images and screen names from up to 200 randomly selected Twitter users speaking one of the following 32 languages (ISO 639-1 represntation): en, cs, fr, nl, ar, ro, bs, da, it, pt, no, es, hr, tr, de, fi, el, he, ru, bg, hu, sk, et, pl, lv, sl, lt, ga, eu, mt, is, rm, cy. The images in this dataset cover a large variety including slogans, icons, selfies, panoramic views and more.In addition to gender, the dataset also covers the attributes age and organization-status.
The dataset contains 1,270 images of unique individuals with different skin types. The subjects are composed of male and female parliamentarians from 6 countries: 3 African countries (South Africa, Senegal, Rwanda) and 3 European countries (Finland, Sweden, Iceland). Their images are annotated with gender and skin type and allows us to benchmark the performance of methods based on fine-grained gender and skin type intersectional groups -- darker females, darker males, lighter females, and lighter males.