Politician Map in English Wikipedia
[ Parsing DBPedia Data ]
The main goal is to retrieve a list of all politicians from DBpedia as a starting point for the Wiki parsing team. Additional infobox data for each politician, such as age, date of birth and nationality are also extracted.
Parsing Wikipedia
The main goal of the Wikipedia parsing team is to retrieve internal links between articles about politicians, one revision per month for all years from 2001 to 2016. A set of outgoing edge list is then generated, which represents the network of connected politicians.
Network Analysis
The plot shows that the number male politician Wiki pages stays significantly higher than the number of female politician Wiki pages for the whole observation period. As the total number of nodes increases over the years, so does the absolute size of the gap.
However, the relative ratio is slowly declining. As of December 2016, the number of male politician Wiki pages is more than five times the number of female politician Wiki pages, but in 2006, there were more than seven times more male politician Wiki page than female ones.
The first plot shows that females link to females more than males. Males link to females less than males. Percentage of female to female links equals to number of links from females to females divided by the number of links from females to males and females. Percentage of male to female links equals to number of links from males to females divided by the number of links from males to males and females.
The second plot shows that males link to males more than females. Females link to males less than females. Percentage of male to male links equals to number of links from males to males divided by the number of links from males to males and females. Percentage of female to male links equals to number of links from females to males divided by the number of links from females to males and females. In short, the homophily between female node is stronger than the homophily between male node.
In short, the homophily between female node is stronger than the homophily between male node.
The plot shows that according to in-degree centrality measure, men are significantly more central.
The second plot shows error bars. The distribution of indegrees is represented by showing a single data point, representing the mean value, and error bars to represent the overall distribution of the indegrees. Based on the error bars, The variance of indegrees increases over the time period. Variance of male indegrees increases more than females ones.
The third plot shows the bar chart of average indegrees over the period. We considered error bars to show how variance is changing. In this plot only one revision is considered per year.
The error bars show the standard error that is calculated by dividing the standard deviation by the square root of number of measurements that make up the mean (number of revisions).
Male politician Wiki pages are more likely to be linked to than female ones, which means male politicians are on average more popular or more noticable. Another reason is that there are more male nodes than female nodes, which – combined with the fact that there is a measurable level of homophily in the network (as pointed out earlier) – means, that male nodes get more inlinks.
This stays the same over the full observation period. The relative difference is very slowly decreasing over the years.
According to k-core measure, men are more likely to be in well connected subnetworks than women.
The second plot shows error bars. Using error bars we can show the distribution of k-core values. So instead of showing a single data point, representing the mean value of the data, we considered error bars to represent the overall distribution of the data. Based on the error bars, The variance of k-cores increases over the time period. Variance of male k-core increases more than females ones.
The error bars show the standard error that is calculated by dividing the standard deviation by the square root of number of measurements that make up the mean (number of revisions).
UI/UX Documentation
The UI/UX team is in charge of design and development of a web interface to present the network graph and statistics to end users. The team is also responsible for integration of inputs from both the Wikipedia parsing and Network analysis teams.
The team was sub-divided into two, one to design the website framework and interface, and the other to develop the visualization of the network graph.