Using Gephi and Python to identify the strength of female professional networks
In terms of data science, D&I has become increasingly important for people teams. Currently, most multinationals have launched robust programs related to gender and minority development, pay equality, and disability access. Though executive sponsorship is high, the quality of data insights often trails behind insights. Typically, leaders rely on lagging indicators such as pay distribution charts, high potential selection, and succession plan coverage to monitor success. More advanced organizations have introduced machine learning models related to D&I outcomes. However, the reliance on historical data to train and test the algorithms often skews the results.
This is where organization network analysis becomes powerful. Overall, it’s important to remember that the breadth, depth, and quality of an employee’s professional network is a leading indicator. From an analytics perspective, understanding how employees are building their internal and external network is critically important. This includes outside platforms such as LinkedIn and Google Scholar, alongside internal connectors such as executive influencers and strategic locations. From a D&I perspective, the goal is to understand how employee professional networks differ. Moreover, what company factors are adversely impacting gender and minority candidates, including those with disabilities.
Please consider the following question: If a male and female top performer are placed on the same succession plan, how do you know if their development plans are equivalent?
Typically, the approach is to look at planned assignments, training courses, and leadership assessments. As part of this, companies may overlay a mentoring program to drive executive exposure. Though useful, these tactics rely on company directed initiatives, without accounting for organic growth. From a data perspective, this leads to lagging indicators, which makes it difficult for people teams to innovate.
What if you asked a follow-up question: assuming the same scenario, what is the impact if a male top performer has a stronger internal network than a female?
Before looking at a single visualization, most leaders would intuitively know that the female would be disadvantaged. From a D&I perspective, the goal of organizational network analysis is to track the relative strength between employee populations, while creating connection opportunities for target segments.
TECHNICAL NOTE: This article covers the basics related to network theory, including nodes, edges, and degree centrality. However, coding has been excluded to keep the content strategic. If you have a question about how to use Python to establish Gephi files for LinkedIn or Email, please post a comment and I’ll follow-up with a technical paper that includes programming examples.
Before Starting: Understanding Graph Theory
In summary, think of a graph as a set of data that is arranged using two member groups: NODES and EDGES. The former represents the actors in an organizational network. Example nodes include individuals or organizations, the bigger being more influential. However, nodes can represent anything that links entities together such as computers, cities, or political parties. Comparatively, edges highlight the presence of a connection between two nodes. In an organizational network, examples include a supervisor-employee relationship, a professional society, or an account manager interfacing with a key customer. Importantly, graph theory emphasizes the difference between directed vs. undirected connections. Specifically, a directed graph has edges that indicate a single (one-way) relationship. Comparatively, an undirected graph has edges that are multi-directional, allowing for movement between the source and target nodes.
For HR professionals, it’s important to understand the following concepts:
Degree Centrality: highlights the number of links that are tied to a node. In other words, it shows which nodes are most important in a network. In a network graph, you’ll recognize the key influencers by the relative size of the node, alongside the number of connectors.
Betweenness Centrality: represents the number of times a node (i.e. key influencer) acts as a bridge between two other nodes. This metric relies on the shortest path, which stipulates that nodes with the highest betweenness centrality have the strongest control.
Network Density: indicates the connections individuals maintain compared to the total possible connections available in the network. For example, if an employee network has 100 participants, then each person has the potential of connecting with 99 individuals. The density calculation evaluates how many “actual connections” have been achieved. For example, a person with 15 connections would have a network density of 15% (15/99).
Step 1: Identify your target population for network analysis and segment talent pools
To conduct your D&I study, it’s important to establish both a male and female data population. To do this, analysts should identify a list of top performers at various leadership levels including front-line Supervisor, Director and Vice President. These employees should be segregated by gender, allowing for separate network graphs to be created. Importantly, the goal is to identify the key influencers, location clusters, and strategic initiatives that connect employees across internal teams. Likewise, using LinkedIn and Google Scholar helps determine the external connections that drive industry contribution. As discussed, understanding an employee’s network strength is a leading indicator of development. This is particularly important as individuals assume higher leadership roles.
Complete network graphs for both male top performers and D&I talent pools
Step 2: Create a Gephi File for both the internal and external network
Using Python, the next step is to convert data from external platforms such as LinkedIn and Google Scholar into the Gephi format. This includes creating two .csv files to identify the network’s nodes and edges. As part of this, analysts should determine if weights should be assigned to the edges to indicate relative importance. This includes adding other features such as employee type, location, assigned succession plan and so forth. Once these files are created, the next step is to repeat the process for the internal network. This includes downloading company email, alongside other corporate platforms such as professional societies, employee blogs, and mentoring databases. Once you have generated the Gephi files for both the external and internal networks, the final activity is to segment the files into employee populations. For D&I analysis, it’s useful to evaluate male top performers against female employees.
Step 3: Analyze the network for centrality, density and betweenness (shortest path)
From an analytics perspective, it’s critical that each employee population maintains the same statistical assumptions. This means that choices regarding graph layout, degree centrality and network density calculations should be the same. Furthermore, it’s important that analysts eliminate unnecessary noise from the graph. This means filtering out nodes with limited connections, alongside sizing the network in a manner that highlights key influencers, strategic locales, and other clusters.
Step 4: Identify themes that positively impact the networks of top performers
Once you’ve generated network graphs for both male and female employees, the next step is to evaluate their characteristics side-by-side. Importantly, the goal is to identity key themes that are strengthening certain performers, while weakening others. Though it’s common for males to have stronger connections, it’s important that analysts objectively analyze the graphs without prejudgments. Overall, the following categories are useful when evaluating network strength. For each element, analysts should calculate the delta between the male and female graphs. The goal is to identify gaps that can be addressed either through training, mentoring, or policy adjustments.
Identify the Principle Nodes (i.e. Key Influencers) and their key connections.
Evaluate the Geographic and Team Clusters that drive organizational activity.
Identify the Professional Societies and Industry Conferences that are most impactful.
Calculate the path distance between key influencers and junior employees.
Compare incumbent vs. successors, including differences in density and centrality.
Analyze the delta between the internal vs. external network strength of employees.
Combined, these trends should present a compelling narrative when comparing male vs. female top performers. Ideally, professional networks should maintain a similar breadth, depth, and quality for employees assigned to the same succession plan. In the case of D&I, identifying differences is a powerful starting point in designing programs that help female leaders develop. Moreover, the graphs serve as a quality check for existing programs.
Final Step: Recommendations
Once you’ve completed the Gephi analysis, the process for developing recommendations is relatively straightforward. Specifically, the objective is to identify network elements that are present in top performers, but comparatively weak or non-existent among female D&I talent pools. In this way, People Teams can design interventions that help employees break into key networks. As discussed, the presence of a robust internal and external network is a leading indicator of development. Therefore, the goal is to identity those principle influencers (Nodes), alongside the key communication clusters (Edges) that drive operations. This should include both external networks such as LinkedIn and Google Scholar, alongside internal platforms. Once complete, HR Professionals can determine which barriers exist that are limiting the quality of D&I networks. Examples include barriers to entry (i.e. lack of a mentor), missing skills, the need for a critical assignment, and/or misaligned company polices. Ultimately, the more D&I networks are calibrated against top performers, the easier it will be for employees to grow. Lastly, the visualization is both powerful and intuitive, making it easier for executives to buy-in.
Network analysis is a great leading indicator of D&I progress, which can be balanced against lagging indicators such as succession planning strength and high potential selection. Taken together, People Teams can use this information to form a comprehensive view of development. In this article, female opportunities were emphasized. However, the same approach can be used to evaluate minority leaders and employees with disabilities.