5 Working with Network Data

Importing, cleaning, and managing network data from multiple sources

Author

Dr Clemens Jarnach

Published

March 18, 2026

We will not always generate our own graph objects. As social scientists, we usually want to analyse social networks observed in the real world, so we need to learn how to deal with real network data, which rarely arrives in a tidy format. This chapter addresses the practical challenge of converting raw data — such as edge lists from APIs, adjacency matrices from surveys, and bipartite data from affiliations — into usable network objects in R. You will learn how to import, validate, and manipulate network data from multiple sources.

Setup

Loading Packages

library(igraph)
library(tidyverse)
library(tidygraph)
library(ggraph)
library(ergm)      # provides florentine dataset as network class objects
library(network)   # underlies ergm; provides as.matrix() for network objects

Data Formats: A Conceptual Overview

Network data arrives in three common representations, each with distinct structural properties and appropriate use cases:

Format	Structure	Typical Source
Adjacency Matrix	\(n \times n\) matrix; cell \((i,j) = 1\) if an edge exists	Survey instruments, sociometric data
Edge List	Two-column data frame of sender–receiver pairs	API exports, relational databases, CSV files
Incidence Matrix	\(n \times m\) matrix linking actors to events	Affiliation records, membership data

The choice of representation is rarely arbitrary: it reflects the data collection instrument, the storage system, and the analytical tradition of the field. Familiarity with all three formats — and the conversion pathways between them — is essential for applied network research.

Part 1: Adjacency Matrices — The Florentine Families

The Dataset

The Florentine families network is a canonical dataset in social network analysis, originally compiled by Padgett and Ansell (padgett1993robust?) from historical records of fifteenth-century Florence. It encodes marriage and business alliances among 16 elite Florentine families during the Medici’s rise to political dominance. The dataset has become a benchmark for studying how social structure shapes political outcomes.

The ergm package provides this data as a pair of network class objects — the native format of the statnet software suite — giving us a realistic starting point for demonstrating data conversion.

# Load the florentine families data
data(florentine)

# Two network objects are now available:
# flomarriage  — marriage alliance network
# flobusiness  — business partnership network

class(flomarriage)

[1] "network"

# Inspect the network object's summary
flomarriage

 Network attributes:
  vertices = 16 
  directed = FALSE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 20 
    missing edges= 0 
    non-missing edges= 20 

 Vertex attribute names: 
    priorates totalties vertex.names wealth 

No edge attributes

The printed summary reveals key structural properties: 16 vertices, 20 edges, undirected, and unweighted. It also lists vertex attributes stored within the network object — including wealth and priorates (number of seats held on the city council) — which we will extract shortly.

Extracting the Adjacency Matrix

The network class is the native format of the statnet suite but is not directly compatible with igraph. The standard conversion pathway proceeds via the adjacency matrix: we extract it using as.matrix() and then pass it to igraph.

# Extract the adjacency matrix from the network object
flo_mat <- as.matrix(flomarriage)

# Inspect structure
dim(flo_mat)

[1] 16 16

flo_mat[1:6, 1:6]

           Acciaiuoli Albizzi Barbadori Bischeri Castellani Ginori
Acciaiuoli          0       0         0        0          0      0
Albizzi             0       0         0        0          0      1
Barbadori           0       0         0        0          1      0
Bischeri            0       0         0        0          0      0
Castellani          0       0         1        0          0      0
Ginori              0       1         0        0          0      0

The matrix is symmetric (confirming undirected ties), with 1s indicating marriage alliances and 0s indicating no recorded alliance. Row and column names are automatically preserved from the network object’s vertex names.

Converting to igraph

igraph’s graph_from_adjacency_matrix() handles this conversion directly. The mode argument controls directionality: "undirected" collapses the symmetric matrix into a single edge per pair and ignores the diagonal.

g_flo <- graph_from_adjacency_matrix(
  flo_mat,
  mode     = "undirected",
  weighted = NULL,
  diag     = FALSE
)

g_flo

IGRAPH 53f7d4d UN-- 16 20 -- 
+ attr: name (v/c)
+ edges from 53f7d4d (vertex names):
 [1] Acciaiuoli--Medici       Albizzi   --Ginori      
 [3] Albizzi   --Guadagni     Albizzi   --Medici      
 [5] Barbadori --Castellani   Barbadori --Medici      
 [7] Bischeri  --Guadagni     Bischeri  --Peruzzi     
 [9] Bischeri  --Strozzi      Castellani--Peruzzi     
[11] Castellani--Strozzi      Guadagni  --Lamberteschi
[13] Guadagni  --Tornabuoni   Medici    --Ridolfi     
[15] Medici    --Salviati     Medici    --Tornabuoni  
+ ... omitted several edges

The printed summary confirms the correct structure: 16 vertices, 20 edges, undirected, unweighted. Node names are automatically imported from the matrix dimnames.

# Verify that vertex names are preserved
V(g_flo)$name

 [1] "Acciaiuoli"   "Albizzi"      "Barbadori"    "Bischeri"    
 [5] "Castellani"   "Ginori"       "Guadagni"     "Lamberteschi"
 [9] "Medici"       "Pazzi"        "Peruzzi"      "Pucci"       
[13] "Ridolfi"      "Salviati"     "Strozzi"      "Tornabuoni"

Direct Conversion with intergraph

An alternative one-step conversion from network to igraph is available via the intergraph package:

# install.packages("intergraph")
 g_flo <- intergraph::asIgraph(flomarriage)

Extracting Node Attributes from the Source Object

Before converting to tbl_graph, we extract the vertex-level attributes from the original network object. These are stored internally and accessible via network::get.vertex.attribute().

# Extract vertex attributes from the network object into a tibble
family_meta <- tibble(
  name      = network::get.vertex.attribute(flomarriage, "vertex.names"),
  wealth    = network::get.vertex.attribute(flomarriage, "wealth"),
  priorates = network::get.vertex.attribute(flomarriage, "priorates")
) |>
  mutate(
    # Simplified political faction grouping for illustration
    faction = case_when(
      name == "Medici"                                                     ~ "Medici Alliance",
      name %in% c("Albizzi", "Pazzi", "Strozzi",
                  "Castellani", "Bischeri", "Peruzzi", "Guadagni")        ~ "Oligarch",
      TRUE                                                                  ~ "Other"
    )
  )

family_meta

# A tibble: 16 × 4
   name         wealth priorates faction        
   <chr>         <dbl>     <dbl> <chr>          
 1 Acciaiuoli       10        53 Other          
 2 Albizzi          36        65 Oligarch       
 3 Barbadori        55         0 Other          
 4 Bischeri         44        12 Oligarch       
 5 Castellani       20        22 Oligarch       
 6 Ginori           32         0 Other          
 7 Guadagni          8        21 Oligarch       
 8 Lamberteschi     42         0 Other          
 9 Medici          103        53 Medici Alliance
10 Pazzi            48         0 Oligarch       
11 Peruzzi          49        42 Oligarch       
12 Pucci             3         0 Other          
13 Ridolfi          27        38 Other          
14 Salviati         10        35 Other          
15 Strozzi         146        74 Oligarch       
16 Tornabuoni       48         0 Other

Converting to tbl_graph and Joining Attributes

For tidyverse-compatible workflows, we convert the igraph object to a tbl_graph and join the metadata table using standard dplyr syntax. Within a tbl_graph, activate(nodes) or activate(edges) switches the active context for subsequent operations.

tg_flo <- as_tbl_graph(g_flo) |>
  activate(nodes) |>
  left_join(family_meta, by = "name")

tg_flo

# A tbl_graph: 16 nodes and 20 edges
#
# An undirected simple graph with 2 components
#
# A tibble: 16 × 4
  name       wealth priorates faction 
  <chr>       <dbl>     <dbl> <chr>   
1 Acciaiuoli     10        53 Other   
2 Albizzi        36        65 Oligarch
3 Barbadori      55         0 Other   
4 Bischeri       44        12 Oligarch
5 Castellani     20        22 Oligarch
6 Ginori         32         0 Other   
# ℹ 10 more rows
#
# A tibble: 20 × 2
   from    to
  <int> <int>
1     1     9
2     2     6
3     2     7
# ℹ 17 more rows

# Confirm all attributes are present
tg_flo |> activate(nodes) |> as_tibble()

# A tibble: 16 × 4
   name         wealth priorates faction        
   <chr>         <dbl>     <dbl> <chr>          
 1 Acciaiuoli       10        53 Other          
 2 Albizzi          36        65 Oligarch       
 3 Barbadori        55         0 Other          
 4 Bischeri         44        12 Oligarch       
 5 Castellani       20        22 Oligarch       
 6 Ginori           32         0 Other          
 7 Guadagni          8        21 Oligarch       
 8 Lamberteschi     42         0 Other          
 9 Medici          103        53 Medici Alliance
10 Pazzi            48         0 Oligarch       
11 Peruzzi          49        42 Oligarch       
12 Pucci             3         0 Other          
13 Ridolfi          27        38 Other          
14 Salviati         10        35 Other          
15 Strozzi         146        74 Oligarch       
16 Tornabuoni       48         0 Other

Visualisation

ggraph(tg_flo, layout = "fr") +
  geom_edge_link(colour = "grey65", width = 0.8) +
  geom_node_point(aes(size = wealth, colour = faction), alpha = 0.85) +
  geom_node_text(aes(label = name), repel = TRUE, size = 3) +
  scale_size_continuous(range = c(3, 12), name = "Wealth\n(000s lire)") +
  scale_colour_manual(
    values = c(
      "Medici Alliance" = "#2166ac",
      "Oligarch"        = "#d73027",
      "Other"           = "#888888"
    ),
    name = "Faction"
  ) +
  theme_graph(base_family = "sans") +
  labs(
    title    = "Florentine Marriage Network",
    subtitle = "Node size = family wealth; colour = political faction (Padgett & Ansell, 1993)"
  )

The Medici’s structural position — not merely their wealth — explains their political dominance. Despite the Strozzi being wealthier, the Medici occupied a more central brokerage position in the marriage network, connecting otherwise disconnected families.

Part 2: Edge Lists — Zachary’s Karate Club

The Dataset

Zachary’s Karate Club (zachary1977information?) is a network of 34 friendship ties among members of a university karate club, collected over two years of participant observation. During the study period, the club underwent a factional split between the instructor (node 1, “Mr. Hi”) and the administrator (node 34, “John A.”), eventually dividing into two independent clubs. This network has become one of the most widely used benchmarks for community detection algorithms, because the ground-truth group memberships are known from the ethnographic record.

# Load the built-in igraph implementation of the Zachary dataset
karate_raw <- make_graph("Zachary")

vcount(karate_raw)   # number of members

[1] 34

ecount(karate_raw)   # number of friendship ties

[1] 78

Extracting and Inspecting the Edge List

The edge list is the most common format in which network data arrives from external sources: relational databases, API responses, and CSV files all naturally produce rows of sender–receiver pairs. igraph’s as_data_frame() extracts the edge list as a tibble, giving us a realistic starting point.

# Extract edge list as a data frame
el_karate <- igraph::as_data_frame(karate_raw, what = "edges")
head(el_karate, 10)

glimpse(el_karate)

Rows: 78
Columns: 2
$ from <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2,…
$ to   <dbl> 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 18, 20, 22, 3…

Each row represents one undirected friendship tie. Vertex identities are defined implicitly by their appearance in the from and to columns — there is no separate node table required for the basic conversion.

Converting a data.frame to igraph

We reconstruct the igraph object from the extracted edge list, demonstrating the standard import pathway for data arriving as a flat file. Any additional columns in the data frame are automatically treated as edge attributes.

# Reconstruct igraph from the edge list data frame
g_karate <- graph_from_data_frame(
  d        = el_karate,
  directed = FALSE
)

g_karate

IGRAPH 86c67b0 UN-- 34 78 -- 
+ attr: name (v/c)
+ edges from 86c67b0 (vertex names):
 [1] 1 --2  1 --3  1 --4  1 --5  1 --6  1 --7  1 --8  1 --9  1 --11
[10] 1 --12 1 --13 1 --14 1 --18 1 --20 1 --22 1 --32 2 --3  2 --4 
[19] 2 --8  2 --14 2 --18 2 --20 2 --22 2 --31 3 --4  3 --8  3 --28
[28] 3 --29 3 --33 3 --10 3 --9  3 --14 4 --8  4 --13 4 --14 5 --7 
[37] 5 --11 6 --7  6 --11 6 --17 7 --17 9 --31 9 --33 9 --34 10--34
[46] 14--34 15--33 15--34 16--33 16--34 19--33 19--34 20--34 21--33
[55] 21--34 23--33 23--34 24--26 24--28 24--33 24--34 24--30 25--26
[64] 25--28 25--32 26--32 27--30 27--34 28--34 29--32 29--34 30--33
+ ... omitted several edges

Joining Node Metadata

Vertex-level attributes — here, the faction membership from Zachary’s original field observations — are typically stored in a separate table and must be joined to the graph after construction.

# Faction membership from Zachary (1977): 1 = Mr. Hi, 2 = John A.
faction_vec <- c(
  1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 2, 2,
  1, 1, 2, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2
)

node_meta_karate <- tibble(
  name    = as.character(1:34),
  faction = ifelse(faction_vec == 1, "Mr. Hi", "John A."),
  is_key  = name %in% c("1", "34")   # the two principal actors
)

head(node_meta_karate, 10)

# A tibble: 10 × 3
   name  faction is_key
   <chr> <chr>   <lgl> 
 1 1     Mr. Hi  TRUE  
 2 2     Mr. Hi  FALSE 
 3 3     Mr. Hi  FALSE 
 4 4     Mr. Hi  FALSE 
 5 5     Mr. Hi  FALSE 
 6 6     Mr. Hi  FALSE 
 7 7     Mr. Hi  FALSE 
 8 8     Mr. Hi  FALSE 
 9 9     John A. FALSE 
10 10    John A. FALSE

# Convert to tbl_graph and join metadata
tg_karate <- as_tbl_graph(g_karate) |>
  activate(nodes) |>
  left_join(node_meta_karate, by = "name")

tg_karate |> activate(nodes) |> as_tibble() |> head()

# A tibble: 6 × 3
  name  faction is_key
  <chr> <chr>   <lgl> 
1 1     Mr. Hi  TRUE  
2 2     Mr. Hi  FALSE 
3 3     Mr. Hi  FALSE 
4 4     Mr. Hi  FALSE 
5 5     Mr. Hi  FALSE 
6 6     Mr. Hi  FALSE

Visualisation

ggraph(tg_karate, layout = "fr") +
  geom_edge_link(colour = "grey80", width = 0.6, alpha = 0.7) +
  geom_node_point(aes(colour = faction, shape = is_key), size = 5) +
  geom_node_text(
    aes(label = ifelse(is_key, paste0("Node ", name), "")),
    repel = TRUE, size = 4, fontface = "bold"
  ) +
  scale_colour_manual(
    values = c("Mr. Hi" = "#2166ac", "John A." = "#d73027"),
    name = "Faction"
  ) +
  scale_shape_manual(
    values = c("TRUE" = 17, "FALSE" = 16),
    guide  = "none"
  ) +
  theme_graph(base_family = "sans") +
  labs(
    title    = "Zachary's Karate Club Network",
    subtitle = "Colour = observed faction split; triangles mark the two principal actors (nodes 1 and 34)"
  )

The two factions are visually separable even without the community detection algorithm, illustrating that the structural division in the network preceded the formal organisational split — a classic finding in the sociology of organisations.

Part 3: Bipartite Networks — Seminar Attendance

The Data Structure

A bipartite (or two-mode) network connects two distinct sets of nodes — typically actors and events — where edges run exclusively between sets, never within them. In social science research, this structure arises naturally from affiliation data: researchers attending seminars, legislators co-sponsoring bills, board members sharing directorships, or consumers purchasing products.

The typical representation is the incidence matrix (also called a membership matrix): rows are actors, columns are events, and cell \((i, j) = 1\) if actor \(i\) participated in event \(j\).

Constructing the Seminar Attendance Matrix

We represent a cohort of seven researchers and their attendance at five institute events.

# Define node sets
researchers <- c(
  "Alice Smith", "Bob Jones",   "Carol McGregor",
  "David Park",  "Emma Wilson", "Frank Dow",  "Grace Kelly"
)

events <- c(
  "SNA Workshop", "AI Ethics Seminar", "Public Policy Forum",
  "DPhil Symposium", "Methods Masterclass"
)

# Incidence matrix: rows = researchers, columns = events
# Cell (i, j) = 1 if researcher i attended event j
attendance_mat <- matrix(
  c(1, 0, 1, 0, 1,
    0, 1, 1, 1, 0,
    1, 1, 0, 0, 1,
    0, 0, 1, 1, 1,
    1, 1, 0, 1, 0,
    0, 1, 1, 0, 1,
    1, 0, 0, 1, 1),
  nrow     = 7,
  byrow    = TRUE,
  dimnames = list(researchers, events)
)

attendance_mat

               SNA Workshop AI Ethics Seminar Public Policy Forum
Alice Smith               1                 0                   1
Bob Jones                 0                 1                   1
Carol McGregor            1                 1                   0
David Park                0                 0                   1
Emma Wilson               1                 1                   0
Frank Dow                 0                 1                   1
Grace Kelly               1                 0                   0
               DPhil Symposium Methods Masterclass
Alice Smith                  0                   1
Bob Jones                    1                   0
Carol McGregor               0                   1
David Park                   1                   1
Emma Wilson                  1                   0
Frank Dow                    0                   1
Grace Kelly                  1                   1

Converting to a Bipartite igraph

igraph::graph_from_incidence_matrix() converts an incidence matrix directly to a bipartite graph object. The function assigns a type vertex attribute — FALSE for row nodes (researchers) and TRUE for column nodes (events) — which is used by downstream layout and projection functions.

g_bip <- graph_from_incidence_matrix(attendance_mat, directed = FALSE)

# Confirm bipartite structure
is_bipartite(g_bip)

[1] TRUE

# Inspect node types
tibble(
  name = V(g_bip)$name,
  type = ifelse(V(g_bip)$type, "Event", "Researcher")
)

# A tibble: 12 × 2
   name                type      
   <chr>               <chr>     
 1 Alice Smith         Researcher
 2 Bob Jones           Researcher
 3 Carol McGregor      Researcher
 4 David Park          Researcher
 5 Emma Wilson         Researcher
 6 Frank Dow           Researcher
 7 Grace Kelly         Researcher
 8 SNA Workshop        Event     
 9 AI Ethics Seminar   Event     
10 Public Policy Forum Event     
11 DPhil Symposium     Event     
12 Methods Masterclass Event

vcount(g_bip)   # 7 researchers + 5 events = 12 nodes

[1] 12

ecount(g_bip)   # one edge per attendance record

[1] 21

Projecting to One-Mode Networks

Bipartite networks are commonly projected onto a single mode for analysis. The researcher projection links two researchers if they attended at least one common event; the event projection links two events if they share at least one attendee. bipartite_projection() computes both simultaneously and records the number of shared co-attendees as an edge weight attribute.

# Compute both projections
projections <- bipartite_projection(g_bip)

g_researchers <- projections$proj1   # researcher co-attendance network
g_events      <- projections$proj2   # event co-attendance network

# Edge weights record the number of shared events
E(g_researchers)$weight

 [1] 2 1 2 1 2 2 1 2 2 2 1 2 2 2 1 2 1 2 2 1 1

Information Loss in Projection

Projecting a bipartite network onto one mode inevitably discards structural information. Two researchers who co-attended five events and two who co-attended one event are both represented by a single edge — distinguished only by the weight attribute. For many analyses, working directly with the bipartite structure (or using the dual-projection approach of Everett and Borgatti (2013)) preserves more of the original relational signal.

Visualisation

# Bipartite two-column layout
ggraph(g_bip, layout = "bipartite") +
  geom_edge_link(colour = "grey70", width = 0.6, alpha = 0.6) +
  geom_node_point(
    aes(
      colour = ifelse(type, "Event", "Researcher"),
      shape  = ifelse(type, "Event", "Researcher")
    ),
    size = 7
  ) +
  geom_node_text(aes(label = name), repel = TRUE, size = 3) +
  scale_colour_manual(
    values = c("Researcher" = "#2166ac", "Event" = "#d73027"),
    name   = "Node Type"
  ) +
  scale_shape_manual(
    values = c("Researcher" = 16, "Event" = 15),
    name   = "Node Type"
  ) +
  theme_graph(base_family = "sans") +
  labs(
    title    = "Seminar Attendance Network (Bipartite)",
    subtitle = "Circles = researchers; squares = events; edges = attendance"
  )

# Researcher co-attendance projection
ggraph(g_researchers, layout = "fr") +
  geom_edge_link(aes(width = weight), colour = "#2166ac", alpha = 0.45) +
  geom_node_point(colour = "#2166ac", size = 7, alpha = 0.85) +
  geom_node_text(aes(label = name), repel = TRUE, size = 3) +
  scale_edge_width(range = c(0.5, 3.5), name = "Shared\nEvents") +
  theme_graph(base_family = "sans") +
  labs(
    title    = "Researcher Co-attendance Network (Projected)",
    subtitle = "Edge weight = number of events attended together"
  )

Part 4: Importing from External Files

Reading a CSV Edge List

The most common import task in applied research is loading a flat CSV file containing an edge list. The workflow is consistent regardless of whether the file is stored locally or retrieved from a URL: read it into a tibble, inspect and clean it, then pass it to graph_from_data_frame().

For this example of reading csv files we will use a Game of Thrones network dataset. It shows data for character relationships within George R. R. Martin’s A Storm of Swords, the third novel in his series A Song of Ice and Fire (also known as the HBO television adaptation Game of Thrones). This data was originally compiled by A. Beveridge and J. Shan, “Network of Thrones,” Math Horizons Magazine , Vol. 23, No. 4 (2016), pp. 18-22.

The nodes csv contains 107 different characters, and the edges csv contains 353 weighted relationships between those characters, which were calculated based on how many times two characters’ names appeared within 15 words of one another in the novel.

edge_path <- "data/got-edges.csv"  

edges_raw <- read_csv(edge_path, show_col_types = FALSE)
glimpse(edges_raw)

Rows: 352
Columns: 3
$ Source <chr> "Aemon", "Aemon", "Aerys", "Aerys", "Aerys", "Aerys…
$ Target <chr> "Grenn", "Samwell", "Jaime", "Robert", "Tyrion", "T…
$ Weight <dbl> 5, 31, 18, 6, 5, 8, 5, 5, 11, 23, 9, 6, 5, 43, 7, 1…

# To verify the weights are there:

A well-formed edge list CSV requires at minimum two columns (source and target). Additional columns are automatically treated as edge attributes by graph_from_data_frame():

# Standardise column names and apply basic cleaning
edges_clean <- edges_raw |>
  rename(from = 1, to = 2) |>         # ensure first two cols are from/to
  filter(!is.na(from), !is.na(to)) |>  # remove rows with missing endpoints
  mutate(across(c(from, to), as.character)) |>  # enforce character IDs
  distinct(from, to, .keep_all = TRUE)           # remove duplicate edges

# Build the igraph object
g_csv <- graph_from_data_frame(edges_clean, directed = FALSE)

# You could also add the edge weights manually:
# E(g_csv)$weight <- edges_raw$Weight

# Convert to tbl_graph for tidyverse workflows
tg_csv <- as_tbl_graph(g_csv)

g_csv

IGRAPH 32e482d UN-- 107 352 -- 
+ attr: name (v/c), Weight (e/n)
+ edges from 32e482d (vertex names):
 [1] Aemon  --Grenn     Aemon  --Samwell   Aerys  --Jaime    
 [4] Aerys  --Robert    Aerys  --Tyrion    Aerys  --Tywin    
 [7] Alliser--Mance     Amory  --Oberyn    Arya   --Anguy    
[10] Arya   --Beric     Arya   --Bran      Arya   --Brynden  
[13] Arya   --Cersei    Arya   --Gendry    Arya   --Gregor   
[16] Arya   --Jaime     Arya   --Joffrey   Arya   --Jon      
[19] Arya   --Rickon    Arya   --Robert    Arya   --Roose    
[22] Arya   --Sandor    Arya   --Thoros    Arya   --Tyrion   
+ ... omitted several edges

Visualisation

# Re-attach the weight from the original data (Weight column preserved as edge attr)
E(g_csv)$weight <- edges_raw$Weight

tg_csv <- as_tbl_graph(g_csv) |>
  activate(nodes) |>
  mutate(degree = igraph::degree(g_csv))

ggraph(tg_csv, layout = "fr") +
  geom_edge_link(aes(width = weight), colour = "grey60", alpha = 0.4) +
  geom_node_point(aes(size = degree), colour = "#2166ac", alpha = 0.85) +
  geom_node_text(
    aes(label = ifelse(degree > quantile(degree, 0.85), name, "")),
    repel = F, size = 3
  ) +
  scale_edge_width(range = c(0.3, 3), name = "Co-occurrences") +
  scale_size_continuous(range = c(2, 10), name = "Degree") +
  theme_graph(base_family = "sans") +
  labs(
    title    = "Game of Thrones Character Network",
    subtitle = "Edge width = co-occurrence weight; node size = degree; labels show top 15% by degree"
  )

Reading a GraphML File

GraphML is an XML-based format that stores both graph topology and arbitrary node and edge attributes in a single self-describing file — making it the standard exchange format for tools such as Gephi, Cytoscape, and NetworkX. Unlike a CSV edge list, a GraphML file requires no separate attribute-joining step: all metadata travels with the network.

igraph reads GraphML directly via read_graph() with format = "graphml".

g_marvel <- read_graph("data/marvel-network.graphml", format = "graphml")

g_marvel

IGRAPH 4e72f4a U-W- 327 9891 -- 
+ attr: label (v/c), id (v/c), Edge Label (e/c), weight
| (e/n), id (e/c)
+ edges from 4e72f4a:
 [1] 1-- 2 1-- 3 1-- 4 1-- 5 1-- 6 1-- 7 1-- 8 1-- 9 1--10 1--11
[11] 1--12 1--13 1--14 1--15 1--16 1--17 1--18 1--19 1--20 1--21
[21] 1--22 1--23 1--24 1--25 1--26 1--27 1--28 1--29 1--30 1--31
[31] 1--32 1--33 1--34 1--35 1--36 1--37 1--38 1--39 1--40 1--41
[41] 1--42 1--43 1--44 1--45 1--46 1--47 1--48 1--49 1--50 1--51
[51] 1--52 1--53 1--54 1--55 1--56 1--57 1--58 1--59 1--60 1--61
[61] 1--62 1--63 1--64 1--65 1--66 1--67 1--68 1--69 1--70 1--71
+ ... omitted several edges

# Inspect what vertex and edge attributes the file contains
vertex_attr_names(g_marvel)

[1] "label" "id"

edge_attr_names(g_marvel)

[1] "Edge Label" "weight"     "id"

# Preview node attributes as a tibble
igraph::as_data_frame(g_marvel, what = "vertices") |> head(10)

   label                      id
1         Black Panther / T'chal
2               Loki [asgardian]
3              Mantis / ? Brandt
4          Iceman / Robert Bobby
5        Marvel Girl / Jean Grey
6         Cyclops / Scott Summer
7            Klaw / Ulysses Klaw
8         Human Torch / Johnny S
9           Richards, Franklin B
10             Wolverine / Logan

# Convert to tbl_graph — all attributes are carried over automatically
tg_marvel <- as_tbl_graph(g_marvel)
tg_marvel

# A tbl_graph: 327 nodes and 9891 edges
#
# An undirected simple graph with 1 component
#
# A tibble: 327 × 2
  label id                     
  <chr> <chr>                  
1 ""    Black Panther / T'chal 
2 ""    Loki [asgardian]       
3 ""    Mantis / ? Brandt      
4 ""    Iceman / Robert Bobby  
5 ""    Marvel Girl / Jean Grey
6 ""    Cyclops / Scott Summer 
# ℹ 321 more rows
#
# A tibble: 9,891 × 5
   from    to `Edge Label` weight id    
  <int> <int> <chr>         <dbl> <chr> 
1     1     2 ""               10 105996
2     1     3 ""               23 105997
3     1     4 ""               12 105998
# ℹ 9,888 more rows

Visualisation

tg_marvel <- tg_marvel |>
  activate(nodes) |>
  mutate(degree = igraph::degree(g_marvel))

ggraph(tg_marvel, layout = "fr") +
  geom_edge_link(colour = "grey65", alpha = 0.25, width = 0.4) +
  geom_node_point(aes(size = degree), colour = "#d73027", alpha = 0.8) +
  geom_node_text(
    aes(label = ifelse(degree > quantile(degree, 0.93), id, "")),
    repel = F, size = 3
  ) +
  scale_size_continuous(range = c(1, 10), name = "Degree") +
  theme_graph(base_family = "sans") +
  labs(
    title    = "Marvel Character Co-appearance Network",
    subtitle = "Node size = degree centrality; labels show top 7% most connected characters"
  )

Part 5: Data Validation

Before proceeding to analysis, it is good practice to validate the integrity of a newly imported network. Three pathologies are worth checking systematically: isolates (disconnected nodes), self-loops (edges from a node to itself), and fragmented components (multiple disconnected subgraphs). Each may reflect a genuine substantive feature of the network or a data import error — and the distinction matters for interpretation.

Checking for Isolates

Isolates are nodes with degree zero: actors present in the node list but with no recorded ties. They may indicate data entry errors (e.g., a respondent appearing in a node list but absent from the edge list), or they may be substantively meaningful (e.g., survey respondents who reported no social contacts).

# Using the Karate Club as our working example
isolate_idx <- which(igraph::degree(g_karate) == 0)

if (length(isolate_idx) == 0) {
  cat("No isolates detected.\n")
} else {
  cat("Isolates found at indices:", isolate_idx, "\n")
}

No isolates detected.

Checking for Self-Loops

Self-loops — edges from a node to itself — are rarely substantively meaningful in social networks and often indicate data import errors, such as duplicate identifiers that collapse into self-referential ties. igraph’s which_loop() returns a logical vector over all edges.

n_loops <- sum(which_loop(g_karate))

if (n_loops == 0) {
  cat("No self-loops detected.\n")
} else {
  cat(n_loops, "self-loop(s) found. Removing with simplify()...\n")
  g_karate_clean <- simplify(g_karate, remove.multiple = FALSE, remove.loops = TRUE)
}

No self-loops detected.

Checking Connectedness

A connected graph has a single component: every node can reach every other node through some path. Disconnected graphs have multiple components, which has direct implications for analysis — distances between nodes in different components are undefined, and centrality measures computed on the full graph may be misleading.

is_connected(g_karate)

[1] TRUE

comps <- igraph::components(g_karate)
cat("Number of components:", comps$no, "\n")

Number of components: 1

cat("Component sizes     :", comps$csize, "\n")

Component sizes     : 34

A Reusable Validation Function

Wrapping these checks into a single function makes validation a consistent step in any import pipeline.

validate_graph <- function(g, name = "graph") {
  cat("=== Network Validation:", name, "===\n\n")

  cat(sprintf("  %-18s %d\n", "Vertices:",   vcount(g)))
  cat(sprintf("  %-18s %d\n", "Edges:",      ecount(g)))
  cat(sprintf("  %-18s %s\n", "Directed:",   is_directed(g)))
  cat(sprintf("  %-18s %s\n", "Weighted:",   is_weighted(g)))
  cat(sprintf("  %-18s %s\n", "Bipartite:",  is_bipartite(g)))
  cat("\n")

  n_isolates <- sum(igraph::degree(g) == 0)
  n_loops    <- sum(which_loop(g))
  n_comps    <- igraph::components(g)$no
  connected  <- is_connected(g)

  cat(sprintf("  %-18s %d%s\n", "Isolates:",
              n_isolates, ifelse(n_isolates > 0, "  [CHECK]", "  [OK]")))
  cat(sprintf("  %-18s %d%s\n", "Self-loops:",
              n_loops,    ifelse(n_loops > 0,    "  [CHECK]", "  [OK]")))
  cat(sprintf("  %-18s %s%s\n", "Connected:",
              connected,  ifelse(!connected,
                                 paste0("  [", n_comps, " components]"),
                                 "  [OK]")))
  cat(sprintf("  %-18s %.4f\n", "Density:", edge_density(g)))
  cat("\n")

  invisible(g)
}

validate_graph(g_flo,        name = "Florentine Families (Marriage)")

=== Network Validation: Florentine Families (Marriage) ===

  Vertices:          16
  Edges:             20
  Directed:          FALSE
  Weighted:          FALSE
  Bipartite:         FALSE

  Isolates:          1  [CHECK]
  Self-loops:        0  [OK]
  Connected:         FALSE  [2 components]
  Density:           0.1667

validate_graph(g_karate,     name = "Zachary's Karate Club")

=== Network Validation: Zachary's Karate Club ===

  Vertices:          34
  Edges:             78
  Directed:          FALSE
  Weighted:          FALSE
  Bipartite:         FALSE

  Isolates:          0  [OK]
  Self-loops:        0  [OK]
  Connected:         TRUE  [OK]
  Density:           0.1390

validate_graph(g_bip,        name = "Seminar Attendance (Bipartite)")

=== Network Validation: Seminar Attendance (Bipartite) ===

  Vertices:          12
  Edges:             21
  Directed:          FALSE
  Weighted:          FALSE
  Bipartite:         TRUE

  Isolates:          0  [OK]
  Self-loops:        0  [OK]
  Connected:         TRUE  [OK]
  Density:           0.3182

Summary

This chapter has covered the four principal import pathways for network data in R:

Source	Raw Format	igraph Constructor
`ergm` / `network` package	`network` class object	`as.matrix()` → `graph_from_adjacency_matrix()`
Flat file / database	Edge list `data.frame`	`graph_from_data_frame()`
Affiliation records	Incidence matrix	`graph_from_incidence_matrix()`
CSV file or URL	Edge list CSV	`read_csv()` → `graph_from_data_frame()`

Across all four pathways, the workflow follows a consistent logic: (1) load the raw data into an appropriate R object; (2) convert to igraph (or other graph object) using the relevant constructor; (3) attach node and edge attributes via left_join() within a tbl_graph; and (4) validate the resulting network before analysis.

The tbl_graph representation from tidygraph provides a tidyverse-compatible wrapper around igraph objects, enabling dplyr pipelines for attribute manipulation and ggraph for publication-quality visualisation. Together, these tools constitute a coherent and extensible analytical stack for social network research — one that connects cleanly to both the statnet ecosystem (via intergraph) and to the broader tidyverse.