1 + 3 [1] 4
# arithmetic evaluationEssential programming skills for network analysis
Before we build networks, we need to ensure you’re comfortable with R itself. This chapter introduces the R programming concepts you will need throughout the workshop. If you are already comfortable with R, treat it as a quick reference. If you are new to R, work through it carefully — the material here underpins everything that follows.
All code blocks in this workshop can be run interactively. Open the .qmd source file in RStudio, place your cursor inside a code chunk, and press Ctrl+Enter (Windows/Linux) or Cmd+Enter (Mac) to run a single line, or Ctrl+Shift+Enter to run the entire chunk.
R can be used as a calculator, but its real power lies in storing and manipulating values. The assignment operator <- binds a value to a name in the current environment.
1 + 3 [1] 4
# arithmetic evaluation2^8 # exponentiation[1] 256
17 %% 5 # modulo (remainder)[1] 2
17 %/% 5 # integer division[1] 3
a <- 9 # assignment: bind 9 to the name 'a'sqrt(a) # apply a function to an object[1] 3
b <- sqrt(a)a == b # logical comparison: are they equal?[1] FALSE
a != b # not equal[1] TRUE
a > 5 # greater than[1] TRUE
ls() # list all objects in the current environment[1] "a" "b"
R is case-sensitive: A and a are different objects. Object names can contain letters, digits, . and _, but must start with a letter or ..
Every value in R has a type. The most common scalar types are:
class(3.14) # numeric (double)[1] "numeric"
class(42L) # integer (note: I used the L suffix to tell R this is an integer)[1] "integer"
class("hello") # character[1] "character"
class(TRUE) # logical[1] "logical"
class(2 + 3i) # complex[1] "complex"
# Type coercion
as.numeric("3.14")[1] 3.14
as.integer(42)[1] 42
as.character(100)[1] "100"
as.logical(0) # 0 is FALSE; anything non-zero is TRUE[1] FALSE
is.na(NA) # test for missing value[1] TRUE
A vector is the fundamental data structure in R — an ordered collection of values of the same type. Almost all operations in R are vectorised, meaning they apply element-wise without explicit loops.
x <- c(1, 3, 5) # combine values into a vector
y <- c("one", "three", "five")# Indexing (1-based)
x[1] # first element[1] 1
x[c(1, 3)] # first and third elements[1] 1 5
x[-2] # all except the second[1] 1 5
# Sequences
a <- 1:10
b <- seq(from = 0, to = 1, by = 0.25)
c_rep <- rep(c(1, 2), times = 3)
c_rep[1] 1 2 1 2 1 2
# Vectorised operations — applied element-wise
x * 2[1] 2 6 10
x + c(10, 20, 30)[1] 11 23 35
# Logical operations on vectors
x > 2[1] FALSE TRUE TRUE
any(x > 2) # is any element > 2?[1] TRUE
all(x > 2) # are all elements > 2?[1] FALSE
which(x > 2) # indices where condition is TRUE[1] 2 3
# Common summary functions
length(x)[1] 3
sum(x)[1] 9
mean(x)[1] 3
sd(x)[1] 2
min(x); max(x)[1] 1
[1] 5
A matrix is a two-dimensional vector: all elements share the same type, and values are arranged in rows and columns. In network analysis, adjacency matrices are the canonical representation of graph structure — rows and columns correspond to nodes, and cell values indicate the presence (or weight) of an edge.
m <- matrix(1:25, nrow = 5, ncol = 5)
m [,1] [,2] [,3] [,4] [,5]
[1,] 1 6 11 16 21
[2,] 2 7 12 17 22
[3,] 3 8 13 18 23
[4,] 4 9 14 19 24
[5,] 5 10 15 20 25
# Element access
m[1, 2] # row 1, column 2[1] 6
m[1, ] # entire first row[1] 1 6 11 16 21
m[, 2] # entire second column[1] 6 7 8 9 10
m[2:3, 3:5] # submatrix [,1] [,2] [,3]
[1,] 12 17 22
[2,] 13 18 23
# Dimensions
nrow(m); ncol(m)[1] 5
[1] 5
dim(m)[1] 5 5
# Matrix operations — essential for network mathematics
t(m) # transpose [,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 6 7 8 9 10
[3,] 11 12 13 14 15
[4,] 16 17 18 19 20
[5,] 21 22 23 24 25
m %*% t(m) # matrix multiplication (not element-wise!) [,1] [,2] [,3] [,4] [,5]
[1,] 855 910 965 1020 1075
[2,] 910 970 1030 1090 1150
[3,] 965 1030 1095 1160 1225
[4,] 1020 1090 1160 1230 1300
[5,] 1075 1150 1225 1300 1375
diag(m) # diagonal elements[1] 1 7 13 19 25
# Build an adjacency matrix manually
adj <- matrix(0, nrow = 4, ncol = 4,
dimnames = list(c("A","B","C","D"), c("A","B","C","D")))
adj["A", "B"] <- 1
adj["B", "C"] <- 1
adj["C", "D"] <- 1
adj["D", "A"] <- 1
adj # a 4-node directed cycle A B C D
A 0 1 0 0
B 0 0 1 0
C 0 0 0 1
D 1 0 0 0
Lists are R’s most flexible container: each element can hold any object of any type or length. Many R functions return lists (model output, igraph objects, etc.), so you need to know how to navigate them.
person <- list(
name = "Alice",
age = 34,
scores = c(88, 92, 79),
active = TRUE
)# Access by name
person$name[1] "Alice"
person[["age"]][1] 34
# Access by position
person[[3]] # third element (the scores vector)[1] 88 92 79
person[[3]][2] # second score[1] 92
# Inspect structure
str(person)List of 4
$ name : chr "Alice"
$ age : num 34
$ scores: num [1:3] 88 92 79
$ active: logi TRUE
length(person)[1] 4
names(person)[1] "name" "age" "scores" "active"
A data frame is the standard rectangular data structure in R: a list of vectors of equal length, where each vector is a column. Think of it as a spreadsheet or database table. Node attribute tables and edge lists are both naturally represented as data frames.
df <- data.frame(
id = 1:5,
name = c("Alice", "Bob", "Clara", "David", "Eva"),
degree = c(3, 7, 2, 5, 4),
active = c(TRUE, TRUE, FALSE, TRUE, FALSE),
stringsAsFactors = FALSE
)
df id name degree active
1 1 Alice 3 TRUE
2 2 Bob 7 TRUE
3 3 Clara 2 FALSE
4 4 David 5 TRUE
5 5 Eva 4 FALSE
# Column access
df$name[1] "Alice" "Bob" "Clara" "David" "Eva"
df[["degree"]][1] 3 7 2 5 4
# Row and column indexing
df[1, ] # first row id name degree active
1 1 Alice 3 TRUE
df[, 3] # third column[1] 3 7 2 5 4
df[df$active, ] # rows where active == TRUE id name degree active
1 1 Alice 3 TRUE
2 2 Bob 7 TRUE
4 4 David 5 TRUE
df[df$degree > 3, c("name", "degree")] name degree
2 Bob 7
4 David 5
5 Eva 4
# Useful inspection functions
nrow(df); ncol(df)[1] 5
[1] 4
head(df, 3) id name degree active
1 1 Alice 3 TRUE
2 2 Bob 7 TRUE
3 3 Clara 2 FALSE
str(df)'data.frame': 5 obs. of 4 variables:
$ id : int 1 2 3 4 5
$ name : chr "Alice" "Bob" "Clara" "David" ...
$ degree: num 3 7 2 5 4
$ active: logi TRUE TRUE FALSE TRUE FALSE
summary(df) id name degree active
Min. :1 Length:5 Min. :2.0 Mode :logical
1st Qu.:2 Class :character 1st Qu.:3.0 FALSE:2
Median :3 Mode :character Median :4.0 TRUE :3
Mean :3 Mean :4.2
3rd Qu.:4 3rd Qu.:5.0
Max. :5 Max. :7.0
x <- 15
if (x > 10) {
cat("x is greater than 10\n")
} else if (x == 10) {
cat("x is exactly 10\n")
} else {
cat("x is less than 10\n")
}x is greater than 10
# ifelse() — vectorised conditional
scores <- c(55, 72, 48, 91, 60)
ifelse(scores >= 60, "pass", "fail")[1] "fail" "pass" "fail" "pass" "pass"
In R, explicit loops are often avoidable thanks to vectorisation, but they are useful for iterative tasks and are worth knowing.
# for loop
for (i in 1:5) {
cat("Iteration:", i, "\n")
}Iteration: 1
Iteration: 2
Iteration: 3
Iteration: 4
Iteration: 5
# while loop
count <- 0
while (count < 3) {
count <- count + 1
cat("count is", count, "\n")
}count is 1
count is 2
count is 3
# break and next
for (i in 1:10) {
if (i == 4) next # skip this iteration
if (i == 7) break # exit the loop
cat(i, "")
}1 2 3 5 6
The apply family offers a concise way to apply a function over a vector, list, or matrix — often replacing explicit loops.
m <- matrix(1:12, nrow = 3)apply(m, 1, sum) # row sums[1] 22 26 30
apply(m, 2, mean) # column means[1] 2 5 8 11
nums <- list(a = 1:4, b = 5:10, c = 11:15)
sapply(nums, mean) # returns a named vector a b c
2.5 7.5 13.0
lapply(nums, length) # returns a list$a
[1] 4
$b
[1] 6
$c
[1] 5
Functions are the building blocks of reusable code. When you find yourself repeating the same operations, write a function.
# Basic function definition
greet <- function(name, greeting = "Hello") {
paste(greeting, name)
}
greet("Alice")[1] "Hello Alice"
greet("Bob", greeting = "Welcome")[1] "Welcome Bob"
# A more practical example: normalise a vector to [0, 1]
normalise <- function(x) {
(x - min(x)) / (max(x) - min(x))
}centralityScores <- c(3, 7, 2, 9, 4)
normalise(centralityScores)[1] 0.1428571 0.7142857 0.0000000 1.0000000 0.2857143
# Functions can return multiple values via a list
describe <- function(x) {
list(n = length(x), mean = mean(x), sd = sd(x), range = range(x))
}
describe(centralityScores)$n
[1] 5
$mean
[1] 5
$sd
[1] 2.915476
$range
[1] 2 9
The tidyverse is a collection of R packages designed around a consistent philosophy of tidy data and readable code. The core package for data manipulation is dplyr, which provides a small set of verbs that cover the vast majority of data-wrangling tasks.
library(tidyverse)The pipe |> (base R, since 4.1) or %>% (magrittr/tidyverse) passes the result of one expression as the first argument of the next. It allows you to write a sequence of transformations in the order they happen, which is much easier to read than nested function calls.
# Without pipe — read inside out
round(mean(c(1, 2, 3, 4, 5)), digits = 2)[1] 3
# With pipe — read left to right
c(1, 2, 3, 4, 5) |> mean() |> round(digits = 2)[1] 3
We’ll use a small node-attribute table representing actors in a network.
nodes <- tibble(
id = 1:8,
name = c("Alice","Bob","Clara","David","Eva","Frank","Grace","Hugo"),
department = c("Research","Operations","Research","HR","Operations","Research","HR","Operations"),
tenure = c(5, 2, 8, 3, 6, 1, 4, 7),
score = c(72, 85, 91, 60, 78, 55, 88, 74)
)nodes# A tibble: 8 × 5
id name department tenure score
<int> <chr> <chr> <dbl> <dbl>
1 1 Alice Research 5 72
2 2 Bob Operations 2 85
3 3 Clara Research 8 91
4 4 David HR 3 60
5 5 Eva Operations 6 78
6 6 Frank Research 1 55
7 7 Grace HR 4 88
8 8 Hugo Operations 7 74
# Keep only Research department members
nodes |> filter(department == "Research")# A tibble: 3 × 5
id name department tenure score
<int> <chr> <chr> <dbl> <dbl>
1 1 Alice Research 5 72
2 3 Clara Research 8 91
3 6 Frank Research 1 55
# Multiple conditions
nodes |> filter(department == "Operations", tenure > 4)# A tibble: 2 × 5
id name department tenure score
<int> <chr> <chr> <dbl> <dbl>
1 5 Eva Operations 6 78
2 8 Hugo Operations 7 74
# Using %in% for multiple values
nodes |> filter(department %in% c("HR", "Operations"))# A tibble: 5 × 5
id name department tenure score
<int> <chr> <chr> <dbl> <dbl>
1 2 Bob Operations 2 85
2 4 David HR 3 60
3 5 Eva Operations 6 78
4 7 Grace HR 4 88
5 8 Hugo Operations 7 74
nodes |> select(name, department, score)# A tibble: 8 × 3
name department score
<chr> <chr> <dbl>
1 Alice Research 72
2 Bob Operations 85
3 Clara Research 91
4 David HR 60
5 Eva Operations 78
6 Frank Research 55
7 Grace HR 88
8 Hugo Operations 74
# Drop a column with -
nodes |> select(-id)# A tibble: 8 × 4
name department tenure score
<chr> <chr> <dbl> <dbl>
1 Alice Research 5 72
2 Bob Operations 2 85
3 Clara Research 8 91
4 David HR 3 60
5 Eva Operations 6 78
6 Frank Research 1 55
7 Grace HR 4 88
8 Hugo Operations 7 74
# Rename while selecting
nodes |> select(name, dept = department, performance = score)# A tibble: 8 × 3
name dept performance
<chr> <chr> <dbl>
1 Alice Research 72
2 Bob Operations 85
3 Clara Research 91
4 David HR 60
5 Eva Operations 78
6 Frank Research 55
7 Grace HR 88
8 Hugo Operations 74
nodes |>
mutate(
senior = tenure >= 5,
score_z = (score - mean(score)) / sd(score), # standardise
label = paste0(name, " (", department, ")")
)# A tibble: 8 × 8
id name department tenure score senior score_z label
<int> <chr> <chr> <dbl> <dbl> <lgl> <dbl> <chr>
1 1 Alice Research 5 72 TRUE -0.261 Alice (Researc…
2 2 Bob Operations 2 85 FALSE 0.745 Bob (Operation…
3 3 Clara Research 8 91 TRUE 1.21 Clara (Researc…
4 4 David HR 3 60 FALSE -1.19 David (HR)
5 5 Eva Operations 6 78 TRUE 0.203 Eva (Operation…
6 6 Frank Research 1 55 FALSE -1.58 Frank (Researc…
7 7 Grace HR 4 88 FALSE 0.977 Grace (HR)
8 8 Hugo Operations 7 74 TRUE -0.106 Hugo (Operatio…
nodes |> arrange(desc(score)) # highest score first# A tibble: 8 × 5
id name department tenure score
<int> <chr> <chr> <dbl> <dbl>
1 3 Clara Research 8 91
2 7 Grace HR 4 88
3 2 Bob Operations 2 85
4 5 Eva Operations 6 78
5 8 Hugo Operations 7 74
6 1 Alice Research 5 72
7 4 David HR 3 60
8 6 Frank Research 1 55
nodes |> arrange(department, tenure) # sort by two columns# A tibble: 8 × 5
id name department tenure score
<int> <chr> <chr> <dbl> <dbl>
1 4 David HR 3 60
2 7 Grace HR 4 88
3 2 Bob Operations 2 85
4 5 Eva Operations 6 78
5 8 Hugo Operations 7 74
6 6 Frank Research 1 55
7 1 Alice Research 5 72
8 3 Clara Research 8 91
# Overall summary
nodes |>
summarise(
n = n(),
mean_score = mean(score),
sd_score = sd(score),
max_tenure = max(tenure)
)# A tibble: 1 × 4
n mean_score sd_score max_tenure
<int> <dbl> <dbl> <dbl>
1 8 75.4 12.9 8
# Summary by group — critical for network attribute analysis
nodes |>
group_by(department) |>
summarise(
n = n(),
mean_score = mean(score),
mean_tenure = mean(tenure)
) |>
arrange(desc(mean_score))# A tibble: 3 × 4
department n mean_score mean_tenure
<chr> <int> <dbl> <dbl>
1 Operations 3 79 5
2 HR 2 74 3.5
3 Research 3 72.7 4.67
Joins combine two data frames on a shared key column — essential when merging node attributes with an edge list.
# Edge list: pairs of node IDs representing connections
edges <- tibble(
from = c(1, 1, 2, 3, 4, 5, 6),
to = c(2, 3, 4, 5, 6, 7, 8),
weight = c(1.2, 0.8, 1.5, 0.5, 2.0, 1.1, 0.9)
)# Attach sender attributes to edge list
edges |>
left_join(nodes |> select(id, name, department),
by = c("from" = "id")) |>
rename(from_name = name, from_dept = department)# A tibble: 7 × 5
from to weight from_name from_dept
<dbl> <dbl> <dbl> <chr> <chr>
1 1 2 1.2 Alice Research
2 1 3 0.8 Alice Research
3 2 4 1.5 Bob Operations
4 3 5 0.5 Clara Research
5 4 6 2 David HR
6 5 7 1.1 Eva Operations
7 6 8 0.9 Frank Research
# Wide to long (pivot_longer) — useful for comparing multiple metrics
nodes |>
select(name, tenure, score) |>
pivot_longer(cols = c(tenure, score),
names_to = "metric",
values_to = "value")# A tibble: 16 × 3
name metric value
<chr> <chr> <dbl>
1 Alice tenure 5
2 Alice score 72
3 Bob tenure 2
4 Bob score 85
5 Clara tenure 8
6 Clara score 91
7 David tenure 3
8 David score 60
9 Eva tenure 6
10 Eva score 78
11 Frank tenure 1
12 Frank score 55
13 Grace tenure 4
14 Grace score 88
15 Hugo tenure 7
16 Hugo score 74
# Long to wide (pivot_wider)
nodes |>
select(name, department) |>
mutate(present = 1) |>
pivot_wider(names_from = department, values_from = present, values_fill = 0)# A tibble: 8 × 4
name Research Operations HR
<chr> <dbl> <dbl> <dbl>
1 Alice 1 0 0
2 Bob 0 1 0
3 Clara 1 0 0
4 David 0 0 1
5 Eva 0 1 0
6 Frank 1 0 0
7 Grace 0 0 1
8 Hugo 0 1 0
# CSV files
df <- read_csv("data/nodes.csv") # tidyverse (recommended)
df <- read.csv("data/nodes.csv") # base Rwrite_csv(df, "data/nodes_clean.csv")# Excel files
library(readxl)
df <- read_excel("data/data.xlsx", sheet = 1)# SPSS, Stata, SAS
library(haven)
df <- read_spss("data/survey.sav")
df <- read_dta("data/survey.dta")# R's native format
saveRDS(df, "data/df.rds")
df <- readRDS("data/df.rds")| Task | Function |
|---|---|
| Assign a value | x <- value |
| Create a vector | c(1, 2, 3) |
| Create a sequence | 1:10, seq(0, 1, 0.1) |
| Create a matrix | matrix(data, nrow, ncol) |
| Create a data frame | data.frame() / tibble() |
| Check an object’s type | class(), typeof() |
| Check dimensions | dim(), nrow(), ncol(), length() |
| Inspect structure | str(), summary(), head() |
| Filter rows | filter() |
| Select columns | select() |
| Create columns | mutate() |
| Sort rows | arrange() |
| Aggregate | group_by() + summarise() |
| Join tables | left_join(), inner_join() |
| Reshape wide → long | pivot_longer() |
| Reshape long → wide | pivot_wider() |