3  R Fundamentals

Essential programming skills for network analysis

Author
Published

March 18, 2026

Before we build networks, we need to ensure you’re comfortable with R itself. This chapter introduces the R programming concepts you will need throughout the workshop. If you are already comfortable with R, treat it as a quick reference. If you are new to R, work through it carefully — the material here underpins everything that follows.

TipRunning code

All code blocks in this workshop can be run interactively. Open the .qmd source file in RStudio, place your cursor inside a code chunk, and press Ctrl+Enter (Windows/Linux) or Cmd+Enter (Mac) to run a single line, or Ctrl+Shift+Enter to run the entire chunk.


Basic Operations and Assignment

R can be used as a calculator, but its real power lies in storing and manipulating values. The assignment operator <- binds a value to a name in the current environment.

1 + 3   
[1] 4
# arithmetic evaluation
2^8             # exponentiation
[1] 256
17 %% 5         # modulo (remainder)
[1] 2
17 %/% 5        # integer division
[1] 3
a <- 9          # assignment: bind 9 to the name 'a'
sqrt(a)         # apply a function to an object
[1] 3
b <- sqrt(a)
a == b          # logical comparison: are they equal?
[1] FALSE
a != b          # not equal
[1] TRUE
a > 5           # greater than
[1] TRUE
ls()            # list all objects in the current environment
[1] "a" "b"
Note

R is case-sensitive: A and a are different objects. Object names can contain letters, digits, . and _, but must start with a letter or ..


Data Types

Every value in R has a type. The most common scalar types are:

class(3.14)         # numeric (double)
[1] "numeric"
class(42L)          # integer (note: I used the L suffix to tell R this is an integer)
[1] "integer"
class("hello")      # character
[1] "character"
class(TRUE)         # logical
[1] "logical"
class(2 + 3i)       # complex
[1] "complex"
# Type coercion
as.numeric("3.14")
[1] 3.14
as.integer(42)
[1] 42
as.character(100)
[1] "100"
as.logical(0)       # 0 is FALSE; anything non-zero is TRUE
[1] FALSE
is.na(NA)           # test for missing value
[1] TRUE

Vectors

A vector is the fundamental data structure in R — an ordered collection of values of the same type. Almost all operations in R are vectorised, meaning they apply element-wise without explicit loops.

x <- c(1, 3, 5)          # combine values into a vector
y <- c("one", "three", "five")
# Indexing (1-based)
x[1]                      # first element
[1] 1
x[c(1, 3)]                # first and third elements
[1] 1 5
x[-2]                     # all except the second
[1] 1 5
# Sequences
a <- 1:10
b <- seq(from = 0, to = 1, by = 0.25)
c_rep <- rep(c(1, 2), times = 3)
c_rep
[1] 1 2 1 2 1 2
# Vectorised operations — applied element-wise
x * 2
[1]  2  6 10
x + c(10, 20, 30)
[1] 11 23 35
# Logical operations on vectors
x > 2
[1] FALSE  TRUE  TRUE
any(x > 2)                # is any element > 2?
[1] TRUE
all(x > 2)                # are all elements > 2?
[1] FALSE
which(x > 2)              # indices where condition is TRUE
[1] 2 3
# Common summary functions
length(x)
[1] 3
sum(x)
[1] 9
mean(x)
[1] 3
sd(x)
[1] 2
min(x); max(x)
[1] 1
[1] 5

Matrices

A matrix is a two-dimensional vector: all elements share the same type, and values are arranged in rows and columns. In network analysis, adjacency matrices are the canonical representation of graph structure — rows and columns correspond to nodes, and cell values indicate the presence (or weight) of an edge.

m <- matrix(1:25, nrow = 5, ncol = 5)
m
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    6   11   16   21
[2,]    2    7   12   17   22
[3,]    3    8   13   18   23
[4,]    4    9   14   19   24
[5,]    5   10   15   20   25
# Element access
m[1, 2]                   # row 1, column 2
[1] 6
m[1, ]                    # entire first row
[1]  1  6 11 16 21
m[, 2]                    # entire second column
[1]  6  7  8  9 10
m[2:3, 3:5]               # submatrix
     [,1] [,2] [,3]
[1,]   12   17   22
[2,]   13   18   23
# Dimensions
nrow(m); ncol(m)
[1] 5
[1] 5
dim(m)
[1] 5 5
# Matrix operations — essential for network mathematics
t(m)                      # transpose
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    2    3    4    5
[2,]    6    7    8    9   10
[3,]   11   12   13   14   15
[4,]   16   17   18   19   20
[5,]   21   22   23   24   25
m %*% t(m)                # matrix multiplication (not element-wise!)
     [,1] [,2] [,3] [,4] [,5]
[1,]  855  910  965 1020 1075
[2,]  910  970 1030 1090 1150
[3,]  965 1030 1095 1160 1225
[4,] 1020 1090 1160 1230 1300
[5,] 1075 1150 1225 1300 1375
diag(m)                   # diagonal elements
[1]  1  7 13 19 25
# Build an adjacency matrix manually
adj <- matrix(0, nrow = 4, ncol = 4,
              dimnames = list(c("A","B","C","D"), c("A","B","C","D")))
adj["A", "B"] <- 1
adj["B", "C"] <- 1
adj["C", "D"] <- 1
adj["D", "A"] <- 1
adj                       # a 4-node directed cycle
  A B C D
A 0 1 0 0
B 0 0 1 0
C 0 0 0 1
D 1 0 0 0

Lists

Lists are R’s most flexible container: each element can hold any object of any type or length. Many R functions return lists (model output, igraph objects, etc.), so you need to know how to navigate them.

person <- list(
  name    = "Alice",
  age     = 34,
  scores  = c(88, 92, 79),
  active  = TRUE
)
# Access by name
person$name
[1] "Alice"
person[["age"]]
[1] 34
# Access by position
person[[3]]               # third element (the scores vector)
[1] 88 92 79
person[[3]][2]            # second score
[1] 92
# Inspect structure
str(person)
List of 4
 $ name  : chr "Alice"
 $ age   : num 34
 $ scores: num [1:3] 88 92 79
 $ active: logi TRUE
length(person)
[1] 4
names(person)
[1] "name"   "age"    "scores" "active"

Data Frames

A data frame is the standard rectangular data structure in R: a list of vectors of equal length, where each vector is a column. Think of it as a spreadsheet or database table. Node attribute tables and edge lists are both naturally represented as data frames.

df <- data.frame(
  id     = 1:5,
  name   = c("Alice", "Bob", "Clara", "David", "Eva"),
  degree = c(3, 7, 2, 5, 4),
  active = c(TRUE, TRUE, FALSE, TRUE, FALSE),
  stringsAsFactors = FALSE
)

df
  id  name degree active
1  1 Alice      3   TRUE
2  2   Bob      7   TRUE
3  3 Clara      2  FALSE
4  4 David      5   TRUE
5  5   Eva      4  FALSE
# Column access
df$name
[1] "Alice" "Bob"   "Clara" "David" "Eva"  
df[["degree"]]
[1] 3 7 2 5 4
# Row and column indexing
df[1, ]                   # first row
  id  name degree active
1  1 Alice      3   TRUE
df[, 3]                   # third column
[1] 3 7 2 5 4
df[df$active, ]           # rows where active == TRUE
  id  name degree active
1  1 Alice      3   TRUE
2  2   Bob      7   TRUE
4  4 David      5   TRUE
df[df$degree > 3, c("name", "degree")]
   name degree
2   Bob      7
4 David      5
5   Eva      4
# Useful inspection functions
nrow(df); ncol(df)
[1] 5
[1] 4
head(df, 3)
  id  name degree active
1  1 Alice      3   TRUE
2  2   Bob      7   TRUE
3  3 Clara      2  FALSE
str(df)
'data.frame':   5 obs. of  4 variables:
 $ id    : int  1 2 3 4 5
 $ name  : chr  "Alice" "Bob" "Clara" "David" ...
 $ degree: num  3 7 2 5 4
 $ active: logi  TRUE TRUE FALSE TRUE FALSE
summary(df)
       id        name               degree      active       
 Min.   :1   Length:5           Min.   :2.0   Mode :logical  
 1st Qu.:2   Class :character   1st Qu.:3.0   FALSE:2        
 Median :3   Mode  :character   Median :4.0   TRUE :3        
 Mean   :3                      Mean   :4.2                  
 3rd Qu.:4                      3rd Qu.:5.0                  
 Max.   :5                      Max.   :7.0                  

Control Flow

Conditionals

x <- 15

if (x > 10) {
  cat("x is greater than 10\n")
} else if (x == 10) {
  cat("x is exactly 10\n")
} else {
  cat("x is less than 10\n")
}
x is greater than 10
# ifelse() — vectorised conditional
scores <- c(55, 72, 48, 91, 60)
ifelse(scores >= 60, "pass", "fail")
[1] "fail" "pass" "fail" "pass" "pass"

Loops

In R, explicit loops are often avoidable thanks to vectorisation, but they are useful for iterative tasks and are worth knowing.

# for loop
for (i in 1:5) {
  cat("Iteration:", i, "\n")
}
Iteration: 1 
Iteration: 2 
Iteration: 3 
Iteration: 4 
Iteration: 5 
# while loop
count <- 0
while (count < 3) {
  count <- count + 1
  cat("count is", count, "\n")
}
count is 1 
count is 2 
count is 3 
# break and next
for (i in 1:10) {
  if (i == 4) next       # skip this iteration
  if (i == 7) break      # exit the loop
  cat(i, "")
}
1 2 3 5 6 

Apply Functions

The apply family offers a concise way to apply a function over a vector, list, or matrix — often replacing explicit loops.

m <- matrix(1:12, nrow = 3)
apply(m, 1, sum)          # row sums
[1] 22 26 30
apply(m, 2, mean)         # column means
[1]  2  5  8 11
nums <- list(a = 1:4, b = 5:10, c = 11:15)
sapply(nums, mean)        # returns a named vector
   a    b    c 
 2.5  7.5 13.0 
lapply(nums, length)      # returns a list
$a
[1] 4

$b
[1] 6

$c
[1] 5

Writing Functions

Functions are the building blocks of reusable code. When you find yourself repeating the same operations, write a function.

# Basic function definition
greet <- function(name, greeting = "Hello") {
  paste(greeting, name)
}

greet("Alice")
[1] "Hello Alice"
greet("Bob", greeting = "Welcome")
[1] "Welcome Bob"
# A more practical example: normalise a vector to [0, 1]
normalise <- function(x) {
  (x - min(x)) / (max(x) - min(x))
}
centralityScores <- c(3, 7, 2, 9, 4)
normalise(centralityScores)
[1] 0.1428571 0.7142857 0.0000000 1.0000000 0.2857143
# Functions can return multiple values via a list
describe <- function(x) {
  list(n = length(x), mean = mean(x), sd = sd(x), range = range(x))
}

describe(centralityScores)
$n
[1] 5

$mean
[1] 5

$sd
[1] 2.915476

$range
[1] 2 9

Data Wrangling with the Tidyverse

The tidyverse is a collection of R packages designed around a consistent philosophy of tidy data and readable code. The core package for data manipulation is dplyr, which provides a small set of verbs that cover the vast majority of data-wrangling tasks.

library(tidyverse)

The Pipe Operator

The pipe |> (base R, since 4.1) or %>% (magrittr/tidyverse) passes the result of one expression as the first argument of the next. It allows you to write a sequence of transformations in the order they happen, which is much easier to read than nested function calls.

# Without pipe — read inside out
round(mean(c(1, 2, 3, 4, 5)), digits = 2)
[1] 3
# With pipe — read left to right
c(1, 2, 3, 4, 5) |> mean() |> round(digits = 2)
[1] 3

A Working Dataset

We’ll use a small node-attribute table representing actors in a network.

nodes <- tibble(
  id         = 1:8,
  name       = c("Alice","Bob","Clara","David","Eva","Frank","Grace","Hugo"),
  department = c("Research","Operations","Research","HR","Operations","Research","HR","Operations"),
  tenure     = c(5, 2, 8, 3, 6, 1, 4, 7),
  score      = c(72, 85, 91, 60, 78, 55, 88, 74)
)
nodes
# A tibble: 8 × 5
     id name  department tenure score
  <int> <chr> <chr>       <dbl> <dbl>
1     1 Alice Research        5    72
2     2 Bob   Operations      2    85
3     3 Clara Research        8    91
4     4 David HR              3    60
5     5 Eva   Operations      6    78
6     6 Frank Research        1    55
7     7 Grace HR              4    88
8     8 Hugo  Operations      7    74

filter() — Keep Rows

# Keep only Research department members
nodes |> filter(department == "Research")
# A tibble: 3 × 5
     id name  department tenure score
  <int> <chr> <chr>       <dbl> <dbl>
1     1 Alice Research        5    72
2     3 Clara Research        8    91
3     6 Frank Research        1    55
# Multiple conditions
nodes |> filter(department == "Operations", tenure > 4)
# A tibble: 2 × 5
     id name  department tenure score
  <int> <chr> <chr>       <dbl> <dbl>
1     5 Eva   Operations      6    78
2     8 Hugo  Operations      7    74
# Using %in% for multiple values
nodes |> filter(department %in% c("HR", "Operations"))
# A tibble: 5 × 5
     id name  department tenure score
  <int> <chr> <chr>       <dbl> <dbl>
1     2 Bob   Operations      2    85
2     4 David HR              3    60
3     5 Eva   Operations      6    78
4     7 Grace HR              4    88
5     8 Hugo  Operations      7    74

select() — Choose Columns

nodes |> select(name, department, score)
# A tibble: 8 × 3
  name  department score
  <chr> <chr>      <dbl>
1 Alice Research      72
2 Bob   Operations    85
3 Clara Research      91
4 David HR            60
5 Eva   Operations    78
6 Frank Research      55
7 Grace HR            88
8 Hugo  Operations    74
# Drop a column with -
nodes |> select(-id)
# A tibble: 8 × 4
  name  department tenure score
  <chr> <chr>       <dbl> <dbl>
1 Alice Research        5    72
2 Bob   Operations      2    85
3 Clara Research        8    91
4 David HR              3    60
5 Eva   Operations      6    78
6 Frank Research        1    55
7 Grace HR              4    88
8 Hugo  Operations      7    74
# Rename while selecting
nodes |> select(name, dept = department, performance = score)
# A tibble: 8 × 3
  name  dept       performance
  <chr> <chr>            <dbl>
1 Alice Research            72
2 Bob   Operations          85
3 Clara Research            91
4 David HR                  60
5 Eva   Operations          78
6 Frank Research            55
7 Grace HR                  88
8 Hugo  Operations          74

mutate() — Create or Modify Columns

nodes |>
  mutate(
    senior   = tenure >= 5,
    score_z  = (score - mean(score)) / sd(score),   # standardise
    label    = paste0(name, " (", department, ")")
  )
# A tibble: 8 × 8
     id name  department tenure score senior score_z label          
  <int> <chr> <chr>       <dbl> <dbl> <lgl>    <dbl> <chr>          
1     1 Alice Research        5    72 TRUE    -0.261 Alice (Researc…
2     2 Bob   Operations      2    85 FALSE    0.745 Bob (Operation…
3     3 Clara Research        8    91 TRUE     1.21  Clara (Researc…
4     4 David HR              3    60 FALSE   -1.19  David (HR)     
5     5 Eva   Operations      6    78 TRUE     0.203 Eva (Operation…
6     6 Frank Research        1    55 FALSE   -1.58  Frank (Researc…
7     7 Grace HR              4    88 FALSE    0.977 Grace (HR)     
8     8 Hugo  Operations      7    74 TRUE    -0.106 Hugo (Operatio…

arrange() — Sort Rows

nodes |> arrange(desc(score))          # highest score first
# A tibble: 8 × 5
     id name  department tenure score
  <int> <chr> <chr>       <dbl> <dbl>
1     3 Clara Research        8    91
2     7 Grace HR              4    88
3     2 Bob   Operations      2    85
4     5 Eva   Operations      6    78
5     8 Hugo  Operations      7    74
6     1 Alice Research        5    72
7     4 David HR              3    60
8     6 Frank Research        1    55
nodes |> arrange(department, tenure)   # sort by two columns
# A tibble: 8 × 5
     id name  department tenure score
  <int> <chr> <chr>       <dbl> <dbl>
1     4 David HR              3    60
2     7 Grace HR              4    88
3     2 Bob   Operations      2    85
4     5 Eva   Operations      6    78
5     8 Hugo  Operations      7    74
6     6 Frank Research        1    55
7     1 Alice Research        5    72
8     3 Clara Research        8    91

summarise() and group_by() — Aggregation

# Overall summary
nodes |>
  summarise(
    n         = n(),
    mean_score = mean(score),
    sd_score   = sd(score),
    max_tenure = max(tenure)
  )
# A tibble: 1 × 4
      n mean_score sd_score max_tenure
  <int>      <dbl>    <dbl>      <dbl>
1     8       75.4     12.9          8
# Summary by group — critical for network attribute analysis
nodes |>
  group_by(department) |>
  summarise(
    n          = n(),
    mean_score = mean(score),
    mean_tenure = mean(tenure)
  ) |>
  arrange(desc(mean_score))
# A tibble: 3 × 4
  department     n mean_score mean_tenure
  <chr>      <int>      <dbl>       <dbl>
1 Operations     3       79          5   
2 HR             2       74          3.5 
3 Research       3       72.7        4.67

Joining Tables

Joins combine two data frames on a shared key column — essential when merging node attributes with an edge list.

# Edge list: pairs of node IDs representing connections
edges <- tibble(
  from = c(1, 1, 2, 3, 4, 5, 6),
  to   = c(2, 3, 4, 5, 6, 7, 8),
  weight = c(1.2, 0.8, 1.5, 0.5, 2.0, 1.1, 0.9)
)
# Attach sender attributes to edge list
edges |>
  left_join(nodes |> select(id, name, department),
            by = c("from" = "id")) |>
  rename(from_name = name, from_dept = department)
# A tibble: 7 × 5
   from    to weight from_name from_dept 
  <dbl> <dbl>  <dbl> <chr>     <chr>     
1     1     2    1.2 Alice     Research  
2     1     3    0.8 Alice     Research  
3     2     4    1.5 Bob       Operations
4     3     5    0.5 Clara     Research  
5     4     6    2   David     HR        
6     5     7    1.1 Eva       Operations
7     6     8    0.9 Frank     Research  

Reshaping Data

# Wide to long (pivot_longer) — useful for comparing multiple metrics
nodes |>
  select(name, tenure, score) |>
  pivot_longer(cols = c(tenure, score),
               names_to  = "metric",
               values_to = "value")
# A tibble: 16 × 3
   name  metric value
   <chr> <chr>  <dbl>
 1 Alice tenure     5
 2 Alice score     72
 3 Bob   tenure     2
 4 Bob   score     85
 5 Clara tenure     8
 6 Clara score     91
 7 David tenure     3
 8 David score     60
 9 Eva   tenure     6
10 Eva   score     78
11 Frank tenure     1
12 Frank score     55
13 Grace tenure     4
14 Grace score     88
15 Hugo  tenure     7
16 Hugo  score     74
# Long to wide (pivot_wider)
nodes |>
  select(name, department) |>
  mutate(present = 1) |>
  pivot_wider(names_from = department, values_from = present, values_fill = 0)
# A tibble: 8 × 4
  name  Research Operations    HR
  <chr>    <dbl>      <dbl> <dbl>
1 Alice        1          0     0
2 Bob          0          1     0
3 Clara        1          0     0
4 David        0          0     1
5 Eva          0          1     0
6 Frank        1          0     0
7 Grace        0          0     1
8 Hugo         0          1     0

Reading and Writing Data

# CSV files
df <- read_csv("data/nodes.csv")            # tidyverse (recommended)
df <- read.csv("data/nodes.csv")            # base R
write_csv(df, "data/nodes_clean.csv")
# Excel files
library(readxl)
df <- read_excel("data/data.xlsx", sheet = 1)
# SPSS, Stata, SAS
library(haven)
df <- read_spss("data/survey.sav")
df <- read_dta("data/survey.dta")
# R's native format
saveRDS(df, "data/df.rds")
df <- readRDS("data/df.rds")

Quick Reference

Task Function
Assign a value x <- value
Create a vector c(1, 2, 3)
Create a sequence 1:10, seq(0, 1, 0.1)
Create a matrix matrix(data, nrow, ncol)
Create a data frame data.frame() / tibble()
Check an object’s type class(), typeof()
Check dimensions dim(), nrow(), ncol(), length()
Inspect structure str(), summary(), head()
Filter rows filter()
Select columns select()
Create columns mutate()
Sort rows arrange()
Aggregate group_by() + summarise()
Join tables left_join(), inner_join()
Reshape wide → long pivot_longer()
Reshape long → wide pivot_wider()