3 R Fundamentals

Essential programming skills for network analysis

Author

Published

March 18, 2026

Before we build networks, we need to ensure you’re comfortable with R itself. This chapter introduces the R programming concepts you will need throughout the workshop. If you are already comfortable with R, treat it as a quick reference. If you are new to R, work through it carefully — the material here underpins everything that follows.

Running code

All code blocks in this workshop can be run interactively. Open the .qmd source file in RStudio, place your cursor inside a code chunk, and press Ctrl+Enter (Windows/Linux) or Cmd+Enter (Mac) to run a single line, or Ctrl+Shift+Enter to run the entire chunk.

Basic Operations and Assignment

R can be used as a calculator, but its real power lies in storing and manipulating values. The assignment operator <- binds a value to a name in the current environment.

1 + 3

[1] 4

# arithmetic evaluation

2^8             # exponentiation

[1] 256

17 %% 5         # modulo (remainder)

[1] 2

17 %/% 5        # integer division

[1] 3

a <- 9          # assignment: bind 9 to the name 'a'

sqrt(a)         # apply a function to an object

[1] 3

b <- sqrt(a)

a == b          # logical comparison: are they equal?

[1] FALSE

a != b          # not equal

[1] TRUE

a > 5           # greater than

[1] TRUE

ls()            # list all objects in the current environment

[1] "a" "b"

Note

R is case-sensitive: A and a are different objects. Object names can contain letters, digits, . and _, but must start with a letter or ..

Data Types

Every value in R has a type. The most common scalar types are:

class(3.14)         # numeric (double)

[1] "numeric"

class(42L)          # integer (note: I used the L suffix to tell R this is an integer)

[1] "integer"

class("hello")      # character

[1] "character"

class(TRUE)         # logical

[1] "logical"

class(2 + 3i)       # complex

[1] "complex"

# Type coercion
as.numeric("3.14")

[1] 3.14

as.integer(42)

[1] 42

as.character(100)

[1] "100"

as.logical(0)       # 0 is FALSE; anything non-zero is TRUE

[1] FALSE

is.na(NA)           # test for missing value

[1] TRUE

Vectors

A vector is the fundamental data structure in R — an ordered collection of values of the same type. Almost all operations in R are vectorised, meaning they apply element-wise without explicit loops.

x <- c(1, 3, 5)          # combine values into a vector
y <- c("one", "three", "five")

# Indexing (1-based)
x[1]                      # first element

[1] 1

x[c(1, 3)]                # first and third elements

[1] 1 5

x[-2]                     # all except the second

[1] 1 5

# Sequences
a <- 1:10
b <- seq(from = 0, to = 1, by = 0.25)
c_rep <- rep(c(1, 2), times = 3)
c_rep

[1] 1 2 1 2 1 2

# Vectorised operations — applied element-wise
x * 2

[1]  2  6 10

x + c(10, 20, 30)

[1] 11 23 35

# Logical operations on vectors
x > 2

[1] FALSE  TRUE  TRUE

any(x > 2)                # is any element > 2?

[1] TRUE

all(x > 2)                # are all elements > 2?

[1] FALSE

which(x > 2)              # indices where condition is TRUE

[1] 2 3

# Common summary functions
length(x)

[1] 3

sum(x)

[1] 9

mean(x)

[1] 3

sd(x)

[1] 2

min(x); max(x)

[1] 1

[1] 5

Matrices

A matrix is a two-dimensional vector: all elements share the same type, and values are arranged in rows and columns. In network analysis, adjacency matrices are the canonical representation of graph structure — rows and columns correspond to nodes, and cell values indicate the presence (or weight) of an edge.

m <- matrix(1:25, nrow = 5, ncol = 5)
m

     [,1] [,2] [,3] [,4] [,5]
[1,]    1    6   11   16   21
[2,]    2    7   12   17   22
[3,]    3    8   13   18   23
[4,]    4    9   14   19   24
[5,]    5   10   15   20   25

# Element access
m[1, 2]                   # row 1, column 2

[1] 6

m[1, ]                    # entire first row

[1]  1  6 11 16 21

m[, 2]                    # entire second column

[1]  6  7  8  9 10

m[2:3, 3:5]               # submatrix

     [,1] [,2] [,3]
[1,]   12   17   22
[2,]   13   18   23

# Dimensions
nrow(m); ncol(m)

[1] 5

[1] 5

dim(m)

[1] 5 5

# Matrix operations — essential for network mathematics
t(m)                      # transpose

     [,1] [,2] [,3] [,4] [,5]
[1,]    1    2    3    4    5
[2,]    6    7    8    9   10
[3,]   11   12   13   14   15
[4,]   16   17   18   19   20
[5,]   21   22   23   24   25

m %*% t(m)                # matrix multiplication (not element-wise!)

     [,1] [,2] [,3] [,4] [,5]
[1,]  855  910  965 1020 1075
[2,]  910  970 1030 1090 1150
[3,]  965 1030 1095 1160 1225
[4,] 1020 1090 1160 1230 1300
[5,] 1075 1150 1225 1300 1375

diag(m)                   # diagonal elements

[1]  1  7 13 19 25

# Build an adjacency matrix manually
adj <- matrix(0, nrow = 4, ncol = 4,
              dimnames = list(c("A","B","C","D"), c("A","B","C","D")))
adj["A", "B"] <- 1
adj["B", "C"] <- 1
adj["C", "D"] <- 1
adj["D", "A"] <- 1
adj                       # a 4-node directed cycle

Lists

Lists are R’s most flexible container: each element can hold any object of any type or length. Many R functions return lists (model output, igraph objects, etc.), so you need to know how to navigate them.

person <- list(
  name    = "Alice",
  age     = 34,
  scores  = c(88, 92, 79),
  active  = TRUE
)

# Access by name
person$name

[1] "Alice"

person[["age"]]

[1] 34

# Access by position
person[[3]]               # third element (the scores vector)

[1] 88 92 79

person[[3]][2]            # second score

[1] 92

# Inspect structure
str(person)

List of 4
 $ name  : chr "Alice"
 $ age   : num 34
 $ scores: num [1:3] 88 92 79
 $ active: logi TRUE

length(person)

[1] 4

names(person)

[1] "name"   "age"    "scores" "active"

Data Frames

A data frame is the standard rectangular data structure in R: a list of vectors of equal length, where each vector is a column. Think of it as a spreadsheet or database table. Node attribute tables and edge lists are both naturally represented as data frames.

df <- data.frame(
  id     = 1:5,
  name   = c("Alice", "Bob", "Clara", "David", "Eva"),
  degree = c(3, 7, 2, 5, 4),
  active = c(TRUE, TRUE, FALSE, TRUE, FALSE),
  stringsAsFactors = FALSE
)

df

  id  name degree active
1  1 Alice      3   TRUE
2  2   Bob      7   TRUE
3  3 Clara      2  FALSE
4  4 David      5   TRUE
5  5   Eva      4  FALSE

# Column access
df$name

[1] "Alice" "Bob"   "Clara" "David" "Eva"

df[["degree"]]

[1] 3 7 2 5 4

# Row and column indexing
df[1, ]                   # first row

  id  name degree active
1  1 Alice      3   TRUE

df[, 3]                   # third column

[1] 3 7 2 5 4

df[df$active, ]           # rows where active == TRUE

  id  name degree active
1  1 Alice      3   TRUE
2  2   Bob      7   TRUE
4  4 David      5   TRUE

df[df$degree > 3, c("name", "degree")]

   name degree
2   Bob      7
4 David      5
5   Eva      4

# Useful inspection functions
nrow(df); ncol(df)

[1] 5

[1] 4

head(df, 3)

  id  name degree active
1  1 Alice      3   TRUE
2  2   Bob      7   TRUE
3  3 Clara      2  FALSE

str(df)

'data.frame':   5 obs. of  4 variables:
 $ id    : int  1 2 3 4 5
 $ name  : chr  "Alice" "Bob" "Clara" "David" ...
 $ degree: num  3 7 2 5 4
 $ active: logi  TRUE TRUE FALSE TRUE FALSE

summary(df)

       id        name               degree      active       
 Min.   :1   Length:5           Min.   :2.0   Mode :logical  
 1st Qu.:2   Class :character   1st Qu.:3.0   FALSE:2        
 Median :3   Mode  :character   Median :4.0   TRUE :3        
 Mean   :3                      Mean   :4.2                  
 3rd Qu.:4                      3rd Qu.:5.0                  
 Max.   :5                      Max.   :7.0

Control Flow

Conditionals

x <- 15

if (x > 10) {
  cat("x is greater than 10\n")
} else if (x == 10) {
  cat("x is exactly 10\n")
} else {
  cat("x is less than 10\n")
}

x is greater than 10

# ifelse() — vectorised conditional
scores <- c(55, 72, 48, 91, 60)
ifelse(scores >= 60, "pass", "fail")

[1] "fail" "pass" "fail" "pass" "pass"

Loops

In R, explicit loops are often avoidable thanks to vectorisation, but they are useful for iterative tasks and are worth knowing.

# for loop
for (i in 1:5) {
  cat("Iteration:", i, "\n")
}

Iteration: 1 
Iteration: 2 
Iteration: 3 
Iteration: 4 
Iteration: 5

# while loop
count <- 0
while (count < 3) {
  count <- count + 1
  cat("count is", count, "\n")
}

count is 1 
count is 2 
count is 3

# break and next
for (i in 1:10) {
  if (i == 4) next       # skip this iteration
  if (i == 7) break      # exit the loop
  cat(i, "")
}

1 2 3 5 6

Apply Functions

The apply family offers a concise way to apply a function over a vector, list, or matrix — often replacing explicit loops.

m <- matrix(1:12, nrow = 3)

apply(m, 1, sum)          # row sums

[1] 22 26 30

apply(m, 2, mean)         # column means

[1]  2  5  8 11

nums <- list(a = 1:4, b = 5:10, c = 11:15)
sapply(nums, mean)        # returns a named vector

   a    b    c 
 2.5  7.5 13.0

lapply(nums, length)      # returns a list

$a
[1] 4

$b
[1] 6

$c
[1] 5

Writing Functions

Functions are the building blocks of reusable code. When you find yourself repeating the same operations, write a function.

# Basic function definition
greet <- function(name, greeting = "Hello") {
  paste(greeting, name)
}

greet("Alice")

[1] "Hello Alice"

greet("Bob", greeting = "Welcome")

[1] "Welcome Bob"

# A more practical example: normalise a vector to [0, 1]
normalise <- function(x) {
  (x - min(x)) / (max(x) - min(x))
}

centralityScores <- c(3, 7, 2, 9, 4)
normalise(centralityScores)

[1] 0.1428571 0.7142857 0.0000000 1.0000000 0.2857143

# Functions can return multiple values via a list
describe <- function(x) {
  list(n = length(x), mean = mean(x), sd = sd(x), range = range(x))
}

describe(centralityScores)

$n
[1] 5

$mean
[1] 5

$sd
[1] 2.915476

$range
[1] 2 9

Data Wrangling with the Tidyverse

The tidyverse is a collection of R packages designed around a consistent philosophy of tidy data and readable code. The core package for data manipulation is dplyr, which provides a small set of verbs that cover the vast majority of data-wrangling tasks.

library(tidyverse)

The Pipe Operator

The pipe |> (base R, since 4.1) or %>% (magrittr/tidyverse) passes the result of one expression as the first argument of the next. It allows you to write a sequence of transformations in the order they happen, which is much easier to read than nested function calls.

# Without pipe — read inside out
round(mean(c(1, 2, 3, 4, 5)), digits = 2)

[1] 3

# With pipe — read left to right
c(1, 2, 3, 4, 5) |> mean() |> round(digits = 2)

[1] 3

A Working Dataset

We’ll use a small node-attribute table representing actors in a network.

nodes <- tibble(
  id         = 1:8,
  name       = c("Alice","Bob","Clara","David","Eva","Frank","Grace","Hugo"),
  department = c("Research","Operations","Research","HR","Operations","Research","HR","Operations"),
  tenure     = c(5, 2, 8, 3, 6, 1, 4, 7),
  score      = c(72, 85, 91, 60, 78, 55, 88, 74)
)

nodes

# A tibble: 8 × 5
     id name  department tenure score
  <int> <chr> <chr>       <dbl> <dbl>
1     1 Alice Research        5    72
2     2 Bob   Operations      2    85
3     3 Clara Research        8    91
4     4 David HR              3    60
5     5 Eva   Operations      6    78
6     6 Frank Research        1    55
7     7 Grace HR              4    88
8     8 Hugo  Operations      7    74

filter() — Keep Rows

# Keep only Research department members
nodes |> filter(department == "Research")

# A tibble: 3 × 5
     id name  department tenure score
  <int> <chr> <chr>       <dbl> <dbl>
1     1 Alice Research        5    72
2     3 Clara Research        8    91
3     6 Frank Research        1    55

# Multiple conditions
nodes |> filter(department == "Operations", tenure > 4)

# A tibble: 2 × 5
     id name  department tenure score
  <int> <chr> <chr>       <dbl> <dbl>
1     5 Eva   Operations      6    78
2     8 Hugo  Operations      7    74

# Using %in% for multiple values
nodes |> filter(department %in% c("HR", "Operations"))

# A tibble: 5 × 5
     id name  department tenure score
  <int> <chr> <chr>       <dbl> <dbl>
1     2 Bob   Operations      2    85
2     4 David HR              3    60
3     5 Eva   Operations      6    78
4     7 Grace HR              4    88
5     8 Hugo  Operations      7    74

select() — Choose Columns

nodes |> select(name, department, score)

# A tibble: 8 × 3
  name  department score
  <chr> <chr>      <dbl>
1 Alice Research      72
2 Bob   Operations    85
3 Clara Research      91
4 David HR            60
5 Eva   Operations    78
6 Frank Research      55
7 Grace HR            88
8 Hugo  Operations    74

# Drop a column with -
nodes |> select(-id)

# A tibble: 8 × 4
  name  department tenure score
  <chr> <chr>       <dbl> <dbl>
1 Alice Research        5    72
2 Bob   Operations      2    85
3 Clara Research        8    91
4 David HR              3    60
5 Eva   Operations      6    78
6 Frank Research        1    55
7 Grace HR              4    88
8 Hugo  Operations      7    74

# Rename while selecting
nodes |> select(name, dept = department, performance = score)

# A tibble: 8 × 3
  name  dept       performance
  <chr> <chr>            <dbl>
1 Alice Research            72
2 Bob   Operations          85
3 Clara Research            91
4 David HR                  60
5 Eva   Operations          78
6 Frank Research            55
7 Grace HR                  88
8 Hugo  Operations          74

mutate() — Create or Modify Columns

nodes |>
  mutate(
    senior   = tenure >= 5,
    score_z  = (score - mean(score)) / sd(score),   # standardise
    label    = paste0(name, " (", department, ")")
  )

# A tibble: 8 × 8
     id name  department tenure score senior score_z label          
  <int> <chr> <chr>       <dbl> <dbl> <lgl>    <dbl> <chr>          
1     1 Alice Research        5    72 TRUE    -0.261 Alice (Researc…
2     2 Bob   Operations      2    85 FALSE    0.745 Bob (Operation…
3     3 Clara Research        8    91 TRUE     1.21  Clara (Researc…
4     4 David HR              3    60 FALSE   -1.19  David (HR)     
5     5 Eva   Operations      6    78 TRUE     0.203 Eva (Operation…
6     6 Frank Research        1    55 FALSE   -1.58  Frank (Researc…
7     7 Grace HR              4    88 FALSE    0.977 Grace (HR)     
8     8 Hugo  Operations      7    74 TRUE    -0.106 Hugo (Operatio…

arrange() — Sort Rows

nodes |> arrange(desc(score))          # highest score first

# A tibble: 8 × 5
     id name  department tenure score
  <int> <chr> <chr>       <dbl> <dbl>
1     3 Clara Research        8    91
2     7 Grace HR              4    88
3     2 Bob   Operations      2    85
4     5 Eva   Operations      6    78
5     8 Hugo  Operations      7    74
6     1 Alice Research        5    72
7     4 David HR              3    60
8     6 Frank Research        1    55

nodes |> arrange(department, tenure)   # sort by two columns

# A tibble: 8 × 5
     id name  department tenure score
  <int> <chr> <chr>       <dbl> <dbl>
1     4 David HR              3    60
2     7 Grace HR              4    88
3     2 Bob   Operations      2    85
4     5 Eva   Operations      6    78
5     8 Hugo  Operations      7    74
6     6 Frank Research        1    55
7     1 Alice Research        5    72
8     3 Clara Research        8    91

summarise() and group_by() — Aggregation

# Overall summary
nodes |>
  summarise(
    n         = n(),
    mean_score = mean(score),
    sd_score   = sd(score),
    max_tenure = max(tenure)
  )

# A tibble: 1 × 4
      n mean_score sd_score max_tenure
  <int>      <dbl>    <dbl>      <dbl>
1     8       75.4     12.9          8

# Summary by group — critical for network attribute analysis
nodes |>
  group_by(department) |>
  summarise(
    n          = n(),
    mean_score = mean(score),
    mean_tenure = mean(tenure)
  ) |>
  arrange(desc(mean_score))

# A tibble: 3 × 4
  department     n mean_score mean_tenure
  <chr>      <int>      <dbl>       <dbl>
1 Operations     3       79          5   
2 HR             2       74          3.5 
3 Research       3       72.7        4.67

Joining Tables

Joins combine two data frames on a shared key column — essential when merging node attributes with an edge list.

# Edge list: pairs of node IDs representing connections
edges <- tibble(
  from = c(1, 1, 2, 3, 4, 5, 6),
  to   = c(2, 3, 4, 5, 6, 7, 8),
  weight = c(1.2, 0.8, 1.5, 0.5, 2.0, 1.1, 0.9)
)

# Attach sender attributes to edge list
edges |>
  left_join(nodes |> select(id, name, department),
            by = c("from" = "id")) |>
  rename(from_name = name, from_dept = department)

# A tibble: 7 × 5
   from    to weight from_name from_dept 
  <dbl> <dbl>  <dbl> <chr>     <chr>     
1     1     2    1.2 Alice     Research  
2     1     3    0.8 Alice     Research  
3     2     4    1.5 Bob       Operations
4     3     5    0.5 Clara     Research  
5     4     6    2   David     HR        
6     5     7    1.1 Eva       Operations
7     6     8    0.9 Frank     Research

Reshaping Data

# Wide to long (pivot_longer) — useful for comparing multiple metrics
nodes |>
  select(name, tenure, score) |>
  pivot_longer(cols = c(tenure, score),
               names_to  = "metric",
               values_to = "value")

# A tibble: 16 × 3
   name  metric value
   <chr> <chr>  <dbl>
 1 Alice tenure     5
 2 Alice score     72
 3 Bob   tenure     2
 4 Bob   score     85
 5 Clara tenure     8
 6 Clara score     91
 7 David tenure     3
 8 David score     60
 9 Eva   tenure     6
10 Eva   score     78
11 Frank tenure     1
12 Frank score     55
13 Grace tenure     4
14 Grace score     88
15 Hugo  tenure     7
16 Hugo  score     74

# Long to wide (pivot_wider)
nodes |>
  select(name, department) |>
  mutate(present = 1) |>
  pivot_wider(names_from = department, values_from = present, values_fill = 0)

# A tibble: 8 × 4
  name  Research Operations    HR
  <chr>    <dbl>      <dbl> <dbl>
1 Alice        1          0     0
2 Bob          0          1     0
3 Clara        1          0     0
4 David        0          0     1
5 Eva          0          1     0
6 Frank        1          0     0
7 Grace        0          0     1
8 Hugo         0          1     0

Reading and Writing Data

# CSV files
df <- read_csv("data/nodes.csv")            # tidyverse (recommended)
df <- read.csv("data/nodes.csv")            # base R

write_csv(df, "data/nodes_clean.csv")

# Excel files
library(readxl)
df <- read_excel("data/data.xlsx", sheet = 1)

# SPSS, Stata, SAS
library(haven)
df <- read_spss("data/survey.sav")
df <- read_dta("data/survey.dta")

# R's native format
saveRDS(df, "data/df.rds")
df <- readRDS("data/df.rds")

Quick Reference

Task	Function
Assign a value	`x <- value`
Create a vector	`c(1, 2, 3)`
Create a sequence	`1:10`, `seq(0, 1, 0.1)`
Create a matrix	`matrix(data, nrow, ncol)`
Create a data frame	`data.frame()` / `tibble()`
Check an object’s type	`class()`, `typeof()`
Check dimensions	`dim()`, `nrow()`, `ncol()`, `length()`
Inspect structure	`str()`, `summary()`, `head()`
Filter rows	`filter()`
Select columns	`select()`
Create columns	`mutate()`
Sort rows	`arrange()`
Aggregate	`group_by()` + `summarise()`
Join tables	`left_join()`, `inner_join()`
Reshape wide → long	`pivot_longer()`
Reshape long → wide	`pivot_wider()`