Basic network models

class: center, middle, inverse, title-slide

# Basic network models
### Petar Jerčić, David Garcia, Jana Lasser <br><br> <em>TU Graz</em>
### Computational Modelling of Social Systems

---

layout: true

<div class="my-footer"><span>David Garcia - Computational Modelling of Social Systems</span></div>

---

## So far

- **Block 1: Fundamentals of agent-based modelling**
  - Basics of agent-based modelling: the micro-macro gap
  - Modelling segregation: Schelling's model
  - Modelling cultures

- **Block 2: Opinion dynamics**
  - Basics of spreading: Granovetter's threshold model
  - Opinion dynamics
  - Modelling hyperpolarization and cognitive balance

- **Block 3: Fundamentals of agent-based modelling**
  - **Today: Basic network models**
  - Modelling small worlds
  - Scale-free networks
  - Growth processes

---

# Overview

## 1. Random graphs

## 2. Poisson random graphs

## 3. Giant component

## 4. Null models

---

# Random graphs

## *1. Random graphs*

## 2. Poisson random graphs

## 3. Giant component

## 4. Null models

---

# Refresher: Social networks
<div style="float:right">
<img src="Figures/MorenoNetwork.png" alt="Jacob Moreno network." width=450px/>  
</br>
Jacob Moreno's sociogram
</div>

Social Networks are models to represent individuals and the relationships between them. The minimal components of a social network are:

- **Nodes** (or vertices), which represent individuals. These individuals are social actors, for example, humans, animals, fictional characters...
- **Links** (or edges) are relationships between individuals, for example, friendship, family ties, interaction, communication...

---

# Refresher: Representing social networks
<div style="float:right">
<img src="Figures/networkDirected.png" alt="Social network example." width=400px/>  
</div>
A graph or network is a tuple `$G = (n, m)$`  
  - n is a set of vertices or nodes  
  - m ⊆ n × n is a set of edges or links  
  - n × n is the Cartesian product (i.e. the set of all possible links)

Edges are denoted as ordered pairs `$(i, j)$`, which means that a link points from node `$i$` to node `$j$`.

The example of the picture can be written as:  
`$n = {a, b, c, d, e}$`  
`$m = {(b, a),(c, a),(e, a),(d, e),(c, b),(b, c)}$`

---

# Random graphs

- **Networks** that possess a particular property
  - Which are otherwise random (e.g., fixed degree distribution)

- **Analysis** performed by comparing (two) different model network structures
  - E.g., non-scale-free networks vs. scale-free networks

---

# Random graphs

- **Measurements** of network data for (two) different structures
  - Mathematically
  - Computationally
  - **Statistically**

- **Behavior** of such mathematical models of (two) different network structures
  - E.g., dynamic processes on networks - interaction between nodes in a social (connected) network
  - Consensus, opinion formation models ...

---

# Random graph ensamble

.pull-left[
<img src="Figures/random_network_graph1.png" width="900" style="display: block; margin: auto;" />
]

.pull-right[
- Consider a **simple graph** with the following fixed parameters:
  - Number of vertices (n)
  - Number of edges (m)

- For such **Random graph G(n,m)** created in the following method:
  - Place edge (m) between (uniformly) random chosen pair
  - Prevent multiedge/self-edge by: distinct pairing, only to already not connected vertices
]

---

# Random graph ensamble

.pull-left[
- There is not one but an ensamble of networks that satisfy the fixed constraints
  - E.g., for n=20 and m=22

- Probability of such networks `$P(G)$`
  - `$P(G)=\frac{1}{\omega}$` , for such networks where `$\omega$` is the number of such networks
  - `$P(G)=0$` , for all other networks
]

.pull-right[
<img src="Figures/random_network_graph2.png" width="900" style="display: block; margin: auto;" />
]

---

# Random graph ensamble

- Segregated components are also possible

---

# Simple graph

---

# Random graph ensamble

- Ensambles have *typical* behavior based on **properties (averages)**
  - Number of edges is the number of connections between all vertices
`$$\langle m \rangle$$`
  - Diameter of a graph is the largest finite distance between any two vertices
`$$\langle l \rangle = \sum_{G} P(G)l(G) = \frac{1}{\omega} \sum_{G} l(G)$$`
  - Degree of a vertex is the number of adjacent edges, **also known as `$c$`**
`$$\langle k \rangle = c = 2 \frac{m}{n}$$`

---

# Two interesting random graphs ensembles

.pull-left[
1. G(n,m) - fixed number of edges `$m$`
2. G(n,p) - fixed **probability** of edges `$p(m)$`
  - The edges `$m$` are not-fixed
  - The vertices `$n$` are fixed

- E.g., Erdos - Renyi model
]

.pull-right[
<img src="Figures/erdos_renyi.png" width="450" style="display: block; margin: auto;" />
]

---

# G(n,p) ensamble

- Probability of such G(n,p) networks `$P(G)$`:
  - `$P(G) = p^m(1-p)^{(\frac{n}{2})-m}$` , for such networks
  - `$P(G)=0$` , for non-simple graphs

.pull-left[
- p = 0.1
<img src="Figures/g_n_p_0_1.png" width="300" style="display: block; margin: auto;" />
]
.pull-right[
- p = 0.05
<img src="Figures/g_n_p_0_0_5.png" width="300" style="display: block; margin: auto;" />
]

---

## Analytical behavior analysis of G(n,p) ensemble

- Mean number of edges `$\langle m \rangle$`
  - **Distinct vertex pairs** `$\binom{n}{2}$` * probability of an edge `$p$`;

`$$\langle m \rangle = \binom{n}{2} p$$`
- Mean degree of vertex `$c$`
  - Number of other vertices `$(n-1)$` * probability of an edge `$p$`

`$$c = (n-1)p$$`

---

- Degree distribution `$p_{k}$` (for a vertex connected with `$k$` others)
  - **Distinct vertex pairs** `$\binom{n-1}{k}$` * probability of connection with `$k$` vertices, but not with others

`$$p_{k} = \binom{n-1}{k}p^{k}(1-p)^{n-1-k}$$`
.pull-left[
<img src="Figures/degree_distribution_histogram.png" width="400" style="display: block; margin: auto;" />
]
.pull-right[
Degree distribution with `$n=10^5,c =20$` is a **binomial distribution**
]

---

# Recap basics: Binomial distribution

- Binomial distribution with parameters `$n$` and `$p$`
  - The discrete probability distribution of the number of successes in a sequence of n independent experiments
  - Each experiment asks a yes-no question (boolean-valued outcome)
  - Success (with probability p)
  - Failure (with probability q = 1 − p)

---

# Example G(n,p) ensemble

.pull-left[
- `$n = 20, p = 0.1$`
- `$\langle m \rangle = 190 * 0.1 = 19$`
- `$c = (20-1) * 0.1 = 1.9$`
- `$p_{3} = 969 * 0.001 * 0.185 = 0.18$`
<img src="Figures/g_n_p_0_1.png" width="300" style="display: block; margin: auto;" />
]
.pull-right[
- `$n=20, p = 0.05$`
- `$\langle m \rangle = 190 * 0.05 = 9.5$`
- `$c = (20-1) * 0.05 = 0.95$`
- `$p_{3} = 969 * .000125 * .44 = .05$`

<img src="Figures/g_n_p_0_0_5.png" width="300" style="display: block; margin: auto;" />
]

---

# Poisson random graphs

## 1. Random graphs

## *2. Poisson random graphs*

## 3. Giant component

## 4. Null models

---

## Function asymptotically approaches some value

- Asymptotically equal-to/approaches some value
  - The function approaches a particular finite value
  - Horizontal asymptotes are defined as limits at infinity

---

# Large networks `$n\rightarrow\infty$`

- What happens to the degree distribution as `$n$` goes to infinity `$n\rightarrow\infty$`

`$$p_{k} = \binom{n-1}{k}p^{k}(1-p)^{n-1-k}$$`

- The binomial distribution approaches the **Poisson distribution**

---

# Poisson random graph

- Special case of G(n,p) ensemble for large networks `$n\rightarrow\infty$`

- Mean degree `$c$` - constant over large networks
  - Number of friends does not depend on the number of people in the world

`$$c = (n-1)p \Rightarrow p = \frac{c}{n-1} , n\rightarrow\infty$$`
  - Vanishingly small probability `$p$` of an edge

- Degree distribution (for a vertex) `$p_k$`

`$$p_{k} = e^{-c} \frac{c^{k}}{k!}, n\rightarrow\infty$$`

---

## Analytical behavior analysis of G(n,p) ensemble

- Clustering coefficient `$C$`
  - Transitive behavior (property) in a network
   - What is the probability that two vertex neighbors are also neighbors of each other?

`$$C = \frac{c}{n-1}$$`

---

- Poisson random graphs for large networks
  - Clustering coefficient `$C \simeq 0 , n\rightarrow\infty$`, if mean degree `$c$` is fixed

- Difference between the RNG and real-world networks: real-world networks have a high clustering coefficient `$C$`

.center[![:scale 35%](Figures/structuralHole.png)![:scale 35%](Figures/ModularNetwork.png)]

---

# Giant component

## 1. Random graphs

## 2. Poisson random graphs

## 3. *Giant component*

## 4. Null models

---

## Largest component for large networks `$n\rightarrow\infty$`

- If `$p = 0$`; independent of size `$n$`
  - Disconnected network
  - `$n$` separate components - vertices isolated

- If `$p = 1$`; dependent of size `$n$`
  - 1-single component
  - Each vertex is connected to each other
  - **Giant component** - connected component of a network that contains most of the entire nodes in the network

- Most networks (should) have a large component that fills the network
  - To be able to perform the intended role

---

# Internet connected component

---

### Sensitivity of parameter `$p[0,1]$` vs. largest component
- How the size to the largest component changes in respect to `$p$`?
  - Expected: The size increases gradually and it is **extensive** around `$p = 1$`
  - Reality: Largest component suddenly changes for one particular value of `$p$`
  - **Phase transition**

.pull-left[
<img src="Figures/phase_transition_graph.png" width="400" style="display: block; margin: auto;" />
]
.pull-right[
Kang, M., & Petrášek, Z. (2014). Random graphs: Theory and applications from nature to society to the brain
]

---

#Size of the giant component

.pull-left[
<img src="Figures/size_gianct_component_over_degree_distribution.png" width="600" style="display: block; margin: auto;" />
]

.pull-right[
- Size of the giant component for `$c>1$` is larger than one of the possible solutions `$S=0$`
  - This is illustrated on the figure for all values of `$c$`, with a limit at `$c=1$` for two given cases
  - Phase transition

<img src="Figures/phase_transition_network.png" width="300" style="display: block; margin: auto;" />
]

---

# Phase transition examples

.pull-left[
- Schelling
<img src="Figures/IvsF.png" width="750" style="display: block; margin: auto;" />
]

.pull-right[
- Granovetter
<img src="Figures/STD.png" width="750" style="display: block; margin: auto;" />
]

---

#### Analytical behavor analysis of G(n,p) ensamble for large networks `$n\rightarrow\infty$`

.pull-left[
<img src="Figures/size_giant_component.png" width="450" style="display: block; margin: auto;" />
]

.pull-right[
- `$S$` - size proportion of a large component
- `$c$` - average degree of a vertex
1. Size `$S=0$`
  - Small `$c$` - no giant component
2. Size `$S=0$` and `$S>0$` 
  - Large `$c$` - there is giant component
  - Transition phase happendes in between small/large `$c$`, where gradient of the diagonal matches the gradient of the curve for `$S=0$`
]

---

# Null models

## 1. Random graphs

## 2. Poisson random graphs

## 3. Giant component

## 4. *Null models*

---

# Random graphs as null models

- Most real-world networks show an average distance that is comparable to that of the corresponding random graph
  - But their average clustering coefficient is much higher

- Exploratory data analysis
  - Given the **observed** value of a structural measure in a real-world network it can be compared with the **expected** value of that measure in a graph in which the network’s structure is randomized
  - Any structure is tested to whether it deviates from the structure of some pre-defined model

---

# Social network example

.pull-left[
<img src="Figures/null_model.jpg" width="600" style="display: block; margin: auto;" />
]
- Increasing complexity, cliquishness, and gender segregation with age
  - Filled squares, girls; open circles, boys.

- It is well known that, in general, same-gender friendships are more likely than boy-girl friendships
  - Especially in primary school.

---

# Social network example

- Conlan et al. compared the number of these edges with the expected one in a model where each child maintains the number of declared and received friendship declarations concerning the gender of the other child
  - The mutuality is still statistically significant in almost all cases
  - The effect is less pronounced than if compared with a simple directed random graph in which the gender is not regarded.

- *Note that this can be modeled structurally by defining two different in-degrees and two different out-degrees, each differentiated by the gender of the other child*
  - The random graph model then maintains both types of in- and outdegrees

Conlan et al. (2011) Measuring social networks in British primary schools through scientific engagement

---

# Random graphs as null models

- In terms of statistics, the null-hypothesis assumes that graph `$\mathcal{G}$` was produced by the random graph model
  - This null hypothesis is rejected if it is very unlikely that the model is true
  - The random graph model is also often called **the null model**

- The **p-value** `$P(G|\mathcal{G})$` itself, i.e., the probability that the observed graph `$G$` was produced by some random graph model `$\mathcal{G}$`, does not directly give you the probability `$P(\mathcal{G}|G)$`, i.e., the probability that the model is true when `$G$` is observed
  - Read more: Permutation test lecture in Foundations of CSS

---

## Comparison between an observed and null model

<div class="figure" style="text-align: center">
<img src="Figures/null_model_comparison.jpg" alt="Clustering coefficient vs. age of school class within the network of mutual links" width="400" />
<p class="caption">Clustering coefficient vs. age of school class within the network of mutual links</p>
</div>

---

## Summary

- Random graphs
  - Simple graphs with fixed parameters (vertices, edges)
  - There is an ensemble of networks that satisfy the fixed constraints

- Large networks where `$n\rightarrow\infty$`
 - Degree distribution of a vertex approaches the Poisson distribution
 - Clustering coefficient `$C \simeq 0$` if mean degree `$c$` is fixed

- Giant component contains most of the entire nodes in the network
  - Largest component suddenly changes for one particular value of `$p$`
  - Phase transition happens at `$c=1$`

- The null hypothesis assumes that graph `$\mathcal{G}$` was produced by the RNG
  - Real-world networks' average clustering coefficient is high
  - This null hypothesis is rejected if the assumption is unlikely
  
---
# Quiz

- If we flip a coin 20 times, what probability is for 10 heads?

- Do all graphs in the ensemble G(n,m) have the same number of edges?

- What is the density of a network?

- What is the largest possible diameter `$l$` of a network with 20 nodes?