---
title: "Introduction to flownet"
author: "Sebastian Krantz"
date: "`r Sys.Date()`"
output:
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 3
vignette: >
  %\VignetteIndexEntry{Introduction to flownet}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 6,
  out.width = "100%",
  #dpi = 72,
  fig.retina = 1,
  eval = TRUE
)
```

## Introduction

The `flownet` package provides efficient tools for transportation modeling, specifically network processing/preparation, route enumeration, and traffic assignment tasks. The package implements the **path-sized logit (PSL) model** for traffic assignment, which accounts for route overlap when assigning traffic flows to network edges. It can handle directed or undirected graphs, and it's network processing functions/algorithms support multimodal networks. A key design decision was to represent network graphs using simple data frames with `from` and `to` node columns. This facilitates applied work, e.g., further manual network edits such as adding multimodal connector links, and the development of generalized cost functions using network features. <!-- This makes it particularly suitable for modeling transport networks where multiple alternative routes may share common segments. -->

## Getting Started

Let's start by loading the required packages and exploring the example datasets included with `flownet`:

```{r load-packages, message=FALSE}
library(fastverse)
fastverse_extend(flownet, sf, tmap)
tmap_mode("plot")
```

The package includes four example datasets for Africa:

- **`africa_network`**: A road transport network with 2,825 LINESTRING features representing existing roads (2,344 edges) and proposed new links (481 edges). Each edge includes attributes such as distance, travel duration, border crossing costs, terrain ruggedness, and road upgrade costs.

- **`africa_cities_ports`**: 453 African cities with population > 100,000 and international ports. Includes population data, capital status, and port cargo outflows.

- **`africa_segments`**: 14,358 raw network segments representing intersected road routes. Useful for demonstrating network consolidation and simplification functions.

- **`africa_trade`**: Bilateral trade flows between 47 African countries aggregated by HS section (21 product categories). Values represent annual averages over 2012-2022.

The `africa_network`, `africa_cities_ports`, and `africa_segments` datasets are from Krantz, S. (2024). [Optimal Investments in Africa's Road Network](https://doi.org/10.1596/1813-9450-10893). Policy Research Working Paper 10893. World Bank. Replication materials are available at [github.com/SebKrantz/OptimalAfricanRoads](https://github.com/SebKrantz/OptimalAfricanRoads).

Let's examine the data:

```{r examine-data}
# View network structure (existing links only)
africa_net <- fsubset(africa_network, !add, -add)
str(africa_net, max.level = 1)

# View cities/ports
head(fselect(africa_cities_ports, city, country, population))

# View trade data structure
head(africa_trade)
```

## Basic Workflow Example

This section demonstrates a complete workflow from network preparation to traffic assignment and visualization. We'll use a gravity-based OD matrix derived from city populations.

### Step 1: Visualize the Network

First, let's visualize the network to understand its structure:

```{r visualize-network}
# Plot network colored by travel speed
tm_basemap("CartoDB.Positron", zoom = 4) +
tm_shape(africa_net) +
  tm_lines(col = "speed_kmh",
           col.scale = tm_scale_continuous(values = "turbo", values.range = c(0.1, 0.9)),
           col.legend = tm_legend("Speed (km/h)", position = c("left", "bottom"),
                                  frame = FALSE, text.size = 0.8, title.size = 1, 
                                  item.height = 2.5), lwd = 1.5) +
tm_layout(frame = FALSE)
```

### Step 2: Convert Network to Graph

The `linestrings_to_graph()` function converts LINESTRING geometries to a graph data frame format required for traffic assignment. In this case, the relevant columns are already provided, thus we only need to strip the geometry column and turn it into a normal data frame.

```{r convert-to-graph}
# Convert to graph (use atomic_elem to drop sf geometry, qDF for data.frame)
graph <- atomic_elem(africa_net) |> qDF()
head(graph)
```

The resulting graph data frame contains:

- `edge`: Edge identifier (optional)
- `from`, `to`: Node IDs
- `FX`, `FY`, `TX`, `TY`: Node coordinates (optional)
- All original columns from the network (distance, duration, total_time, etc.)

### Step 3: Extract Nodes and Map Cities

Next, we need to map the city/port locations to the nearest network nodes:

```{r extract-nodes}
# Extract nodes with spatial coordinates
nodes <- nodes_from_graph(graph, sf = TRUE)

# Map cities/ports to nearest nodes
nearest_nodes <- nodes$node[st_nearest_feature(africa_cities_ports, nodes)]
```

### Step 4: Create OD Matrix

For this example, we'll create a simple gravity-based OD matrix from city populations. The `melt_od_matrix()` function converts the matrix to long format required by `run_assignment()`:

```{r process-od}
# Create gravity-based OD matrix (population product scaled down)
od_mat <- outer(africa_cities_ports$population, africa_cities_ports$population) / 1e12
dimnames(od_mat) <- list(nearest_nodes, nearest_nodes)

# Convert to long format
od_matrix_long <- melt_od_matrix(od_mat)
head(od_matrix_long)
```

The resulting data frame contains columns `from`, `to`, and `flow`, with only positive, finite flow values.

### Step 5: Run Traffic Assignment

Now we can run the traffic assignment. The `run_assignment()` function:

- Enumerates alternative routes for each OD pair
- Computes path probabilities accounting for route overlap (PSL model)
- Assigns flows to network edges

For large networks like this one, we'll use the All-or-Nothing (AoN) method which is faster:

```{r run-assignment}
# Run Traffic Assignment (All-or-Nothing method for speed)
result <- run_assignment(graph, od_matrix_long, cost.column = "duration",
                         method = "AoN", return.extra = "all")
print(result)
```

Key parameters:

- `cost.column`: The cost column to use for route computation (here "duration" in minutes)
- `method`: "PSL" (path-sized logit) or "AoN" (all-or-nothing)
- `return.extra`: Additional results to return ("all" returns everything)

### Step 6: Visualize Results

Finally, we can visualize the assigned flows on the network:

```{r visualize-results}
# Add flows to network for visualization
africa_net$final_flows_log10 <- log10(result$final_flows + 1)

tm_basemap("CartoDB.Positron", zoom = 4) +
tm_shape(africa_net) +
  tm_lines(col = "final_flows_log10",
           col.scale = tm_scale_continuous(values = "brewer.yl_or_rd"),
           col.legend = tm_legend("Log10 Flows", position = c("left", "bottom"),
                                  frame = FALSE, text.size = 0.8, title.size = 1), lwd = 1.5) +
tm_shape(africa_cities_ports) + tm_dots(size = 0.15, fill = "grey30") +
tm_layout(frame = FALSE)
```

## Advanced Workflow: Trade Flow Disaggregation

For more realistic modeling, we can disaggregate country-level trade flows to city-level using population shares. This approach distributes trade between countries across their respective cities proportionally to population.

### Step 1: Compute City Population Shares

```{r city-pop-shares}
# Compute each city's share of its country's population
city_pop <- africa_cities_ports |> atomic_elem() |> qDF() |>
  fcompute(node = nearest_nodes,
           city = qF(city_country),
           pop_share = fsum(population, iso3, TRA = "/"),
           keep = "iso3")

head(city_pop)
```

### Step 2: Disaggregate Trade Flows

```{r disaggregate-trade}
# Aggregate trade to country-country level (sum across HS sections)
trade_agg <- africa_trade |> collap(quantity ~ iso3_o + iso3_d, fsum)

# Join with city population shares for origin and destination
# add_stub adds suffix to all columns, so iso3 -> iso3_o matches trade_agg$iso3_o
od_matrix_trade <- trade_agg |>
  join(city_pop |> add_stub("_o", FALSE), multiple = TRUE) |>
  join(city_pop |> add_stub("_d", FALSE), multiple = TRUE) |>
  fmutate(flow = quantity * pop_share_o * pop_share_d) |>
  frename(from = node_o, to = node_d) |>
  fsubset(flow > 0 & from != to)

head(od_matrix_trade)
```

### Step 3: Run Assignment with Trade Flows

```{r run-assignment-trade}
# Run Traffic Assignment with trade-based OD matrix
result_trade <- run_assignment(graph, od_matrix_trade, cost.column = "duration",
                               method = "AoN", return.extra = "all")
print(result_trade)
```

```{r visualize-results-trade}
# Add flows to network for visualization
africa_net$final_flows_log10 <- log10(result_trade$final_flows + 1)

tm_basemap("CartoDB.Positron", zoom = 4) +
tm_shape(africa_net) +
  tm_lines(col = "final_flows_log10",
           col.scale = tm_scale_continuous(values = "brewer.yl_or_rd"),
           col.legend = tm_legend("Log10 Trade Flows", position = c("left", "bottom"),
                                  frame = FALSE, text.size = 0.8, title.size = 1), lwd = 1.5) +
tm_shape(africa_cities_ports) + tm_dots(size = 0.15, fill = "grey30") +
tm_layout(frame = FALSE)
```

## Advanced Workflow with Network Consolidation

For large or complex networks, it's often beneficial to consolidate the graph by removing intermediate nodes and merging edges. This simplifies the network topology while preserving connectivity, which can significantly improve computational performance.

### Using Raw Segments

The `africa_segments` dataset provides raw network segments that can be processed through the full consolidation workflow. It was obtained by computing fastest car routes between all of the 453 African (port-)cities that are less than 2000km apart using [osrm](https://cran.r-project.org/package=osrm), feeding them through [`stplanr::overline()`](https://docs.ropensci.org/stplanr/reference/overline.html) and [`rmapshaper::ms_simplify()`](https://andyteucher.ca/rmapshaper/reference/ms_simplify.html), and then just taking the start and end coordinate of each segment. The final network `africa_network` used above was created using these segments in a similar way to how I will demonstrate below (consolidation followed by clustering simplification and a final call to [osrm](https://cran.r-project.org/package=osrm) to get simplified edge geometries for better visual appearance).

```{r segments-workflow}
# Convert segments to sf and then to graph
graph_seg <- africa_segments |>
  linestrings_from_graph() |>
  linestrings_to_graph() |>
  create_undirected_graph()

# Get nodes and map cities
nodes_seg <- nodes_from_graph(graph_seg, sf = TRUE)
nearest_nodes_seg <- nodes_seg$node[st_nearest_feature(africa_cities_ports, nodes_seg)]

cat("Original segments:", nrow(graph_seg), "\n")
```

### Consolidate Graph

The `consolidate_graph()` function recursively removes intermediate nodes (nodes that occur exactly twice) and merges connecting edges. At each step, network attributes are aggregated using [`collap()`](https://fastverse.org/collapse/reference/collap.html), which, by default, uses the mean for numeric and the statistical mode for categorical attributes (columns), and supports weights. 

```{r consolidate-graph}
# Consolidate graph, preserving city nodes
graph_cons <- consolidate_graph(graph_seg, keep = nearest_nodes_seg, w = ~ .length)

cat("After consolidation:", nrow(graph_cons), "\n")
```

The use of `keep` here indicates node IDs to preserve (city/port closest nodes).


### Compare Original vs Consolidated Network

```{r compare-networks}
tm_basemap("CartoDB.Positron", zoom = 4) +
tm_shape(linestrings_from_graph(graph_seg)) +
  tm_lines(col = "passes",
           col.scale = tm_scale_continuous(values = "turbo", values.range = c(0.1, 0.9)),
           col.legend = tm_legend("N. Passes",
                                  position = c("left", "bottom"), frame = FALSE,
                                  text.size = 0.8, title.size = 1), lwd = 1.5) +
tm_layout(frame = FALSE) + tm_title(paste("Original:", nrow(graph_seg), "edges"))
```

The number of passes through a segment along different routes is generated by [`stplanr::overline()`](https://docs.ropensci.org/stplanr/reference/overline.html). If all routes were traversed only once, it would indicate traffic flow. 

```{r compare-networks-cons}
tm_basemap("CartoDB.Positron", zoom = 4) +
tm_shape(linestrings_from_graph(graph_cons)) +
  tm_lines(col = "passes",
           col.scale = tm_scale_continuous(values = "turbo", values.range = c(0.1, 0.9)),
           col.legend = tm_legend("N. Passes",
                                  position = c("left", "bottom"), frame = FALSE,
                                  text.size = 0.8, title.size = 1), lwd = 1.5) +
tm_layout(frame = FALSE) + tm_title(paste("Consolidated:", nrow(graph_cons), "edges"))
```

## Network Simplification with `simplify_network()`

While `consolidate_graph()` simplifies the network topology preserving full connectivity, the `simplify_network()` function provides two methods for further reducing network complexity while preserving connectivity between important nodes:

1. **shortest-paths**: Keeps only edges traversed by shortest paths between specified nodes
2. **cluster**: Spatially clusters nodes using the `leaderCluster` algorithm and contracts the graph

The two functions are complimentary, after calling `simplify_network()`, `consolidate_graph()` could be called again to simplify the topology of the reduced network.

### Shortest-Paths Method

This method computes shortest paths between all specified nodes and retains only the traversed edges:

```{r simplify-shortest-paths}
# Simplify network using shortest paths
graph_simple <- simplify_network(graph_cons, nearest_nodes_seg,
                                 method = "shortest-paths",
                                 cost.column = ".length")

cat("Consolidated edges:", nrow(graph_cons), "\n")
cat("Simplified edges:", nrow(graph_simple), "\n")

# Visualize
tm_basemap("CartoDB.Positron", zoom = 4) +
tm_shape(linestrings_from_graph(graph_simple)) +
  tm_lines(col = "passes",
           col.scale = tm_scale_continuous(values = "turbo", values.range = c(0.1, 0.9)),
           col.legend = tm_legend("N. Passes",
                                  position = c("left", "bottom"), frame = FALSE,
                                  text.size = 0.8, title.size = 1), lwd = 1.5) +
tm_layout(frame = FALSE) + tm_title(paste("Simplified (SP):", nrow(graph_simple), "edges"))
```

### Cluster Method

The cluster method uses the `leaderCluster` algorithm to spatially group nearby nodes and contract the graph. This is useful for creating coarser network representations:

```{r simplify-cluster}
# Compute node weights for clustering (sum of gravity at each node)
node_weights <- rowbind(
  fselect(graph_cons, node = from, gravity_rd),
  fselect(graph_cons, to, gravity_rd),use.names = FALSE) |>
  collap(~ node, fsum)

# Cluster-based simplification
graph_cluster <- simplify_network(graph_cons, nearest_nodes_seg,
                                  method = "cluster",
                                  cost.column = node_weights$gravity_rd,
                                  radius_km = list(nodes = 30, cluster = 27),
                                  w = ~ .length) 

cat("Clustered edges:", nrow(graph_cluster), "\n")

tm_basemap("CartoDB.Positron", zoom = 4) +
tm_shape(linestrings_from_graph(graph_cluster)) +
  tm_lines(col = "passes",
           col.scale = tm_scale_continuous(values = "turbo", values.range = c(0.1, 0.9)),
           col.legend = tm_legend("N. Passes",
                                  position = c("left", "bottom"), frame = FALSE,
                                  text.size = 0.8, title.size = 1), lwd = 1.5) +
tm_layout(frame = FALSE) + tm_title(paste("Simplified (CL):", nrow(graph_cluster), "edges"))
```

Key cluster parameters:

- `radius_km$nodes`: Radius (km) around preserved nodes to assign nearby nodes
- `radius_km$cluster`: Clustering radius (km) for remaining nodes

Let's see if we can assign to this network: 

```{r simplify-cluster-assign}
dimnames(od_mat) <- list(nearest_nodes_seg, nearest_nodes_seg)
od_matrix_long <- melt_od_matrix(od_mat)

# Run Traffic Assignment with gravity-based OD matrix
result_cl <- run_assignment(graph_cluster, od_matrix_long, cost.column = ".length",
                            method = "AoN", return.extra = "all")
print(result_cl)

# Add flows to network for visualization
graph_cluster$final_flows_log10 <- log10(result_cl$final_flows + 1)

tm_basemap("CartoDB.Positron", zoom = 4) +
tm_shape(linestrings_from_graph(graph_cluster)) +
  tm_lines(col = "final_flows_log10",
           col.scale = tm_scale_continuous(values = "brewer.yl_or_rd"),
           col.legend = tm_legend("Log10 Flows", position = c("left", "bottom"),
                                  frame = FALSE, text.size = 0.8, title.size = 1), lwd = 1.5) +
tm_shape(africa_cities_ports) + tm_dots(size = 0.15, fill = "grey30") +
tm_layout(frame = FALSE)
```


## Key Concepts and Functions

The above overview showcased essential package functions. The Path-Sized Logit (PSL) assignment model was not used as it takes a few minutes to process around 200k OD-pairs with the PSL, beholding that a selection of routes must be generated for each OD pair. PSL flow assignments are smoother/more distributed, and more realistic than the all-or-nothing method which assigns the entire flow to the shortest path without regard for alternative routes. 

The remainder of this vignette provides a theoretical overview of the PSL model, followed by a summary of the key functions of this package. 

### Path-Sized Logit Model

The **Path-Sized Logit (PSL)** model (Ben-Akiva and Bierlaire, 1999)^[See further theoretical and practical references at the end.] is a discrete choice model used for traffic assignment that overcomes the independence of irrelevant alternatives (IIA) property of the standard Multinomial Logit (MNL) model. In transport networks, alternative routes often overlap significantly, sharing common edges. The MNL model treats these overlapping routes as independent, leading to an overestimation of flow on shared segments.

The PSL model introduces a **correction term** (path-size factor) to the utility function of each route, penalizing paths that share links with other alternatives.

#### Probability Formula

The probability $P_k$ of choosing route $k$ from the set of alternative routes $K$ is given by:

$$P_k = \frac{e^{V_k}}{\sum_{j \in K} e^{V_j}}$$

where $V_k$ is the deterministic utility of route $k$, defined in `flownet` as:

$$V_k = -C_k + \beta_{PSL} \ln(PS_k)$$

Here:
- $C_k$ is the generalized cost of route $k$.
- $\beta_{PSL}$ is the path-sized logit parameter (controlled by the `beta` argument).
- $PS_k$ is the **Path-Size factor** for route $k$.

#### The Correction Term: Path-Size Factor ($PS_k$)

The Path-Size factor $PS_k$ quantifies the uniqueness of route $k$. It is defined as the ratio of the length (or cost) of the links in path $k$ to the number of alternatives sharing those links. The formula implemented in `flownet` is:

$$PS_k = \frac{1}{C_k} \sum_{a \in \Gamma_k} \frac{c_a}{\delta_a}$$

Where:
- $\Gamma_k$ is the set of edges in path $k$.
- $c_a$ is the cost of edge $a$.
- $C_k = \sum_{a \in \Gamma_k} c_a$ is the total cost of path $k$.
- $\delta_a$ is the number of alternative routes in the choice set $K$ that use edge $a$.

**Interpretation:**
- If a path is completely unique ($\delta_a = 1$ for all edges), then $PS_k = 1$ and $\ln(PS_k) = 0$. The utility reduces to the standard MNL utility ($V_k = -C_k$).
- If a path shares edges with many other routes ($\delta_a > 1$), then $PS_k < 1$ and $\ln(PS_k) < 0$.
- The term $\beta_{PSL} \ln(PS_k)$ acts as a correction. Since $\ln(PS_k)$ is negative for overlapping routes, a **positive** $\beta_{PSL}$ (default 1) is required to ensure that overlapping paths have lower utility (penalization).

This formulation ensures that the model assigns lower probabilities to routes that are largely duplicates of others, distributing flow more realistically across truly distinct alternatives. In practice, a value of $\beta_{PSL} = 1$ is a sensible default; higher values strengthen the penalization of overlapping routes and lower values make the model behave more like a standard MNL.

### Network Processing Functions

**`linestrings_to_graph()`**: Converts LINESTRING geometries to a graph data frame format. This is typically the first step when working with spatial network data.

**`linestrings_from_graph()`**: Converts a graph data frame back to LINESTRING geometries. Useful for visualization and spatial operations.

**`create_undirected_graph()`**: Converts a directed graph to an undirected graph by normalizing edge directions and aggregating duplicate connections. Useful when transport links can be traversed in both directions.

**`consolidate_graph()`**: Simplifies network topology by removing intermediate nodes and merging connecting edges. This can significantly reduce computational complexity while preserving connectivity.

**`simplify_network()`**: Simplifies the network using one of two methods:

- **shortest-paths**: Keeps only edges traversed by shortest paths between specified nodes
- **cluster**: Spatially clusters nodes using the `leaderCluster` algorithm and contracts the graph

### Graph Utilities

**`nodes_from_graph()`**: Extracts unique nodes with coordinates from a graph. Essential for mapping zone centroids to network nodes.

**`normalize_graph()`**: Normalizes node IDs to consecutive integers starting from 1. This can improve performance with some graph algorithms.

**`distances_from_graph()`**: Computes a distance matrix for all node pairs using the graph structure.

### OD Matrix Utilities

**`melt_od_matrix()`**: Converts an origin-destination matrix to long format with columns `from`, `to`, and `flow`. This format is required by `run_assignment()`.

### Traffic Assignment

**`run_assignment()`**: The core function for traffic assignment. Key parameters:

- **`method`**: "PSL" (path-sized logit) or "AoN" (all-or-nothing). AoN is faster but doesn't account for route overlap.
- **`beta`** (default: 1): Path-sized logit parameter. A positive value penalizes overlapping routes.
- **`detour.max`** (default: 1.5): Maximum detour factor for alternative routes. Higher values consider more routes but increase computation time substantially.
- **`angle.max`** (default: 90): Maximum detour angle in degrees. Routes outside this angle from the straight line between origin and destination are not considered.
- **`cost.column`**: The cost column to use for route computation. Can be a column name or numeric vector.
- **`return.extra`**: Additional results to return, such as paths, edge counts, or distance matrices.

### Parameter Tuning Tips

- **`detour.max`**: Start with the default (1.5) and increase if you need to consider more alternative routes. Be aware that higher values can dramatically increase computation time.
- **`angle.max`**: Use this to restrict route enumeration to more direct paths. Smaller values (e.g., 45°) can significantly speed up computation for large networks.
- **`beta`**: The path-sized logit parameter. Typically positive (default 1) to penalize route overlap; values larger than 1 increase the penalty on overlapping routes, and values between 0 and 1 make the model closer to a standard MNL.

## Conclusion

This vignette has demonstrated the basic and advanced workflows for using the `flownet` package:

1. **Basic workflow**: Convert network to graph, map zones to nodes, create OD matrix, and run traffic assignment
2. **Trade flow disaggregation**: Distribute country-level trade to city-level flows using population shares
3. **Network consolidation**: Process raw segments through consolidation and simplification for efficient computation

The package provides efficient tools for transport modeling with the path-sized logit model, accounting for route overlap in traffic assignment. The network processing utilities allow you to prepare and simplify networks for efficient computation.

### Next Steps

- Explore the full function documentation: `?flownet-package`
- Experiment with different parameter values (`beta`, `detour.max`, `angle.max`)
- Try the PSL method instead of AoN for smaller networks or subsets
- Use `africa_trade` to explore trade patterns by different HS sections
- Experiment with `simplify_network()` cluster method for coarser network representations

For more information, see the package documentation and examples in the help files for individual functions.

## References

Ben-Akiva, M., & Bierlaire, M. (1999). Discrete choice methods and their
applications to short term travel decisions. In R. W. Hall (Ed.), *Handbook of Transportation Science* (pp. 5–33). Springer US. https://doi.org/10.1007/978-1-4615-5203-1_2

Cascetta, E. (2001). *Transportation systems engineering: Theory and methods*. Springer.

Ben-Akiva, M., & Lerman, S. R. (1985). *Discrete choice analysis: Theory and application to travel demand*. The MIT Press.

Ramming, M. S. (2002). *Network knowledge and route choice* (Doctoral dissertation). Massachusetts Institute of Technology.

Prato, C. G. (2009). Route choice modeling: Past, present and future research directions. *Journal of Choice Modelling, 2*(1), 65–100. https://doi.org/10.1016/S1755-5345(13)70005-8

AequilibiaE Python Documentation: https://www.aequilibrae.com/develop/python/route_choice/path_size_logit.html