Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 0 additions & 3 deletions .github/workflows/R-CMD-check.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -26,15 +26,12 @@ jobs:
- {os: macos-latest, r: 'release'}

- {os: windows-latest, r: 'release'}
# use 4.0 or 4.1 to check with rtools40's older compiler
- {os: windows-latest, r: 'oldrel-4'}

- {os: ubuntu-latest, r: 'devel', http-user-agent: 'release'}
- {os: ubuntu-latest, r: 'release'}
- {os: ubuntu-latest, r: 'oldrel-1'}
- {os: ubuntu-latest, r: 'oldrel-2'}
- {os: ubuntu-latest, r: 'oldrel-3'}
- {os: ubuntu-latest, r: 'oldrel-4'}

env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
Expand Down
6 changes: 5 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: fastRG
Title: Sample Generalized Random Dot Product Graphs in Linear Time
Version: 0.3.3.9000
Version: 0.3.3.9001
Authors@R: c(
person("Alex", "Hayes", , "alexpghayes@gmail.com", role = c("aut", "cre", "cph"),
comment = c(ORCID = "0000-0002-4985-5160")),
Expand Down Expand Up @@ -38,11 +38,15 @@ Imports:
tidyr
Suggests:
covr,
irlba,
knitr,
magrittr,
purrr,
rmarkdown,
scales,
testthat (>= 3.0.0)
Config/testthat/edition: 3
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.2
VignetteBuilder: knitr
30 changes: 27 additions & 3 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,32 @@
# fastRG (development version)

- Added option to specify precise number of nodes in each block of a `dcsbm()` or `sbm()` via the `block_sizes` argument. This makes it easier to construct blockmodels with exactly repeated eigenvalues.
- The default behavior of `dcsbm()`, `sbm()` and `planted_partition()` has changed: when `block_sizes` or `pi` is unspecified, the new default is to balance block sizes as evenly as possible. Previously, `pi` was set to a constant vector, balancing block sizes in expectation only.
- Specifying both `k` and `B` in `dcsbm()` and `sbm()` now results in an error; only specify one of these arguments.
## Major changes

- This release fixes a long-standing inconsistency in how degrees are counted, which resulted in sampling twice as many edges as desired in `undirected_factor_model()`. See below for details (#19). This also caused population singular values for undirected factors models to be off by a factor of 2 (#31).

## Breaking changes

- Specifying both `k` and `B` in `dcsbm()` and `sbm()` now results in an error. You should only specify one of these arguments.

- It is now possible to specify the precise number of nodes in each block of a `dcsbm()` (and subclasses `sbm()` and `planted_partition()`) via the `block_sizes` argument. This makes it easier to construct blockmodels with exactly repeated eigenvalues. Additionally, the default behavior is now to use this argument and to balance block sizes as evenly as possible. Previously, the default behavior was to sample blocks memberships with equal probability.

## Non-breaking changes

- Added `vignette("consistency")` demonstrating how to check consistency of spectral estimators using `fastRG` for sampling and population spectra computations (#33, #43)

## Details about degree over-sampling bug and the fix

The `fastRG` sampling algorithm, as implemented in `sample_edgelist.matrix()`, is fundamentally a sampler for asymmetric, directed networks with conditional expectation $\mathbb E[A \mid X, S, Y] = X S Y^\top \in \mathbb R^{n_1 \times n_2}$. That is, you can think of the sampler as a very efficient procedure for iterating through $i = 1, ..., n_1$ and $j = 1, ..., n_2$ and sampling from a Poisson with rate $(X S Y^\top)_{ij}$.

However, we would also like to use this same sampler to sample from undirected networks. In an undirected networks, the conditional expectation $\mathbb E[A \mid X, S] = X S X^\top \in \mathbb R^{n \times n}$ is a square matrix with $(X S X^\top)_{ij} = (X S X^\top)_{ji}$. To sample from this matrix, it's typical to sample the upper triangle of $A$ from a Poisson with rate $(X S X^\top)_{ij}$ for all $1 \le i \le j \le n$, and then symmetrize $A$.

Since the `fastRG` algorithm samples $A_{ij}$ for all $i, j$, not just the upper triangle of $A$, we use a trick to sample from undirected networks. First, we force the conditional expectation to the symmetric by symmetrizing $S$. Then, we still sample for all $i, j$. To set $A_{ij}$ we sample once from a Poisson with rate $(X S X^\top)_{ij}$ and once from a Poisson with rate $(X S X^\top)_{ji}$ (these rates are equal by symmetry!). Then we set $A_{ij} = A_{ji}$ to the sum of these Poisson random variables. The issue is that this doubles the expected value of $A_{ij} = A_{ji}$ and so we sample twice as many edges as we should. Up until this release of `fastRG`, we've unfortunately been doing this double sampling in undirected networks (#19).

In this release, we fix this over-sampling. The key is that we divide $S$ by two at sampling time. We do not modify $S$ at all in the `undirected_factor_model()`! You can always use $X S X^\top$ to compute the expected value of $A$. This new change *only affects sampling*.

That is, instead of passing the $S$ from an `undirected_factor_model()` to the sampler `sample_edgelist.matrix()`, we pass $S / 2$ (see `sample_edgelist.undirected_factor_model()`). This fixes double sampling on the off-diagonal of $A$. The downside is that we're now undersampling by half the diagonal of $A$. I'm assuming that for most use cases this doesn't matter. We could correct for this undersampling of the diagonal of $A$, so please open an issue if self-loops are important to your project.

As a consequence of this change, $A$ and $\mathbb E[A | X, S]$ show now be on the same scale, rather than off by a factor of 2. Importantly, the spectrums should match up now, so you can now use `fastRG` to check how closely you're recovering the spectrum of the your model. See `vignette("consistency")` for a quick demonstration showing consistency of spectral estimates.

# fastRG 0.3.3

Expand Down
69 changes: 43 additions & 26 deletions R/directed_dcsbm.R
Original file line number Diff line number Diff line change
@@ -1,14 +1,17 @@
new_directed_dcsbm <- function(
X, S, Y,
theta_out,
theta_in,
z_out,
z_in,
pi_out,
pi_in,
sorted,
...,
subclass = character()) {
X,
S,
Y,
theta_out,
theta_in,
z_out,
z_in,
pi_out,
pi_in,
sorted,
...,
subclass = character()
) {
subclass <- c(subclass, "directed_dcsbm")
dcsbm <- directed_factor_model(X, S, Y, ..., subclass = subclass)
dcsbm$theta_out <- theta_out
Expand Down Expand Up @@ -334,16 +337,20 @@ validate_directed_dcsbm <- function(x) {
#' population_svd <- svds(ddcsbm)
#'
directed_dcsbm <- function(
n = NULL,
theta_out = NULL, theta_in = NULL,
k_out = NULL, k_in = NULL, B = NULL,
...,
pi_out = rep(1 / k_out, k_out),
pi_in = rep(1 / k_in, k_in),
sort_nodes = TRUE,
force_identifiability = TRUE,
poisson_edges = TRUE,
allow_self_loops = TRUE) {
n = NULL,
theta_out = NULL,
theta_in = NULL,
k_out = NULL,
k_in = NULL,
B = NULL,
...,
pi_out = rep(1 / k_out, k_out),
pi_in = rep(1 / k_in, k_in),
sort_nodes = TRUE,
force_identifiability = TRUE,
poisson_edges = TRUE,
allow_self_loops = TRUE
) {
# NOTE:
# - X corresponds to outgoing communities, outgoing edges and rows of A
# - Y corresponds to incoming communities, incoming edges and columns of A
Expand Down Expand Up @@ -462,8 +469,6 @@ directed_dcsbm <- function(
z_in <- sort(z_in)
}



if (k_out > 1) {
X <- sparse.model.matrix(~ z_out + 0)
} else {
Expand Down Expand Up @@ -525,7 +530,10 @@ directed_dcsbm <- function(
#' @export
print.directed_dcsbm <- function(x, ...) {
cat(glue("Directed Degree-Corrected Stochastic Blockmodel\n", .trim = FALSE))
cat(glue("-----------------------------------------------\n\n", .trim = FALSE))
cat(glue(
"-----------------------------------------------\n\n",
.trim = FALSE
))

sorted <- if (x$sorted) "arranged by block" else "not arranged by block"

Expand All @@ -550,7 +558,16 @@ print.directed_dcsbm <- function(x, ...) {
cat("Allow self loops:", as.character(x$allow_self_loops), "\n\n")

cat(glue("Expected edges: {round(expected_edges(x))}\n", .trim = FALSE))
cat(glue("Expected in degree: {round(expected_out_degree(x), 1)}\n", .trim = FALSE))
cat(glue("Expected out degree: {round(expected_in_degree(x), 1)}\n", .trim = FALSE))
cat(glue("Expected density: {round(expected_density(x), 5)}\n", .trim = FALSE))
cat(glue(
"Expected in degree: {round(expected_out_degree(x), 1)}\n",
.trim = FALSE
))
cat(glue(
"Expected out degree: {round(expected_in_degree(x), 1)}\n",
.trim = FALSE
))
cat(glue(
"Expected density: {round(expected_density(x), 5)}\n",
.trim = FALSE
))
}
33 changes: 25 additions & 8 deletions R/directed_erdos_renyi.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,17 @@
new_directed_erdos_renyi <- function(X, S, Y, p, poisson_edges, allow_self_loops, ...) {
new_directed_erdos_renyi <- function(
X,
S,
Y,
p,
poisson_edges,
allow_self_loops,
...
) {
er <- directed_factor_model(
X, S, Y, ...,
X,
S,
Y,
...,
subclass = "directed_erdos_renyi",
poisson_edges = poisson_edges,
allow_self_loops = allow_self_loops
Expand Down Expand Up @@ -57,14 +68,18 @@ validate_directed_erdos_renyi <- function(x) {
#' A
#'
directed_erdos_renyi <- function(
n, ..., p = NULL,
poisson_edges = TRUE,
allow_self_loops = TRUE) {
n,
...,
p = NULL,
poisson_edges = TRUE,
allow_self_loops = TRUE
) {
X <- Matrix(1, nrow = n, ncol = 1)
Y <- Matrix(1, nrow = n, ncol = 1)

if (is.null(p) && is.null(expected_in_degree) &&
is.null(expected_out_degree)) {
if (
is.null(p) && is.null(expected_in_degree) && is.null(expected_out_degree)
) {
stop(
"Must specify either `p`, `expected_in_degree` or ",
" `expected_out_degree`.",
Expand All @@ -79,7 +94,9 @@ directed_erdos_renyi <- function(
S <- matrix(p, nrow = 1, ncol = 1)

er <- new_directed_erdos_renyi(
X, S, Y,
X,
S,
Y,
p = p,
poisson_edges = poisson_edges,
allow_self_loops = allow_self_loops,
Expand Down
34 changes: 21 additions & 13 deletions R/directed_factor_model.R
Original file line number Diff line number Diff line change
@@ -1,9 +1,12 @@
new_directed_factor_model <- function(
X, S, Y,
poisson_edges,
allow_self_loops,
...,
subclass = character()) {
X,
S,
Y,
poisson_edges,
allow_self_loops,
...,
subclass = character()
) {
rlang::check_dots_unnamed()

n <- nrow(X)
Expand Down Expand Up @@ -160,13 +163,16 @@ validate_directed_factor_model <- function(x) {
#' fm2
#'
directed_factor_model <- function(
X, S, Y,
...,
expected_in_degree = NULL,
expected_out_degree = NULL,
expected_density = NULL,
poisson_edges = TRUE,
allow_self_loops = TRUE) {
X,
S,
Y,
...,
expected_in_degree = NULL,
expected_out_degree = NULL,
expected_density = NULL,
poisson_edges = TRUE,
allow_self_loops = TRUE
) {
X <- Matrix(X)
S <- Matrix(S)
Y <- Matrix(Y)
Expand All @@ -186,7 +192,9 @@ directed_factor_model <- function(
}

fm <- new_directed_factor_model(
X, S, Y,
X,
S,
Y,
poisson_edges = poisson_edges,
allow_self_loops = allow_self_loops,
...
Expand Down
35 changes: 17 additions & 18 deletions R/expected-degrees.R
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,23 @@ expected_edges.undirected_factor_model <- function(factor_model, ...) {
sum(Cx %*% S %*% Cx)
}

#' @rdname expected_edges
#' @export
expected_degrees <- function(factor_model, ...) {
UseMethod("expected_degrees")
}

#' @export
expected_degrees.undirected_factor_model <- function(factor_model, ...) {
# rowSums of E[A|X, S] = XSX' are XSX'1 for 1 a column vector of ones
# want to avoid memory cost of instantiating all of E[A|X, S], which is
# typically large and dense
X <- factor_model$X
S <- factor_model$S
ones <- matrix(1, nrow = nrow(X))
as.numeric(X %*% (tcrossprod(S, X) %*% ones))
}

#' @rdname expected_edges
#' @export
expected_degree <- function(factor_model, ...) {
Expand Down Expand Up @@ -123,24 +140,6 @@ expected_degree.undirected_factor_model <- function(factor_model, ...) {
expected_edges(factor_model) / as.numeric(factor_model$n)
}


#' @rdname expected_edges
#' @export
expected_degrees <- function(factor_model, ...) {
UseMethod("expected_degrees")
}

#' @export
expected_degrees.undirected_factor_model <- function(factor_model, ...) {
# rowSums of E[A|X, S] = XSX' are XSX'1 for 1 a column vector of ones
# want to avoid memory cost of instantiating all of E[A|X, S], which is
# typically large and dense
X <- factor_model$X
S <- factor_model$S
ones <- matrix(1, nrow = nrow(X))
as.numeric(X %*% (tcrossprod(S, X) %*% ones))
}

#' @export
expected_density.undirected_factor_model <- function(factor_model, ...) {
expected_edges(factor_model) / as.numeric(choose(factor_model$n, 2))
Expand Down
37 changes: 21 additions & 16 deletions R/expected-spectra.R
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,13 @@ RSpectra::eigs_sym
#' @method eigs_sym undirected_factor_model
#' @export
eigs_sym.undirected_factor_model <- function(
A, k = A$k,
which = "LM", sigma = NULL,
opts = list(),
...) {
A,
k = A$k,
which = "LM",
sigma = NULL,
opts = list(),
...
) {
rlang::check_installed("RSpectra")

Ax <- function(x, args) as.numeric(args$X %*% (args$SXt %*% x))
Expand All @@ -37,12 +40,13 @@ eigs_sym.undirected_factor_model <- function(
#' @method svds undirected_factor_model
#' @export
svds.undirected_factor_model <- function(
A,
k = A$k,
nu = k,
nv = k,
opts = list(),
...) {
A,
k = A$k,
nu = k,
nv = k,
opts = list(),
...
) {
rlang::check_installed("RSpectra")

Ax <- function(x, args) {
Expand Down Expand Up @@ -76,12 +80,13 @@ svds.undirected_factor_model <- function(
#' @method svds directed_factor_model
#' @export
svds.directed_factor_model <- function(
A,
k = min(A$k1, A$k2),
nu = k,
nv = k,
opts = list(),
...) {
A,
k = min(A$k1, A$k2),
nu = k,
nv = k,
opts = list(),
...
) {
rlang::check_installed("RSpectra")

Ax <- function(x, args) {
Expand Down
Loading