In a graph \(G\), edges are either present (i.e. \(G_{ij}=1\)) or absent (i.e. \(G_{ij}=0\)). However in a weighted or valued graph, edges can take a range of values that may capture such properties as the strength or capacity of the edge. Although weighted graphs contain a large amount of information, there are some cases (e.g. visualization, application of statistical models not developed for weighted graphs) where it is useful to reduce this information by focusing on an unweighted subgraph that contains only the most important edges. We call this subgraph the backbone of \(G\), which we denote as \(G’\). Extracting \(G’\) from \(G\) requires deciding which edges to preserve. This usually involves selecting a threshold \(T_{ij}\) such that edges are preserved if they are above the threshold (i.e. \(G_{ij}’=1\) if \(G_{ij} > T_{ij}\)), and omitted if they are below the threshold (i.e. \(G_{ij}’=0\) if \(G_{ij} < T_{ij}\)). It is also possible to extract a signed backbone by selecting upper \(T^+_{ij}\) and lower \(T^-_{ij}\) thresholds such that \(G_{ij}’=1\) if \(G_{ij} > T^+_{ij}\), \(G_{ij}’=-1\) if \(G_{ij} < T^-_{ij}\), and \(G_{ij}’=0\) if \(G_{ij} > T^-_{ij}\) and \(G_{ij} < T^+_{ij}\). The key to all backbone extraction methods lies in the selection of \(T\). The backbone package provides several different methods for selecting \(T\) and thus extracting \(G’\) from \(G\).
We outline the use of the backbone package with Davis, Gardner, and Gardner’s Southern Women Dataset (Davis, Gardner, and Gardner 1941), which can be accessed via (Repository 2006). This data takes the form of a bipartite graph \(B\) containing 18 women (rows) and 14 social events (columns) taking place over a nine month period. In \(B\), \(B_{ij} = 1\) if women \(i\) attended event \(j\), and otherwise is 0. Let’s take a look at the Davis dataset included in this package to see that it is bipartite.
data(davis) #load the dataset
op <- options(width = 100)
davis #view the dataset
#> 6/27 3/2 4/12 9/26 2/25 5/19 3/15 9/16 4/8 6/10 2/23 4/7 11/21 8/3
#> EVELYN 1 1 1 1 1 1 0 1 1 0 0 0 0 0
#> LAURA 1 1 1 0 1 1 1 1 0 0 0 0 0 0
#> THERESA 0 1 1 1 1 1 1 1 1 0 0 0 0 0
#> BRENDA 1 0 1 1 1 1 1 1 0 0 0 0 0 0
#> CHARLOTTE 0 0 1 1 1 0 1 0 0 0 0 0 0 0
#> FRANCES 0 0 1 0 1 1 0 1 0 0 0 0 0 0
#> ELEANOR 0 0 0 0 1 1 1 1 0 0 0 0 0 0
#> PEARL 0 0 0 0 0 1 0 1 1 0 0 0 0 0
#> RUTH 0 0 0 0 1 0 1 1 1 0 0 0 0 0
#> VERNE 0 0 0 0 0 0 1 1 1 0 0 1 0 0
#> MYRNA 0 0 0 0 0 0 0 1 1 1 0 1 0 0
#> KATHERINE 0 0 0 0 0 0 0 1 1 1 0 1 1 1
#> SYLVIA 0 0 0 0 0 0 1 1 1 1 0 1 1 1
#> NORA 0 0 0 0 0 1 1 0 1 1 1 1 1 1
#> HELEN 0 0 0 0 0 0 1 1 0 1 1 1 0 0
#> DOROTHY 0 0 0 0 0 0 0 1 1 0 0 0 0 0
#> OLIVIA 0 0 0 0 0 0 0 0 1 0 1 0 0 0
#> FLORA 0 0 0 0 0 0 0 0 1 0 1 0 0 0
options(op)
We see that our two sets of vertices are women and events attended.
A weighted graph \(G\) can be constructed from \(B\) via bipartite projection, where \(G = BB^T\) and \(G_{ij}\) contains the number of events that both woman \(i\) and woman \(j\) attended. Looking at the matrix of southern women and events attended above, we see that Evelyn and Charlotte have attended three of the same events. This means that \(G_{15} = 3\) in the projection, shown below.
davis%*%t(davis) #The projected davis dataset
#> EVELYN LAURA THERESA BRENDA CHARLOTTE FRANCES ELEANOR PEARL RUTH
#> EVELYN 8 6 7 6 3 4 3 3 3
#> LAURA 6 7 6 6 3 4 4 2 3
#> THERESA 7 6 8 6 4 4 4 3 4
#> BRENDA 6 6 6 7 4 4 4 2 3
#> CHARLOTTE 3 3 4 4 4 2 2 0 2
#> FRANCES 4 4 4 4 2 4 3 2 2
#> ELEANOR 3 4 4 4 2 3 4 2 3
#> PEARL 3 2 3 2 0 2 2 3 2
#> RUTH 3 3 4 3 2 2 3 2 4
#> VERNE 2 2 3 2 1 1 2 2 3
#> MYRNA 2 1 2 1 0 1 1 2 2
#> KATHERINE 2 1 2 1 0 1 1 2 2
#> SYLVIA 2 2 3 2 1 1 2 2 3
#> NORA 2 2 3 2 1 1 2 2 2
#> HELEN 1 2 2 2 1 1 2 1 2
#> DOROTHY 2 1 2 1 0 1 1 2 2
#> OLIVIA 1 0 1 0 0 0 0 1 1
#> FLORA 1 0 1 0 0 0 0 1 1
#> VERNE MYRNA KATHERINE SYLVIA NORA HELEN DOROTHY OLIVIA FLORA
#> EVELYN 2 2 2 2 2 1 2 1 1
#> LAURA 2 1 1 2 2 2 1 0 0
#> THERESA 3 2 2 3 3 2 2 1 1
#> BRENDA 2 1 1 2 2 2 1 0 0
#> CHARLOTTE 1 0 0 1 1 1 0 0 0
#> FRANCES 1 1 1 1 1 1 1 0 0
#> ELEANOR 2 1 1 2 2 2 1 0 0
#> PEARL 2 2 2 2 2 1 2 1 1
#> RUTH 3 2 2 3 2 2 2 1 1
#> VERNE 4 3 3 4 3 3 2 1 1
#> MYRNA 3 4 4 4 3 3 2 1 1
#> KATHERINE 3 4 6 6 5 3 2 1 1
#> SYLVIA 4 4 6 7 6 4 2 1 1
#> NORA 3 3 5 6 8 4 1 2 2
#> HELEN 3 3 3 4 4 5 1 1 1
#> DOROTHY 2 2 2 2 1 1 2 1 1
#> OLIVIA 1 1 1 1 2 1 1 2 2
#> FLORA 1 1 1 1 2 1 1 2 2
In this vignette, we demonstrate using the backbone package to extract the backbone of \(G\), which involves deciding whether to preserve an edge between Evelyn and Charlotte in \(G’\), and similarly for all other edges in \(G\).
In this section, we will describe backbone methods that can be applied to any weighted graph, whether the weights are present in a natively unipartite graph, or are the product of a bipartite projection (as is the case in our example data).
The simplest approach to backbone extraction applies a single threshold \(T\) to all edges, and is achieved using the universal()
function. The universal()
function allows the user to extract a binary backbone by selecting a single threshold \(T\), or extract a signed backbone by selecting upper and lower thresholds \(T^+\) and \(T^-\).
The universal( )
function has four parameters,
The function universal()
returns a list containing the backbone matrix, a signed (or binary) adjacency matrix of a graph, and a data frame $summary
, containing the model name (universal threshold), number of rows in M, skew of row sums of M, number of columns of M, skew of column sums of M, and running time. The universal()
function can be used in a variety of different ways, demonstrated in the following examples. Using the davis
dataset, if we input the projected matrix G <- davis%*%t(davis)
, we can use the universal threshold on the weighted matrix G
. If we set an upper threshold of 0, then if two women have attended any event together (co-attendance > 0), there will be an edge between the two. We can plot this graph with the igraph
package.
G <- davis%*%t(davis) #projected davis dataset, a weighted graph
universal_bb <- universal(G, upper = 0)
universal_bb$backbone
#> EVELYN LAURA THERESA BRENDA CHARLOTTE FRANCES ELEANOR PEARL RUTH
#> EVELYN 0 1 1 1 1 1 1 1 1
#> LAURA 1 0 1 1 1 1 1 1 1
#> THERESA 1 1 0 1 1 1 1 1 1
#> BRENDA 1 1 1 0 1 1 1 1 1
#> CHARLOTTE 1 1 1 1 0 1 1 0 1
#> FRANCES 1 1 1 1 1 0 1 1 1
#> ELEANOR 1 1 1 1 1 1 0 1 1
#> PEARL 1 1 1 1 0 1 1 0 1
#> RUTH 1 1 1 1 1 1 1 1 0
#> VERNE 1 1 1 1 1 1 1 1 1
#> MYRNA 1 1 1 1 0 1 1 1 1
#> KATHERINE 1 1 1 1 0 1 1 1 1
#> SYLVIA 1 1 1 1 1 1 1 1 1
#> NORA 1 1 1 1 1 1 1 1 1
#> HELEN 1 1 1 1 1 1 1 1 1
#> DOROTHY 1 1 1 1 0 1 1 1 1
#> OLIVIA 1 0 1 0 0 0 0 1 1
#> FLORA 1 0 1 0 0 0 0 1 1
#> VERNE MYRNA KATHERINE SYLVIA NORA HELEN DOROTHY OLIVIA FLORA
#> EVELYN 1 1 1 1 1 1 1 1 1
#> LAURA 1 1 1 1 1 1 1 0 0
#> THERESA 1 1 1 1 1 1 1 1 1
#> BRENDA 1 1 1 1 1 1 1 0 0
#> CHARLOTTE 1 0 0 1 1 1 0 0 0
#> FRANCES 1 1 1 1 1 1 1 0 0
#> ELEANOR 1 1 1 1 1 1 1 0 0
#> PEARL 1 1 1 1 1 1 1 1 1
#> RUTH 1 1 1 1 1 1 1 1 1
#> VERNE 0 1 1 1 1 1 1 1 1
#> MYRNA 1 0 1 1 1 1 1 1 1
#> KATHERINE 1 1 0 1 1 1 1 1 1
#> SYLVIA 1 1 1 0 1 1 1 1 1
#> NORA 1 1 1 1 0 1 1 1 1
#> HELEN 1 1 1 1 1 0 1 1 1
#> DOROTHY 1 1 1 1 1 1 0 1 1
#> OLIVIA 1 1 1 1 1 1 1 0 1
#> FLORA 1 1 1 1 1 1 1 1 0
universal_bb$summary
#> Model Summary
#> Model Universal Threshold
#> Number of Rows 18
#> Skew of Row Sums -0.22725
#> Number of Columns 18
#> Skew of Column Sums -0.22725
#> Running Time 0
graph <- igraph::graph_from_adjacency_matrix(universal_bb$backbone, mode = "undirected")
op <- par(mar=c(0,0,0,0))
lo <- igraph::layout_(graph, igraph::with_fr())
plot(graph, vertex.label = 1:18, layout = lo)
We can also use the universal()
function on the original bipartite data. When inputting bipartite data, we set parameter bipartite = TRUE
. The bipartite matrix will be multiplied by its transpose before the threshold is applied. Below, we input the bipartite matrix davis
with the same threshold values as before, returning the same backbone matrix.
universal_bb <- universal(davis, upper = 0, bipartite = TRUE)
universal_bb$summary
#> Model Summary
#> Model Universal Threshold
#> Number of Rows 18
#> Skew of Row Sums 0.13747
#> Number of Columns 14
#> Skew of Column Sums 0.77915
#> Running Time 0
graph <- igraph::graph_from_adjacency_matrix(universal_bb$backbone, mode = "undirected")
op <- par(mar=c(0,0,0,0))
plot(graph, vertex.label = 1:18, layout = lo)
To create a signed backbone, we can apply both an upper and lower threshold value. For instance, we could choose to retain a positive edge if the women attended more than 4 events together, and a negative edge if they attended less than 2 events together (co-attendance of 0 or 1 events). We can do this with the following code. Note that the returned backbone matrix now has both \(+1\) and \(-1\) values.
universal_bb <- universal(davis, upper = 4, lower = 2, bipartite = TRUE)
universal_bb$backbone
#> EVELYN LAURA THERESA BRENDA CHARLOTTE FRANCES ELEANOR PEARL RUTH
#> EVELYN 0 1 1 1 0 0 0 0 0
#> LAURA 1 0 1 1 0 0 0 0 0
#> THERESA 1 1 0 1 0 0 0 0 0
#> BRENDA 1 1 1 0 0 0 0 0 0
#> CHARLOTTE 0 0 0 0 0 0 0 -1 0
#> FRANCES 0 0 0 0 0 0 0 0 0
#> ELEANOR 0 0 0 0 0 0 0 0 0
#> PEARL 0 0 0 0 -1 0 0 0 0
#> RUTH 0 0 0 0 0 0 0 0 0
#> VERNE 0 0 0 0 -1 -1 0 0 0
#> MYRNA 0 -1 0 -1 -1 -1 -1 0 0
#> KATHERINE 0 -1 0 -1 -1 -1 -1 0 0
#> SYLVIA 0 0 0 0 -1 -1 0 0 0
#> NORA 0 0 0 0 -1 -1 0 0 0
#> HELEN -1 0 0 0 -1 -1 0 -1 0
#> DOROTHY 0 -1 0 -1 -1 -1 -1 0 0
#> OLIVIA -1 -1 -1 -1 -1 -1 -1 -1 -1
#> FLORA -1 -1 -1 -1 -1 -1 -1 -1 -1
#> VERNE MYRNA KATHERINE SYLVIA NORA HELEN DOROTHY OLIVIA FLORA
#> EVELYN 0 0 0 0 0 -1 0 -1 -1
#> LAURA 0 -1 -1 0 0 0 -1 -1 -1
#> THERESA 0 0 0 0 0 0 0 -1 -1
#> BRENDA 0 -1 -1 0 0 0 -1 -1 -1
#> CHARLOTTE -1 -1 -1 -1 -1 -1 -1 -1 -1
#> FRANCES -1 -1 -1 -1 -1 -1 -1 -1 -1
#> ELEANOR 0 -1 -1 0 0 0 -1 -1 -1
#> PEARL 0 0 0 0 0 -1 0 -1 -1
#> RUTH 0 0 0 0 0 0 0 -1 -1
#> VERNE 0 0 0 0 0 0 0 -1 -1
#> MYRNA 0 0 0 0 0 0 0 -1 -1
#> KATHERINE 0 0 0 1 1 0 0 -1 -1
#> SYLVIA 0 0 1 0 1 0 0 -1 -1
#> NORA 0 0 1 1 0 0 -1 0 0
#> HELEN 0 0 0 0 0 0 -1 -1 -1
#> DOROTHY 0 0 0 0 -1 -1 0 -1 -1
#> OLIVIA -1 -1 -1 -1 0 -1 -1 0 0
#> FLORA -1 -1 -1 -1 0 -1 -1 0 0
We can also choose a threshold that is a multiple of some function, such as mean, max, or min. The function is applied to the edge weights, and then multiplied by the upper and lower thresholds. Any \(G_{ij}\) values above the upper threshold are counted as a positive \(+1\) value in the backbone, and any below the lower threshold are counted as a negative \(-1\) value in the backbone. The following code will return a backbone where the positive edges indicate two women attended more than 1 standard deviation above the mean number of events and negative edges indicate two women attended less than 1 standard deviation below the mean number of events.
universal_bb <- universal(davis,
upper = function(x)mean(x)+sd(x),
lower=function(x)mean(x)-sd(x),
bipartite = TRUE)
Here, the davis
matrix has first been projected. Then, the standard deviation of the \(G_{ij}\) entries is calculated and added to (or subtracted from) to the mean of the \(G_{ij}\) values. This value is then used to threshold the projected matrix for the positive (or negative) entries.
The methods described above can be applied to any weighted graph \(G\). In this section we describe methods that are designed for weighted graphs that are the result of bipartite projections. They differ from other methods because they take into account the information contained in the original bipartite graph \(B\). Specifically, these methods are conditioned on the bipartite graph’s two degree sequences: the row vertex degrees (i.e. row marginals) and column vertex degrees (i.e. column marginals). Each method follows the same basic algorithm:
The backbone package implements three ways to perform steps 1-3, counting the proportion of times \(G^*_{ij}\) was larger or smaller than \(G_{ij}\): the hypergeometric distribution using hyperg()
, the fixed degree sequence model using fdsm()
, and the stochastic degree sequence model using sdsm()
. From these counts, the backbone can then be extracted for a given \(\alpha\) level using the backbone.extract()
function. In this section, we first describe backbone.extract()
, then illustrate its use in the context of hyperg(), fdsm(),
and sdsm()
.
The hyperg(), fdsm(),
and sdsm()
functions return two matrices: a positive
matrix containing the number of times \(G^*_{ij}\) was larger than \(G_{ij}\), and a negative
matrix containing the number of times \(G^*_{ij}\) was smaller than \(G_{ij}\). The backbone.extract()
function allows the user to take these positive and negative matrices and return a binary or signed backbone.
The backbone.extract()
function has three parameters: two matrices, positive
and negative
, and a significant test value alpha
. The matrices should be the number of times the projected matrix values \(P_{ij}\) were above (in the positive matrix) or below (in the negative matrix) the corresponding entry in one of the matrices generated by hyperg(), fdsm()
, or sdsm()
, divided by the total number of matrices generated.
One can adjust the precision of the significance test, alpha
, to refine their backbone results. The value of alpha
should be between 0
and 1
. The default is alpha=0.05
. If only the positive
matrix is supplied to the function (i.e. negative
= NULL, as is the default), then the alpha
value is equal to the user’s input, and the statistical test is one-tailed yielding a binary backbone. If the negative
matrix is also supplied to the function, the alpha
value is equal to the user’s input divided by two, and the statistical test is two-tailed yielding a signed backbone.
If an entry in the positive
matrix is greater than the alpha
value, it is considered a +1
edge in the backbone. If an entry in the negative
matrix is greater than the alpha
value, it is considered a -1
edge in the backbone. All other values are 0
in the backbone matrix.
We demonstrate this function’s use in the following sections.
The hypergeometric distribution compares an edge’s observed weight, \(G_{ij}\) to the distribution of weights expected in a projection obtained from a random bipartite network where the row vertex degrees are fixed, but the column vertex degrees are allowed to vary. This method of backbone extraction was developed in (Neal 2013), which showed that the distribution of \(G^*_{ij}\) when only vertex degrees are fixed is given by the hypergeometric distribution and does not require simulation using steps 1-3 shown above. For documentation on the hypergeometric distribution, see stats::phyper
.
The hyperg()
function has one parameter,
The hyperg()
function returns a list of the following:
Following the hyperg()
function, the user must use the backbone.extract()
function to find the backbone at a given significance value alpha
.
The fixed degree sequence model compares an edge’s observed weight, \(G_{ij}\), to the distribution of weights expected in a projection obtained from a random bipartite network where both the row vertex degrees and column vertex degrees are fixed. This method of backbone extraction was developed in (Zweig and Kaufmann 2011), however the challenge lies in randomly sampling from the space of \(B^*\) with fixed degree sequences. The fdsm()
function uses the curveball algorithm (Strona et al. 2014), which is proven to do so (Carstens 2015).
The fdsm( )
function has five parameters,
utils::txtProgressBar
should be used to measure progress. Default is FALSE.The fdsm()
function returns a list of the following:
We can find the backbone using the fixed degree sequence model as follows:
fdsm_props <- fdsm(davis, trials = 100, sparse = TRUE, dyad=c(1,5))
#> Finding the Backbone using Curveball FDSM
#> Estimated time to complete is 3.2 secs
fdsm_props$dyad_values
#> [1] 4 3 3 2 3 2 3 3 3 3 2 2 3 4 4 2 3 2 2 2 2 3 3 3 3 2 4 3 2 2 4 3 2 4 4
#> [36] 2 3 3 4 3 2 1 4 2 2 2 4 3 3 3 3 3 3 3 3 3 3 1 3 2 3 2 3 3 2 2 2 2 1 4
#> [71] 1 3 3 3 2 3 2 4 3 3 3 2 3 2 3 4 4 3 4 1 1 4 2 2 3 3 3 2 2 2
fdsm_bb <- backbone.extract(fdsm_props$positive, fdsm_props$negative, alpha = 0.1)
fdsm_bb
#> EVELYN LAURA THERESA BRENDA CHARLOTTE FRANCES ELEANOR PEARL RUTH
#> EVELYN 0 0 1 0 0 0 0 0 0
#> LAURA 0 0 0 1 0 0 0 0 0
#> THERESA 1 0 0 0 0 0 0 0 0
#> BRENDA 0 1 0 0 0 0 0 0 0
#> CHARLOTTE 0 0 0 0 0 0 0 0 0
#> FRANCES 0 0 0 0 0 0 0 0 0
#> ELEANOR 0 0 0 0 0 0 0 0 0
#> PEARL 0 0 0 0 0 0 0 0 0
#> RUTH 0 0 0 0 0 0 0 0 0
#> VERNE 0 0 0 0 0 0 0 0 0
#> MYRNA 0 0 0 0 0 0 0 0 0
#> KATHERINE -1 -1 -1 -1 -1 0 0 0 0
#> SYLVIA -1 -1 0 -1 0 0 0 0 0
#> NORA -1 -1 0 -1 0 0 0 0 0
#> HELEN -1 0 0 0 0 0 0 0 0
#> DOROTHY 0 0 0 0 0 0 0 0 0
#> OLIVIA 0 0 0 0 0 0 0 0 0
#> FLORA 0 0 0 0 0 0 0 0 0
#> VERNE MYRNA KATHERINE SYLVIA NORA HELEN DOROTHY OLIVIA FLORA
#> EVELYN 0 0 -1 -1 -1 -1 0 0 0
#> LAURA 0 0 -1 -1 -1 0 0 0 0
#> THERESA 0 0 -1 0 0 0 0 0 0
#> BRENDA 0 0 -1 -1 -1 0 0 0 0
#> CHARLOTTE 0 0 -1 0 0 0 0 0 0
#> FRANCES 0 0 0 0 0 0 0 0 0
#> ELEANOR 0 0 0 0 0 0 0 0 0
#> PEARL 0 0 0 0 0 0 0 0 0
#> RUTH 0 0 0 0 0 0 0 0 0
#> VERNE 0 0 0 0 0 0 0 0 0
#> MYRNA 0 0 0 0 0 0 0 0 0
#> KATHERINE 0 0 0 1 0 0 0 0 0
#> SYLVIA 0 0 1 0 0 0 0 0 0
#> NORA 0 0 0 0 0 0 0 0 0
#> HELEN 0 0 0 0 0 0 0 0 0
#> DOROTHY 0 0 0 0 0 0 0 0 0
#> OLIVIA 0 0 0 0 0 0 0 0 0
#> FLORA 0 0 0 0 0 0 0 0 0
The fdsm_props$dyad_values
output is a list of the \(G_{1,5}^*\) values for each of the 100 trials, which in these data corresponds to the number of parties Evelyn and Charlotte would be expected to simultaneously attend if: (a) the number of parties attended by Evelyn was fixed, (b) the number of parties attended by Charlotte was fixed, and (c) the number of attendees at each party was fixed. Because we have provided both a positive
and negative
matrix, backbone.extract()
returns a signed backbone matrix by conducting a two-tailed significance test in which alpha
is \(0.05\) on each end of the distribution.
The stochastic degree sequence model compares an edge’s observed weight, \(G_{ij}\) to the distribution of weights expected in a projection obtained from a random bipartite network where both the row vertex degrees and column vertex degrees are approximately fixed. This method of backbone extraction was developed in (Neal 2014). The construction of \(B^*\) involves a series of steps:
The sdsm( )
function has nine parameters,
model
parameter can take in a link
function, as described by the stats
package under stats::glm
and stats::family
. This can be one of c('logit', 'probit', 'cauchit', 'log', 'cloglog')
. Default is ‘logit’.trials
> 0.utils::txtProgressBar
should be used to measure progress. Default is FALSE.If trials
> 0, the function uses repeat Bernoulli trials to compute the proportions, using the following steps: During each iteration, sdsm computes a new B* matrix using probabilities computed in the glm
. This is a random bipartite matrix with about the same row and column sums as the original matrix B. If the dyad_parameter is indicated to be used in the parameters, when the B* matrix is projected, the projected value for the corresponding row and column will be saved. This allows the user to see the distribution of the edge weights for desired row and column.
If trials
= 0, the proportion of edges above or below the observed values are computed using the Poisson Binomial distribution. These values are approximated using either a Discrete Fourier Transform (DFT method) or a Refined Normal Approximation (RNA method). These functions are described by . The RNA method is used by default, unless the computed value is within the margin of ‘alpha’-‘tolerance’ and ‘alpha’+‘tolerance’, the DFT method is used.
The sdsm()
function returns a list of the following:
Below are three different ways to compute the backbone using the stochastic degree sequence model. The first computes each proportion using the RNA Poisson Binomial method. The second computes each proportion using the DFT Poisson Binomial method. The third computes each proportion using repeat Bernoulli trials.
sdsm_rna <- sdsm(davis, trials = 0, tolerance = 0)
#> Finding the Backbone using Poisson Binomial SDSM with alpha = 0.05 and tolerance = 0
sdsm_dft <- sdsm(davis, trials = 0, tolerance = 1)
#> Finding the Backbone using Poisson Binomial SDSM with alpha = 0.05 and tolerance = 1
sdsm_bt <- sdsm(davis, trials = 100, dyad = c("EVELYN", "CHARLOTTE"))
#> Finding the Backbone using logit SDSM
#> Estimated time to complete is 0.7 secs
sdsm_bt$dyad_values
#> [1] 3 3 1 4 2 2 3 3 3 1 1 1 5 3 4 1 1 2 3 4 2 2 3 3 4 2 3 5 3 0 4 3 2 3 3
#> [36] 6 3 1 3 2 3 2 3 3 4 2 1 1 3 3 2 3 3 4 2 3 5 2 1 2 1 2 1 2 2 5 3 3 3 2
#> [71] 1 3 1 5 6 2 4 2 3 2 2 4 4 1 2 1 2 2 1 4 2 3 2 1 3 2 3 2 1 3
sdsm_bb <- backbone.extract(sdsm_bt$positive, alpha = 0.1)
sdsm_bb
#> EVELYN LAURA THERESA BRENDA CHARLOTTE FRANCES ELEANOR PEARL RUTH
#> EVELYN 0 0 0 0 0 0 0 0 0
#> LAURA 0 0 0 0 0 0 0 0 0
#> THERESA 0 0 0 0 0 0 0 0 0
#> BRENDA 0 0 0 0 0 0 0 0 0
#> CHARLOTTE 0 0 0 0 0 0 0 0 0
#> FRANCES 0 0 0 0 0 0 0 0 0
#> ELEANOR 0 0 0 0 0 0 0 0 0
#> PEARL 0 0 0 0 0 0 0 0 0
#> RUTH 0 0 0 0 0 0 0 0 0
#> VERNE 0 0 0 0 0 0 0 0 0
#> MYRNA 0 0 0 0 0 0 0 0 0
#> KATHERINE 0 0 0 0 0 0 0 0 0
#> SYLVIA 0 0 0 0 0 0 0 0 0
#> NORA 0 0 0 0 0 0 0 0 0
#> HELEN 0 0 0 0 0 0 0 0 0
#> DOROTHY 0 0 0 0 0 0 0 0 0
#> OLIVIA 0 0 0 0 0 0 0 0 0
#> FLORA 0 0 0 0 0 0 0 0 0
#> VERNE MYRNA KATHERINE SYLVIA NORA HELEN DOROTHY OLIVIA FLORA
#> EVELYN 0 0 0 0 0 0 0 0 0
#> LAURA 0 0 0 0 0 0 0 0 0
#> THERESA 0 0 0 0 0 0 0 0 0
#> BRENDA 0 0 0 0 0 0 0 0 0
#> CHARLOTTE 0 0 0 0 0 0 0 0 0
#> FRANCES 0 0 0 0 0 0 0 0 0
#> ELEANOR 0 0 0 0 0 0 0 0 0
#> PEARL 0 0 0 0 0 0 0 0 0
#> RUTH 0 0 0 0 0 0 0 0 0
#> VERNE 0 0 0 0 0 0 0 0 0
#> MYRNA 0 0 0 0 0 0 0 0 0
#> KATHERINE 0 0 0 1 0 0 0 0 0
#> SYLVIA 0 0 1 0 0 0 0 0 0
#> NORA 0 0 0 0 0 0 0 0 0
#> HELEN 0 0 0 0 0 0 0 0 0
#> DOROTHY 0 0 0 0 0 0 0 0 0
#> OLIVIA 0 0 0 0 0 0 0 0 0
#> FLORA 0 0 0 0 0 0 0 0 0
The sdsm_bt$dyad_values
output is a list of the \(G_{Evelyn,Charlotte}^*\) values for each of the 100 trials, which in these data corresponds to the number of parties Evelyn and Charlotte would be expected to simultaneously attend if: (a) the number of parties attended by Evelyn was approximately fixed, (b) the number of parties attended by Charlotte was approximately fixed, and (c) the number of attendees at each party was approximately fixed. Because we have provided only a positive
matrix, backbone.extract() returns a binary backbone matrix by conducting a one-tailed significance test in which alpha
is \(0.1\).
The backbone
package will be updated to contain additional backbone extraction methods that are used in the current literature.
Carstens, C. J. 2015. “Proof of Uniform Sampling of Binary Matrices with Fixed Row Sums and Column Sums for the Fast Curveball Algorithm.” Physical Review E 91 (4). https://doi.org/10.1103/PhysRevE.91.042812.
Davis, Allison, Burleigh B Gardner, and Mary R Gardner. 1941. Deep South: A Social Anthropological Study of Caste and Class. University of Chicago Press. https://doi.org/10.1177/0002716242220001105.
Neal, Zachary. 2013. “Identifying Statistically Significant Edges in One-Mode Projections.” Social Network Analysis and Mining 3 (4): 915–24. https://doi.org/10.1007/s13278-013-0107-y.
———. 2014. “The Backbone of Bipartite Projections: Inferring Relationships from Co-Authorship, Co-Sponsorship, Co-Attendance and Other Co-Behaviors.” Social Networks 39 (October): 84–97. https://doi.org/10.1016/j.socnet.2014.06.001.
Repository, UCI Network Data. 2006. “Southern Women Data Set.” https://networkdata.ics.uci.edu/netdata/html/davis.html.
Strona, Giovanni, Domenico Nappo, Francesco Boccacci, Simone Fattorini, and Jesus San-Miguel-Ayanz. 2014. “A Fast and Unbiased Procedure to Randomize Ecological Binary Matrices with Fixed Row and Column Totals.” Nature Communications 5 (June): 4114. https://doi.org/10.1038/ncomms5114.
Strona, Giovanni, Werner Ulrich, and Nicholas J. Gotelli. 2018. “Bi-Dimensional Null Model Analysis of Presence-Absence Binary Matrices.” Ecology 99 (1): 103–15. https://doi.org/10.1002/ecy.2043.
Zweig, Katharina Anna, and Michael Kaufmann. 2011. “A Systematic Approach to the One-Mode Projection of Bipartite Graphs.” Social Network Analysis and Mining 1 (3): 187–218. https://doi.org/10.1007/s13278-011-0021-0.