Help for package sand

Version:

2.0.0

Title:

Statistical Analysis of Network Data with R, 2nd Edition

Author:

Eric Kolaczyk [aut, cre], Gábor Csárdi [aut], Carolyn Kolaczyk [ctb]

Maintainer:

Eric Kolaczyk <eric.kolaczyk@gmail.com>

Depends:

R (≥ 3.5.0), igraph, igraphdata

Imports:

utils

Suggests:

GO.db, GOstats, ROCR, ape, blockmodels, car, eigenmodel, ergm, fdrtool, ggplot2, huge, kernlab, lattice, network, networkDynamic, networkTomography, ngspatial, org.Sc.sgd.db, sna, vioplot

Description:

Data sets and code blocks for the book 'Statistical Analysis of Network Data with R, 2nd Edition'.

License:

GPL-3

URL:

https://github.com/kolaczyk/sand

BugReports:

https://github.com/kolaczyk/sand/issues

LazyData:

true

Encoding:

UTF-8

NeedsCompilation:

Packaged:

2020-07-01 14:31:46 UTC; erick

Repository:

CRAN

Date/Publication:

2020-07-02 07:20:06 UTC

E. coli gene expression levels

Description

Gene expression levels in the bacteria Escherichia coli (E. coli), measured for 153 genes under each of 40 different experimental conditions.

Usage

data(Ecoli.data)
Ecoli.expr
regDB.adj

Format

Ecoli.expr is a 40 by 153 matrix of (log) gene expression levels in the bacteria Escherichia coli (E. coli), measured for 153 transcription factors under each of 40 different experimental conditions, averaged over three replicates of each experiment. The data are a subset of those published in the reference below. The experiments were genetic perturbation experiments,in which a given gene was ‘turned off’, for each of 40 different genes.

regDB.adj is an adjacency matrix of regulatory relationships in E. coli, extracted from the RegulonDB (http://regulondb.ccg.unam.mx/) database at the same time the experimental data were collected.

Source

See the reference below. Please cite it if you use this dataset in your work.

References

J. Faith, B. Hayete, J. Thaden, I. Mogno, J. Wierzbowski, G. Cottarel, S. Kasif, J. Collins, T. Gardner: Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol. 5(1), e8 (2007).

AIDS blog citation network

Description

A snapshot of the pattern of citation among 146 unique blogs related to AIDS, patients, and their support networks, collected by Suchi Gopal (see reference below) over a randomly selected three-day period in August 2005. A directed edge from one blog to another indicates that the former has a link to the latter in their web page (more specifically, the former refers to the latter in their so-called ‘blogroll’).

Usage

aidsblog

Format

A directed igraph graph object with 146 vertices and 187 edges.

Source

This dataset was provided to us by Suchi Gopal. Please cite the reference below if you use this dataset in your work.

References

S. Gopal, The evolving social geography of blogs. In Societies and Cities in the Age of Instant Access, ed. by H. Miller (Springer, Berlin, 2007), 139 pp. 275-294.

Austrian phone call network data

Description

A set of data for phone traffic 60 between 32 telecommunication districts in Austria throughout a period during the 61 year 1991.

Usage

calldata

Format

A data frame with 32 x 31 flow measurements, 992 rows, and seven columns:

Orig: factor, the origin district.
Dest: factor, the destination district.
DistEuc: numeric, Euclidean distance between the districts.
DistRd: numeric, road distance between districts.
O.GRP: numeric, gross regional product of the origin district, in Austrian schillings.
D.GRP: numeric, gross regional product of the destination district, in Austrian schillings.
Flow: the “amount” of phone calls from the origin district to the destination district, in erlang units (number of phone calls, including faxes, times the average length of the call divided by the duration of the measurement period).

Source

This dataset was provided to us by Suchi Gopal. Please cite the reference below if you use this dataset in your work.

References

M. Fischer, S. Gopal: Artificial neural networks: a new approach to modeling interregional telecommunication flows. J. Reg. Sci. 34(4), 503-527 (1994).

Network of French political blogs

Description

Subnetwork of French political blogs, extracted from a snapshot of over 1,100 such blogs on a single day in October of 2006 and classified by the “Observatoire Presidentielle” project as to political affiliation.

Usage

fblog

Format

An undirected igraph graph with 192 vertices and 1431 edges. Note that the graph is undirected. The graph has two vertex attributes, ‘name’ is the URL of the blog, and ‘PolParty’ is the assigned political affiliation, a political party.

Source

The mixer R package.

A toy bipartite network

Description

A toy bipartite network.

Usage

g.bip

Format

An undirected bipartite igraph graph object, with vertex attributes ‘name’ and ‘type’.

Hospital encounter network data

Description

Records of contacts among patients and various types of health care workers in the geriatric unit of a hospital in Lyon, France, in 2010, from 1pm on Monday, December 6 to 2pm on Friday, December 10. Each of the 75 people in this study consented to wear RFID sensors on small identification badges during this period, which made it possible to record when any two of them were in face-to-face contact with each other (i.e., within 1-1.5 m of each other) during a 20-second interval of time.

Usage

hc

Format

A data frame, where each row is an interaction. It has five columns:

Time: integer, time in seconds when the 20 second encounter terminated.
ID1: integer, numeric ID of the first person.
ID2: integer, numeric ID of the second person.
S1: factor, the status of the first person, see below.
S2: factor, the status of the second person, see below.

Status codes: administrative staff (ADM), medical doctor (MED), paramedical staff, such as nurses or nurses' aides (NUR), and patients (PAT).

Source

See the reference below. Please cite the it if you use this dataset in your work.

References

P. Vanhems, A. Barrat, C. Cattuto, J.-F. Pinton, N. Khanafer, C. Regis, B.-a. Kim, B. Comte, N. Voirin: Estimating potential infection transmission routes in hospital wards using wearable proximity sensors. PloS One 8(9), e73970 306 (2013).

Install all packages used in the book

Description

This function makes it easy to download and install all R packages that are used in the book ‘Statistical Analysis of Network Data with R, 2nd Edition’.

Usage

install_sand_packages()

Details

The function uses the BioConductor installer, as this can install both all required BioConductor and CRAN packages.

Value

Returns the names of the installed packages, invisibly.

Author(s)

Gabor Csardi csardi.gabor@gmail.com

Lazega lawyers network data

Description

This data set comes from a network study of corporate law partnership that was carried out in a Northeastern US corporate law firm, referred to as SG&R, 1988-1991 in New England. It includes (among others) measurements of networks among the 71 attorneys (partners and associates) of this firm, i.e. their strong-coworker network, advice network, friendship network, and indirect control networks. Various members' attributes are also part of the dataset, including seniority, formal status, office in which they work, gender, lawschool attended, individual performance measurements (hours worked, fees brought in), attitudes concerning various management policy options, etc.

Note that this is only a subset of the originally collected data, including the 36 partners of the firm.

Usage

lazega
elist.lazega
v.attr.lazega

Format

lazega is an igraph graph object, undirected. It has the following vertex attributes: ‘name’, ‘Seniority’, ‘Status’ (all 1, meaning partner), ‘Gender’ (1 is man, 2 is woman), ‘Office’ (1 is Boston, 2 is Hartford, 3 is Providence), ‘Years’ (years with the firm), ‘Age’, ‘Practice’ (1 is litigation, 2 is corporate), and ‘School’ (1 is Harvard or Yale, 2 is University of Connecticut, 3 is other). See the reference below for more.

elist.lazega is a data frame containing an edge list of the network.

v.attr.lazega is a data frame containing the vertex attributes only.

Source

Provided to us by Emmanuel Lazega. Please cite the reference below if you use this dataset in your work.

References

E. Lazega, The Collegial Phenomenon: The Social Mechanisms of Cooperation Among Peers in a Corporate Law Partnership. Oxford University Press, Oxford (2001).

Yeast protein interaction network

Description

A network of 241 interactions among 134 proteins. They were assembled by Jiang et al. (see below), from various sources, and pertain to only those proteins annotated, as of January 2007, with the term “cell communication” in the gene ontology (GO) database.

Usage

ppi.CC

Format

An undirected igraph graph object, with vertex attributes:

‘name’: the name of the protein.
‘ICSC’: whether the protein is annotated with the “intracellular signaling cascade” GO term, zero or one.
‘IPR000198’: whether the protein contains the ‘rho GTPase-activating protein domain’ (IPR000198) motif.
‘IPR000403’: whether the protein contains the ‘phosphatidylinositol 3-/4-kinase, catalytic domain’ (IPR000403) motif.
‘IPR001806’: whether the protein contains the ‘small GTPase superfamily’ (IPR001806) motif.
‘IPR001849’: whether the protein contains the ‘pleckstrin homology domain’ (IPR001849) motif.
‘IPR002041’: whether the protein contains the ‘ran GTPase’ (IPR002041) motif.
‘IPR003527’: whether the protein contains the ‘mitogen-activated protein (MAP) kinase, conserved site’ (IPR003527) motif.

Source

See the reference below. Please cite it if you use this dataset in your work.

References

X. Jiang, N. Nariai, M. Steffen, S. Kasif, E. Kolaczyk: Integration of relational and hierarchical network information for protein function prediction. BMC Bioinform. 9, 350 (2008).

The sand package

Description

This R package accompanies the book ‘Statistical Analysis of Network Data with R, 2nd Edition’. It contains some of the data sets used in the book (the others are in the igraphdata package). It also contains the code from the book, and some simple functions to run the code without the need for typing it in.

In brief

Type in N<enter> to run the next chunk of code, and C<x> to jump to Chapter x, where x is between 2 and 11. E.g. C6<enter> resets R and “loads” Chapter 6. P<enter> prints the next code chunk to be run (without actually running it).

The data sets

The various data sets are loaded from the code chunks in the book. The sand package contains the following data sets, each is documented in its on manual page: Ecoli, aidsblog, calldata, elist.lazega, fblog, g.bip, hc, lazega, ppi.CC, sandwichprobe, strike, v.attr.lazega.

Code chunks

Code chunks of the book are numbered by chapter and each chunk is identified the chapter number and the chunk number connected by a dot.

The reader is supposed to run the code chapter by chapter, ideally, starting from a clean, new R session. This might not be critical, but it is not always possible to unload packages in R, so it is the only way to make sure that the code works correctly.

To make it easy to step through the code, the sand package define some “commands”. Note that these are are not functions, and also q that they are meant to be used interactively, and not programatically.

The cursor

The cursor marks the point the reader is at in the book, and commands discussed below move the cursor and run the code the cursor is at.

The ‘C’ commands clear R, i.e. unload all loaded packages except for sand and its dependencies, and delete all objects from the global workspace. They also set the cursor to the first chunk of the given chapter: there are nine ‘C’ commands, from ‘C2’ to ‘C11’, one for each Chapter of the book. (Chapter 1 has no code to run.)

The command ‘N’ runs the chunk at the cursor, and steps the cursor to the next chunk. It is possible to run multiple chunks at once, with the form ‘N + x’ (with or without the spaces), where ‘x’ is the number of additional chunks to run. (I.e. ‘N + 2’ runs three chunks.)

The command ‘P’ prints the chunk at the cursor, without running it. It is possible to print other chunks as well: ‘P - 1’ prints the previous chunk, ‘P - 2’ the one before that, etc., ‘P + 1’ prints the next chunk, etc.

The reader is welcome to inspect R objects, or run arbitrary R code between the ‘N’ and ‘P’ commands.

Author(s)

Gabor Csardi <csardi.gabor@gmail.com>

Examples

## Start with Chapter 2
C2

## Run first code chunk
N

## Run next code chunk
N

## Jump to Chapter 5
C5

## Run first code chunk in Chapter 5
## It will create a plot
N

Internet packet probes data

Description

These data correspond to an experiment conducted by Coates et al. to measure the difference in delay experienced by packet probes sent over the Internet during a short period in 2001, from a desktop computer in the ECE department at Rice University to similar machines at ten other university locations. The data were intended for use with a newly proposed method of Internet topology inference.

Usage

delaydata
host.locs

Format

The data is provided in two files. delaydata is a three-column data frame. The first columnf is the difference in delay of the small packets (in milliseconds). The second column is the numeric code of the destination of small packets. The third column is the numeric code of the destination of large packet.

host.locs contains the character code of the destinations:

‘IST’ Instituto Superior Tecnico (Portugal)
‘IT’ Instituto de Telecomunicacoes (Portugal)
‘Bkly’ University of California, Berkeley
‘MSU1’ Michigan State University (Host 1)
‘MSU2’ Michigan State University (Host 2)
‘UIUC’ University of Illinois, Urbana-Champaign
‘UWisc1’ University of Wisconsin, Madison (Host 1)
‘UWisc2’ University of Wisconsin, Madison (Host 2)
‘RiceU1’ Rice University (Host 1)
‘RiceU2’ Rice University (Host 2)

Source

Provided by Mark Coates, see reference below. Please cite the reference below if you use this dataset in your work.

References

M. Coates, R. Castro, R. Nowak, M. Gadhiok, R. King, Y. Tsang, Maximum likelihood network topology identification from edge-based unicast measurements. Proceedings of the 2002 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, 2002, pp. 11-20.

Michael's strike network

Description

A representation of who consults with whom in the context of a strike at a forest products manufacturing facility, following proposed changes to the workers' compensation package. An edge between two workers indicates that at least one of them said they consult with the other with moderate frequency.

Usage

strike

Format

An undirected igraph graph object with 24 vertices and 38 edges. The graph has two vertex attributes, ‘name’ is the first name of the individual and ‘race’ is a compilation of age (young or old) and language spoken (English or Spanish).

Source

This dataset was constructed from the corresponding version in Chapter 7 of W. De Nooy, A. Mrvar, and V. Batagelj, Exploratory Social Network Analysis with Pajek. Cambridge University Press, 2011, vol. 27.

Please cite the original reference below if you use this dataset in your work.

References

J.H. Michael, "Labor dispute reconciliation in a forest productsmanufacturing facility," Forest Products Journal, vol. 47, no. 11/12, p. 41, 1997.

E. coli gene expression levels

Description

Usage

Format

Source

References

AIDS blog citation network

Description

Usage

Format

Source

References

Austrian phone call network data

Description

Usage

Format

Source

References

Network of French political blogs

Description

Usage

Format

Source

A toy bipartite network

Description

Usage

Format

Hospital encounter network data

Description

Usage

Format

Source

References

Install all packages used in the book

Description

Usage

Details

Value

Author(s)

Lazega lawyers network data

Description

Usage

Format

Source

References

Yeast protein interaction network

Description

Usage

Format

Source

References

The sand package

Description

In brief

The data sets

Code chunks

The cursor

Author(s)

See Also

Examples

Internet packet probes data

Description

Usage

Format

Source

References

Michael's strike network

Description

Usage

Format

Source

References