Introduction
Most network analysts want to use network analysis to better
understand empirical networks. {manynet}
offers various
functions for importing or coercing networks into formats that you can
use. This tutorial is going to cover two main ways to get data into the
package:
- using data from
{manynet}
,{migraph}
, or other packages - importing and using data from outside of R
One might also create or stochastically generate networks, and
{manynet}
includes functions for this too, but we cover
these functions in later tutorials on topology and diffusion.
This tutorial also covers ways to work with this data until it is in the class or format you require for further analysis.
Packaged data
As many R packages do, {manynet}
includes a number of
datasets used for teaching and testing the functions contained in the
package. These are sometimes classical network datasets, such as the Southern
Women dataset or Zachary’s Karateka
dataset, and sometimes new data with neat themes, features, or
attributes that make them exemplar teaching or testing data.
Finding the data
To see what data is in the package, you can explore the documentation available on the website (see here) or use a function in R to list the data available in the package.
One function, available in base R (with no added packages), is
data(package = "manynet")
. Type this in to the box below to
see what datasets are available in the package. There are buttons to
start over, receive any hints/solutions available, as well as to run the
code you have entered to discover its effects. Try it out now!
data(package = "_____")
data(package = "manynet")
On the left of the output are the names of the objects (here all
starting with ison_*
). On the right is a brief description
of the networks and their canonical source. Do you recognise any of the
datasets?
{manynet}
also includes its own way of identifying
network data in a package. table_data()
returns a table of
the network datasets in a package, along with information about the
number of nodes, ties, and various other features.
table_data()
Let’s say that we are only interested in two-mode network data. How can we filter this table so that only those networks that are two-mode are retained?
table_data() %>% dplyr::filter(twomode)
Calling the data
Ok, so we can see that there are a number of very interesting datasets available in this package. How do we access and use this data?
The easiest way to call the data is just to make sure that the
package is loaded using the command library(manynet)
, and
then use the selected dataset as named above. 1 Let’s try
calling ison_adolescents
by first loading the
{manynet}
‘library’ and then just typing
ison_adolescents
to see what happens.
library(______)
ison______
library(manynet)
ison_adolescents
Alternatively, the data can be called directly out of the package like this:
example_name <- manynet::ison_adolescents
, but since we think you will probably want all of the other functions available in{manynet}
at your disposal, you may as well just load the package entirely.↩︎
Describing networks
Reading prints
All of the network data available in {manynet}
(and
{migraph}
) are in a special tbl_graph
format,
from the {tidygraph}
package, that makes it compatible,
flexible, and transparent. When you call one of these data objects, some
information about the type of network it is, how many nodes and ties it
has, and the first few examples of nodes and ties is given. 2 Let’s see whether we can make sense
of the main features of this network. Run the following line and then
answer the questions.
ison_adolescents
You can now describe the main dimensions and type of network. In another tutorial, we will see how we can describe such networks visually.
You may have noticed when the package was first loaded that it mentioned that the
print_tbl_graph
method from that package was overwritten. That’s so that we can make some different choices about what and how networks are described.↩︎
Grabbing details
We can ask other questions of this data too. {manynet}
uses a simple function naming convention so that you always know to what
it relates.
net_*()
functions usually return one value for the network or graph, whether that be a string likeEvelyn
or some number like3
or-0.003
3node_*()
functions always return a vector of values for the network as long as the number of nodes or vertices in the network (of any mode)tie_*()
functions always return a vector of values for the network as long as the number of ties or edges in the network (of any sign or type)
To find out how many nodes are in the network, use
net_nodes()
. To find out how many nodes are in each mode,
use net_dims()
. To find out the names of those nodes, use
node_names()
. Use such functions to find out:
- how many nodes are in the
ison_southern_women
network - how many nodes are in each mode
- how many ties are in the network
- what nodal attributes there are in the network
- what the names of the nodes are
net_nodes(ison_southern_women)
net_dims(ison_southern_women)
net_ties(ison_southern_women)
net_node_attributes(ison_southern_women)
node_names(ison_southern_women)
There are a bunch of logical checks for many common properties or
features of networks. For example, one can check whether a network
is_twomode()
, is_directed()
, or
is_labelled()
. Remember, all is_*()
functions
work on any compatible class.
There are a few exceptions to this.↩︎
From class to class
Network class objects
We can describe and work with networks from other R packages too, not
just those in the tbl_graph
format. For example, another
commonly used package in network analysis is {network}
,
which includes a few example datasets of its own. Can you remember how
to find out which data are available in this package, and call the last
one in the list?
# You may need to call the data out of this package directly,
# such as:
data(flo, package = "network")
data(package = "_____")
library(_____)
data(_____)
_____
data(package = "network")
data(flo)
flo
This data uses quite a different class to what we encountered above.
It prints out the full adjacency matrix of the Florentine network, but
as a network
-class object (i.e. from the
{network}
package). This is no problem for
{manynet}
(or {migraph}
), since every included
function works the same on any of the compatible classes, but in case
you would like to work with a network in a particular class, or it needs
to be in a particular format for further work (e.g. for use with
{ergm}
), then {manynet}
has you covered for
that too.
Class coercion
Coercing networks between different classes of objects uses the
as_*()
functions. These functions will do their best to
coerce data from the current class of the object to the class named in
the function. Some classes have ‘slots’ or recognition for some kinds of
information that others don’t. For example, coercing a
tbl_graph
into an edgelist will sacrifice all the
information about nodal attributes. Still, we aim for these functions to
be as lossless as possible and welcome feedback that highlights how
these translations can be improved. Let’s see whether we can coerce our
‘flo’ network into a tbl_graph
(‘tidygraph’) class
object.
_____ <- as_t_____(_____)
flo <- as_tidygraph(_____)
flo <- as_tidygraph(flo)
flo
Other packages that include network data include David Schoch’s
descriptively named {networkdata}
package. The data in this package are {igraph}
-class
objects. Can you coerce one of the datasets in this package into a
tidygraph format? Into a network format? Into a matrix? Into an
edgelist?
External data
Finding data
Researchers will regularly find themselves needing to import and work with network data from outside of R. There are a great number of networks datasets and data resources available online. 4 See for example:
- UCINET data
- Pajek data
- GML datasets
- UCIrvine Network Data Repository
- KONECT project
- SNAP Stanford Large Network Dataset Collection
Yet these resources contain data in a range of different formats,
some that are specifically made to work with certain software, others
that rely on open standards, and yet others that keep data in a very
standard edgelist (and perhaps nodelist) format in .csv files or
similar. Fortunately, {manynet}
has functions to help with
importing data from such formats too.
Here we keep a necessarily partial list, but we are happy to update it with additional suggestions.↩︎
Importing edgelists
One format most users are long familiar with is Excel. In Excel,
users are typically collecting network data as edgelists, nodelists, or
both. Recall that edgelists tabulate senders/from and receivers/to of
each tie in the first two columns and any other edge- or tie-related
attributes as additional columns. There may optionally also be a
nodelist that tabulates Edgelists are typically the main object to be
imported, and we can import them from an Excel file or a
.csv
file.5 For the sake of this exercise,
we’ll import some data, adols.csv
, that I’ve pre-saved
within the package in the data/
folder of this tutorial.
Try the following code chunk.
adolties <- read_edgelist("data/adols.csv")
flonodes <- read_edgelist("data/flonode.csv")
If you do not specify a particular file name, a helpful popup will open that assists you with locating and importing a file from your operating system. Importing a nodelist of nodal attributes operates very similarly.
Note that if you import from a .csv file, please specify whether the separation value should be commas (
sv = "comma"
) or semi-colons (sv = "semi-colon"
). The function expects comma separated values by default.↩︎
Exporting edgelists
In some cases, users will be faced with having to collect data
themselves, or wish to first manipulate the data in Excel before
importing it, but may be uncertain about the expected format of an
edgelist. Here it may be useful to try exporting one of the built-in
datasets in {manynet}
to see how complete network data
looks. If this is potentially complex, calling
write_edgelist()
without any arguments will export a test
file with a barebones structure that you can overwrite with your own
data.
write______(ison_marvel_relationships, "_____/marvedges.xlsx")
write______(ison_marvel_relationships, "_____/marvnodes.xlsx")
write_edgelist(ison_marvel_relationships, "data/marvedges.xlsx")
write_nodelist(ison_marvel_relationships, "data/marvnodes.xlsx")
Importing other formats
Since network data can be complex, edgelists (and nodelists) may not be sufficient to structure all the information necessary to represent the network. For this reason, a variety of other external formats have been proposed and used. As such, you may find network data of interest that is in another format. Here are some examples:
read_pajek()
andwrite_pajek()
for importing and exporting .net or .paj filesread_ucinet()
andwrite_ucinet()
for importing and exporting .##h files (.##d files are automatically imported alongside them)read_graphml()
andwrite_graphml()
for importing and exporting .graphml filesread_dynetml()
for importing .dynetml files
For more information on any of these functions, you can ask for help
by typing ?read_pajek
in the console. Whereas
read_edgelist()
and read_nodelist()
will
import into a tibble/data frame class, read_pajek()
and
read_ucinet()
will import the network into a tidygraph
format (see above). Of course, any network data that is imported can be
easily coerced into any other compatible class. Let’s say we want to
import the adolescents edgelist back in, but we want it in an igraph
format. There are three ways you might do this:
# 1. Separate steps
adols <- read_edgelist("data/adols.csv")
adolsigraph1 <- as_igraph(adols)
adolsigraph1
# 2. Nested steps
adolsigraph2 <- as_igraph(read_edgelist("data/adols.csv"))
adolsigraph2
# 3. Chained steps
adolsigraph3 <- read_edgelist("data/adols.csv") %>% as_igraph()
adolsigraph3
How does it compare to the original?
But how can we change the network to be unnamed (or anything else)?
Reformatting networks
As mentioned above, {manynet}
attempts to retain as much
information as possible when converting objects between different
classes. The presumption is that users should explicitly decide to
reduce or simplify their data. {manynet}
includes functions
for reformatting, transforming (or removing) certain properties of
network objects. Here will introduce a few functions used for
‘reformatting’ networks. We call functions ‘reformatting functions’ if
they change the type but not the order (number of nodes) in the
network.
A good example is to_undirected()
. The astute among you
may have noticed that when we imported the adolescents network, it
returned a directed network instead of the original
undirected network. This was a consequence of a heuristic used
during the import, but gives us a good occasion to try out
to_undirected()
. Reimport the data/adols.csv
file, make it an igraph-class object, and then make it undirected.
read_edgelist("data/adols.csv") %>% as_igraph() %>% to_undirected()
Try this out with other compatible classes of objects, and reformatting other aspects of the network. For example:
to_unnamed()
removes/anonymises all vertex/node labelsto_named()
adds some random (U.S.) childrens’ names, which can be useful for identifying particular nodesto_undirected()
replaces directed ties with an undirected tie (if an arc in either direction is present)to_redirected()
replaces undirected ties with directed ties (arcs) or, if already directed, swaps arcs’ directionto_unweighted()
binarises or dichotomises a network around a particular threshold (by default1
)to_unsigned()
returns just the “positive” or “negative” ties from a signed network, respectivelyto_uniplex()
reduces a multigraph or multiplex network to one with a single set of edges or tiesto_simplex()
removes all loops or self-ties from a complex network
Transforming networks
These functions are similar to the reformatting functions, and are
also named to_*()
, but their operation always changes the
network’s ‘order’ (number of nodes).
Projections
Good examples of this are to_mode1()
and
to_mode2()
for transforming a two-mode network into one of
its one-mode projections. to_mode1()
will transform
(project) the network to a one-mode network of shared ties among its
first set of nodes, while to_mode2()
will project the
original network to a network of shared ties among its second set of
nodes. For more information on projection, see for example Knoke et
al. (2021).
Let’s try this out on a classic two-mode network,
ison_southern_women
. Assign and name the transformed
networks something sensible using e.g.
women <- to_mode...
so that we can continue working with
this data afterwards. To assign and immediately print the result, wrap
the line in parentheses.
ison_southern_women
(s_women <- to_mode1(ison_southern_women))
(s_events <- to_mode2(ison_southern_women))
Now use functions introduced above on the two projections you have created to find out:
- how many nodes there are in each of these networks,
- what the names of the nodes are, and
- what tie attributes there are in the networks.
net_nodes(s_women)
net_nodes(s_events)
node_names(s_women)
node_names(s_events)
net_tie_attributes(s_women)
net_tie_attributes(s_events)
So we can see that the to_mode*()
functions have created
a network of only one of the modes in the network. The ties in these
projected networks, representing shared connections to nodes of the
other mode, are weighted. This shows up when listing the network’s tie
attributes, or could be retrieved using tie_weights()
, but
can also be checked with the simple logical check
is_weighted()
.
Retrieve the tie weights from your women projection. Find the average (mean) of this vector of tie weights, and the average (mean) tie weight overall.
tie_weights(_____)
mean(tie_weights(_____))
mean(as_matrix(_____))
tie_weights(s_women)
mean(tie_weights(s_women))
mean(as_matrix(s_women))
Ok, so now we know that projection transforms an (unweighted)
two-mode network into a weighted one-mode network and what these weights
represent. Note though that counting the frequency of shared ties to
nodes in the other mode is just one (albeit the default) option for how
ties in the projection are weighted. Other options included in
{manynet}
include the Jaccard index, Rand simple matching
coefficient, Pearson coefficient, and Yule’s Q. These may be of interest
if, for example, overlap should be weighted by participation.
Other transforming functions include:
to_giant()
identifies and returns only the main component of a network.to_no_isolates()
identifies and returns a network including only nodes with at least one tie.to_subgraph()
returns only a subgraph of the network based on e.g. some nodal attribute.to_ties()
returns a network where the ties in the original network become the nodes, and the ties are shared adjacencies to nodes.to_matching()
returns a network in which each node is only tied to one of its previously existing ties such that the network’s cardinality is maximised.
Remember, all these to_*()
functions work on any
compatible class; the to_*()
functions will also attempt to
return that same class of object, making it even easier to manipulate
networks into shape for analysis.
Modifying data
After choosing and/or importing some network data, and alongside other modifications, you may need to add or delete particular nodes or ties, or add or delete specific nodal or tie attributes.
Adding/deleting nodes and ties
Sometimes you may wish to add particular nodes or ties. This can be achieved easily enough with the following functions. Note that one (1) additional node is added via the first command, and one (1) additional tie (between the nodes indexed 1 and 3) is added using the second.
add_nodes(ison_adolescents, 1)
add_ties(ison_adolescents, list(1,3))
Other times you may wish to delete particular nodes or ties, and
there the syntax is similar. Note here though that when a single value
is used in the second argument, this is interpreted as an index, whereas
one can also use the node’s name for delete_nodes()
, or
name the tie(s) to be deleted using |
as a separator.
delete_nodes(ison_adolescents, 1)
delete_nodes(ison_adolescents, "Sue")
delete_ties(ison_adolescents, 1)
delete_ties(ison_adolescents, "Carol|Tina")
Adding/deleting attributes
Adding nodal attributes to a given network is relatively
straightforward. {manynet}
offers a more
{igraph}
-like syntax,
e.g. add_node_attribute()
, as well as a more
{dplyr}
-like syntax, e.g. mutate()
, for those
already familiar with these tools in R:
ison_adolescents %>%
mutate_nodes(color = "red",
degree = 1:8) %>%
mutate_ties(weight = 1:10)
One can also delete nodal attributes in a similar way, by assigning
NULL
to a particular attribute.
ison_southern_women
ison_southern_women %>%
mutate_nodes(Surname = NULL,
Title = NULL)
Note that to use {dplyr}
-like functions like
mutate()
, rename()
, filter()
,
select()
, or join()
on network ties, you will
need to append the function name with _ties
.