R recipes

1. Converting the Gene Ontology graph into igraph

Rajarshi Guha wrote a nice Python script that converts the Gene Ontology graph into an igraph graph. Here is an R version that is much shorter, because we take advantage of the Bioconductor project, which has the data bundled into an R package.

Note that you only need the first two lines if you don't yet have the GO.db package installed. This package is updated twice a year, just like Bioconductor itself.

source("http://bioconductor.org/biocLite.R")
biocLite("GO.db")
library(GO.db)
library(igraph)
BP <- toTable(GOBPPARENTS)
CC <- toTable(GOCCPARENTS)
MF <- toTable(GOMFPARENTS)
g <- graph.data.frame( rbind(BP,CC,MF) )

Here is another version that keeps (some of) the metadata of the vertices (=terms) as well: the name of the term, the ontology and the textual definition. These will be added as vertex attributes.

source("http://bioconductor.org/biocLite.R")
biocLite("GO.db")
library(GO.db)
library(igraph)
BP <- toTable(GOBPPARENTS)
CC <- toTable(GOCCPARENTS)
MF <- toTable(GOMFPARENTS)
terms <- toTable(GOTERM)[,2:5]
terms <- terms[ !duplicated(terms[,1]), ]
g <- graph.data.frame( rbind(BP,CC,MF), vertices=terms )

For the terms, the table in the GO.db package is a one-to-many mapping, because it contains the (potentially many) synonyms and secondary IDS of the category. But we throw away these columns anyway, so we throw away the rows that appear many time in the table as well.


2. Centrality(-zation) function

This is quite basic, but may help cut corners. All the three 'classic' centrality measures are normalized.

centrality.norm<-function(graph,type=c("degree","closeness","betweenness"),centralization=FALSE)
{    
    result<-NA
    g<-graph
    cent<-centralization
    if (!is.igraph(g)) {stop("Not a graph object")}
    if (type[[1]] == "degree") {
        if (!cent) result <- degree(g)/(vcount(g)-1)
        else result <- (sum(max(degree(g))-degree(g)))/((vcount(g)-1)*(vcount(g)-2))}
    else if (type[[1]] == "betweenness") {
        temp <- 2*betweenness(g)/((vcount(g)-1)*(vcount(g)-2))
        if (!cent) result <- temp
        else  result <- sum(max(temp)-temp)/(vcount(g)-1)}
    else if (type[[1]] == "closeness") {
        if (!cent) result <- closeness(g)
        else result <- (2*vcount(g)-3)*(sum(max(closeness(g))-closeness(g)))/((vcount(g)-1)*(vcount(g)-2))}
    else {stop("this type is unavailable or mispelled")}
    return(result)
}

For example, if you want to calculate the closeness centrality of the vertices of a graph:

g<-graph.atlas(1000);
centrality.norm(g,type="closeness",centralization=FALSE)

To obtain the closeness centralization:

centrality.norm(g,type="closeness",centralization=TRUE)

3. Exporting graphs to LaTeX, using igraph and TikZ

Jérémie Knüsel wrote a very useful function to export graphs in LaTeX in order to use the powerful TikZ package, allowing then to use its options. You'll be able to create very large graphs, which was hard and time-consuming to do in LaTeX before.

3.1 The function

  • The two "igraph" parameters are 1. the graph (igraph format) 2. the layout, which can be either a matrix (dim: vcount(g) x vcount(g)) or a function (e.g. layout.random, …)
  • The "TikZ" parameters, that are situated inside the "igraph.to.tikz" function may by modified according to the pgf/TikZ manual.

node is related to the nodes' aspects
nodirectional is used in the case of undirected graph
unidirectional and bidirectional are used with directed graphs. tbidirectional is used in the case of bidectional (!) edges and draws them one next to the other (and not superimposed like with the dafault R output).

3.2 The code

tikz <- function (graph, layout) {
  ## Here we get the matrix layout
  if (class(layout) == "function")
    layout <- layout(graph)

  layout <- layout / max(abs(layout))
  clos <- closeness(graph)

  ##TikZ initialisation and default options (see pgf/TikZ manual)
  cat("\\tikzset{\n")
  cat("\tnode/.style={circle,inner sep=1mm,minimum size=0.8cm,draw,very thick,black,fill=red!20,text=black},\n")
  cat("\tnondirectional/.style={very thick,black},\n")
  cat("\tunidirectional/.style={nondirectional,shorten >=2pt,-stealth},\n")
  cat("\tbidirectional/.style={unidirectional,bend right=10}\n")
  cat("}\n")
  cat("\n")

  ##Size
  cat("\\begin{tikzpicture}[scale=5]\n")

  for (vertex in V(graph)) {
    label <- V(graph)[vertex]$label
    if (is.null(label))
      label <- ""

  ##drawing vertices
    cat (sprintf ("\t\\node [node] (v%d) at (%f, %f)\t{%s};\n", vertex, layout[vertex+1,1], layout[vertex+1,2], label))
  }
  cat("\n")

  adj = get.adjacency(graph)
  bidirectional = adj & t(adj)

  if (!is.directed(graph)) ##undirected case
      for (line in 1:nrow(adj)) {
      for (col in line:ncol(adj)) {
           if (adj[line,col]&col>line) {
            cat (sprintf ("\t\\path [nondirectional] (v%d) edge (v%d);\n", line-1, col-1)) ##edges drawing
          }
        }
      }
  else ##directed case
      for (line in 1:nrow(adj)) {
      for (col in 1:ncol(adj)) {
          if (bidirectional[line,col]&line > col)
            cat (sprintf ("\t\\path [bidirectional] (v%d) edge (v%d);\n", line-1, col-1),
            sprintf ("\t\\path [bidirectional] (v%d) edge (v%d);\n", col-1, line-1)) ##edges drawing
        else if (!bidirectional[line,col]&adj[line,col]) 
            cat (sprintf ("\t\\path [unidirectional] (v%d) edge (v%d);\n", line-1, col-1)) ##edges drawing
      }
    }

  cat("\\end{tikzpicture}\n")
}

3.3 An example

After the installation of the pgf/TikZ package, create a LaTeX document with this template:

\documentclass{article}

\usepackage{tikz} 

\usetikzlibrary{arrows,decorations.pathmorphing,backgrounds,positioning,fit} 

\begin{document} 

\end{document}

Then evaluate the function "igraph.to.tikz" in R. At that point, create a graph and translate it:

g<-as.directed(graph.atlas(400))
igraph.to.tikz(g,layout.fruchterman.reingold)

Copy the result in the LaTeX document and compose it!

4. Bonacich's power centrality for big graphs, using sparse matrices

bonpow.sparse <- function(graph, nodes=V(graph), loops=FALSE,
                          exponent=1, rescale=FALSE, tol=1e-07) {

  ## remove loops if requested
  if (!loops) {
    graph <- simplify(graph, remove.multiple=FALSE, remove.loops=TRUE)
  }

  vg <- vcount(graph)

  ## sparse adjacency matrix
  d <- get.adjacency(graph, sparse=TRUE)

  ## sparse identity matrix
  id <- Diagonal(vg)

  ## solve it
  ev <- solve(id - exponent * d, degree(graph, mode="out"), tol=tol)

  if (rescale) {
    ev <- ev/sum(ev)
  } else {
    ev <- ev * sqrt(vcount(graph)/sum((ev)^2))
  }

  ev[as.numeric(nodes) + 1]
}

## test graph
test.g <- simplify(ba.game(1000,m=2))

## test run
system.time(bp1 <- bonpow(test.g))
system.time(bp2 <- bonpow.sparse(test.g))

## check that they are the same
max(abs(bp1-bp2))

5. Assortativity coefficient in R

Code translated from the python recipes. References? Same coefficient as Newman's ?

assortativity <- function(graph){
    deg <- degree(graph)
    deg.sq <- deg^2
    m <- ecount(graph)
    num1 <- 0; num2 <- 0; den <- 0
    edges <- get.edgelist(graph, names=FALSE)+1

    num1 <- sum(deg[edges[,1]] * deg[edges[,2]]) / m
    num2 <- (sum(deg[edges[,1]] + deg[edges[,2]]) / (2 * m))^2
    den <- sum(deg.sq[edges[,1]] + deg.sq[edges[,2]]) / (2 * m)

    return((num1-num2)/(den-num2))
    }

6. Bipartite Networks measures

Reference: Matthieu Latapy, Clemence Magnien, Nathalie Del Vecchio, Basic notions for the analysis of large two-mode networks, Social Networks 30 (2008) 31–48

# Number of top and bottom nodes
top<-length(V(g)[type==FALSE])
bottom<-length(V(g)[type==TRUE])
# Number of edges
m<-ecount(g)
# Mean degree for top and bottom nodes
ktop<-m/top
kbottom<-m/bottom
# Density for bipartite network
bidens<-m/(top*bottom)
# Largest connected component for top and bottom nodes:
gclust<-clusters(g, mode='weak')
lcc<-induced.subgraph(g, V(g)[gclust$membership==1])
lcctop<-length(V(lcc)[type==FALSE])
lccbottom<-length(V(lcc)[type==TRUE])
# Mean distance for top and bottom nodes
distop<-mean(shortest.paths(lcc, v=V(lcc)[type==FALSE], to=V(lcc)[type==FALSE], mode = 'all'))
disbottom<-mean(shortest.paths(lcc, v=V(lcc)[type==TRUE], to=V(lcc)[type==TRUE], mode = 'all'))
#######################################################
# Network transitivity
trtarget<-graph.motifs(target, 4)
ccglobal<-(2*trtarget[20])/trtarget[14]
# Clustering coefficients (cc, cclowdot, cctopdot) for top and bottom nodes
ccPointTarg<-ccBip(g)
cctop<-mean(ccPointTarg$proj1, na.rm=TRUE)
ccbottom<-mean(ccPointTarg$proj2, na.rm=TRUE)
ccLowTarg<-ccLowDot(g)
cclowdottop<-mean(ccLowTarg$proj1, na.rm=TRUE)
cclowdotbottom<-mean(ccLowTarg$proj2, na.rm=TRUE)
ccTopTarg<-ccTopDot(g)
cctopdottop<-mean(ccTopTarg$proj1, na.rm=TRUE)
cctopdobottom<-mean(ccTopTarg$proj2, na.rm=TRUE)
# Those last clustering measures relies on the functions ccBip wrote by Gabor Csardi, while ccLowDot and ccTopDot are 
# essentially the same function with only a minor change
ccBip <- function(g) {
if (! "name" %in% list.vertex.attributes(g)) {
  V(g)$name <- seq_len(vcount(g))
}
neib <- get.adjlist(g)
names(neib) <- V(g)$name
proj <- bipartite.projection(g)
lapply(proj, function(x) {
  el <- get.edgelist(x)
  sapply(V(x)$name, function(v) {
    subs <- el[,1]==v | el[,2]==v
    f <- function(un, vn) length(union(un, vn))
    vals <- E(x)[subs]$weight / unlist(mapply(f, neib[el[subs,1]], neib[el[subs,2]]))
    mean(vals)
  })
})
}

ccLowDot <- function(g) {
if (! "name" %in% list.vertex.attributes(g)) {
  V(g)$name <- seq_len(vcount(g))
}
neib <- get.adjlist(g)
names(neib) <- V(g)$name
proj <- bipartite.projection(g)
lapply(proj, function(x) {
  el <- get.edgelist(x)
  sapply(V(x)$name, function(v) {
    subs <- el[,1]==v | el[,2]==v
    f <- function(un, vn) min(length(un), length(vn))
    vals <- E(x)[subs]$weight /
      unlist(mapply(f, neib[el[subs,1]], neib[el[subs,2]]))
    mean(vals)
  })
})
}

ccTopDot <- function(g) {
if (! "name" %in% list.vertex.attributes(g)) {
  V(g)$name <- seq_len(vcount(g))
}
neib <- get.adjlist(g)
names(neib) <- V(g)$name
proj <- bipartite.projection(g)
lapply(proj, function(x) {
  el <- get.edgelist(x)
  sapply(V(x)$name, function(v) {
    subs <- el[,1]==v | el[,2]==v
    f <- function(un, vn) max(length(un), length(vn))
    vals <- E(x)[subs]$weight /
      unlist(mapply(f, neib[el[subs,1]], neib[el[subs,2]]))
    mean(vals)
  })
})
}

7. Weighted degree distribution

The below code simply takes the code for degree.distribution and modifies it to use the weighted degree in the calculation (obtained via graph.strength)

graph.strength.distribution <- function (graph, cumulative = FALSE, ...)
{
  if (!is.igraph(graph)) {
    stop("Not a graph object")
  }
  # graph.strength() instead of degree()
  cs <- graph.strength(graph, ...)
  hi <- hist(cs, -1:max(cs), plot = FALSE)$intensities
  if (!cumulative) {
    res <- hi
  }
  else {
    res <- rev(cumsum(rev(hi)))
  }
  res
}

8. Leverage centrality

The following function calculates the leverage centrality of each vertex in a graph according to the definition found in Joyce et al: A New Measure of Centrality for Brain Networks (PLoS ONE 5(8):e12200, 2010). Thanks to Alex Upton for the implementation.

lvcent <- function(graph){
    k <- degree(graph)
    n <- vcount(graph)
    sapply(1:n, function(v) { mean((k[v]-k[neighbors(graph,v)]) / (k[v]+k[neighbors(graph,v)])) })
}

9. Animating temporal networks

Esteban Moro Egido has created some nice visualizations of temporal networks (see this one and this one) using igraph and R. He has also written a short tutorial about how to make such visualizations using igraph and only 20 lines of R code.

Unless otherwise stated, the content of this page is licensed under GNU Free Documentation License.