Summary
Guide on how to create igraph
objects from pathways imported from Pathway Commons.
Package
graphsim 1.0.2
1 Importing Pathways
1.1 Motivations
Here we demonstrate how to create igraph
objects (Csardi and Nepusz 2006) for pathways compatible with graphsim
. We provide example objects with the package and these examples contain additional details showing how these are imported into R. This uses the paxtoolsr
package class=“citation”>(Luna et al 2015) from Bioconductor.
Graph object have edge properties (Barabási and Oltvai 2004). Here we show how to define the “state” parameter which can be used to differentiate inhibitions. We use different arrowheadsto show these as per convention in molecular biology.
1.2 Getting started
The bioconductor package to import data can be installed as follows.
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
::install("paxtoolsr") BiocManager
To generate perform these step the following packages must be imported.
library("igraph")
library("graphsim")
library("paxtoolsr")
1.3 Importing data
We will demonstrate downloading <a href=“https://reactome.org/>Reactome pathway (Croft et al. 2014) from the <a href=”http://www.pathwaycommons.org/>Pathway Commons. Reactome pathways are also available in the href="https://bioconductor.org/packages/release/data/annotation/html/reactome.db.html>reactome.db package (Fabregat et al. 2017; Ligtenberg 2019) but this only contains the gene set information. We use Pathway Commons as it contains direction graph structure information for the edges.
We downlaad the Pathway Commons release into Extended Simple Interaction Format (SIF) Network format. We use a legacy version (y) to reproduce the results.
# Importing data
downloadPc2(version = 7, selectedFileName = "Pathway%20Commons.7.Reactome.BIOPAX.owl.gz") results <-
However we recommend using the latest version which is:
print(latest_version)
#> [1] 12
Run the following code to download the new version.
The results will be cached here:
Sys.getenv("PAXTOOLSR_CACHE")
1.3.1 Searching results
We then query the results to find the pathways of interest. See the paxtoolsr vignette for details.
## Search Pathway Commons for 'PI3K'-related pathways
searchPc(q = "PI3K", type = "pathway", verbose = TRUE)
searchResults <-#> URL: http://www.pathwaycommons.org/pc2/search.xml?q=PI3K&page=0&type=pathway
xpathSApply(searchResults, "/searchResponse/searchHit/name", xmlValue)
pathways <-length(pathways)
#> [1] 100
head(pathways)
#> [1] "PI3K events in ERBB2 signaling"
#> [2] "Class IB PI3K non-lipid kinase events"
#> [3] "Erythropoietin activates Phosphoinositide-3-kinase (PI3K)"
#> [4] "Activated NTRK3 signals through PI3K"
#> [5] "Activated NTRK2 signals through PI3K"
#> [6] "Trk receptor signaling mediated by PI3K and PLC-gamma"
1.3.2 Downloading a Pathway
We can create a local OWL file in this manner.
## Search Pathway Commons for 'PI3K'-related pathways
searchPc(q = "PI3K Cascade", type = "pathway", verbose = TRUE)
searchResults <-#> URL: http://www.pathwaycommons.org/pc2/search.xml?q=PI3K%20Cascade&page=0&type=pathway
xpathSApply(searchResults, "/searchResponse/searchHit/name", xmlValue)
pathways <-length(pathways)
#> [1] 100
head(pathways)
#> [1] "PI3K Cascade"
#> [2] "IGF1R signaling cascade"
#> [3] "IRS-mediated signalling"
#> [4] "LPA receptor mediated events"
#> [5] "Insulin receptor signalling cascade"
#> [6] "EPHA2 forward signaling"
We then select the first pathway.
pathways[1]
pathway <-
pathway#> [1] "PI3K Cascade"
We save it to a temporary OWL file. First we extract the required columns.
library("plyr")
#>
#> Attaching package: 'plyr'
#> The following object is masked from 'package:paxtoolsr':
#>
#> summarize
#convert to data frame
ldply(xmlToList(searchResults), data.frame)
searchResultsDf <-dim(searchResultsDf)
#> [1] 105 22
# Simplified results
searchResultsDf[, c("name", "uri", "biopaxClass")]
simplifiedSearchResultsDf <-head(simplifiedSearchResultsDf)
#> name
#> 1 PI3K Cascade
#> 2 IGF1R signaling cascade
#> 3 IRS-mediated signalling
#> 4 LPA receptor mediated events
#> 5 Insulin receptor signalling cascade
#> 6 EPHA2 forward signaling
#> uri
#> 1 https://identifiers.org/reactome/R-HSA-109704
#> 2 https://identifiers.org/reactome/R-HSA-2428924
#> 3 https://identifiers.org/reactome/R-HSA-112399
#> 4 http://pathwaycommons.org/pc12/Pathway_ebbd43e6d7ede5ba46b0b03c4566c06f
#> 5 https://identifiers.org/reactome/R-HSA-74751
#> 6 http://pathwaycommons.org/pc12/Pathway_41112300a6e2adfd271ada175fc3f63d
#> biopaxClass
#> 1 Pathway
#> 2 Pathway
#> 3 Pathway
#> 4 Pathway
#> 5 Pathway
#> 6 Pathway
Then we write to a temp file.
## Use an XPath expression to extract the results of interest. In this case, the
## URIs (IDs) for the pathways from the results
xpathSApply(searchResults, "/searchResponse/searchHit/uri", xmlValue)
tmpSearchResults <-
## Generate temporary file to save content into
"bioxpax-reactome-pi3k-cascade.owl"
biopaxFile <-
## Extract a URI for a pathway in the search results and save into a file
which(grepl("reactome", simplifiedSearchResultsDf$uri) & grepl("PI3K Cascade",
idx <-$name, ignore.case = TRUE))
simplifiedSearchResultsDf simplifiedSearchResultsDf$uri[idx]
uri <-saveXML(getPc(uri, format = "BIOPAX"), file = biopaxFile)
#> [1] "bioxpax-reactome-pi3k-cascade.owl"
1.3.3 Create SIF object
We convert to th eExtended Simple Interaction Format (SIF) Network format. This gives a matrix of nodes for genes and edges for relationships bewteen.
toSifnx(inputFile = biopaxFile)
resultsSIF <-print(paste(c("nodes:", nrow(resultsSIF$nodes))))
#> [1] "nodes:" "44"
print(paste(c("edges:", nrow(resultsSIF$edges))))
#> [1] "edges:" "1219"
With node properties:
as.data.frame(resultsSIF$nodes)
results.nodesDF <-head(results.nodesDF)
#> PARTICIPANT PARTICIPANT_TYPE PARTICIPANT_NAME
#> 1 Q8NEB9 ProteinReference PK3C3_HUMAN
#> 2 Q06124 ProteinReference PTN11_HUMAN
#> 3 Q9UEF7 ProteinReference KLOT_HUMAN
#> 4 Q99570 ProteinReference PI3R4_HUMAN
#> 5 O95750 ProteinReference FGF19_HUMAN
#> 6 P12034 ProteinReference FGF5_HUMAN
#> UNIFICATION_XREF RELATIONSHIP_XREF
#> 1 uniprot knowledgebase:Q8NEB9 hgnc symbol:PIK3C3
#> 2 uniprot knowledgebase:Q06124 hgnc symbol:PTPN11
#> 3 uniprot knowledgebase:Q9UEF7 hgnc symbol:KL
#> 4 uniprot knowledgebase:Q99570 hgnc symbol:PIK3R4
#> 5 uniprot knowledgebase:O95750 hgnc symbol:FGF19
#> 6 uniprot knowledgebase:P12034 hgnc symbol:FGF5
With edge properties:
as.data.frame(resultsSIF$edges)
results.edgesDF <-head(results.edgesDF)
#> PARTICIPANT_A
#> 1 1-phosphatidyl-1D-myo-inositol 4,5-bisphosphate
#> 2 1-phosphatidyl-1D-myo-inositol 4,5-bisphosphate
#> 3 1-phosphatidyl-1D-myo-inositol 4,5-bisphosphate
#> 4 1-phosphatidyl-1D-myo-inositol 4,5-bisphosphate
#> 5 1-phosphatidyl-1D-myo-inositol 4,5-bisphosphate
#> 6 1-phosphatidyl-1D-myo-inositol 4,5-bisphosphate
#> INTERACTION_TYPE
#> 1 used-to-produce
#> 2 used-to-produce
#> 3 reacts-with
#> 4 consumption-controlled-by
#> 5 consumption-controlled-by
#> 6 consumption-controlled-by
#> PARTICIPANT_B
#> 1 1-phosphatidyl-1D-myo-inositol 3,4,5-trisphosphate
#> 2 ADP
#> 3 ATP
#> 4 O00459
#> 5 O15520
#> 6 O43320
#> INTERACTION_DATA_SOURCE
#> 1 Reactome
#> 2 Reactome
#> 3 Reactome
#> 4 Reactome
#> 5 Reactome
#> 6 Reactome
#> INTERACTION_PUBMED_ID PATHWAY_NAMES
#> 1 12660731;1381348;16847462;19805105;21827948;7543144 PI3K Cascade
#> 2 12660731;1381348;16847462;19805105;21827948;7543144 PI3K Cascade
#> 3 12660731;1381348;16847462;19805105;21827948;7543144 PI3K Cascade
#> 4 12660731;1381348;16847462;19805105;21827948;7543144 PI3K Cascade
#> 5 12660731;1381348;16847462;19805105;21827948;7543144 PI3K Cascade
#> 6 12660731;1381348;16847462;19805105;21827948;7543144 PI3K Cascade
We see that all genes and edges belong to the same Reactome pathway.
table(results.edgesDF$PATHWAY_NAMES)
#>
#> PI3K Cascade
#> 902
table(results.edgesDF$INTERACTION_DATA_SOURCE)
#>
#> Reactome
#> 1218
Edges are defined in several ways (some directional).
table(results.edgesDF$INTERACTION_TYPE)
#>
#> chemical-affects consumption-controlled-by
#> 30 78
#> controls-production-of in-complex-with
#> 78 286
#> neighbor-of reacts-with
#> 741 1
#> used-to-produce
#> 4
1.3.4 Filtering genes and metabolites
We can then optionally filter out edges that are not related to genes or proteins.
Either by filtering edges involving metabolites.
results.edgesDF[results.edgesDF$INTERACTION_TYPE != "chemical-affects",]
results.edgesDF <- results.edgesDF[results.edgesDF$INTERACTION_TYPE != "reacts-with",]
results.edgesDF <- results.edgesDF[results.edgesDF$INTERACTION_TYPE != "used-to-produce",]
results.edgesDF <-table(results.edgesDF$INTERACTION_TYPE)
#>
#> consumption-controlled-by controls-production-of
#> 78 78
#> in-complex-with neighbor-of
#> 286 741
Ions can be removed as follows while retaining other metabolites.
results.edgesDF[results.edgesDF[,1] != "2+",1:3]
results.edgesDF <- results.edgesDF[results.edgesDF[,3] != "2+",1:3]
results.edgesDF <- results.edgesDF[results.edgesDF[,1] != "3+",1:3]
results.edgesDF <- results.edgesDF[results.edgesDF[,3] != "3+",1:3] results.edgesDF <-
Alternatively by screening the nodes for proteins.
table(results.nodesDF$PARTICIPANT_TYPE)
#>
#> ProteinReference SmallMoleculeReference
#> 39 5
#extract protein nodes
results.nodesDF[results.nodesDF$PARTICIPANT_TYPE == "ProteinReference",]
results.nodesDF <-#match to edges
results.edgesDF[results.edgesDF$PARTICIPANT_A %in% results.nodesDF$PARTICIPANT,]
results.edgesDF <- results.edgesDF[results.edgesDF$PARTICIPANT_B %in% results.nodesDF$PARTICIPANT,]
results.edgesDF <-print(paste(c("nodes:", nrow(results.nodesDF))))
#> [1] "nodes:" "39"
print(paste(c("edges:", nrow(results.edgesDF))))
#> [1] "edges:" "1027"
1.3.5 Creating an igraph object
Then we create and edge list from the SIF object. First we match names between edges and participants to gene symbols.
#extract names
resultsSIF$nodes$PARTICIPANT_NAME
gene_names <-#replace with gene symbol (if defined)
grep("hgnc symbol:", resultsSIF$nodes$RELATIONSHIP_XREF)] <- sapply(strsplit(grep("hgnc symbol:", resultsSIF$nodes$RELATIONSHIP_XREF, value = TRUE), ":"), function(x) x[2])
gene_names[
gene_names#> [1] "PIK3C3"
#> [2] "PTPN11"
#> [3] "KL"
#> [4] "PIK3R4"
#> [5] "FGF19"
#> [6] "FGF5"
#> [7] "ADP"
#> [8] "FGF3"
#> [9] "FLT3"
#> [10] "FGFR2"
#> [11] "GRB2"
#> [12] "FRS2"
#> [13] "FGF8"
#> [14] "FGF7"
#> [15] "FGF1"
#> [16] "GAB1"
#> [17] "PIK3R1"
#> [18] "FGF2"
#> [19] "FGF4"
#> [20] "FGFR4"
#> [21] "FGF16"
#> [22] "PIK3R2"
#> [23] "FGF22"
#> [24] "FGF23"
#> [25] "FGF6"
#> [26] "FGF20"
#> [27] "FLT3LG"
#> [28] "IRS2"
#> [29] "TLR9"
#> [30] "KLB"
#> [31] "IRS1"
#> [32] "GAB2"
#> [33] "1-phosphatidyl-1D-myo-inositol 3,4,5-trisphosphate"
#> [34] "FGFR1"
#> [35] "FGF17"
#> [36] "FGF18"
#> [37] "heparan sulfate"
#> [38] "1-phosphatidyl-1D-myo-inositol 4,5-bisphosphate"
#> [39] "FGF9"
#> [40] "FGFR3"
#> [41] "FGF10"
#> [42] "PIK3CB"
#> [43] "ATP"
#> [44] "PIK3CA"
Match gene symbols to edge participants
$PARTICIPANT_A <- gene_names[match(results.edgesDF$PARTICIPANT_A, results.nodesDF$PARTICIPANT)]
results.edgesDF$PARTICIPANT_B <- gene_names[match(results.edgesDF$PARTICIPANT_B, results.nodesDF$PARTICIPANT)] results.edgesDF
head(results.edgesDF[,c(1, 3)])
#> PARTICIPANT_A PARTICIPANT_B
#> 86 FGF16 heparan sulfate
#> 87 FGF16 heparan sulfate
#> 88 FGF16 FGFR4
#> 89 FGF16 FGFR4
#> 90 FGF16 1-phosphatidyl-1D-myo-inositol 3,4,5-trisphosphate
#> 91 FGF16 1-phosphatidyl-1D-myo-inositol 3,4,5-trisphosphate
Create edge list with SIF edges.
library("igraph")
graph_from_edgelist(as.matrix(results.edgesDF[,c(1, 3)]))
g <-
g#> IGRAPH d711b49 DN-- 39 1027 --
#> + attr: name (v/c)
#> + edges from d711b49 (vertex names):
#> [1] FGF16->heparan sulfate
#> [2] FGF16->heparan sulfate
#> [3] FGF16->FGFR4
#> [4] FGF16->FGFR4
#> [5] FGF16->1-phosphatidyl-1D-myo-inositol 3,4,5-trisphosphate
#> [6] FGF16->1-phosphatidyl-1D-myo-inositol 3,4,5-trisphosphate
#> [7] FGF16->FGFR1
#> [8] FGF16->FGFR1
#> + ... omitted several edges
library("graphsim")
plot_directed(g, arrow_clip = 0.25, col.arrow = "grey75", cex.arrow = 0.5, fill.node = "lightblue", cex.node = 1.25)
2 Session info
Here is the output of sessionInfo()
on the system on which this document was compiled running pandoc 2.1:
#> R version 4.0.2 (2020-06-22)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Catalina 10.15.7
#>
#> Matrix products: default
#> BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
#> LAPACK: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib
#>
#> locale:
#> [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods
#> [7] base
#>
#> other attached packages:
#> [1] plyr_1.8.6 paxtoolsr_1.22.0 XML_3.99-0.5
#> [4] rJava_0.9-13 graphsim_1.0.2 igraph_1.2.6.9001
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.5 pillar_1.4.7 compiler_4.0.2
#> [4] BiocManager_1.30.10 R.methodsS3_1.8.1 bitops_1.0-6
#> [7] prettydoc_0.4.0 R.utils_2.10.1 tools_4.0.2
#> [10] digest_0.6.27 tibble_3.0.4 lifecycle_0.2.0
#> [13] jsonlite_1.7.1 evaluate_0.14 lattice_0.20-41
#> [16] pkgconfig_2.0.3 rlang_0.4.8 Matrix_1.2-18
#> [19] curl_4.3 yaml_2.2.1 mvtnorm_1.1-1
#> [22] xfun_0.19 stringr_1.4.0 httr_1.4.2
#> [25] knitr_1.30 vctrs_0.3.5 hms_0.5.3
#> [28] gtools_3.8.2 caTools_1.18.0 grid_4.0.2
#> [31] R6_2.5.0 rmarkdown_2.5 readr_1.4.0
#> [34] magrittr_2.0.1 ellipsis_0.3.1 gplots_3.1.0
#> [37] htmltools_0.5.0 matrixcalc_1.0-3 KernSmooth_2.23-18
#> [40] stringi_1.5.3 rjson_0.2.20 crayon_1.3.4
#> [43] R.oo_1.24.0
3 References
Barabási, A. L., and Oltvai, Z. N. 2004. “Network Biology: Understanding the Cell’s Functional Organization.” Nat Rev Genet 5 (2): 101–13.
Croft, D., Mundo, A. F., Haw, R., Milacic, M., Weiser, J., Wu, G., Caudy, M., et al. 2014. “The Reactome pathway knowledgebase.” Journal Article. Nucleic Acids Res 42 (database issue): D472–D477. https://doi.org/10.1093/nar/gkt1102.
Csardi, G., and Nepusz, T. 2006. “The Igraph Software Package for Complex Network Research.” InterJournal Complex Systems: 1695. https://igraph.org/.
Fabregat, A., Sidiropoulos, K., Viteri, G. et al. 2017. “Reactome pathway analysis: a high-performance in-memory approach.” BMC Bioinformatics 18: 1695. https://doi.org/10.1186/s12859-017-1559-2.
Ligtenberg W. 2019. “reactome.db: A set of annotation maps for reactome.” R package version 1.68.0. <a href="https://bioconductor.org/packages/release/data/annotation/html/reactome.db.html>https://bioconductor.org/packages/release/data/annotation/html/reactome.db.html.
Luna, A., Babur, Ö., Aksoy, A. B, Demir, E., Sander, C. 2016. “PaxtoolsR: Pathway Analysis in R Using Pathway Commons.” Bioinformaticsl 32 (8): 1262-4. <a href="https://doi.org/10.1093/bioinformatics/btv733>https://doi.org/10.1093/bioinformatics/btv733.