Summary

Guide on how to create igraph objects from pathways imported from Pathway Commons.

Package

graphsim 1.0.2

1 Importing Pathways

1.1 Motivations

Here we demonstrate how to create igraph objects (Csardi and Nepusz 2006) for pathways compatible with graphsim. We provide example objects with the package and these examples contain additional details showing how these are imported into R. This uses the paxtoolsr package class=“citation”>(Luna et al 2015) from Bioconductor.

Graph object have edge properties (Barabási and Oltvai 2004). Here we show how to define the “state” parameter which can be used to differentiate inhibitions. We use different arrowheadsto show these as per convention in molecular biology.

1.2 Getting started

The bioconductor package to import data can be installed as follows.

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("paxtoolsr")

To generate perform these step the following packages must be imported.

library("igraph")
library("graphsim")
library("paxtoolsr")

1.3 Importing data

We will demonstrate downloading <a href=“https://reactome.org/>Reactome pathway (Croft et al. 2014) from the <a href=”http://www.pathwaycommons.org/>Pathway Commons. Reactome pathways are also available in the href="https://bioconductor.org/packages/release/data/annotation/html/reactome.db.html>reactome.db package (Fabregat et al. 2017; Ligtenberg 2019) but this only contains the gene set information. We use Pathway Commons as it contains direction graph structure information for the edges.

We downlaad the Pathway Commons release into Extended Simple Interaction Format (SIF) Network format. We use a legacy version (y) to reproduce the results.

# Importing data
results <- downloadPc2(version = 7, selectedFileName = "Pathway%20Commons.7.Reactome.BIOPAX.owl.gz")

However we recommend using the latest version which is:

print(latest_version)
#> [1] 12

Run the following code to download the new version.

The results will be cached here:

Sys.getenv("PAXTOOLSR_CACHE")

1.3.1 Searching results

We then query the results to find the pathways of interest. See the paxtoolsr vignette for details.

## Search Pathway Commons for 'PI3K'-related pathways
searchResults <- searchPc(q = "PI3K", type = "pathway", verbose = TRUE)
#> URL:  http://www.pathwaycommons.org/pc2/search.xml?q=PI3K&page=0&type=pathway
pathways <- xpathSApply(searchResults, "/searchResponse/searchHit/name", xmlValue)
length(pathways)
#> [1] 100
head(pathways)
#> [1] "PI3K events in ERBB2 signaling"                           
#> [2] "Class IB PI3K non-lipid kinase events"                    
#> [3] "Erythropoietin activates Phosphoinositide-3-kinase (PI3K)"
#> [4] "Activated NTRK3 signals through PI3K"                     
#> [5] "Activated NTRK2 signals through PI3K"                     
#> [6] "Trk receptor signaling mediated by PI3K and PLC-gamma"

1.3.2 Downloading a Pathway

We can create a local OWL file in this manner.

## Search Pathway Commons for 'PI3K'-related pathways
searchResults <- searchPc(q = "PI3K Cascade", type = "pathway", verbose = TRUE)
#> URL:  http://www.pathwaycommons.org/pc2/search.xml?q=PI3K%20Cascade&page=0&type=pathway
pathways <- xpathSApply(searchResults, "/searchResponse/searchHit/name", xmlValue)
length(pathways)
#> [1] 100
head(pathways)
#> [1] "PI3K Cascade"                       
#> [2] "IGF1R signaling cascade"            
#> [3] "IRS-mediated signalling"            
#> [4] "LPA receptor mediated events"       
#> [5] "Insulin receptor signalling cascade"
#> [6] "EPHA2 forward signaling"

We then select the first pathway.

pathway <- pathways[1]
pathway
#> [1] "PI3K Cascade"

We save it to a temporary OWL file. First we extract the required columns.

library("plyr")
#> 
#> Attaching package: 'plyr'
#> The following object is masked from 'package:paxtoolsr':
#> 
#>     summarize

#convert to data frame
searchResultsDf <- ldply(xmlToList(searchResults), data.frame)
dim(searchResultsDf)
#> [1] 105  22
# Simplified results
simplifiedSearchResultsDf <- searchResultsDf[, c("name", "uri", "biopaxClass")]
head(simplifiedSearchResultsDf)
#>                                  name
#> 1                        PI3K Cascade
#> 2             IGF1R signaling cascade
#> 3             IRS-mediated signalling
#> 4        LPA receptor mediated events
#> 5 Insulin receptor signalling cascade
#> 6             EPHA2 forward signaling
#>                                                                       uri
#> 1                            https://identifiers.org/reactome/R-HSA-109704
#> 2                           https://identifiers.org/reactome/R-HSA-2428924
#> 3                            https://identifiers.org/reactome/R-HSA-112399
#> 4 http://pathwaycommons.org/pc12/Pathway_ebbd43e6d7ede5ba46b0b03c4566c06f
#> 5                             https://identifiers.org/reactome/R-HSA-74751
#> 6 http://pathwaycommons.org/pc12/Pathway_41112300a6e2adfd271ada175fc3f63d
#>   biopaxClass
#> 1     Pathway
#> 2     Pathway
#> 3     Pathway
#> 4     Pathway
#> 5     Pathway
#> 6     Pathway

Then we write to a temp file.

## Use an XPath expression to extract the results of interest. In this case, the
## URIs (IDs) for the pathways from the results
tmpSearchResults <- xpathSApply(searchResults, "/searchResponse/searchHit/uri", xmlValue)

## Generate temporary file to save content into
biopaxFile <- "bioxpax-reactome-pi3k-cascade.owl"

## Extract a URI for a pathway in the search results and save into a file
idx <- which(grepl("reactome", simplifiedSearchResultsDf$uri) & grepl("PI3K Cascade", 
    simplifiedSearchResultsDf$name, ignore.case = TRUE))
uri <- simplifiedSearchResultsDf$uri[idx]
saveXML(getPc(uri, format = "BIOPAX"), file = biopaxFile)
#> [1] "bioxpax-reactome-pi3k-cascade.owl"

1.3.3 Create SIF object

We convert to th eExtended Simple Interaction Format (SIF) Network format. This gives a matrix of nodes for genes and edges for relationships bewteen.

resultsSIF <- toSifnx(inputFile = biopaxFile)
print(paste(c("nodes:", nrow(resultsSIF$nodes))))
#> [1] "nodes:" "44"
print(paste(c("edges:", nrow(resultsSIF$edges))))
#> [1] "edges:" "1219"

With node properties:

results.nodesDF <- as.data.frame(resultsSIF$nodes) 
head(results.nodesDF)
#>   PARTICIPANT PARTICIPANT_TYPE PARTICIPANT_NAME
#> 1      Q8NEB9 ProteinReference      PK3C3_HUMAN
#> 2      Q06124 ProteinReference      PTN11_HUMAN
#> 3      Q9UEF7 ProteinReference       KLOT_HUMAN
#> 4      Q99570 ProteinReference      PI3R4_HUMAN
#> 5      O95750 ProteinReference      FGF19_HUMAN
#> 6      P12034 ProteinReference       FGF5_HUMAN
#>               UNIFICATION_XREF  RELATIONSHIP_XREF
#> 1 uniprot knowledgebase:Q8NEB9 hgnc symbol:PIK3C3
#> 2 uniprot knowledgebase:Q06124 hgnc symbol:PTPN11
#> 3 uniprot knowledgebase:Q9UEF7     hgnc symbol:KL
#> 4 uniprot knowledgebase:Q99570 hgnc symbol:PIK3R4
#> 5 uniprot knowledgebase:O95750  hgnc symbol:FGF19
#> 6 uniprot knowledgebase:P12034   hgnc symbol:FGF5

With edge properties:

results.edgesDF <- as.data.frame(resultsSIF$edges) 
head(results.edgesDF)
#>                                     PARTICIPANT_A
#> 1 1-phosphatidyl-1D-myo-inositol 4,5-bisphosphate
#> 2 1-phosphatidyl-1D-myo-inositol 4,5-bisphosphate
#> 3 1-phosphatidyl-1D-myo-inositol 4,5-bisphosphate
#> 4 1-phosphatidyl-1D-myo-inositol 4,5-bisphosphate
#> 5 1-phosphatidyl-1D-myo-inositol 4,5-bisphosphate
#> 6 1-phosphatidyl-1D-myo-inositol 4,5-bisphosphate
#>            INTERACTION_TYPE
#> 1           used-to-produce
#> 2           used-to-produce
#> 3               reacts-with
#> 4 consumption-controlled-by
#> 5 consumption-controlled-by
#> 6 consumption-controlled-by
#>                                        PARTICIPANT_B
#> 1 1-phosphatidyl-1D-myo-inositol 3,4,5-trisphosphate
#> 2                                                ADP
#> 3                                                ATP
#> 4                                             O00459
#> 5                                             O15520
#> 6                                             O43320
#>   INTERACTION_DATA_SOURCE
#> 1                Reactome
#> 2                Reactome
#> 3                Reactome
#> 4                Reactome
#> 5                Reactome
#> 6                Reactome
#>                                 INTERACTION_PUBMED_ID PATHWAY_NAMES
#> 1 12660731;1381348;16847462;19805105;21827948;7543144  PI3K Cascade
#> 2 12660731;1381348;16847462;19805105;21827948;7543144  PI3K Cascade
#> 3 12660731;1381348;16847462;19805105;21827948;7543144  PI3K Cascade
#> 4 12660731;1381348;16847462;19805105;21827948;7543144  PI3K Cascade
#> 5 12660731;1381348;16847462;19805105;21827948;7543144  PI3K Cascade
#> 6 12660731;1381348;16847462;19805105;21827948;7543144  PI3K Cascade

We see that all genes and edges belong to the same Reactome pathway.

table(results.edgesDF$PATHWAY_NAMES)
#> 
#> PI3K Cascade 
#>          902
table(results.edgesDF$INTERACTION_DATA_SOURCE)
#> 
#> Reactome 
#>     1218

Edges are defined in several ways (some directional).

table(results.edgesDF$INTERACTION_TYPE)
#> 
#>          chemical-affects consumption-controlled-by 
#>                        30                        78 
#>    controls-production-of           in-complex-with 
#>                        78                       286 
#>               neighbor-of               reacts-with 
#>                       741                         1 
#>           used-to-produce 
#>                         4

1.3.4 Filtering genes and metabolites

We can then optionally filter out edges that are not related to genes or proteins.

Either by filtering edges involving metabolites.

results.edgesDF <- results.edgesDF[results.edgesDF$INTERACTION_TYPE != "chemical-affects",]
results.edgesDF <- results.edgesDF[results.edgesDF$INTERACTION_TYPE != "reacts-with",]
results.edgesDF <- results.edgesDF[results.edgesDF$INTERACTION_TYPE != "used-to-produce",]
table(results.edgesDF$INTERACTION_TYPE)
#> 
#> consumption-controlled-by    controls-production-of 
#>                        78                        78 
#>           in-complex-with               neighbor-of 
#>                       286                       741

Ions can be removed as follows while retaining other metabolites.

results.edgesDF <- results.edgesDF[results.edgesDF[,1] != "2+",1:3]
results.edgesDF <- results.edgesDF[results.edgesDF[,3] != "2+",1:3]
results.edgesDF <- results.edgesDF[results.edgesDF[,1] != "3+",1:3]
results.edgesDF <- results.edgesDF[results.edgesDF[,3] != "3+",1:3]

Alternatively by screening the nodes for proteins.

table(results.nodesDF$PARTICIPANT_TYPE)
#> 
#>       ProteinReference SmallMoleculeReference 
#>                     39                      5
#extract protein nodes
results.nodesDF <- results.nodesDF[results.nodesDF$PARTICIPANT_TYPE == "ProteinReference",]
#match to edges
results.edgesDF <- results.edgesDF[results.edgesDF$PARTICIPANT_A %in% results.nodesDF$PARTICIPANT,]
results.edgesDF <- results.edgesDF[results.edgesDF$PARTICIPANT_B %in% results.nodesDF$PARTICIPANT,]
print(paste(c("nodes:", nrow(results.nodesDF))))
#> [1] "nodes:" "39"
print(paste(c("edges:", nrow(results.edgesDF))))
#> [1] "edges:" "1027"

1.3.5 Creating an igraph object

Then we create and edge list from the SIF object. First we match names between edges and participants to gene symbols.

#extract names
gene_names <- resultsSIF$nodes$PARTICIPANT_NAME
#replace with gene symbol (if defined)
gene_names[grep("hgnc symbol:", resultsSIF$nodes$RELATIONSHIP_XREF)] <- sapply(strsplit(grep("hgnc symbol:", resultsSIF$nodes$RELATIONSHIP_XREF, value = TRUE), ":"), function(x) x[2])
gene_names
#>  [1] "PIK3C3"                                            
#>  [2] "PTPN11"                                            
#>  [3] "KL"                                                
#>  [4] "PIK3R4"                                            
#>  [5] "FGF19"                                             
#>  [6] "FGF5"                                              
#>  [7] "ADP"                                               
#>  [8] "FGF3"                                              
#>  [9] "FLT3"                                              
#> [10] "FGFR2"                                             
#> [11] "GRB2"                                              
#> [12] "FRS2"                                              
#> [13] "FGF8"                                              
#> [14] "FGF7"                                              
#> [15] "FGF1"                                              
#> [16] "GAB1"                                              
#> [17] "PIK3R1"                                            
#> [18] "FGF2"                                              
#> [19] "FGF4"                                              
#> [20] "FGFR4"                                             
#> [21] "FGF16"                                             
#> [22] "PIK3R2"                                            
#> [23] "FGF22"                                             
#> [24] "FGF23"                                             
#> [25] "FGF6"                                              
#> [26] "FGF20"                                             
#> [27] "FLT3LG"                                            
#> [28] "IRS2"                                              
#> [29] "TLR9"                                              
#> [30] "KLB"                                               
#> [31] "IRS1"                                              
#> [32] "GAB2"                                              
#> [33] "1-phosphatidyl-1D-myo-inositol 3,4,5-trisphosphate"
#> [34] "FGFR1"                                             
#> [35] "FGF17"                                             
#> [36] "FGF18"                                             
#> [37] "heparan sulfate"                                   
#> [38] "1-phosphatidyl-1D-myo-inositol 4,5-bisphosphate"   
#> [39] "FGF9"                                              
#> [40] "FGFR3"                                             
#> [41] "FGF10"                                             
#> [42] "PIK3CB"                                            
#> [43] "ATP"                                               
#> [44] "PIK3CA"

Match gene symbols to edge participants

results.edgesDF$PARTICIPANT_A <- gene_names[match(results.edgesDF$PARTICIPANT_A, results.nodesDF$PARTICIPANT)]
results.edgesDF$PARTICIPANT_B <- gene_names[match(results.edgesDF$PARTICIPANT_B, results.nodesDF$PARTICIPANT)]
head(results.edgesDF[,c(1, 3)])
#>    PARTICIPANT_A                                      PARTICIPANT_B
#> 86         FGF16                                    heparan sulfate
#> 87         FGF16                                    heparan sulfate
#> 88         FGF16                                              FGFR4
#> 89         FGF16                                              FGFR4
#> 90         FGF16 1-phosphatidyl-1D-myo-inositol 3,4,5-trisphosphate
#> 91         FGF16 1-phosphatidyl-1D-myo-inositol 3,4,5-trisphosphate

Create edge list with SIF edges.

library("igraph")
g <- graph_from_edgelist(as.matrix(results.edgesDF[,c(1, 3)]))
g
#> IGRAPH d711b49 DN-- 39 1027 -- 
#> + attr: name (v/c)
#> + edges from d711b49 (vertex names):
#> [1] FGF16->heparan sulfate                                   
#> [2] FGF16->heparan sulfate                                   
#> [3] FGF16->FGFR4                                             
#> [4] FGF16->FGFR4                                             
#> [5] FGF16->1-phosphatidyl-1D-myo-inositol 3,4,5-trisphosphate
#> [6] FGF16->1-phosphatidyl-1D-myo-inositol 3,4,5-trisphosphate
#> [7] FGF16->FGFR1                                             
#> [8] FGF16->FGFR1                                             
#> + ... omitted several edges
library("graphsim")
plot_directed(g, arrow_clip = 0.25, col.arrow = "grey75", cex.arrow = 0.5, fill.node = "lightblue", cex.node = 1.25)


2 Session info

Here is the output of sessionInfo() on the system on which this document was compiled running pandoc 2.1:

#> R version 4.0.2 (2020-06-22)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Catalina 10.15.7
#> 
#> Matrix products: default
#> BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
#> LAPACK: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib
#> 
#> locale:
#> [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods  
#> [7] base     
#> 
#> other attached packages:
#> [1] plyr_1.8.6        paxtoolsr_1.22.0  XML_3.99-0.5     
#> [4] rJava_0.9-13      graphsim_1.0.2    igraph_1.2.6.9001
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.5          pillar_1.4.7        compiler_4.0.2     
#>  [4] BiocManager_1.30.10 R.methodsS3_1.8.1   bitops_1.0-6       
#>  [7] prettydoc_0.4.0     R.utils_2.10.1      tools_4.0.2        
#> [10] digest_0.6.27       tibble_3.0.4        lifecycle_0.2.0    
#> [13] jsonlite_1.7.1      evaluate_0.14       lattice_0.20-41    
#> [16] pkgconfig_2.0.3     rlang_0.4.8         Matrix_1.2-18      
#> [19] curl_4.3            yaml_2.2.1          mvtnorm_1.1-1      
#> [22] xfun_0.19           stringr_1.4.0       httr_1.4.2         
#> [25] knitr_1.30          vctrs_0.3.5         hms_0.5.3          
#> [28] gtools_3.8.2        caTools_1.18.0      grid_4.0.2         
#> [31] R6_2.5.0            rmarkdown_2.5       readr_1.4.0        
#> [34] magrittr_2.0.1      ellipsis_0.3.1      gplots_3.1.0       
#> [37] htmltools_0.5.0     matrixcalc_1.0-3    KernSmooth_2.23-18 
#> [40] stringi_1.5.3       rjson_0.2.20        crayon_1.3.4       
#> [43] R.oo_1.24.0

3 References

Barabási, A. L., and Oltvai, Z. N. 2004. “Network Biology: Understanding the Cell’s Functional Organization.” Nat Rev Genet 5 (2): 101–13.

Croft, D., Mundo, A. F., Haw, R., Milacic, M., Weiser, J., Wu, G., Caudy, M., et al. 2014. “The Reactome pathway knowledgebase.” Journal Article. Nucleic Acids Res 42 (database issue): D472–D477. https://doi.org/10.1093/nar/gkt1102.

Csardi, G., and Nepusz, T. 2006. “The Igraph Software Package for Complex Network Research.” InterJournal Complex Systems: 1695. https://igraph.org/.

Fabregat, A., Sidiropoulos, K., Viteri, G. et al. 2017. “Reactome pathway analysis: a high-performance in-memory approach.” BMC Bioinformatics 18: 1695. https://doi.org/10.1186/s12859-017-1559-2.

Ligtenberg W. 2019. “reactome.db: A set of annotation maps for reactome.” R package version 1.68.0. <a href="https://bioconductor.org/packages/release/data/annotation/html/reactome.db.html>https://bioconductor.org/packages/release/data/annotation/html/reactome.db.html.

Luna, A., Babur, Ö., Aksoy, A. B, Demir, E., Sander, C. 2016. “PaxtoolsR: Pathway Analysis in R Using Pathway Commons.” Bioinformaticsl 32 (8): 1262-4. <a href="https://doi.org/10.1093/bioinformatics/btv733>https://doi.org/10.1093/bioinformatics/btv733.