Skip to contents

Transcriptomic Knowledge-Graph Omics Integration for Human Pathway Analysis

The tkoi package provides an integrative framework that combines transcriptomic data with a human-specific biological knowledge graph. This enables network-aware enrichment, functional interpretation, and gene prioritization via personalized PageRank and ontology-aware annotation.

Web Application

For non-power users, please use the web application of tKOI at tkoi.org.

Installation

To install the development version from GitHub:

# Install devtools if necessary
install.packages("devtools")

# Install tkoi
devtools::install_github("Broccolito/tkoi")

Core Features

  • Personalized PageRank propagation using transcriptomic weights
  • Permutation-based enrichment scoring for network nodes
  • Functional annotation using Gene Ontology, Disease Ontology, Cell Ontology, Reactome, and more
  • Modular and extensible S4 object design (tKOIList)
  • Export and visualization tools for enriched subnetworks
  • Seamless compatibility with clusterProfiler, enrichplot, and ggplot2 ## Getting Started

Example Workflow

This section walks you through a complete example using the tkoi package—from reading expression data, running the core network analysis, to visualizing enrichment results.

Step 1: Load Example Gene Expression Data

The tkoi package includes a small example CSV file containing simulated gene expression results. We’ll read it using data.table for performance.

library(tkoi)
library(data.table)

# Get the file path of the example expression data
file_path = system.file("extdata", "example_data.csv", package = "tkoi")

# Read the CSV file
expression_data = fread(file_path)
head(expression_data)

The file includes columns:

  • gene_name: Ensembl gene identifiers
  • logfc: log2 fold-change values
  • pvalue: associated p-values for differential expression

Step 2: Run tKOI Network Enrichment Analysis

tKOI integrates transcriptomic changes with a biological knowledge graph using a personalized PageRank algorithm. It also performs permutations to assess statistical enrichment.

tkoi_result = run_tkoi(
  expression_data = expression_data,
  subnetwork = tkoi::tkoi_net,    # Predefined igraph network included with the package
  pvalue_threshold = 0.05,        # p-value filter for differential expression
  logfc_threshold = 0.25,         # Minimum log fold change
  indirect_link_threshold = 3,    # Required indirect connectivity for downstream inclusion
  topology_similarity = 0.9,      # Similarity for selecting matched genes in permutations
  n_permutation = 100,            # Number of random permutations
  damping_factor = 0.85,          # PageRank damping factor
  maximum_iteration = 500         # Max iterations for convergence
)

The result is an S4 object (tKOIList) that stores PageRank scores, permutation statistics, and network annotations.

Step 3: Perform Gene Ontology (GO) Enrichment

You can extend the analysis by integrating GO term enrichment using clusterProfiler. This allows for side-by-side comparisons of ontology-based and graph-based enrichment.

tkoi_result = run_gene_enrichment(tkoi_result)

This adds a gene_enrichment_comparison slot containing GO enrichment tables and visual summaries.

Step 4: Visualize GO vs Graph Enrichment

Two visualizations are automatically generated:

Scatter Plot (All Terms)

tkoi_result@gene_enrichment_comparison$comparison_scatter1

Scatter Plot (Faceted by GO Namespace)

tkoi_result@gene_enrichment_comparison$comparison_scatter2

These plots compare tKOI network enrichment (beta) with gene ontology q-values.

Step 5: Visualize Differential Genes in the Network

The make_gene_exploration_plot() function highlights upregulated and downregulated genes in a scatter plot based on both experimental and network evidence.

plt1 = make_gene_exploration_plot(
  tkoi_list = tkoi_result,
  sig_color = "#F39B7FB2",
  non_sig_color = "gray"
)
plt1

Step 6: Export Gene-Level Prioritization Table

This returns a data frame containing logFC, p-values, PageRank scores, and FDRs for each gene.

gene_data = export_gene_exploration_data(tkoi_result)
head(gene_data)

Step 7: Visualize Top N Enriched Nodes

Use visualize_topn() to highlight the most significantly enriched genes, pathways, or biological concepts based on network-level statistics.

plt2 = visualize_topn(
  tkoi_list = tkoi_result,
  category = "Gene",       # Can also be "Pathway", "BiologicalProcess", etc.
  top_n = 25,
  high_color = "#FF5733",  # Strong enrichment
  low_color = "#154360"    # Moderate enrichment
)
plt2

Step 8: (Optional) Save the Analysis Result

Save your full analysis object for future use:

save(tkoi_result, file = "tkoi_result.rda")

S4 Object Structure

tKOIList is an S4 object returned by run_tkoi() with the following slots:

  • expression_data: Input transcriptomic measurements
  • pagerank_data: Personalized PageRank vectors
  • network_summary_statistics: Node-level enrichment results
  • gene_enrichment_comparison: GO enrichment overlay and plots

Annotation Resources

Built-in annotation tables support functional interpretation of the knowledge graph:

  • go_annotation, disease_annotation, celltype_annotation, anatomy_annotation
  • compound_annotation, protein_annotation, complex_annotation
  • reaction_annotation, pathway_annotation, pwgroup_annotation, etc.

Inspect them like so:

data(go_annotation)
head(go_annotation)

License

MIT + file LICENSE

Citation

Gu, W., Bellucci, G., Peetoom, B., McDonagh, M., & Baranzini, S. (in preparation). Integrating Large-Scale Knowledge Graphs to Enhance Transcriptomics Analysis.

Contact

Wanjun Gu ORCID: 0000-0002-7342-7000