Skip to contents

This function performs a tKOI (Transcriptomic Knowledge-graph-driven Omics Integration) analysis on a gene expression dataset. The analysis integrates transcriptomic measurements with a biological knowledge graph to identify and interpret biologically enriched subnetworks. The pipeline includes the following steps:

Usage

run_tkoi(
  expression_data,
  subnetwork = tkoi::tkoi_net,
  pvalue_threshold = 0.05,
  logfc_threshold = 0.25,
  indirect_link_threshold = 3,
  topology_similarity = 0.9,
  n_permutation = 100,
  damping_factor = 0.85,
  maximum_iteration = 500
)

Arguments

expression_data

A data frame containing the gene expression data with columns for gene identifiers, log fold changes, and p-values.

subnetwork

An igraph object representing the subnetwork to be analyzed. Default is tkoi::tkoi_net.

pvalue_threshold

A numeric value specifying the threshold for filtering genes based on p-values. Default is 0.05.

logfc_threshold

A numeric value specifying the threshold for filtering genes based on log fold changes. Default is 0.25.

A numeric value specifying the threshold for the least amount of genes to be 2-hop-connected to the propagated node. Default is 3.

topology_similarity

A numeric value (between 0 and 1) defining the degree of similarity in topology for selecting substitute genes during permutation. Default is 0.9.

n_permutation

An integer specifying the number of permutations for the network enrichment test. Default is 100.

damping_factor

A numeric value for the damping factor used in the personalized PageRank calculation. Default is 0.85.

maximum_iteration

An integer specifying the maximum number of iterations for the PageRank algorithm. Default is 500.

Value

An S4 object of class tKOIList, containing the following slots:

  • expression_data: The input gene expression data.

  • pagerank_data: A data frame of personalized PageRank scores for nodes in the subnetwork.

  • network_summary_statistics: A list of data frames, one for each node type, containing network enrichment statistics and functional annotations.

Details

  1. Gene filtering: Significant genes are selected based on user-defined thresholds for p-value and log fold change.

  2. Network mapping: These genes are mapped onto a predefined biological subnetwork using unique gene identifiers.

  3. Personalized PageRank: A PageRank-based propagation is performed using log fold change-weighted probabilities, quantifying the influence of each node in the network.

  4. Permutation testing: To assess statistical significance, the same propagation is repeated multiple times using randomly substituted genes with similar topological properties.

  5. Network enrichment scoring: Each node's empirical enrichment score (beta) and p-value are computed by comparing observed PageRank to the permutation null distribution.

  6. Node annotation: Nodes are annotated with domain-specific metadata (e.g., GO terms, diseases, cell types, compounds), drawn from curated knowledge resources linked to the graph.

  7. Prioritization: Annotated results are organized by node type, with adjusted FDR values and connectivity statistics (e.g., direct/indirect neighbors) used to prioritize biologically relevant findings.

The result is a comprehensive, annotated list of network nodes enriched in the context of the transcriptomic changes observed in the input dataset.

The function takes gene expression data and integrates it with a predefined knowledge graph subnetwork. Genes are filtered based on their p-values, and personalized PageRank scores are computed to measure their importance within the network. To assess statistical significance, a permutation test is performed where the genes are replaced by substitutes with similar network topology. The results are annotated with network node information and functional descriptions.

The output includes:

  • Personalized PageRank scores for genes.

  • Permutation-based network enrichment statistics.

  • Annotated summary statistics for network nodes.

See also

page_rank, dplyr, compute_stat

Examples

if (FALSE) { # \dontrun{
# Example gene expression data
expression_data <- data.frame(
  gene_name = c("gene1", "gene2", "gene3"),
  logfc = c(1.5, -0.8, 2.3),
  pvalue = c(0.01, 0.05, 0.02)
)

# Run tKOI analysis
result <- run_tkoi(
  expression_data = expression_data,
  subnetwork = tkoi::tkoi_net,
  pvalue_threshold = 0.05,
  logfc_threshold = 0.5,
  n_permutation = 100
)

# Access the results
result@pagerank_data
result@network_summary_statistics
} # }