
Run tKOI Analysis
run_tkoi.Rd
This function performs a tKOI (Transcriptomic Knowledge-graph-driven Omics Integration) analysis on a gene expression dataset. The analysis integrates transcriptomic measurements with a biological knowledge graph to identify and interpret biologically enriched subnetworks. The pipeline includes the following steps:
Usage
run_tkoi(
expression_data,
subnetwork = tkoi::tkoi_net,
pvalue_threshold = 0.05,
logfc_threshold = 0.25,
indirect_link_threshold = 3,
topology_similarity = 0.9,
n_permutation = 100,
damping_factor = 0.85,
maximum_iteration = 500
)
Arguments
- expression_data
A data frame containing the gene expression data with columns for gene identifiers, log fold changes, and p-values.
- subnetwork
An igraph object representing the subnetwork to be analyzed. Default is
tkoi::tkoi_net
.- pvalue_threshold
A numeric value specifying the threshold for filtering genes based on p-values. Default is 0.05.
- logfc_threshold
A numeric value specifying the threshold for filtering genes based on log fold changes. Default is 0.25.
- indirect_link_threshold
A numeric value specifying the threshold for the least amount of genes to be 2-hop-connected to the propagated node. Default is 3.
- topology_similarity
A numeric value (between 0 and 1) defining the degree of similarity in topology for selecting substitute genes during permutation. Default is 0.9.
- n_permutation
An integer specifying the number of permutations for the network enrichment test. Default is 100.
- damping_factor
A numeric value for the damping factor used in the personalized PageRank calculation. Default is 0.85.
- maximum_iteration
An integer specifying the maximum number of iterations for the PageRank algorithm. Default is 500.
Value
An S4 object of class tKOIList
, containing the following slots:
expression_data
: The input gene expression data.pagerank_data
: A data frame of personalized PageRank scores for nodes in the subnetwork.network_summary_statistics
: A list of data frames, one for each node type, containing network enrichment statistics and functional annotations.
Details
Gene filtering: Significant genes are selected based on user-defined thresholds for p-value and log fold change.
Network mapping: These genes are mapped onto a predefined biological subnetwork using unique gene identifiers.
Personalized PageRank: A PageRank-based propagation is performed using log fold change-weighted probabilities, quantifying the influence of each node in the network.
Permutation testing: To assess statistical significance, the same propagation is repeated multiple times using randomly substituted genes with similar topological properties.
Network enrichment scoring: Each node's empirical enrichment score (beta) and p-value are computed by comparing observed PageRank to the permutation null distribution.
Node annotation: Nodes are annotated with domain-specific metadata (e.g., GO terms, diseases, cell types, compounds), drawn from curated knowledge resources linked to the graph.
Prioritization: Annotated results are organized by node type, with adjusted FDR values and connectivity statistics (e.g., direct/indirect neighbors) used to prioritize biologically relevant findings.
The result is a comprehensive, annotated list of network nodes enriched in the context of the transcriptomic changes observed in the input dataset.
The function takes gene expression data and integrates it with a predefined knowledge graph subnetwork. Genes are filtered based on their p-values, and personalized PageRank scores are computed to measure their importance within the network. To assess statistical significance, a permutation test is performed where the genes are replaced by substitutes with similar network topology. The results are annotated with network node information and functional descriptions.
The output includes:
Personalized PageRank scores for genes.
Permutation-based network enrichment statistics.
Annotated summary statistics for network nodes.
Examples
if (FALSE) { # \dontrun{
# Example gene expression data
expression_data <- data.frame(
gene_name = c("gene1", "gene2", "gene3"),
logfc = c(1.5, -0.8, 2.3),
pvalue = c(0.01, 0.05, 0.02)
)
# Run tKOI analysis
result <- run_tkoi(
expression_data = expression_data,
subnetwork = tkoi::tkoi_net,
pvalue_threshold = 0.05,
logfc_threshold = 0.5,
n_permutation = 100
)
# Access the results
result@pagerank_data
result@network_summary_statistics
} # }