
Run pKOI Network Enrichment Analysis
run_pkoi.Rd
This function performs a network enrichment analysis using a personalized PageRank framework over a knowledge-graph-based protein interaction network. The input is differential proteomics data and the output is a list of node-wise enrichment statistics for various biological node types (e.g., anatomy, disease, GO terms) annotated based on their proximity and relevance to significant protein hits in the network.
Usage
run_pkoi(
proteomics_data,
pvalue_threshold = 0.01,
logfc_threshold = 0,
topology_by = "degree",
topology_similarity = 0.9,
n_permutation = 10,
damping_factor = 0.85,
maximum_iteration = 500,
subnetwork = pkoi::pkoi_net,
include_subnetwork = FALSE
)
Arguments
- proteomics_data
A `data.frame` containing differential proteomics results. Must include columns:
- uniprot_id
Unique UniProt identifier for each protein.
- logfc
Log fold-change values.
- p_value
P-values for differential expression.
- pvalue_threshold
A numeric threshold for filtering proteins based on statistical significance (default = 0.01).
- logfc_threshold
A numeric threshold for filtering proteins based on the magnitude of log fold-change (default = 0).
- topology_by
A character string indicating which network topology metric to use when selecting topologically matched sham proteins. Valid options include:
- `"degree"`
Number of direct neighbors (node connectivity).
- `"coreness"`
k-core number indicating how deeply embedded a node is in the network core.
- `"betweenness"`
Number of shortest paths passing through the node (bridge centrality).
- `"closeness"`
Inverse sum of shortest path lengths from the node to all others (centrality via proximity).
- `"constraint"`
Measure of how structurally constrained a node is by its neighbors (network redundancy).
- `"eccentricity"`
Maximum shortest path distance from the node to any other node (network reach).
- `"eigen_centrality"`
Centrality measure accounting for influence of neighbors (similar to Google's PageRank).
- `"transitivity"`
Local clustering coefficient indicating the tendency to form triangles (closed triplets).
These metrics are precomputed for all nodes in the graph using `igraph::vertex_attr()`.
- topology_similarity
A numeric value between 0 and 1 specifying the tolerance for selecting topologically similar sham proteins (default = 0.9). A value closer to 1 enforces tighter matching.
- n_permutation
Integer specifying the number of null simulations to generate empirical background distributions (default = 10).
- damping_factor
A numeric damping factor for the personalized PageRank algorithm (default = 0.85).
- maximum_iteration
Maximum number of iterations allowed for PageRank convergence (default = 500).
- subnetwork
An `igraph` object representing the subnetwork to analyze. Default is `pkoi::pkoi_net`.
- include_subnetwork
Logical; whether to include the full subnetwork as part of the returned S4 object (default = FALSE).
Value
A `pKOIList` S4 object containing:
- proteomics_data
The input proteomics data.
- network_summary_statistics
A named list of data.frames containing enrichment statistics per node type (e.g., Anatomy, Disease).
- subnetwork
(Optional) The input network graph, if `include_subnetwork = TRUE`.
- parameters
Parameters used in the analysis.
Details
The function proceeds in several stages:
Filters proteomics data for significance.
Maps significant proteins to the graph and calculates a weighted personalized vector.
Performs personalized PageRank.
Generates null distributions using topologically matched sham proteins.
Computes z-score and empirical p-values for all graph nodes.
Annotates enriched nodes using built-in ontologies and returns the annotated result.