Skip to contents

This function performs a network enrichment analysis using a personalized PageRank framework over a knowledge-graph-based protein interaction network. The input is differential proteomics data and the output is a list of node-wise enrichment statistics for various biological node types (e.g., anatomy, disease, GO terms) annotated based on their proximity and relevance to significant protein hits in the network.

Usage

run_pkoi(
  proteomics_data,
  pvalue_threshold = 0.01,
  logfc_threshold = 0,
  topology_by = "degree",
  topology_similarity = 0.9,
  n_permutation = 10,
  damping_factor = 0.85,
  maximum_iteration = 500,
  subnetwork = pkoi::pkoi_net,
  include_subnetwork = FALSE
)

Arguments

proteomics_data

A `data.frame` containing differential proteomics results. Must include columns:

uniprot_id

Unique UniProt identifier for each protein.

logfc

Log fold-change values.

p_value

P-values for differential expression.

pvalue_threshold

A numeric threshold for filtering proteins based on statistical significance (default = 0.01).

logfc_threshold

A numeric threshold for filtering proteins based on the magnitude of log fold-change (default = 0).

topology_by

A character string indicating which network topology metric to use when selecting topologically matched sham proteins. Valid options include:

`"degree"`

Number of direct neighbors (node connectivity).

`"coreness"`

k-core number indicating how deeply embedded a node is in the network core.

`"betweenness"`

Number of shortest paths passing through the node (bridge centrality).

`"closeness"`

Inverse sum of shortest path lengths from the node to all others (centrality via proximity).

`"constraint"`

Measure of how structurally constrained a node is by its neighbors (network redundancy).

`"eccentricity"`

Maximum shortest path distance from the node to any other node (network reach).

`"eigen_centrality"`

Centrality measure accounting for influence of neighbors (similar to Google's PageRank).

`"transitivity"`

Local clustering coefficient indicating the tendency to form triangles (closed triplets).

These metrics are precomputed for all nodes in the graph using `igraph::vertex_attr()`.

topology_similarity

A numeric value between 0 and 1 specifying the tolerance for selecting topologically similar sham proteins (default = 0.9). A value closer to 1 enforces tighter matching.

n_permutation

Integer specifying the number of null simulations to generate empirical background distributions (default = 10).

damping_factor

A numeric damping factor for the personalized PageRank algorithm (default = 0.85).

maximum_iteration

Maximum number of iterations allowed for PageRank convergence (default = 500).

subnetwork

An `igraph` object representing the subnetwork to analyze. Default is `pkoi::pkoi_net`.

include_subnetwork

Logical; whether to include the full subnetwork as part of the returned S4 object (default = FALSE).

Value

A `pKOIList` S4 object containing:

proteomics_data

The input proteomics data.

network_summary_statistics

A named list of data.frames containing enrichment statistics per node type (e.g., Anatomy, Disease).

subnetwork

(Optional) The input network graph, if `include_subnetwork = TRUE`.

parameters

Parameters used in the analysis.

Details

The function proceeds in several stages:

  1. Filters proteomics data for significance.

  2. Maps significant proteins to the graph and calculates a weighted personalized vector.

  3. Performs personalized PageRank.

  4. Generates null distributions using topologically matched sham proteins.

  5. Computes z-score and empirical p-values for all graph nodes.

  6. Annotates enriched nodes using built-in ontologies and returns the annotated result.