Skip to contents

Introduction

pKOI (Proteomic Knowledge-Graph Omics Integration) is a network-based enrichment analysis tool that integrates differential proteomics data with a heterogeneous biological knowledge graph. It uses a personalized PageRank algorithm to prioritize and annotate biologically relevant nodes (e.g., genes, diseases, pathways, cell types) connected to significant proteins.

This vignette demonstrates a typical use case of pKOI from input preparation to result interpretation.

Input Data

The input is a data.frame of differential proteomics results, which must include:

  • uniprot_id: Unique UniProt identifier for each protein
  • logfc: Log fold-change value
  • p_value: P-value of differential expression
head(example_data1)

Running the Analysis

Use run_pkoi() to analyze proteomics data with the default pKOI knowledge graph:

result = run_pkoi(
  proteomics_data = example_data1,
  pvalue_threshold = 0.01,
  logfc_threshold = 0,
  topology_by = "degree",
  topology_similarity = 0.9,
  n_permutation = 10,
  damping_factor = 0.85,
  maximum_iteration = 500,
  subnetwork = pkoi_net,
  include_subnetwork = FALSE
)

This process:

  1. Filters the proteomics dataset
  2. Computes personalized PageRank for significant proteins
  3. Simulates a null distribution using topological matching
  4. Calculates empirical statistics
  5. Annotates network nodes with ontology-based information

Exploring the Results

The output is an S4 object of class pKOIList:

result

To access the annotated results:

names(result@network_summary_statistics)

Each element in the list corresponds to a node type (e.g., “Anatomy”, “Pathway”, “Protein”). Here’s an example of inspecting enriched pathways:

head(result@network_summary_statistics$Pathway)

Each table contains:

Column Description
identifier Node identifier (e.g., GO, DOID, UniProt)
pagerank Personalized PageRank value
simulation_mean Mean PageRank under null distribution
simulation_std Standard deviation of null PageRank
beta Z-score of observed vs. null PageRank
p_value Empirical p-value for enrichment
fdr False Discovery Rate (adjusted p-value)

Customizing the Topology Matching

The topology_by parameter controls how the algorithm matches sham proteins for null simulation. Available options include:

  • "degree": number of direct neighbors
  • "coreness": node’s k-core value
  • "betweenness": number of shortest paths through node
  • "closeness": inverse distance to other nodes
  • "constraint": structural redundancy in neighborhood
  • "eccentricity": longest shortest path to any other node
  • "eigen_centrality": importance based on neighbors’ influence
  • "transitivity": local clustering coefficient

Example using "betweenness":

run_pkoi(
  proteomics_data = example_data1,
  topology_by = "betweenness"
)

Visualization and Export

You can visualize or export any enriched node table:

write.csv(result@network_summary_statistics$Disease, "disease_enrichment.csv", row.names = FALSE)

Conclusion

pKOI provides a flexible and interpretable framework for identifying biologically relevant features from proteomic data using a graph-based approach. By integrating ontology-based annotations with network topology and permutation-based null models, it offers both robustness and biological context.

For questions or contributions, email Wanjun Gu