Getting Started with pKOI • pkoi

Introduction

pKOI (Proteomic Knowledge-Graph Omics Integration) is a network-based enrichment analysis tool that integrates differential proteomics data with a heterogeneous biological knowledge graph. It uses a personalized PageRank algorithm to prioritize and annotate biologically relevant nodes (e.g., genes, diseases, pathways, cell types) connected to significant proteins.

This vignette demonstrates a typical use case of pKOI from input preparation to result interpretation.

Input Data

The input is a data.frame of differential proteomics results, which must include:

uniprot_id: Unique UniProt identifier for each protein
logfc: Log fold-change value
p_value: P-value of differential expression

head(example_data1)

Running the Analysis

Use run_pkoi() to analyze proteomics data with the default pKOI knowledge graph:

result = run_pkoi(
  proteomics_data = example_data1,
  pvalue_threshold = 0.01,
  logfc_threshold = 0,
  topology_by = "degree",
  topology_similarity = 0.9,
  n_permutation = 10,
  damping_factor = 0.85,
  maximum_iteration = 500,
  subnetwork = pkoi_net,
  include_subnetwork = FALSE
)

This process:

Filters the proteomics dataset
Computes personalized PageRank for significant proteins
Simulates a null distribution using topological matching
Calculates empirical statistics
Annotates network nodes with ontology-based information

Exploring the Results

The output is an S4 object of class pKOIList:

result

To access the annotated results:

names(result@network_summary_statistics)

Each element in the list corresponds to a node type (e.g., “Anatomy”, “Pathway”, “Protein”). Here’s an example of inspecting enriched pathways:

head(result@network_summary_statistics$Pathway)

Each table contains:

Column	Description
`identifier`	Node identifier (e.g., GO, DOID, UniProt)
`pagerank`	Personalized PageRank value
`simulation_mean`	Mean PageRank under null distribution
`simulation_std`	Standard deviation of null PageRank
`beta`	Z-score of observed vs. null PageRank
`p_value`	Empirical p-value for enrichment
`fdr`	False Discovery Rate (adjusted p-value)

Customizing the Topology Matching

The topology_by parameter controls how the algorithm matches sham proteins for null simulation. Available options include:

"degree": number of direct neighbors
"coreness": node’s k-core value
"betweenness": number of shortest paths through node
"closeness": inverse distance to other nodes
"constraint": structural redundancy in neighborhood
"eccentricity": longest shortest path to any other node
"eigen_centrality": importance based on neighbors’ influence
"transitivity": local clustering coefficient

Example using "betweenness":

run_pkoi(
  proteomics_data = example_data1,
  topology_by = "betweenness"
)

Visualization and Export

You can visualize or export any enriched node table:

write.csv(result@network_summary_statistics$Disease, "disease_enrichment.csv", row.names = FALSE)

Conclusion

pKOI provides a flexible and interpretable framework for identifying biologically relevant features from proteomic data using a graph-based approach. By integrating ontology-based annotations with network topology and permutation-based null models, it offers both robustness and biological context.

For questions or contributions, email Wanjun Gu