Skip to contents

pKOI (Proteomic Knowledge-Graph Omics Integration) is an R package that integrates differential proteomics data with a heterogeneous biological knowledge graph. It identifies biologically enriched nodes and pathways using personalized PageRank propagation, topological null simulations, and ontology annotations.

Installation

You can install the development version of pKOI from GitHub using:

# install.packages("devtools")
devtools::install_github("Broccolito/pkoi")

Make sure you have all system dependencies installed for igraph, dplyr, data.table, purrr, and knitr.

Overview

The core function in this package is run_pkoi(), which:

  • Filters differential proteomics data for significance.
  • Maps proteins to a biological network.
  • Computes personalized PageRank using effect sizes.
  • Performs permutation testing to simulate network background.
  • Annotates significantly enriched nodes with ontology information.

Example

library(pkoi)

# Run pKOI on example proteomics data
result = run_pkoi(
  proteomics_data = pkoi::example_data1,
  pvalue_threshold = 0.01,
  logfc_threshold = 0,
  topology_by = "degree",
  topology_similarity = 0.9,
  n_permutation = 10,
  damping_factor = 0.85,
  maximum_iteration = 500,
  subnetwork = pkoi::pkoi_net,
  include_subnetwork = FALSE
)

The output is a pKOIList S4 object with the following slots:

  • proteomics_data: your input data with UniProt IDs, logFC, and p-values.
  • network_summary_statistics: a list of data frames annotated by node type (e.g., Disease, Anatomy, GO terms).
  • subnetwork (optional): the full igraph object if include_subnetwork = TRUE.

Input Requirements

Your input proteomics_data should be a data.frame with the following columns:

  • uniprot_id: character vector of UniProt IDs
  • logfc: numeric log fold-change values
  • p_value: numeric significance values

Example:

head(pkoi::example_data1)
uniprot_id logfc p_value
Q8WU39 2.2675323 0.0000350
P09326 1.0652402 0.0000757
P01624 1.3610007 0.0009597
P06312 1.4822463 0.0010210
A0A0A0MRZ8 1.6225022 0.0013218
P49863 1.0998201 0.0023031

Output Format

The output object includes annotated tables for various biological node types, such as:

  • Anatomy
  • Disease
  • Biological Process
  • Molecular Function
  • Cell Type
  • Compound
  • Clinical Lab
  • Pathway
  • Protein Domain / Family

Each table contains:

Column Description
identifier Node identifier (e.g., UBERON, GO, DOID)
pagerank Personalized PageRank value
simulation_mean Null distribution mean for PageRank
simulation_std Null distribution std dev
beta Z-score of observed vs null PageRank
p_value Empirical p-value
fdr FDR-corrected p-value

Parameters

Argument Description
pvalue_threshold Minimum p-value threshold for proteins to include (default: 0.01)
logfc_threshold Minimum absolute logFC threshold (default: 0)
topology_by Topology attribute for null matching: degree, coreness, closeness, etc.
topology_similarity Tolerance when matching null proteins (default: 0.9)
n_permutation Number of permutations for null simulation (default: 10)
damping_factor Damping factor for PageRank (default: 0.85)
maximum_iteration Max iterations for PageRank (default: 500)
include_subnetwork Whether to include the subnetwork in the output (default: FALSE)

Citation

If you use pKOI in your research, please cite:

Wanjun Gu. pKOI: Proteomic Knowledge-Graph Omics Integration, UCSF, 2025. DOI: Coming soon

License

This package is distributed under the MIT license. See LICENSE file for details.

Contact

Created and maintained by Wanjun Gu.