pKOI (Proteomic Knowledge-Graph Omics Integration) is an R package that integrates differential proteomics data with a heterogeneous biological knowledge graph. It identifies biologically enriched nodes and pathways using personalized PageRank propagation, topological null simulations, and ontology annotations.
Installation
You can install the development version of pKOI
from GitHub using:
# install.packages("devtools")
devtools::install_github("Broccolito/pkoi")
Make sure you have all system dependencies installed for igraph
, dplyr
, data.table
, purrr
, and knitr
.
Overview
The core function in this package is run_pkoi()
, which:
- Filters differential proteomics data for significance.
- Maps proteins to a biological network.
- Computes personalized PageRank using effect sizes.
- Performs permutation testing to simulate network background.
- Annotates significantly enriched nodes with ontology information.
Example
library(pkoi)
# Run pKOI on example proteomics data
result = run_pkoi(
proteomics_data = pkoi::example_data1,
pvalue_threshold = 0.01,
logfc_threshold = 0,
topology_by = "degree",
topology_similarity = 0.9,
n_permutation = 10,
damping_factor = 0.85,
maximum_iteration = 500,
subnetwork = pkoi::pkoi_net,
include_subnetwork = FALSE
)
The output is a pKOIList
S4 object with the following slots:
-
proteomics_data
: your input data with UniProt IDs, logFC, and p-values. -
network_summary_statistics
: a list of data frames annotated by node type (e.g., Disease, Anatomy, GO terms). -
subnetwork
(optional): the full igraph object ifinclude_subnetwork = TRUE
.
Input Requirements
Your input proteomics_data
should be a data.frame
with the following columns:
-
uniprot_id
: character vector of UniProt IDs -
logfc
: numeric log fold-change values -
p_value
: numeric significance values
Example:
head(pkoi::example_data1)
uniprot_id | logfc | p_value |
---|---|---|
Q8WU39 | 2.2675323 | 0.0000350 |
P09326 | 1.0652402 | 0.0000757 |
P01624 | 1.3610007 | 0.0009597 |
P06312 | 1.4822463 | 0.0010210 |
A0A0A0MRZ8 | 1.6225022 | 0.0013218 |
P49863 | 1.0998201 | 0.0023031 |
Output Format
The output object includes annotated tables for various biological node types, such as:
- Anatomy
- Disease
- Biological Process
- Molecular Function
- Cell Type
- Compound
- Clinical Lab
- Pathway
- Protein Domain / Family
Each table contains:
Column | Description |
---|---|
identifier |
Node identifier (e.g., UBERON, GO, DOID) |
pagerank |
Personalized PageRank value |
simulation_mean |
Null distribution mean for PageRank |
simulation_std |
Null distribution std dev |
beta |
Z-score of observed vs null PageRank |
p_value |
Empirical p-value |
fdr |
FDR-corrected p-value |
Parameters
Argument | Description |
---|---|
pvalue_threshold |
Minimum p-value threshold for proteins to include (default: 0.01) |
logfc_threshold |
Minimum absolute logFC threshold (default: 0) |
topology_by |
Topology attribute for null matching: degree , coreness , closeness , etc. |
topology_similarity |
Tolerance when matching null proteins (default: 0.9) |
n_permutation |
Number of permutations for null simulation (default: 10) |
damping_factor |
Damping factor for PageRank (default: 0.85) |
maximum_iteration |
Max iterations for PageRank (default: 500) |
include_subnetwork |
Whether to include the subnetwork in the output (default: FALSE) |
Citation
If you use pKOI
in your research, please cite:
Wanjun Gu. pKOI: Proteomic Knowledge-Graph Omics Integration, UCSF, 2025. DOI: Coming soon
Contact
Created and maintained by Wanjun Gu.