
Getting Started with pKOI
getting-started.RmdIntroduction
pKOI (Proteomic Knowledge-Graph Omics Integration) is a
network-based enrichment analysis tool that integrates differential
proteomics data with a heterogeneous biological knowledge graph. It uses
a personalized PageRank algorithm to prioritize and annotate
biologically relevant nodes (e.g., genes, diseases, pathways, cell
types) connected to significant proteins.
This vignette demonstrates a typical use case of pKOI
from input preparation to result interpretation.
Input Data
The input is a data.frame of differential proteomics
results, which must include:
-
uniprot_id: Unique UniProt identifier for each protein -
logfc: Log fold-change value -
p_value: P-value of differential expression
head(example_data1)Running the Analysis
Use run_pkoi() to analyze proteomics data with the
default pKOI knowledge graph:
result = run_pkoi(
proteomics_data = example_data1,
pvalue_threshold = 0.01,
logfc_threshold = 0,
topology_by = "degree",
topology_similarity = 0.9,
n_permutation = 10,
damping_factor = 0.85,
maximum_iteration = 500,
subnetwork = pkoi_net,
include_subnetwork = FALSE
)This process:
- Filters the proteomics dataset
- Computes personalized PageRank for significant proteins
- Simulates a null distribution using topological matching
- Calculates empirical statistics
- Annotates network nodes with ontology-based information
Exploring the Results
The output is an S4 object of class pKOIList:
resultTo access the annotated results:
names(result@network_summary_statistics)Each element in the list corresponds to a node type (e.g., “Anatomy”, “Pathway”, “Protein”). Here’s an example of inspecting enriched pathways:
head(result@network_summary_statistics$Pathway)Each table contains:
| Column | Description |
|---|---|
identifier |
Node identifier (e.g., GO, DOID, UniProt) |
pagerank |
Personalized PageRank value |
simulation_mean |
Mean PageRank under null distribution |
simulation_std |
Standard deviation of null PageRank |
beta |
Z-score of observed vs. null PageRank |
p_value |
Empirical p-value for enrichment |
fdr |
False Discovery Rate (adjusted p-value) |
Customizing the Topology Matching
The topology_by parameter controls how the algorithm
matches sham proteins for null simulation. Available options
include:
-
"degree": number of direct neighbors -
"coreness": node’s k-core value -
"betweenness": number of shortest paths through node -
"closeness": inverse distance to other nodes -
"constraint": structural redundancy in neighborhood -
"eccentricity": longest shortest path to any other node -
"eigen_centrality": importance based on neighbors’ influence -
"transitivity": local clustering coefficient
Example using "betweenness":
run_pkoi(
proteomics_data = example_data1,
topology_by = "betweenness"
)Visualization and Export
You can visualize or export any enriched node table:
write.csv(result@network_summary_statistics$Disease, "disease_enrichment.csv", row.names = FALSE)Conclusion
pKOI provides a flexible and interpretable framework for
identifying biologically relevant features from proteomic data using a
graph-based approach. By integrating ontology-based annotations with
network topology and permutation-based null models, it offers both
robustness and biological context.
For questions or contributions, email Wanjun Gu