
Protein Annotations (UniProt-Based)
protein_annotation.RdA reference annotation table mapping protein identifiers to gene symbols and functional descriptions. The dataset is derived primarily from UniProt and related sources, and is used within the pKOI framework to annotate nodes of type "Protein" in biological knowledge graphs.
Format
A data frame with 194076 rows and 3 columns:
- identifier
A character string representing the UniProt protein accession (e.g., "A0A023HJ61").
- name
The gene symbol or short name for the protein (e.g., "RAB4A"). This may be `NA` for uncharacterized or unnamed proteins.
- description
A textual description of the protein’s function or classification, including enzyme commission (EC) numbers, if available (e.g., "ATP synthase subunit a").
Source
UniProt https://www.uniprot.org/
Details
This dataset supports the interpretation of graph-based analyses by providing biological context for protein nodes. Descriptions are derived from automated or curated functional annotations such as RuleBase and ECO evidence codes.
The `identifier` column is intended to be used as a join key for protein-related nodes in the pKOI network output, while the `description` field adds depth to functional interpretation.
Examples
data(protein_annotation)
subset(protein_annotation, grepl("ATP", description, ignore.case = TRUE)) |> head()
#> identifier name description
#> 6 A0A023I7H5 ATP6 ATP synthase subunit a
#> 8 A0A023I7L8 ATP6 ATP synthase subunit a
#> 10 A0A023I7N7 ATP6 ATP synthase subunit a
#> 14 A0A023I7V4 ATP6 ATP synthase subunit a
#> 22 A0A023I889 ATP6 ATP synthase subunit a
#> 27 A0A023I8F1 ATP6 ATP synthase subunit a