Skip to contents

A reference annotation table mapping protein identifiers to gene symbols and functional descriptions. The dataset is derived primarily from UniProt and related sources, and is used within the pKOI framework to annotate nodes of type "Protein" in biological knowledge graphs.

Usage

protein_annotation

Format

A data frame with 194076 rows and 3 columns:

identifier

A character string representing the UniProt protein accession (e.g., "A0A023HJ61").

name

The gene symbol or short name for the protein (e.g., "RAB4A"). This may be `NA` for uncharacterized or unnamed proteins.

description

A textual description of the protein’s function or classification, including enzyme commission (EC) numbers, if available (e.g., "ATP synthase subunit a").

Source

UniProt https://www.uniprot.org/

Details

This dataset supports the interpretation of graph-based analyses by providing biological context for protein nodes. Descriptions are derived from automated or curated functional annotations such as RuleBase and ECO evidence codes.

The `identifier` column is intended to be used as a join key for protein-related nodes in the pKOI network output, while the `description` field adds depth to functional interpretation.

Examples

data(protein_annotation)
subset(protein_annotation, grepl("ATP", description, ignore.case = TRUE)) |> head()
#>    identifier name            description
#> 6  A0A023I7H5 ATP6 ATP synthase subunit a
#> 8  A0A023I7L8 ATP6 ATP synthase subunit a
#> 10 A0A023I7N7 ATP6 ATP synthase subunit a
#> 14 A0A023I7V4 ATP6 ATP synthase subunit a
#> 22 A0A023I889 ATP6 ATP synthase subunit a
#> 27 A0A023I8F1 ATP6 ATP synthase subunit a