
Protein Domain Annotations (Pfam)
proteindomain_annotation.RdA curated annotation table of protein domains based on Pfam identifiers. This dataset provides structured mappings from Pfam domain IDs to domain names, supporting the annotation and interpretation of domain-level biological features in knowledge graphs used by the pKOI framework.
Format
A data frame with 14193 rows and 2 columns:
- identifier
A character string representing the Pfam domain identifier (e.g., "PF19543").
- name
The name or description of the domain, often including structural or functional attributes (e.g., "Glycoside hydrolase 123, N-terminal domain").
Source
Pfam Database https://pfam.xfam.org/
Details
Protein domains represent conserved functional or structural units within proteins. This dataset enables enrichment and network-based inference at the domain level. The `identifier` column can be joined to graph nodes representing domains in pKOI output, while the `name` column adds interpretability for visualization and reporting.
Domains include a variety of structural motifs, enzyme catalytic regions, and uncharacterized conserved segments (e.g., DUFs).
Examples
data(proteindomain_annotation)
subset(proteindomain_annotation, grepl("hydrolase", name, ignore.case = TRUE)) |> head()
#> identifier name
#> 1 PF19543 Glycoside hydrolase 123, N-terminal domain
#> 46 PF00657 GDSL-like Lipase/Acylhydrolase
#> 144 PF00933 Glycosyl hydrolase family 3 N terminal domain
#> 170 PF08307 Glycosyl hydrolase family 98 C-terminal domain
#> 224 PF02964 Methane monooxygenase, hydrolase gamma chain
#> 231 PF02275 Linear amide C-N hydrolases, choloylglycine hydrolase family