Protein Domain Annotations (Pfam) — proteindomain

A curated annotation table of protein domains based on Pfam identifiers. This dataset provides structured mappings from Pfam domain IDs to domain names, supporting the annotation and interpretation of domain-level biological features in knowledge graphs used by the pKOI framework.

Usage

proteindomain_annotation

Format

A data frame with 14193 rows and 2 columns:

identifier: A character string representing the Pfam domain identifier (e.g., "PF19543").
name: The name or description of the domain, often including structural or functional attributes (e.g., "Glycoside hydrolase 123, N-terminal domain").

Source

Pfam Database https://pfam.xfam.org/

Details

Protein domains represent conserved functional or structural units within proteins. This dataset enables enrichment and network-based inference at the domain level. The `identifier` column can be joined to graph nodes representing domains in pKOI output, while the `name` column adds interpretability for visualization and reporting.

Domains include a variety of structural motifs, enzyme catalytic regions, and uncharacterized conserved segments (e.g., DUFs).

Examples

data(proteindomain_annotation)
subset(proteindomain_annotation, grepl("hydrolase", name, ignore.case = TRUE)) |> head()
#>     identifier                                                         name
#> 1      PF19543                   Glycoside hydrolase 123, N-terminal domain
#> 46     PF00657                               GDSL-like Lipase/Acylhydrolase
#> 144    PF00933                Glycosyl hydrolase family 3 N terminal domain
#> 170    PF08307               Glycosyl hydrolase family 98 C-terminal domain
#> 224    PF02964                 Methane monooxygenase, hydrolase gamma chain
#> 231    PF02275 Linear amide C-N hydrolases, choloylglycine hydrolase family