Skip to contents

A curated annotation table of protein domains based on Pfam identifiers. This dataset provides structured mappings from Pfam domain IDs to domain names, supporting the annotation and interpretation of domain-level biological features in knowledge graphs used by the tKOI framework.

Usage

proteindomain_annotation

Format

A data frame with 14193 rows and 2 columns:

identifier

A character string representing the Pfam domain identifier (e.g., "PF19543").

name

The name or description of the domain, often including structural or functional attributes (e.g., "Glycoside hydrolase 123, N-terminal domain").

Source

Pfam Database https://pfam.xfam.org/

Details

Protein domains represent conserved functional or structural units within proteins. This dataset enables enrichment and network-based inference at the domain level. The identifier column can be joined to graph nodes representing domains in tKOI output, while the name column adds interpretability for visualization and reporting.

Domains include a variety of structural motifs, enzyme catalytic regions, and uncharacterized conserved segments (e.g., DUFs).

Examples

data(proteindomain_annotation)
subset(proteindomain_annotation, grepl("hydrolase", name, ignore.case = TRUE))
#>       identifier
#> 1        PF19543
#> 46       PF00657
#> 144      PF00933
#> 170      PF08307
#> 224      PF02964
#> 231      PF02275
#> 356      PF13318
#> 362      PF01670
#> 411      PF03512
#> 470      PF03636
#> 610      PF14508
#> 648      PF01915
#> 667      PF14323
#> 788      PF02638
#> 793      PF02550
#> 827      PF09370
#> 897      PF19420
#> 930      PF16923
#> 978      PF06342
#> 980      PF00443
#> 1101     PF16317
#> 1226     PF18271
#> 1278     PF00670
#> 1303     PF01227
#> 1311     PF13647
#> 1322     PF12471
#> 1343     PF02156
#> 1352     PF10605
#> 1417     PF16255
#> 1519     PF14196
#> 1583     PF13869
#> 1584     PF03747
#> 1667     PF07745
#> 1689     PF04273
#> 1700     PF18089
#> 1928     PF00723
#> 1931     PF05908
#> 1993     PF00722
#> 2006     PF17433
#> 2047     PF02011
#> 2114     PF16875
#> 2131     PF12215
#> 2190     PF02057
#> 2247     PF14587
#> 2340     PF03662
#> 2378     PF01183
#> 2491     PF00332
#> 2532     PF04996
#> 2552     PF06439
#> 2582     PF09752
#> 2618     PF20091
#> 2619     PF14509
#> 2647     PF00704
#> 2673     PF13423
#> 2868     PF03663
#> 3098     PF17167
#> 3153     PF02633
#> 3171     PF05838
#> 3185     PF01074
#> 3480     PF12697
#> 3509     PF12535
#> 3515     PF11308
#> 3652     PF08282
#> 3658     PF18565
#> 3832     PF18034
#> 3922     PF02015
#> 4000     PF08472
#> 4046     PF10566
#> 4078     PF10340
#> 4143     PF15979
#> 4146     PF07858
#> 4157     PF16862
#> 4172     PF18564
#> 4178     PF03537
#> 4193     PF02882
#> 4302     PF11790
#> 4314     PF01156
#> 4320     PF07461
#> 4326     PF00925
#> 4327     PF05165
#> 4429     PF07488
#> 4459     PF03403
#> 4524     PF02148
#> 4547     PF01503
#> 4610     PF01981
#> 4658     PF12891
#> 4660     PF07826
#> 4781     PF07859
#> 4865     PF14885
#> 4917     PF02837
#> 4963     PF17189
#> 4994     PF08306
#> 5028     PF14871
#> 5126     PF19718
#> 5328     PF02838
#> 5366     PF00150
#> 5390     PF01738
#> 5490     PF08244
#> 5628     PF17451
#> 5705     PF14883
#> 5801     PF03648
#> 6030     PF13419
#> 6102     PF13199
#> 6130     PF06259
#> 6132     PF09081
#> 6141     PF00728
#> 6154     PF03959
#> 6164     PF02836
#> 6204     PF06399
#> 6233     PF12695
#> 6237     PF00702
#> 6754     PF16123
#> 6755     PF17387
#> 6763     PF03664
#> 6802     PF04685
#> 6880     PF01301
#> 6967     PF01270
#> 6981     PF12715
#> 7028     PF00232
#> 7165     PF01374
#> 7255     PF03659
#> 7278     PF18031
#> 7435     PF12888
#> 7500     PF14736
#> 7576     PF04909
#> 7620     PF00457
#> 7686     PF17678
#> 7694     PF03632
#> 7697     PF13336
#> 7756     PF01088
#> 7948     PF02649
#> 8047     PF07521
#> 8106     PF02055
#> 8178     PF04616
#> 8189     PF17890
#> 8397     PF07477
#> 8484     PF01195
#> 8561     PF02324
#> 8636     PF08840
#> 8652     PF17652
#> 8694     PF12710
#> 8978     PF01979
#> 9013     PF09127
#> 9050     PF17677
#> 9124     PF03065
#> 9176     PF07486
#> 9192     PF05116
#> 9217     PF10230
#> 9506     PF03633
#> 9716     PF14872
#> 9724     PF03200
#> 9729     PF18230
#> 9732     PF03644
#> 9766     PF00295
#> 9805     PF18088
#> 9980     PF11975
#> 9991     PF03639
#> 10035    PF16874
#> 10057    PF16674
#> 10089    PF02289
#> 10090    PF03718
#> 10203    PF13472
#> 10273    PF07969
#> 10336    PF01557
#> 10408    PF01229
#> 10445    PF00703
#> 10481    PF05221
#> 10559    PF05013
#> 10619    PF00251
#> 10626    PF07176
#> 10659    PF20408
#> 10842    PF07335
#> 10878    PF10081
#> 11021    PF16822
#> 11038    PF09663
#> 11073    PF00795
#> 11088    PF00331
#> 11215    PF00561
#> 11263    PF09994
#> 11385    PF13872
#> 11420    PF03819
#> 11586    PF01341
#> 11683    PF00840
#> 11693    PF01532
#> 11700    PF01373
#> 11838    PF18438
#> 11913    PF00759
#> 11916    PF02056
#> 11935    PF13286
#> 12016    PF18290
#> 12044    PF14498
#> 12103    PF04447
#> 12158    PF13344
#> 12201    PF07748
#> 12302    PF04083
#> 12343    PF20309
#> 12352    PF04307
#> 12441    PF06441
#> 12459    PF17829
#> 12516    PF06821
#> 12597    PF07698
#> 12653    PF00763
#> 12724    PF01502
#> 12727    PF13200
#> 12854    PF04775
#> 13003    PF05990
#> 13019    PF07470
#> 13065    PF06028
#> 13075    PF14606
#> 13085    PF05382
#> 13161    PF07971
#> 13247    PF21365
#> 13258    PF07632
#> 13270    PF21246
#> 13290    PF08924
#> 13388    PF13320
#> 13426    PF16141
#> 13471    PF13802
#> 13519    PF21252
#> 13526    PF20811
#> 13529    PF21570
#> 13651    PF21087
#> 13658    PF05028
#> 13875    PF01055
#> 13941    PF21104
#> 14018    PF16347
#>                                                                       name
#> 1                               Glycoside hydrolase 123, N-terminal domain
#> 46                                          GDSL-like Lipase/Acylhydrolase
#> 144                          Glycosyl hydrolase family 3 N terminal domain
#> 170                         Glycosyl hydrolase family 98 C-terminal domain
#> 224                           Methane monooxygenase, hydrolase gamma chain
#> 231           Linear amide C-N hydrolases, choloylglycine hydrolase family
#> 356                            1-carboxybiuret hydrolase subunit AtzG-like
#> 362                                           Glycosyl hydrolase family 12
#> 411                                           Glycosyl hydrolase family 52
#> 470                       Glycosyl hydrolase family 65, N-terminal domain 
#> 610                                       Glycosyl-hydrolase 97 N-terminal
#> 648                          Glycosyl hydrolase family 3 C-terminal domain
#> 667                 GxGYxYP putative glycoside hydrolase C-terminal domain
#> 788                                             Glycosyl hydrolase-like 10
#> 793                     Acetyl-CoA hydrolase/transferase N-terminal domain
#> 827                                     Phosphoenolpyruvate hydrolase-like
#> 897                     N,N dimethylarginine dimethylhydrolase, eukaryotic
#> 930                         Glycosyl hydrolase family 63 N-terminal domain
#> 978                     Alpha/beta hydrolase of unknown function (DUF1057)
#> 980                                  Ubiquitin carboxyl-terminal hydrolase
#> 1101                                          Glycosyl hydrolase family 99
#> 1226                   Glycoside hydrolase 131 catalytic N-terminal domain
#> 1278               S-adenosyl-L-homocysteine hydrolase, NAD binding domain
#> 1303                                                  GTP cyclohydrolase I
#> 1311                         Glycosyl hydrolase family 80 of chitosanase A
#> 1322                                        GTP cyclohydrolase N terminal 
#> 1343                                          Glycosyl hydrolase family 26
#> 1352                                       3HB-oligomer hydrolase (3HBOH) 
#> 1417                                        GDSL-like Lipase/Acylhydrolase
#> 1519                      L-2-amino-thiazoline-4-carboxylic acid hydrolase
#> 1583                                                  Nucleotide hydrolase
#> 1584                                             ADP-ribosylglycohydrolase
#> 1667                                          Glycosyl hydrolase family 53
#> 1689        Beta-lactamase hydrolase-like protein, phosphatase-like domain
#> 1700                                            DAPG hydrolase PhiG domain
#> 1928                                         Glycosyl hydrolases family 15
#> 1931                                        Poly-gamma-glutamate hydrolase
#> 1993                                         Glycosyl hydrolases family 16
#> 2006                Glycosyl hydrolase family 49 N-terminal Ig-like domain
#> 2047                                          Glycosyl hydrolase family 48
#> 2114                        Glycosyl hydrolase family 36 N-terminal domain
#> 2131              beta-glucosidase 2, glycosyl-hydrolase family 116 N-term
#> 2190                                          Glycosyl hydrolase family 59
#> 2247                                        O-Glycosyl hydrolase family 30
#> 2340                      Glycosyl hydrolase family 79, N-terminal domain 
#> 2378                                         Glycosyl hydrolases family 25
#> 2491                                         Glycosyl hydrolases family 17
#> 2532                                          Succinylarginine dihydrolase
#> 2552                                         3-keto-disaccharide hydrolase
#> 2582                            Alpha/beta hydrolase domain containing 18 
#> 2618                                           Alpha/beta hydrolase domain
#> 2619                     Glycosyl-hydrolase 97 C-terminal, oligomerisation
#> 2647                                         Glycosyl hydrolases family 18
#> 2673                                 Ubiquitin carboxyl-terminal hydrolase
#> 2868                                         Glycosyl hydrolase family 76 
#> 3098                   Glycosyl hydrolase 36 superfamily, catalytic domain
#> 3153                                             Creatinine amidohydrolase
#> 3171                                                Glycosyl hydrolase 108
#> 3185                       Glycosyl hydrolases family 38 N-terminal domain
#> 3480                                           Alpha/beta hydrolase family
#> 3509              Hydrolase of X-linked nucleoside diphosphate N terminal 
#> 3515                    Glycosyl hydrolases related to GH101 family, GH129
#> 3652                                  haloacid dehalogenase-like hydrolase
#> 3658                      Glycoside hydrolase family 2 C-terminal domain 5
#> 3832               Bacterial Glycosyl hydrolase family 3 C-terminal domain
#> 3922                                          Glycosyl hydrolase family 45
#> 4000                       Sucrose-6-phosphate phosphohydrolase C-terminal
#> 4046                                              Glycoside hydrolase 97  
#> 4078                                               Steryl acetyl hydrolase
#> 4143                                         Glycosyl hydrolase family 115
#> 4146                       Limonene-1,2-epoxide hydrolase catalytic domain
#> 4157                   Glycosyl hydrolase family 79 C-terminal beta domain
#> 4172                        Glycoside hydrolase family 5 C-terminal domain
#> 4178                                      Glycoside-hydrolase family GH114
#> 4193  Tetrahydrofolate dehydrogenase/cyclohydrolase, NAD(P)-binding domain
#> 4302                                     Glycosyl hydrolase catalytic core
#> 4314                       Inosine-uridine preferring nucleoside hydrolase
#> 4320                 Nicotine adenine dinucleotide glycohydrolase (NADase)
#> 4326                                                 GTP cyclohydrolase II
#> 4327                                                GTP cyclohydrolase III
#> 4429                            Glycosyl hydrolase family 67 middle domain
#> 4459                Platelet-activating factor acetylhydrolase, isoform II
#> 4524                   Zn-finger in ubiquitin-hydrolases and other protein
#> 4547                               Phosphoribosyl-ATP pyrophosphohydrolase
#> 4610                                          Peptidyl-tRNA hydrolase PTH2
#> 4658                                         Glycoside hydrolase family 44
#> 4660                                       IMP cyclohydrolase-like protein
#> 4781                                             alpha/beta hydrolase fold
#> 4865                             Hypothetical glycosyl hydrolase family 15
#> 4917                    Glycosyl hydrolases family 2, sugar binding domain
#> 4963                     Glycosyl hydrolase family 30 beta sandwich domain
#> 4994                                          Glycosyl hydrolase family 98
#> 5028                                     Hypothetical glycosyl hydrolase 6
#> 5126                   Ubiquitin carboxyl-terminal hydrolase 47 C-terminal
#> 5328                                Glycosyl hydrolase family 20, domain 2
#> 5366                               Cellulase (glycosyl hydrolase family 5)
#> 5390                                         Dienelactone hydrolase family
#> 5490                              Glycosyl hydrolases family 32 C terminal
#> 5628                           Glycosyl hydrolase 101 beta sandwich domain
#> 5705                             Hypothetical glycosyl hydrolase family 13
#> 5801                               Glycosyl hydrolase family 67 N-terminus
#> 6030                                  Haloacid dehalogenase-like hydrolase
#> 6102                                          Glycosyl hydrolase family 66
#> 6130                                                  Alpha/beta hydrolase
#> 6132                       Glucan 1,4-alpha-maltotetraohydrolase, domain C
#> 6141                        Glycosyl hydrolase family 20, catalytic domain
#> 6154                                               Serine hydrolase (FSH1)
#> 6164                       Glycosyl hydrolases family 2, TIM barrel domain
#> 6204               GTP cyclohydrolase I feedback regulatory protein (GFRP)
#> 6233                                           Alpha/beta hydrolase family
#> 6237                                  haloacid dehalogenase-like hydrolase
#> 6754                           Hydroxyacylglutathione hydrolase C-terminus
#> 6755                           Glycosyl hydrolase family 59 central domain
#> 6763                                         Glycosyl hydrolase family 62 
#> 6802                       Glycosyl-hydrolase family 116, catalytic region
#> 6880                                         Glycosyl hydrolases family 35
#> 6967                                          Glycosyl hydrolases family 8
#> 6981                                                    Abhydrolase family
#> 7028                                           Glycosyl hydrolase family 1
#> 7165                                          Glycosyl hydrolase family 46
#> 7255                                          Glycosyl hydrolase family 71
#> 7278                                Ubiquitin carboxyl-terminal hydrolases
#> 7435                                      Lipid-binding putative hydrolase
#> 7500                          Protein N-terminal asparagine amidohydrolase
#> 7576                                                        Amidohydrolase
#> 7620                                         Glycosyl hydrolases family 11
#> 7686                        Glycosyl hydrolase family 92 N-terminal domain
#> 7694                 Glycosyl hydrolase family 65 central catalytic domain
#> 7697                    Acetyl-CoA hydrolase/transferase C-terminal domain
#> 7756                       Ubiquitin carboxyl-terminal hydrolase, family 1
#> 7948                                       Type I GTP cyclohydrolase folE2
#> 8047                 Zn-dependent metallo-hydrolase RNA specificity domain
#> 8106                        Glycosyl hydrolase family 30 TIM-barrel domain
#> 8178                                         Glycosyl hydrolases family 43
#> 8189                          Peptidoglycan hydrolase LytB WW-like domain 
#> 8397                               Glycosyl hydrolase family 67 C-terminus
#> 8484                                               Peptidyl-tRNA hydrolase
#> 8561                                          Glycosyl hydrolase family 70
#> 8636                        BAAT / Acyl-CoA thioester hydrolase C terminal
#> 8652                        Glycosyl hydrolase family 81 C-terminal domain
#> 8694                                  haloacid dehalogenase-like hydrolase
#> 8978                                                 Amidohydrolase family
#> 9013                                  Leukotriene A4 hydrolase, C-terminal
#> 9050         Glycosyl hydrolases family 38 C-terminal beta sandwich domain
#> 9124                                          Glycosyl hydrolase family 57
#> 9176                                                   Cell Wall Hydrolase
#> 9192                                 Sucrose-6F-phosphate phosphohydrolase
#> 9217                                    Lipid-droplet associated hydrolase
#> 9506                      Glycosyl hydrolase family 65, C-terminal domain 
#> 9716                                    Hypothetical glycoside hydrolase 5
#> 9724                        Glycosyl hydrolase family 63 C-terminal domain
#> 9729                   Glycosyl hydrolases family 38 C-terminal sub-domain
#> 9732                                         Glycosyl hydrolase family 85 
#> 9766                                         Glycosyl hydrolases family 28
#> 9805                             Glycoside Hydrolase 20C C-terminal domain
#> 9980                         Family 4 glycosyl hydrolase C-terminal domain
#> 9991                        Glycosyl hydrolase family 81 N-terminal domain
#> 10035                       Glycosyl hydrolase family 36 C-terminal domain
#> 10057               N-terminal of ubiquitin carboxyl-terminal hydrolase 37
#> 10089                                                 Cyclohydrolase (MCH)
#> 10090                                         Glycosyl hydrolase family 49
#> 10203                                GDSL-like Lipase/Acylhydrolase family
#> 10273                                                Amidohydrolase family
#> 10336                           Fumarylacetoacetate (FAA) hydrolase family
#> 10408                                        Glycosyl hydrolases family 39
#> 10445                                         Glycosyl hydrolases family 2
#> 10481                                  S-adenosyl-L-homocysteine hydrolase
#> 10559                                     N-formylglutamate amidohydrolase
#> 10619                      Glycosyl hydrolases family 32 N-terminal domain
#> 10626                   Alpha/beta hydrolase of unknown function (DUF1400)
#> 10659                                          Alpha/beta hydrolase domain
#> 10842                    Fungal chitosanase of glycosyl hydrolase group 75
#> 10878                                          Alpha/beta-hydrolase family
#> 11021                   SGNH hydrolase-like domain, acetyltransferase AlgX
#> 11038                Amidohydrolase ring-opening protein (Amido_AtzD_TrzD)
#> 11073                                            Carbon-nitrogen hydrolase
#> 11088                                         Glycosyl hydrolase family 10
#> 11215                                            alpha/beta hydrolase fold
#> 11263                Uncharacterized alpha/beta hydrolase domain (DUF2235)
#> 11385                               P-loop containing NTP hydrolase pore-1
#> 11420                          MazG nucleotide pyrophosphohydrolase domain
#> 11586                                         Glycosyl hydrolases family 6
#> 11683                                          Glycosyl hydrolase family 7
#> 11693                                         Glycosyl hydrolase family 47
#> 11700                                         Glycosyl hydrolase family 14
#> 11838                    Glycosyl hydrolases family 38 C-terminal domain 1
#> 11913                                          Glycosyl hydrolase family 9
#> 11916                                          Family 4 glycosyl hydrolase
#> 11935                                   Phosphohydrolase-associated domain
#> 12016                                               Nudix hydrolase domain
#> 12044                      Glycosyl hydrolase family 65, N-terminal domain
#> 12103                                       dATP/dGTP pyrophosphohydrolase
#> 12158                                 Haloacid dehalogenase-like hydrolase
#> 12201                      Glycosyl hydrolases family 38 C-terminal domain
#> 12302                           Partial alpha/beta-hydrolase lipase region
#> 12343             Deoxyribohydrolase (DRHyd) domain of the ASK signalosome
#> 12352           LexA-binding, inner membrane-associated putative hydrolase
#> 12441                                         Epoxide hydrolase N terminus
#> 12459                      Gylcosyl hydrolase family 115 C-terminal domain
#> 12516                                                     Serine hydrolase
#> 12597                         7TM receptor with intracellular HD hydrolase
#> 12653      Tetrahydrofolate dehydrogenase/cyclohydrolase, catalytic domain
#> 12724                                    Phosphoribosyl-AMP cyclohydrolase
#> 12727                                   Putative glycosyl hydrolase domain
#> 12854                  Acyl-CoA thioester hydrolase/BAAT N-terminal region
#> 13003                    Alpha/beta hydrolase of unknown function (DUF900)
#> 13019                                         Glycosyl Hydrolase Family 88
#> 13065                    Alpha/beta hydrolase of unknown function (DUF915)
#> 13075                                GDSL-like Lipase/Acylhydrolase family
#> 13085                               Bacteriophage peptidoglycan hydrolase 
#> 13161                        Glycosyl hydrolase family 92 catalytic domain
#> 13247                       Glycosyl hydrolase family 31 C-terminal domain
#> 13258           Cellulose-binding Sde182, nucleoside hydrolase-like domain
#> 13270     Ubiquitin carboxyl-terminal hydrolase 38-like, N-terminal domain
#> 13290                        Rv2525c-like, glycoside hydrolase-like domain
#> 13388                            Glycoside hydrolase 123, catalytic domain
#> 13426                           Glycoside hydrolase family 18, BT1044-like
#> 13471    Glycosyl hydrolase 31 N-terminal galactose mutarotase-like domain
#> 13519                            Glycosyl hydrolase 109, C-terminal domain
#> 13526              Poly (ADP-ribose) glycohydrolase (PARG), helical domain
#> 13529            Arginine dihydrolase ArgZ-like, C-terminal, Rossmann fold
#> 13651                                        Glycosyl hydrolase family 134
#> 13658           Poly (ADP-ribose) glycohydrolase (PARG), Macro domain fold
#> 13875                      Glycosyl hydrolases family 31 TIM-barrel domain
#> 13941    Glycosyl hydrolase family 78 alpha-rhamnosidase N-terminal domain
#> 14018                      N-sulphoglucosamine sulphohydrolase, C-terminal