Family Curator Notes
|Annotation inferences using phylogenetic trees
The goal of the GO Reference Genome Project, described in PMID 19578431, is to provide accurate, complete and consistent GO annotations for all genes in twelve model organism genomes. To this end, GO curators are annotating evolutionary trees from the PANTHER database with GO terms describing molecular function, biological process and cellular component. GO terms based on experimental data from the scientific literature are used to annotate ancestral genes in the phylogenetic tree by sequence similarity (ISS), and unannotated descendants of these ancestral genes are inferred to have inherited these same GO annotations by descent. The annotations are done using a tool called PAINT (Phylogenetic Annotation and INference Tool).
Fast-pass curation of PTHR10516 (FK506 BINDING PROTEIN) family
This family has 424 members, 136 of which are RefGenome proteins. The root is a 4-way polytomy with a major clade descended from AN1 and smaller clades descended from AN731 (17 proteins, including a few from dicty and Arabidopsis; there are several annotations, mostly CC annotations to plant), AN761 (7 unannotated proteins from bacteria through urchin, including 1 from dicty), and AN773 (6 bacterial proteins, plus 3 divergent proteins from chlamy, dicty, and Xenopus). Branch length to each of these 4 clades is consistent and short.
AN1 has a bacterial outgroup (AN703) with a couple of E. coli proteins but no LTP annotations. There is a significant increase in branch length to the polytomy at AN3. Each of the 5 clades of AN3 (AN4, AN130, AN165, AN462, and AN530) seems relatively independent. AN462 is highly divergent and unannotated; all annotations to this clade should be blocked, so I "pruned" AN462. Other than that, all the clades seem well-conserved and stretch from plants to humans. AN4, AN165, and AN530 have multiple duplications and sub-clades.
-There are multiple annotations to GO:0003755 "peptidyl-prolyl cis-trans isomerase activity" in all three annotated clades off the root. Except for the AN462 clade, almost every protein is inferred to be a peptidyl-prolyl cis-trans isomerase. Propagate 3755 to AN0 and block to AN462.
-Annotations to GO:0005528 "FK506 binding" are found throughout AN3. This note appears for UniProt P0A9K9 (E. coli slyD, descended from AN773): "Folding activity is inhibited by FK506." Propagate 5528 to AN0.
The following terms are all binding terms and clade-specific:
GO:0030551cyclic nucleotide binding
GO:0008270zinc ion binding
GO:0005509calcium ion binding
GO:0016151nickel ion binding
GO:0051082unfolded protein binding
GO:0031072heat shock protein binding
GO:0005160transforming growth factor beta receptor binding
GO:0030544Hsp70 protein binding
GO:0035259glucocorticoid receptor binding
GO:0034713type I transforming growth factor beta receptor binding
Curate these more when doing complete curation of the family.
-Propagate GO:0034704 "calcium channel complex" to AN598.
-There are widespread annotations to children of GO:0016020 "membrane." Propagate to AN0 and block to AN462. Look for more specific terms.
-Propagate GO:0005740 "mitochondrial envelope" and GO:0030176 "integral to endoplasmic reticulum membrane" to AN169.
-Propagate GO:0033017 "sarcoplasmic reticulum membrane" to AN571 based on annotation to both mouse paralogs. Expand to AN568 to cover all vertebrates.
-Propagate GO:0005783 "endoplasmic reticulum" to AN10 based on 2 annotations (1 HTP). Also propagate to AN231. Also propagate to AN567 based on annotations to both mouse paralogs and expanded to include all chordates.
-Propagate GO:0030424 "axon" to AN572 based on annotations in rat and chicken.
-Propagate GO:0005730 "nucleolus" to AN130 based on numerous annotations to this term and to its parent "nucleus."
-Propagate GO:0005634 "nucleus" to AN364 based on 2 annotations. Do not choose "nucleolus" b/c (1) there are no annotations to it, and (2) there is a NOT annotation to it. Expand to AN347 to include chicken but not the FKBP5 clade. Allow to propagate into the unannotated FKBP-like clade (AN351) despite divergence b/c there are no annotations there.
-Propagate GO:0005634 "nucleus" to AN536 based on mouse.
-Propagate GO:0005829 "cytosol" to AN572 based on 2 annotations.
-There's only 1 annotation to GO:0018208 "peptidyl-proline modification," but we propagated 3755 to AN0 in MF. Propagate 18208 to AN0.
-Propagate GO:0000412 "histone peptidyl-prolyl isomerization" to AN130.
-Propagate GO:0032513 "negative regulation of protein phosphatase type 2B activity" to AN571 based on annotation to both human paralogs.
-Propagate GO:0060314 "regulation of ryanodine-sensitive calcium-release channel activity" to AN571 based on annotation in both human and both mouse proteins.
=Questions for ontology curators=
*'''Question: '''Should GO:0000412 "histone peptidyl-prolyl isomerization" and its parent GO:0000413 "protein peptidyl-prolyl isomerization" be children of GO:0018208 "peptidyl-proline modification"? (SF # 3220769)
**'''Curator answer: '''seems reasonable to me... I've created the relationship:
peptidyl-proline modification ; GO:0018208
--[isa]protein peptidyl-prolyl isomerization ; GO:0000413
MSL 11 Feb 2011: Submitted
MSL 26 Aug 2011: Added question and answer from wiki