Browsing Expert Curation...

Need help?
Systematic Name Gene Name Motif ID Expert Confidence Dubious? Notes
V YDR174W HMO1 2249 Low   This motif is uncharacteristic for a Sox protein and HMG proteins typically do not bind DNA in a sequence specific manner. Since it is from ChIP data it could be a cofactor motif. Low confidence.
V YOR162C YRR1 2245 High   Classic monomeric GAL4-class motif. PBM studies agree and score significantly on Harbison data. No other motifs have spacing/orientation except 11909958, but even the authors of this study note that "Only half a dyad seems to be conserved in this consensus sequence". 2245 scores highest in Harbison data.
V YLL054C   2242 Medium   Three motifs available, from PBMs; two dimeric GAL4-like motifs but with different spacings and one monomeric. No backup data but looks tidy. Keep all three.
V YDR207C UME6 2239 High   All motifs are similar to each other. BEEML-PBM motif 2239 scores highest across the board.
V YPR009W SUT2 2236 High   Highest-scoring motif (PBM) is a classical GAL4-type monomeric motif and is very significant in ChIP-chip
V YDR169C STB3 2233 High   STB3 binds RRPE element (AAAAATTT) both in vivo and in vitro (PMID 17616518). PBM motifs 810 and 2233 strongly resembles the RRPE element, scores significantly in deletion expression data, and nail the GO categories "nucleolus" and "ribosome biogenesis". 2233 gets slightly higher scores.
V YCR018C SRD1 2232 Medium   PBM studies yield nearly identical motifs. 2232 closely resembles motif from related GATA factors and scores highest overall. This is an unusual motif for the GATA class; hence medium confidence level.
V YKL038W RGT1 2227 High   PBM motif 2227 is very similar to "traditional" motif and to monomeric GAL4 motifs, and scores highest on ChIP-chip data. All PBM motifs are similar.
V YPL133C RDS2 2226 Medium   All motifs contain CGG. PBM motif 2226 appears to be a monomeric version of literature motif 757. However, the paper that produced motif 757 did not demonstrate that this is an optimal binding site. Retain both motifs and give them a "medium" confidence.
V YKL015W PUT3 2223 Medium   Motifs vary considerably. ChIP motif 2065 is a dimeric/(trimeric?) GAL4-like site, and has the highest correspondence to ChIP-chip data (from which it is derived) and some correspondence to expression data (although it is not strong). PBM motif 2223 is a monomeric GAL4-like motif and has higher correspondence to expression data, albeit weaker (but still good) correspondence to ChIP-chip data. It is possible that the actual sequence preference is some other arrangement of monomeric sites that were not picked up in either assay - score as medium confidence.
V YFR034C PHO4 2222 High   Almost all motifs match classic HLH E-box. PBM motif 2222 has highest match to both ChIP-chip and expression data, without being circular.
V YER088C DOT6 2221 High   PBM motif 812 most closely resembles that of homolog TOD6, which is well-supported; has highest correlation to both ChIP and expression data.
V YPL248C GAL4 2206 High   ChIP-chip motif 1510 resembles literature motif, and PBM motif 875, but scores highly on ChIP and expression data, across the board. Note, however, that the high ChIP-chip scores stem from an experiment with high negative correlation. PBM motif 2206 appears to be a monomeric version, socres even higher on ChIP-chip and expression.
V YPR104C FHL1 2203 High   ChIP-chip motifs are all Rap1. PBMs identify a different motif which also corresponds to ChIP-chip data. Selected 2203 as it scores highest on ChIP-chip and expression data.
V YML081W   2194 High   PBM motifs are a classical C2H2 motif that match each other and have some correspondence to ChIP-chip data. 2194 has highest correspondence to ChIP chip.
V YKL222C   2192 High   Two motifs from PBMs resemble monomeric GAL4-like motif. 2192 agrees best with ChIP-chip data and expression data.
V YGR067C   2191 High   PBM motif is a classical C2H2 motif that has good correspondence to ChIP-chip data. 2191 corresponds best and has fewer empty columns in the PWM.
V YBR239C ERT1 2188 Medium   Three PBM motifs are all classic monomeric GAL4 motifs. Chose 2188 because it has fewer noninformative flanking positions, and higher significance on expression data. Also, 826 has the CCGG core that I suspect may be an artefact of PBMs or the DBD clones used in these studies. The highest-scoring ChIP motif is circular and does not resemble a GAL4 class binding site.
V YML007W YAP1 2186 High   PBM motif 2186 looks like a monomeric bZIP site but it has the highest scores on both ChIP and expression
V YBR150C TBS1 2179 High   Two motifs from PBMs are nearly identical GAL4-class motifs with defined spacing and orientation. Motif 552 has slightly higher scores. Two motifs from BEEML analysis of PBM data give monomeric motif - also give this high confidence.
V YPL128C TBF1 2178 High   All motifs, obtained by three different means, are all very similar, although there is no ChIP or expression support for any of them. Went with 2178, which is the BEEML output.
V YHR006W STP2 2174 High   STP1 and 2 have very similar DNA-binding domains. However, they are not similar to those of STP3 and 4. PBM motif for STP2 (2174) correlates highest with ChIP-chip and expression data. ChIP-chip motif for STP1 (660) most strongly resembles motif 800, and scores highly on ChIP-chip data. In addition, these motifs resemble halfmers of literature-derived binding sites.
V YHR056C RSC30 2164 Medium   Arbitrary choice - all PBM motifs look similar (and resemble motif from homolog Rsc3). I have downgraded this one from high to medium because the best scoring motif actually looks the least like the Rsc3 motif.
V YOR380W RDR1 2158 High   All motifs are related except 1851. PBM motif 2158 is monomeric and has highest correspondence to ChIP-chip data. The literature motif 756 consists of two back-to-back and slightly overlapping versions of the monomeric PBM motif. There is no evidence for direct binding in this specific spacing and orientation; however, the results of mutations in reporters indicate that both copies are necessary for induction in the mutant. Retain both motifs.
V YDL106C PHO2 2154 High   Motifs are largely all different from each other. PBM motif 2154 scores highly on ChIP data and resembles classic TAAT homeobox core. Note that PBM motif 794 even more strongly resembles homeobox (TAATTA) but scores slightly less highly.
V YKL043W PHD1 2153 High   High-scoring motifs are all similar, with characteristic APSES GC core and palindromic. PBM motifs score highest on ChIP-seq data, while ChIP-chip motif 393 (which contains flanking G/C residues) scores highest on expression data. Retain both - possibly, the rest of the protein contributes to binding flanking residues. This is the higher-scoring PBM motif (2153).
V YDR043C NRG1 2148 High   PBM, ChIP-chip, and literature motifs all appear very similar, and resemble motif for the related protein NRG2. Choose top PBM motif (2148). There is also a recurring ChIP-chip motif (TGTGCCT) which I believe is actually the MOT3 binding site.
V YER028C MIG3 2144 High   PBM motif 2144 has highest correspondence to ChIP-chip data
V YGL209W MIG2 2143 High   PBM motif 2143 has highest correspondence to ChIP-chip data
V YGL035C MIG1 2142 High   PBM motif 2142 has highest correspondence to ChIP-chip AND AUC for GO category "generation of precursor metabolites and energy". The adjacent A/T stretch, which is also noted in the literature, is found in ChIP-chip motif 654 and others; however, that motif does not sort as well for GO category "generation of precursor metabolites and energy" and also scores lower for both ChIP and expression, so it seems unlikely to represent a key intrinsic activity of the protein itself.
V YGR249W MGA1 2141 Medium   PBM motif 2141 is similar to Hsf1 motif 476 (TTCCA). Has TTC "core" which is shared by most Hsf1 motifs. Scores reasonably on ChIP data but no other supporting information; hence "medium".
V YDR253C MET32 2140 High   Most motifs look similar. PBM motif 2140 has highest correspondence to both ChIP and expression.
V YDL056W MBP1 2138 High   Almost all motifs look similar to literature binding site. PBM motif 2138 scores at the top on ChIP-chip and expression. And is non-circular.
V YLR451W LEU3 2135 High   Most motifs look similar - dimeric GAL4 motif. Literature motif (781) has high correspondence to ChIP-chip and expression data and is not circular. But, PBM motif 2135, which is a monomeric GAL4 motif, scores highest on both ChIP-chip and expression data.
V YOL089C HAL9 2134 High   PBM motifs 799 and 2134 score highest on ChIP-chip data; classic dimeric and monomeric GAL4 sites, respectively.
V YJL110C GZF3 2133 High   Classic GATA motif 2133 from PBM scores highest on ChIP-chip and expression data
V YLR013W GAT3 2128 High   All PBM motifs look similar, also similar to a subset of other GATAs. 2128 scores quite highly on ChIP-chip (albeit with negative correlation!), and also higher on expression and OE data.
V YLR228C ECM22 2122 High   PBM motif 2122 is a monomeric GAL4 class motif, and scores highest on both ChIP and expression ata. 849 is a classic dimeric GAL4 motif with lower but still reasonable scores and is moderately predictive across the board.
V YPL177C CUP9 2121 High   MITOMI and PBM motifs are similar. PBM motif 2121 has slightly lower correspondence to ChIP data, but more significant correspondence to expression data.
V YLR098C CHA4 2120 High   Two PBM motifs agree, and PBM motif 2120 has highest correspondence to ChIP-chip data, even highter than the best ChIP-chip motif. Has a GAL4-like appearance, albeit a variant. Monomeric. (Highest scoring motif - 1607 - is actually a Rap1 motif).
V YIL130W ASG1 2116 Medium   Two PBM motifs appear to represent monomeric and dimeric versions of the same motif. This is the monomeric version. No other supporting data; hence medium confidence. Picked 2116 because it has a higher GO score and expression score.
V YDR421W ARO80 2115 High   PBM motif 2115 appears monomeric and has highest correspondence to ChIP-chip data. ChIP motif 1509 appears dimeric and correlates with ChIP data. Literature motif 725 appears trimeric and has experimental support. Retain all three.
V YLR278C   2112 High   Only 2112 (from PBMs) stands out; dimeric GAL4 motif with high score on ChIP-chip.
V YMR019W STB4 2107 High   PBM motif 2107 is clearly a dimeric GAL4-class motif, and it blows all the other motifs out of the water.
V YCL067C HMLALPHA2 2102 Medium   Protein is similar to PBX/MEIS/TGIF; both PBM motifs have some similarity (central ACA/TGT), so do sites in crystal and in vivo (e.g. PMID: 1682054) but no clear winner between the two. Keep both PBM motifs in curated set (2102 and 2079) but give medium confidence - no supporting ChIP or expression data.
V YDR423C CAD1 2098 High   Classic YAP motif in most cases. Include examples of both overlapping and adjacent monomeric sites - there are examples of both in PBM data and they both score highly on ChIP data. This one is adjacent.
V YJL056C ZAP1 2097 High   Most motifs are similar but do not exceed confidence thresholds on any data type. PBM motif 2097 has highest score for ChIP and expression, and is not circular
V YER184C   2095 Medium   One motif from PBMs is a monomeric GAL4-like motif and the other is dimeric. Medium confidence because there is little independent support, and both contain the CCGG core that I believe may be an artifact. However, both score significantly on ChIP-chip data. Only 512 is significant on expression data.
V YER064C   2094 Medium   PBM motif has high score on GO because it looks a lot like Gcn4
V YBR033W EDS1 2093 High   PBM and ChIP-chip motifs are very similar. PBM motif 2093 scores most significantly on ChIP data. Classic GAL4 class motif.
V YIL056W VHR1 2091 Medium   PBM motif has high score on GO because it looks a lot like Gcn4
V YMR070W MOT3 2080 Medium   PBM motif 2080 is very similar to the literature motif and scores highest on expression data. Moreover, this motif explains high-scoring ChIP-chip motifs for many other TFs, e.g. Nrg1, Yap6, Sok2
V YCL067C HMLALPHA2 2079 Medium   Protein is similar to PBX/MEIS/TGIF; both PBM motifs have some similarity (central ACA/TGT), so do sites in crystal and in vivo (e.g. PMID: 1682054) but no clear winner between the two. Keep both PBM motifs in curated set (2102 and 2079) but give medium confidence - no supporting ChIP or expression data.
V YLR256W HAP1 2078 High   Literature binding site is direct CGG repeats with a 6bp spacer (PMID: 7958882). PBM motif 2078 gets this; it scores highest overall, including significant scores on both ChIP-chip and expression.
V YDR423C CAD1 2073 High   Classic YAP motif in most cases. Include examples of both overlapping and adjacent monomeric sites - there are examples of both in PBM data and they both score highly on ChIP data. This one is overlapping.
V YPL075W GCR1 2071 High   Gcr2 is not a DNA-binding protein. SGD: "Gcr1p is a DNA-binding protein interacting with the consensus sequence CTTCC, whereas Gcr2p interacts with Gcr1p". But, ChIP-chip motif 606 is probably the best Gcr1 motif available (even though it came from Gcr2 ChIP).
V YHR178W STB5 2068 Medium   All motifs have CGG core and most have CGGnG. Most ChIP-derived motifs have no relationship to expression data. Motif 2068 scores highest overall; looks a bit unusual for a Gal4 class motif but also does well on expression data. Retain as potential dimer motif, although it may also incorporate extrinsic information.
V YJL089W SIP4 2067 Medium   PBM motif 573 is a monomeric GAL4-type motif (others appear dimeric) but it has good correspondence to ChIP-chip data. Only a few of the dimeric sites are more significant - the motif from in vivo analysis (PMID: 14685767) does not score as highly as 2067 from ChIP-chip data, but they look very similar. This is 2067, the presumed dimeric site.
V YKL015W PUT3 2065 Medium   Motifs vary considerably. ChIP motif 2065 is a dimeric/(trimeric?) GAL4-like site, and has the highest correspondence to ChIP-chip data (from which it is derived) and some correspondence to expression data (although it is not strong). PBM motif 2223 is a monomeric GAL4-like motif and has higher correspondence to expression data, albeit weaker (but still good) correspondence to ChIP-chip data. It is possible that the actual sequence preference is some other arrangement of monomeric sites that were not picked up in either assay - score as medium confidence.
V YLR014C PPR1 2064 Low   ChIP-chip motif 2064 almost matches the literature site, which has been confirmed by directed experimentation, and scores highest on most measures. But, give it low confidence - it is not at all clear that this is an optimal binding site, and none of the scores for any of the motifs are all that high.
V YBL005W PDR3 2062 Medium   MITOMI yields a simple GAL4 monomeric site that scores well in ChIP-chip data. ChIP-chip yields a dimeric site that resembles the literature site. In vivo, PDR1 and PDR3 may form heterodimers. Retain both. This is the dimeric ChIP-chip motif.
V YAL051W OAF1 2060 Medium   Motif 2060 has a strong resemblance to the literature motifs for the Oaf1-Pip2 dimer, and scores highly on both ChIP and expression data. No in vitro support and it's kind of weak looking so Medium confidence.
V YIL101C XBP1 2039 High   PBM and in vitro selection-derived motifs have highest scores across the board. 842 is higher on GO, but only slightly in AUC, and it has a very large number of empty flanking bases. 2039 (in vitro selection) seems a reasonable compromise - it's highest on ChIP and almost the highest on expression.
V YIL131C FKH1 2002 High   Classic Forkhead motif for most of them. 2002 strongly resembles PBM motif but scores higher on both ChIP (which is circular) and expression (which is not).
V YKL112W ABF1 1993 High   Most motifs are similar, and five have pegged the ChIP P-value. Choose 791- it's the highest scoring overall, and is from PBMs
V YJL127C SPT10 1880 Low   This is the protein that binds histone promoters. The sequence specificity is derived from the histone promoters only so the literature motif may be inaccurate. Motif 1880 has higher scores overall but does not resemble the literature motif. Uncertain what to do here - use 1880, but give low confidence. Motif learned in vivo could contain extrinsic information.
V YPR054W SMK1 1875 Low Dubious I could not find any evidence that this protein binds directly to DNA. There is only one motif derived from ChIP-chip but it bears little relationship to the data from which it was derived.
V YFL031W HAC1 1788 High   1788 is the overall winner. But, literature motif 94 also scores well in ChIP-chip, despite being somewhat different. Possible difference in heterodimerization partners, or proteolytic fragment? Retain both, score 94 as medium.
V YOL028C YAP7 1737 High   7-base bZIP core. Obtained in ChIP-chip studies and higher correspondence to stressed ChIP-chip data. Possible heterodimer? Little literature on this protein. 1737 chosen because it is largely symmetric and has highest score for both stressed and unstressed Harbison data, also, higher GO score
V YLR403W SFP1 1710 Incorrect   Likely represents Rap1 binding site.
V YDL020C RPN4 1700 High   In vitro motifs do not contain the TTT sequence on the end. But they were derived from the DBD only. The rest of the protein may contribute to binding the TTT segment. Motif 1700 has the highest correspondence to ChIP-chip and expression and GO.
V YDL106C PHO2 1680 Incorrect   Likely represents Abf1 binding site.
V YPR104C FHL1 1618 Incorrect   Likely represents Rap1 binding site.
V YLR098C CHA4 1607 Incorrect   Likely represents Rap1 binding site.
V YML065W ORC1 1549 High   Looks like ORC1 motif. Which is not really a TF, but it is a sequence-specific DNA-binding protein.
V YMR021C MAC1 1540 High   Literature motif 1540 most closely most closely corresponds to ChIP-chip data (albeit barely significant). Nothing else to gauge by, but no reason to doubt literature motif.
V YPL248C GAL4 1510 High   ChIP-chip motif 1510 resembles literature motif, and PBM motif 875, but scores highly on ChIP and expression data, across the board. Note, however, that the high ChIP-chip scores stem from an experiment with high negative correlation. PBM motif 2206 appears to be a monomeric version, socres even higher on ChIP-chip and expression.
V YDR421W ARO80 1509 High   PBM motif 2115 appears monomeric and has highest correspondence to ChIP-chip data. ChIP motif 1509 appears dimeric and correlates with ChIP data. Literature motif 725 appears trimeric and has experimental support. Retain all three.
V YML099C ARG81 1507 Incorrect   Likely represents Mcm1 binding site.
V YML099C ARG81 1506 High   ChIP motif 1506 correlates well with ChIP and also with expression data. Resembles dimeric GAL4 class motif.
V YPR104C FHL1 1504 Incorrect   Likely represents Rap1 binding site.
V YOR032C HMS1 1498 Medium   Motif 1498 scores reasonably on ChIP. Other corroborating data are not that convincing - medium confidence.
V YOL067C RTG1 1494 Low   1493 and 1494 are a toss-up and could represent different dimerization partners, conceivably. Similar to 1445 and 1446 above. Retain both but give low confidence.
V YOL067C RTG1 1493 Low   1493 and 1494 are a toss-up and could represent different dimerization partners, conceivably. Similar to 1445 and 1446 above. Retain both but give low confidence.
V YMR042W ARG80 1483 Medium   Motif 1482 is an Arg81 site. 1483, however, is similar to Mcm1. Choose this, give Medium confidence.
V YLR176C RFX1 1478 Medium   Curious case - virtually all motifs are similar in appearance, with a common TGGCAAC core. They range from what appear to be monomers to full dimers, with multiple partial forms. However, none of them scores highly on both ChIP-chip and expression data. Select two representatives: one that scores well on ChIP-chip, and one that scores well on expression. This is the one that scores most highly on ChIP-chip. It is a dimer motif. Give medium confidence, since it has little relationship to expression data.
V YHR124W NDT80 1464 High   Motif 1464 matches literature motifs and PBM motif, and nails sporulation on GO. It also has the highest correspondence to ChIP-chip data.
V YGL073W HSF1 1461 Medium   Four types of motifs contain TTC monomeric core and all score highly on both ChIP and expression. Appear to represent different monomeric/multimeric binding configurations. This is the dimeric head-to-tail site. From ChIP and prior.
V YBR240C THI2 1449 High   This is a GAL4-class protein. All motifs are ChIP-chip derived, none resembles each other. 1449 is the only one with respectable scores on ChIP and expression,and it also has the appearance of a GAL4 class motif..although, the structural prior presumably forces it to have this property.
V YBL103C RTG3 1446 Low   Only the PBM motif is a classic HLH motif. Three different ChIP-chip-derived motifs are all diverse, but all score highly on ChIP-chip data! Are they motifs of other TFs? Check. 602: GCN4; 1095, TEC1; 1096: resembles 602, but is a closer match to CUP9/TOS8. Also hits GCN4. According to the literature (PMID: 9032238) the core binding site for the Rtg1p-Rtg3p heterodimer is 5'-GGTCAC-3'; the only motif that resembles this is 1446. Vague resemblance to 602 and 1096. I am going to retain 1446, which represents the literature site; PBM motif 870, which resembles an E-box, and ChIP-chip motif 1445, which scores highest on ChIP-chip data. But give all low confidence.
V YBL103C RTG3 1445 Low   Only the PBM motif is a classic HLH motif. Three different ChIP-chip-derived motifs are all diverse, but all score highly on ChIP-chip data! Are they motifs of other TFs? Check. 602: GCN4; 1095, TEC1; 1096: resembles 602, but is a closer match to CUP9/TOS8. Also hits GCN4. According to the literature (PMID: 9032238) the core binding site for the Rtg1p-Rtg3p heterodimer is 5'-GGTCAC-3'; the only motif that resembles this is 1446. Vague resemblance to 602 and 1096. I am going to retain 1446, which represents the literature site; PBM motif 870, which resembles an E-box, and ChIP-chip motif 1445, which scores highest on ChIP-chip data. But give all low confidence.
V MATALPHA1-MCM1-dimer alpha1-MCM1-dimer 1442 Medium   Not clear that motif is optimal.
V MATA1-MATALPHA2-dimer a1-alpha2-dimer 1436 Medium   Not clear that motif is optimal.
V YER069W ARG5,6 1426 Medium   Not clear that motif is optimal.
V YPR008W HAA1 1425 Medium   Literature motif is not completely determined, but scores highly on ChIP-chip data. Regardless, medium confidence.
V YCR040W MATALPHA1 1418 Low   According to PMID: 15118075, binds the "Q site" which has "consensus" ACAATGACAG. Seems all that is in common is the CAAT. I believe further study is required.
V YCL058C FYV5 1417 Low   Literature motif is derived from a single promoter and while the protein seems to have some DNA-binding activity, perhaps in conjunction with other TFs, I find the evidence supporting this precise binding site incomplete, since it is derived from a single site. Hence, low confidence in the motif.
V YML113W DAT1 1416 Medium   The literature (e.g. PMID: 8532535) suggests that the sequence specificity may be more promiscuous than the name suggests. To my knowledge there has not been any SELEX or PBM demonstrating that any motif is correct. But, it does bear some relationship to ChIP-chip and expression data.
V YOL028C YAP7 1414 High   8-base bZIP core. Obtained by Mitomi, so this is a homodimer. Higher correspondence to unstressed ChIP-chip data. Little literature on this protein. 1414 chosen for higher ChIP-chip overall scores; plus, it is a palindrome as expected for a bZIP protein.
V YHL009C YAP3 1411 High   Mitomi yields a nearly palindromic 8-mer motif with strong similarity to that of Yap6. PBM motif is similar but appears to be partial.

 Download displayed records: text csv html excel word Page: 1 of 4  Records: 354