Systematic NameGene NameMotif IDExpert ConfidenceDubious?Notes
YPR199CARR1603MediumOnly motif 603 has significant scores with ChIP-chip and expression data; looks somewhat like a YAP motif
YPR196W861HighMotifs from PBMs are very similar and are a variant monomeric GAL4-like motif. Chose 861 as it passes the significance threshold against ChIP-chip data.
YPR186CPZF11321LowThis is a single literature site. The protein almost certainly binds the site but it has not been demonstrated that this is an optimal binding site.
YPR104CFHL11196IncorrectLikely represents Rap1 binding site.
YPR104CFHL12203HighChIP-chip motifs are all Rap1. PBMs identify a different motif which also corresponds to ChIP-chip data. Selected 2203 as it scores highest on ChIP-chip and expression data.
YPR104CFHL1893IncorrectLikely represents Rap1 binding site.
YPR104CFHL11618IncorrectLikely represents Rap1 binding site.
YPR104CFHL1629IncorrectLikely represents Rap1 binding site.
YPR104CFHL1406IncorrectLikely represents Rap1 binding site.
YPR104CFHL11504IncorrectLikely represents Rap1 binding site.
YPR086WSUA71327LowDubiousThis protein is not expected to bind DNA; it is supposed to bind DNA-bound TBP. The TIRF-PBM data used to generate the motif included only 96 sequences.
YPR065WROX11396HighAbout half the motifs have a typical ACAAT Sox core. MITOMI motif 1396 has highest correspondence to both ChIP-chip and deletion expression data.
YPR054WSMK11875LowDubiousI could not find any evidence that this protein binds directly to DNA. There is only one motif derived from ChIP-chip but it bears little relationship to the data from which it was derived.
YPR052CNHP6A879MediumNHP6A and NHP6B are similar to the HMGB family, which is thought to lack sequence specificity. However, the proteins do bend the DNA when they bind, and so may have some level of sequence specificity. Essentially similar motifs were obtained for the two different proteins (in the same study) and the PBM motif for Nhp6A has a good correspondence to ChIP-chip data. Give both Medium confidence.
YPR022C588HighOnly one motif available, from PBMs; classical yeast C2H2 motif, and has some relationship to ChIP-chip data.
YPR015C871HighOnly one motif available, from PBMs; resembles motof from CMR3 which is a paralogous gene (and nearly adjacent on the chromosome). And, scores significantly against expression data.
YPR013CCMR3859HighPBM motifs are very similar. No other supporting data, but it's a clean motif. Chose 859 because it most closely resembles motif from paralog YPR015c.
YPR009WSUT22236HighHighest-scoring motif (PBM) is a classical GAL4-type monomeric motif and is very significant in ChIP-chip
YPR008WHAA11425MediumLiterature motif is not completely determined, but scores highly on ChIP-chip data. Regardless, medium confidence.
YPL248CGAL42206HighChIP-chip motif 1510 resembles literature motif, and PBM motif 875, but scores highly on ChIP and expression data, across the board. Note, however, that the high ChIP-chip scores stem from an experiment with high negative correlation. PBM motif 2206 appears to be a monomeric version, socres even higher on ChIP-chip and expression.
YPL248CGAL41510HighChIP-chip motif 1510 resembles literature motif, and PBM motif 875, but scores highly on ChIP and expression data, across the board. Note, however, that the high ChIP-chip scores stem from an experiment with high negative correlation. PBM motif 2206 appears to be a monomeric version, socres even higher on ChIP-chip and expression.
YPL230WUSV1509HighTwo PBM studies essentially agree on classical C2H2 GGGG-containing motif. Chose 509 because it scores much higher on both ChIP and expression data.
YPL216W0DubiousUnlikely to be true TF.
YPL202CAFT2389HighAll motifs look similar. ChIP-chip motif 389 scores high on ChIP-chip data and also best on expression data.
YPL177CCUP92121HighMITOMI and PBM motifs are similar. PBM motif 2121 has slightly lower correspondence to ChIP data, but more significant correspondence to expression data.
YPL139CUME11143LowDubiousIt is not clear that this is a sequence-specific DNA-binding protein; it contains no DNA-binding domain and has no known in vitro sequence specificity. The motif comes only from ChIP-chip. But, it has a high P-value, and the motif has low similarity to other motifs, with the possible exception of Yox1. But the function of the protein is very different from that of Yox1. Tough call - leave as Dubious, but give Low confidence to motif 1143.
YPL133CRDS2757MediumAll motifs contain CGG. PBM motif 2226 appears to be a monomeric version of literature motif 757. However, the paper that produced motif 757 did not demonstrate that this is an optimal binding site. Retain both motifs and give them a "medium" confidence.
YPL133CRDS22226MediumAll motifs contain CGG. PBM motif 2226 appears to be a monomeric version of literature motif 757. However, the paper that produced motif 757 did not demonstrate that this is an optimal binding site. Retain both motifs and give them a "medium" confidence.
YPL128CTBF12178HighAll motifs, obtained by three different means, are all very similar, although there is no ChIP or expression support for any of them. Went with 2178, which is the BEEML output.
YPL089CRLM11079IncorrectLikely represents Mcm1 binding site.
YPL089CRLM1419MediumMotif 419 has a MADS-like appearance, and scores very highly in ChIP-chip data, despite being derived from the literature. Not much correspondence to expression however, hence Medium confidence. ChIP-chip motif 910 does slightly better on expression but to me is not a credible MADS box binding site.
YPL075WGCR12071HighGcr2 is not a DNA-binding protein. SGD: "Gcr1p is a DNA-binding protein interacting with the consensus sequence CTTCC, whereas Gcr2p interacts with Gcr1p". But, ChIP-chip motif 606 is probably the best Gcr1 motif available (even though it came from Gcr2 ChIP).
YPL049CDIG10DubiousNot a TF - it binds Ste12; all the motifs are Ste12 motifs.
YPL038WMET311370HighMost motifs look similar. MITOMI motif 1370 has highest overall correlation to ChIP-chip, OE, and deletion data.
YPL021WECM23578HighPBM motif 578 strongly resembles that from other yeast GATA-class TFs
YPL016WSWI10DubiousUnlikely to be true TF.
YOR380WRDR12158HighAll motifs are related except 1851. PBM motif 2158 is monomeric and has highest correspondence to ChIP-chip data. The literature motif 756 consists of two back-to-back and slightly overlapping versions of the monomeric PBM motif. There is no evidence for direct binding in this specific spacing and orientation; however, the results of mutations in reporters indicate that both copies are necessary for induction in the mutant. Retain both motifs.
YOR380WRDR1756MediumAll motifs are related except 1851. PBM motif 2158 is monomeric and has highest correspondence to ChIP-chip data. The literature motif 756 consists of two back-to-back and slightly overlapping versions of the monomeric PBM motif. There is no evidence for direct binding in this specific spacing and orientation; however, the results of mutations in reporters indicate that both copies are necessary for induction in the mutant. Retain both motifs.
YOR372CNDD1366IncorrectLikely represents Mcm1 binding site.
YOR372CNDD10DubiousThere is no evidence that this is a sequence-specific DNA-binding protein, either in vitro or in vivo or in its sequence. The ChIP-chip motif that scores most highly is actually an MCM1 motif, which is consistent with the role of NDD1 as a "transcriptional activator essential for nuclear division".
YOR363CPIP20See Oaf1-Pip2-dimer
YOR358WHAP5695HighSubunit of the heme-activated, glucose-repressed Hap2/3/4/5 CCAAT-binding complex - there should be a single motif for all four proteins, containing CCAAT. ChIP-chip motif 695 resembles CCAATCA, and scores highly on ChIP-chip, OE, and deletion expression data.
YOR344CTYE7397HighAll studies except one get canonical HLH motif. 795 (PBM) is nearly tied for best ChIP-chip score with the best ChIP-chip motif. Still, ChIP motif 397 scores higher, and looks identical, but with fewer flanking empty positions.
YOR337WTEA1817MediumThree motifs, all from PBMs. Choose 817 because it has a more robust GAL4 "CGG" core. But there is no convincing corroborating data for either motif and they do not match each other.
YOR308CSNU660DubiousUnlikely to be true TF.
YOR304WISW20DubiousUnlikely to be true TF.
YOR298C-AMBF10DubiousThis is a coactivator. I found no evidence that it is a sequence-specific TF.
YOR290CSNF20DubiousUnlikely to be true TF.
YOR230WWTM11148LowDubiousIt is not clear that this is a sequence-specific DNA-binding protein; it contains no DNA-binding domain and has no known in vitro sequence specificity. The motifs come only from ChIP-chip so scoring on ChIP-chip is circular.
YOR229WWTM20DubiousIt is not clear that this is a sequence-specific DNA-binding protein; it contains no DNA-binding domain and has no known in vitro sequence specificity. The motif comes only from ChIP-chip so scoring on ChIP-chip is circular.
YOR172WYRM1813HighTwo PBM studies largely agree on classic GAL4-class monomeric motif. Motif 813 has indications of spacing and orientation of dimeric protein.
YOR162CYRR12245HighClassic monomeric GAL4-class motif. PBM studies agree and score significantly on Harbison data. No other motifs have spacing/orientation except 11909958, but even the authors of this study note that "Only half a dyad seems to be conserved in this consensus sequence". 2245 scores highest in Harbison data.
YOR156CNFI10DubiousUnlikely to be true TF.
YOR140WSFL1605MediumNone of the motifs are highly related to each other. But, most share a GAAG core and are otherwise A-rich. The PBM motif 839 in particular is compatible with the putative binding sites that are mutated in PMID 17594096, and it scores well on ChIP-chip. Other motifs may represent different multimerization configurations. ChIP-chip motif 605 also scores well on ChIP-chip data, which is circular, but I will retain it for completeness.
YOR140WSFL1839MediumNone of the motifs are highly related to each other. But, most share a GAAG core and are otherwise A-rich. The PBM motif 839 in particular is compatible with the putative binding sites that are mutated in PMID 17594096, and it scores well on ChIP-chip. Other motifs may represent different multimerization configurations. ChIP-chip motif 605 also scores well on ChIP-chip data, which is circular, but I will retain it for completeness.
YOR113WAZF1499HighPBM motif 499 scores as well as the ChIP-chip motifs, but without the circularity. No significant data except ChIP-chip, however.
YOR077WRTS20DubiousHomolog of Kin17; not a typical C2H2 zinc finger. Believed to be "chromatin-associated proteins involved in UV response and DNA replication". No evidence for sequence-specific DNA-binding. Single ChIP-chip motif does not have strong correspondence to the data from which it is derived.
YOR038CHIR20DubiousHir1,2,3 are a nucleosome assembly complex, not TFs
YOR032CHMS11498MediumMotif 1498 scores reasonably on ChIP. Other corroborating data are not that convincing - medium confidence.
YOR028CCIN51349MediumMost motifs match the classic YAP motif. This is the best in vitro motif (highest match to ChIP-chip) and it is different from the ChIP-based motifs - might reflect homo vs. heterodimer?
YOR028CCIN5409HighMost motifs match the classic YAP motif. This is the best in vivo motif (highest match to ChIP-chip).
YOL116WMSN11378HighMITOMI motif 1376 has the highest correspondence to ChIP-chip. MITOMI motif 1378 is very close, however, and seems to be a circular permutation. Retain both motifs.
YOL116WMSN11376HighMITOMI motif 1376 has the highest correspondence to ChIP-chip. MITOMI motif 1378 is very close, however, and seems to be a circular permutation. Retain both motifs.
YOL108CINO4713HighIno2/4 binds as a heterodimer, so there should just be one motif for the two proteins. All motifs appear similar but none of them is derived from in vitro data. Nonetheless most motifs match a classic E-box with some preference for flanking bases. Motif 713 is derived from ChIP-chip; it is not the highest-scoring ChIP-chip motif but it is highest for OE and deletion expression.
YOL089CHAL9799HighPBM motifs 799 and 2134 score highest on ChIP-chip data; classic dimeric and monomeric GAL4 sites, respectively.
YOL089CHAL92134HighPBM motifs 799 and 2134 score highest on ChIP-chip data; classic dimeric and monomeric GAL4 sites, respectively.
YOL067CRTG11494Low1493 and 1494 are a toss-up and could represent different dimerization partners, conceivably. Similar to 1445 and 1446 above. Retain both but give low confidence.
YOL067CRTG11493Low1493 and 1494 are a toss-up and could represent different dimerization partners, conceivably. Similar to 1445 and 1446 above. Retain both but give low confidence.
YOL028CYAP71737High7-base bZIP core. Obtained in ChIP-chip studies and higher correspondence to stressed ChIP-chip data. Possible heterodimer? Little literature on this protein. 1737 chosen because it is largely symmetric and has highest score for both stressed and unstressed Harbison data, also, higher GO score
YOL028CYAP71414High8-base bZIP core. Obtained by Mitomi, so this is a homodimer. Higher correspondence to unstressed ChIP-chip data. Little literature on this protein. 1414 chosen for higher ChIP-chip overall scores; plus, it is a palindrome as expected for a bZIP protein.
YNR063W804HighMotifs from PBMs are virtually identical. This is a monomeric GAL4-like motif. 804 agrees more with ChIP-chip data.
YNR054CESF20DubiousThis is supposed to be a ribosome biogenesis factor. I found no evidence that it is a sequence-specific DNA-binding protein.
YNL314WDAL82690HighPBM and ChIP-chip motifs agree; select ChIP-chip as it scores higher on ChIP-chip although the extra A's on the side could be either due to the FL protein or some other in vivo factor.
YNL309WSTB10DubiousNo direct evidence that this is a DNA-binding protein. It binds Swi6 and the ChIP motifs all resemble Swi4 binding sites.
YNL257CSIP30DubiousSip3 is a protein that "transcription through interaction with DNA-bound Snf1p" (SGD); no DNA-binding domain and no evidence for direct interaction with DNA or intrinsic sequence specificity.
YNL227CJJJ10DubiousUnlikely to be true TF.
YNL216WRAP1254HighMost motifs look similar. ChIP-chip motif 254 has highest correspondence to expression data.
YNL199CGCR20DubiousGcr2 is not a DNA-binding protein. SGD: "Gcr1p is a DNA-binding protein interacting with the consensus sequence CTTCC, whereas Gcr2p interacts with Gcr1p". But, ChIP-chip motif 606 is probably the best Gcr1 motif available (even though it came from Gcr2 ChIP).
YNL167CSKO11401HighThe MITOMI motif 1401 is an offset and asymmetric version of the traditional consensus (TGACGTCA) but has a higher ChIP-chip and expression correspondence than the motifs that are more symmetric.
YNL139CTHO2786LowDubiousIt is not clear that this is a sequence-specific DNA-binding protein; it contains no DNA-binding domain and has no known in vitro sequence specificity. The motif comes only from ChIP-chip.
YNL132WKRE330DubiousThis is supposed to be a ribosome biogenesis factor. I found no evidence that it is a sequence-specific DNA-binding protein.
YNL103WMET40My understanding is that Met4 is a modifier of the specificity of other proteins. SGD states that it "requires different combinations of the auxiliary factors Cbf1p, Met28p, Met31p and Met32p". ChIP-chip motifs 1023 and 1024 I believe are cofactor motifs; they are E-boxes. ChIP-chip motif 689 is different and matches Met28 and Met32 motifs. (CTGTGG core). Met28 is a bZIP protein, and Met32 is a C2H2. MITOMI motif for Met32 is TGTGG. So this is the Met32 motif. I do not believe that any of the Met4 motifs is correct. Need to obtain motifs for complexes.
YNL079CTPM10DubiousUnlikely to be true TF.
YNL068CFKH2830HighMost motifs are classic Forkhead. PBM motif 830 is one of the highest scoring and is not circular.
YNL039WBDP10DubiousUnlikely to be true TF.
YNL027WCRZ1516HighPBM motif 516 scores highest on ChIP and expression; resembles classic literature motifs
YNL023CFAP10DubiousUnlikely to be true TF.
YMR280CCAT833MediumNear-classic dimeric GAL4 motif. Literature-based. Not clear this is an optimal site but it does bind. Seems to hit the right GO category.
YMR213WCEF10DubiousUnlikely to be true TF.
YMR182CRGM1531HighPBM motif 531 looks like a C2H2 motif (row of G's), and scores well on both ChIP-chip and deletion expression data.
YMR176WECM50DubiousUnlikely to be true TF.
YMR172WHOT10DubiousUnlikely to be true TF.
YMR168CCEP3524HighTwo PBM motifs agree. Went with 524 because it appears neater. No other supporting data for any of them.
YMR164CMSS11204LowDubiousThere is no evidence that this is a sequence-specific DNA-binding protein, rather than a cofactor. The motif has a limited relationship to ChIP-chip data. The literature motif scores better than the motif derived from the ChIP-chip study. Also, the motif is identical to that for FLO8.
YMR075WRCO11066LowDubiousThere is no evidence that this is a sequence-specific DNA-binding protein rather than a chromatin factor. The higher-scoring ChIP-chip motif appears to have low information content and does not display strong correspondence to the data it was generated from or to expression data.
YMR072WABF2541MediumDubiousProtein is not expected to be sequence specific. But motif is obtained in vitro. May need further investigation. Give medium confidence, but label as dubious.
YMR070WMOT32080MediumPBM motif 2080 is very similar to the literature motif and scores highest on expression data. Moreover, this motif explains high-scoring ChIP-chip motifs for many other TFs, e.g. Nrg1, Yap6, Sok2
YMR053CSTB2710LowDubiousNo direct evidence that this is a DNA-binding protein. Three ChIP-derived motifs but none scores highly by any measure. Motif 710 is an arbitrary choice - looks tidy.
YMR053CSTB2710IncorrectLikely represents Reb1 binding site.
YMR043WMCM1831HighMost motifs resemble a classic SRF site. PBM motif 831 scores highly across the board, except for expression data where none does well, and its scores are non-circular.
YMR042WARG801483MediumMotif 1482 is an Arg81 site. 1483, however, is similar to Mcm1. Choose this, give Medium confidence.
YMR037CMSN21380HighMITOMI motif 1380 has the highest overall correspondence to ChIP-chip, overexpression, and deletion data. Resembles classic Msn2/4 motif.
YMR021CMAC11540HighLiterature motif 1540 most closely most closely corresponds to ChIP-chip data (albeit barely significant). Nothing else to gauge by, but no reason to doubt literature motif.
YMR019WSTB42107HighPBM motif 2107 is clearly a dimeric GAL4-class motif, and it blows all the other motifs out of the water.
YMR016CSOK2404HighChIP-chip motif 404 has highest correspondence to both ChIP-chip and expression data - and strongly resembles PBM motif
YML113WDAT11416MediumThe literature (e.g. PMID: 8532535) suggests that the sequence specificity may be more promiscuous than the name suggests. To my knowledge there has not been any SELEX or PBM demonstrating that any motif is correct. But, it does bear some relationship to ChIP-chip and expression data.
YML099CARG811507IncorrectLikely represents Mcm1 binding site.
YML099CARG811506HighChIP motif 1506 correlates well with ChIP and also with expression data. Resembles dimeric GAL4 class motif.
YML081W2194HighPBM motifs are a classical C2H2 motif that match each other and have some correspondence to ChIP-chip data. 2194 has highest correspondence to ChIP chip.
YML076CWAR1325LowNone of the motifs are convincing, but at least sequences with the literature motif have been experimentally confirmed to bind the protein (even if it is not shown that this is the optimal binding site)
YML065WORC11549HighLooks like ORC1 motif. Which is not really a TF, but it is a sequence-specific DNA-binding protein.
YML051WGAL800DubiousGal80 is not a sequence-specific DNA-binding protein
YML027WYOX1498HighTwo PBM studies and Pramila et al. (PMID 12464633) agree on classic homeodomain TAATTA motif. All three correlate with expression change and OE. Motif 453 is not a direct measurement so choose PBM motif that is the same length as the typical homeodomain footprint - 498 also correlates best with OE data; expression scores are skewed low by the large number of cell-cycle measurements.
YML007WYAP12186HighPBM motif 2186 looks like a monomeric bZIP site but it has the highest scores on both ChIP and expression
YLR451WLEU3781HighMost motifs look similar - dimeric GAL4 motif. Literature motif (781) has high correspondence to ChIP-chip and expression data and is not circular. But, PBM motif 2135, which is a monomeric GAL4 motif, scores highest on both ChIP-chip and expression data.
YLR451WLEU32135HighMost motifs look similar - dimeric GAL4 motif. Literature motif (781) has high correspondence to ChIP-chip and expression data and is not circular. But, PBM motif 2135, which is a monomeric GAL4 motif, scores highest on both ChIP-chip and expression data.
YLR442CSIR30DubiousThere is no evidence that the SIR proteins are sequence-specific DNA-binding proteins. Most of the motifs for them are Rap1 sites.
YLR403WSFP1797HighMost ChIP-seq studies identified the Rap1 motif. PBM motif 797 is less significant by ChIP-seq (although still highly significant) but is the winner across the board for all types of expression data.
YLR403WSFP1621IncorrectLikely represents Rap1 binding site.
YLR403WSFP11710IncorrectLikely represents Rap1 binding site.
YLR403WSFP1357IncorrectLikely represents Rap1 binding site.
YLR403WSFP11100IncorrectLikely represents Rap1 binding site.
YLR375WSTP3568MediumSTP3 and 4 have very similar DNA-binding domains. However, they are not similar to those of STP1 and 2; the next most closely related are SWI5 and ACE2, with major differences in the recognition alpha helices. All of the STP4 motifs are different from each other and none have any supporting data. There is only one motif for STP3 (568) from PBM and it matches the STP4 motif from the same study (559) which is the basis for choosing these two motifs.
YLR278C2112HighOnly 2112 (from PBMs) stands out; dimeric GAL4 motif with high score on ChIP-chip.
YLR266CPDR8528MediumBoth motifs are equally credible but have very limited support. Literature motif is related to that of YRR1 literature motif. PBM motif, however, is a classic GAL4 monomer. This is the PBM motif.
YLR266CPDR8244MediumBoth motifs are equally credible but have very limited support. Literature motif is related to that of YRR1 literature motif. PBM motif, however, is a classic GAL4 monomer. This is the literature motif.
YLR256WHAP12078HighLiterature binding site is direct CGG repeats with a 6bp spacer (PMID: 7958882). PBM motif 2078 gets this; it scores highest overall, including significant scores on both ChIP-chip and expression.
YLR254CNDL10DubiousUnlikely to be true TF.
YLR228CECM22849HighPBM motif 2122 is a monomeric GAL4 class motif, and scores highest on both ChIP and expression ata. 849 is a classic dimeric GAL4 motif with lower but still reasonable scores and is moderately predictive across the board.
YLR228CECM222122HighPBM motif 2122 is a monomeric GAL4 class motif, and scores highest on both ChIP and expression ata. 849 is a classic dimeric GAL4 motif with lower but still reasonable scores and is moderately predictive across the board.
YLR223CIFH10DubiousCofactor of Fhl1p. No evidence for sequence-specific DNA-binding.
YLR211C0DubiousUnlikely to be true TF.
YLR182WSWI60DubiousSwi6 is a cofactor, not a DNA-binding protein. These motifs are for Mbp1 or Swi4.
YLR176CRFX1496MediumCurious case - virtually all motifs are similar in appearance, with a common TGGCAAC core. They range from what appear to be monomers to full dimers, with multiple partial forms. However, none of them scores highly on both ChIP-chip and expression data. Select two representatives: one that scores well on ChIP-chip, and one that scores well on expression. This is the one that scores most highly on expression (close 2nd on deletion and 1st on overexpression). It is the only purely monomeric motif. Give medium confidence, since according to the literature this protein should bind as a dimer.
YLR176CRFX11478MediumCurious case - virtually all motifs are similar in appearance, with a common TGGCAAC core. They range from what appear to be monomers to full dimers, with multiple partial forms. However, none of them scores highly on both ChIP-chip and expression data. Select two representatives: one that scores well on ChIP-chip, and one that scores well on expression. This is the one that scores most highly on ChIP-chip. It is a dimer motif. Give medium confidence, since it has little relationship to expression data.
YLR131CACE21332HighHighest-scoring ChIP-chip motif is Rap1 site. MITOMI motif 1332 is next, and resembles the classic Swi5/Ace2 motif.
YLR131CACE2918IncorrectLikely represents Rap1 binding site.
YLR113WHOG10DubiousThis is a signalling molecule that associates with many TFs (see SGD)
YLR098CCHA41607IncorrectLikely represents Rap1 binding site.
YLR098CCHA42120HighTwo PBM motifs agree, and PBM motif 2120 has highest correspondence to ChIP-chip data, even highter than the best ChIP-chip motif. Has a GAL4-like appearance, albeit a variant. Monomeric. (Highest scoring motif - 1607 - is actually a Rap1 motif).
YLR014CPPR12064LowChIP-chip motif 2064 almost matches the literature site, which has been confirmed by directed experimentation, and scores highest on most measures. But, give it low confidence - it is not at all clear that this is an optimal binding site, and none of the scores for any of the motifs are all that high.
YLR013WGAT32128HighAll PBM motifs look similar, also similar to a subset of other GATAs. 2128 scores quite highly on ChIP-chip (albeit with negative correlation!), and also higher on expression and OE data.
YLL054C816MediumThree motifs available, from PBMs; two dimeric GAL4-like motifs but with different spacings and one monomeric. No backup data but looks tidy. Keep all three.
YLL054C2242MediumThree motifs available, from PBMs; two dimeric GAL4-like motifs but with different spacings and one monomeric. No backup data but looks tidy. Keep all three.
YLL054C526MediumThree motifs available, from PBMs; two dimeric GAL4-like motifs but with different spacings and one monomeric. No backup data but looks tidy. Keep all three.
YKR101WSIR10DubiousThere is no evidence that the SIR proteins are sequence-specific DNA-binding proteins. Most of the motifs for them are Rap1 sites.
YKR099WBAS1402HighVirtually all motifs are similar, with GAGTCA core. ChIP motif 402 has highest correspondence to both ChIP-chip and expression data.
YKR064WOAF30I do not see how either of these motifs could possibly be a Gal4-class binding motif. And, there is no correspondence to any of the data, even the ChIP-chip data from which it is derived.
YKR034WDAL801355MediumMITOMI motif 1355 has highest correspondence to ChIP-chip. But it's not striking..none of them are, despite the fact that this is a classic GATA site (GATAAG).
YKL222C2192HighTwo motifs from PBMs resemble monomeric GAL4-like motif. 2192 agrees best with ChIP-chip data and expression data.
YKL185WASH1648IncorrectLikely represents Mcm1 binding site.
YKL185WASH128MediumThe literature motif may not represent the full binding activity of the protein. Also, it is not supported by ChIP-chip. ChIP-chip identifies Mcm1-like motifs. But, it does score highly in both ChIP-chip and expression. The only higher-scoring motif has almost no information content.
YKL185WASH1932IncorrectLikely represents Mcm1 binding site.
YKL112WABF11993HighMost motifs are similar, and five have pegged the ChIP P-value. Choose 791- it's the highest scoring overall, and is from PBMs
YKL109WHAP4695HighSubunit of the heme-activated, glucose-repressed Hap2/3/4/5 CCAAT-binding complex - there should be a single motif for all four proteins, containing CCAAT. ChIP-chip motif 695 resembles CCAATCA, and scores highly on ChIP-chip, OE, and deletion expression data.
YKL072WSTB60DubiousIt is not clear that this is a sequence-specific DNA-binding protein; it contains no DNA-binding domain and has no known in vitro sequence specificity. The motif comes only from ChIP-chip so scoring on ChIP-chip is circular. The ChIP-chip motif looks a little like a Rap1 motif.
YKL062WMSN4518HighPBM motif 518 resembles both the classical MSN motif and the PBM motif, and scores highest on both expression and ChIP-chip.
YKL043WPHD12153HighHigh-scoring motifs are all similar, with characteristic APSES GC core and palindromic. PBM motifs score highest on ChIP-seq data, while ChIP-chip motif 393 (which contains flanking G/C residues) scores highest on expression data. Retain both - possibly, the rest of the protein contributes to binding flanking residues. This is the higher-scoring PBM motif (2153).
YKL043WPHD1393HighHigh-scoring motifs are all similar, with characteristic APSES GC core and palindromic. PBM motifs score highest on ChIP-seq data, while ChIP-chip motif 393 (which contains flanking G/C residues) scores highest on expression data. Retain both - possibly, the rest of the protein contributes to binding flanking residues. This is the ChIP motif that scores highest on expression data.
YKL038WRGT12227HighPBM motif 2227 is very similar to "traditional" motif and to monomeric GAL4 motifs, and scores highest on ChIP-chip data. All PBM motifs are similar.
YKL032CIXR10DubiousBinds cisplatin-modified DNA. HMG domains. ChIP-chip motifs not significant. Dubious and no credible motif.
YKL020CSPT23670LowDubiousI could not find any evidence that this protein binds directly to DNA. It has an IPT domain but no REL domain. None of the ChIP-derived motifs scores highly on ChIP data or anything else. Motif 670 bears some relationship to expression data.
YKL015WPUT32065MediumMotifs vary considerably. ChIP motif 2065 is a dimeric/(trimeric?) GAL4-like site, and has the highest correspondence to ChIP-chip data (from which it is derived) and some correspondence to expression data (although it is not strong). PBM motif 2223 is a monomeric GAL4-like motif and has higher correspondence to expression data, albeit weaker (but still good) correspondence to ChIP-chip data. It is possible that the actual sequence preference is some other arrangement of monomeric sites that were not picked up in either assay - score as medium confidence.
YKL015WPUT32223MediumMotifs vary considerably. ChIP motif 2065 is a dimeric/(trimeric?) GAL4-like site, and has the highest correspondence to ChIP-chip data (from which it is derived) and some correspondence to expression data (although it is not strong). PBM motif 2223 is a monomeric GAL4-like motif and has higher correspondence to expression data, albeit weaker (but still good) correspondence to ChIP-chip data. It is possible that the actual sequence preference is some other arrangement of monomeric sites that were not picked up in either assay - score as medium confidence.
YKL005CBYE10DubiousSGD: "Negative regulator of transcription elongation, contains a TFIIS-like domain and a PHD finger, multicopy suppressor of temperature-sensitive ess1 mutations, probably binds RNA polymerase II large subunit". No evidence this is a sequence-specific TF.
YJR147WHMS2992LowThe one ChIP-chip motif bears little relationship to the ChIP data.it kind of looks like an HNF-like site, but still, low confidence.
YJR140CHIR30DubiousHir1,2,3 are a nucleosome assembly complex, not TFs
YJR127CRSF2575HighNo supporting data, but the PBM motif 575 looks like a typical yeast C2H2 motif (Adr1, which has similar zinc fingers, Mig1, etc).
YJR094CIME10DubiousInteracts with UME6. The only significant motif shares 5/6 bases with the UME6 motif core (GCCGCC)
YJR060WCBF11346HighClassic E-box. MITOMI motif 1346 nearly has highest correspondence to ChIP-chip data and is non-circular; no other supporting data
YJL206C0Seven motifs from ChIP-chip, but none of them corresponds well to ChIP-chip data, and none of them resembles a GAL4 motif. 1169 has a CGG in the middle, but too much flanking information to be credible without further independent support.
YJL176CSWI30DubiousUnlikely to be true TF.
YJL127CSPT101880LowThis is the protein that binds histone promoters. The sequence specificity is derived from the histone promoters only so the literature motif may be inaccurate. Motif 1880 has higher scores overall but does not resemble the literature motif. Uncertain what to do here - use 1880, but give low confidence. Motif learned in vivo could contain extrinsic information.
YJL110CGZF32133HighClassic GATA motif 2133 from PBM scores highest on ChIP-chip and expression data
YJL103CGSM1856MediumOnly PBM motif 856 reaches significance, on expression. Classic GAL4-type monomeric site. No other data, relation to expression not strong. Medium confidence.
YJL089WSIP42067MediumPBM motif 573 is a monomeric GAL4-type motif (others appear dimeric) but it has good correspondence to ChIP-chip data. Only a few of the dimeric sites are more significant - the motif from in vivo analysis (PMID: 14685767) does not score as highly as 2067 from ChIP-chip data, but they look very similar. This is 2067, the presumed dimeric site.
YJL089WSIP4573MediumPBM motif 573 is a monomeric GAL4-type motif (others appear dimeric) but it has good correspondence to ChIP-chip data. Only a few of the dimeric sites are more significant - the motif from in vivo analysis (PMID: 14685767) does not score as highly as 2067 from ChIP-chip data, but they look very similar. This is 573, the presumed monomeric site
YJL056CZAP12097HighMost motifs are similar but do not exceed confidence thresholds on any data type. PBM motif 2097 has highest score for ChIP and expression, and is not circular
YIR023WDAL8153LowNone of the motifs agree with each other. The literature motif characterization was indirect; hence low confidence that this is the true motif. The ChIP-chip motifs score higher on ChIP data but that's circular.
YIR018WYAP5896IncorrectLikely represents Rap1 binding site.
YIR018WYAP5777HighChIP-chip yields a classic 7-mer Yap motif that scores well on ChIP and significantly on expression.
YIR017CMET280Like MET4, component of a complex. SGD: "Basic leucine zipper (bZIP) transcriptional activator in the Cbf1p-Met4p-Met28p complex".."Both Met4p and Met28p bind to DNA only in the presence of Cbf1p, and the presence of Cbf1p and Met4p stimulates the binding of Met28p to DNA (1, 2).". ChIP-chip motif 703 (CTGTGG) is clearly the Met31/32 motif. The other ChIP-chip motif is essentially poly-A, and scores poorly. Hence, neither of these motifs represents the intrinsic sequence specificity of MET28. Need in vitro data for complexes.
YIR013CGAT4565HighTwo PBM motifs look similar, also similar to a subset of other GATAs. 565 scores higher on expression and OE data.
YIL131CFKH12002HighClassic Forkhead motif for most of them. 2002 strongly resembles PBM motif but scores higher on both ChIP (which is circular) and expression (which is not).
YIL130WASG1807MediumTwo PBM motifs appear to represent monomeric and dimeric versions of the same motif. This is the dimeric version. No other supporting data; hence medium confidence.
YIL130WASG12116MediumTwo PBM motifs appear to represent monomeric and dimeric versions of the same motif. This is the monomeric version. No other supporting data; hence medium confidence. Picked 2116 because it has a higher GO score and expression score.
YIL128WMET180DubiousI found no evidence that this is a sequence-specific DNA-binding protein. ChIP-chip motif does not correlate with ChIP-chip data, or anything else.
YIL122WPOG10DubiousUnlikely to be true TF.
YIL119CRPI10DubiousIt is not clear that this is a sequence-specific DNA-binding protein; it contains no DNA-binding domain and has no known in vitro sequence specificity. The motif comes only from ChIP-chip so scoring on ChIP-chip is circular.
YIL101CXBP12039HighPBM and in vitro selection-derived motifs have highest scores across the board. 842 is higher on GO, but only slightly in AUC, and it has a very large number of empty flanking bases. 2039 (in vitro selection) seems a reasonable compromise - it's highest on ChIP and almost the highest on expression.
YIL056WVHR12091MediumPBM motif has high score on GO because it looks a lot like Gcn4
YIL036WCST6585HighPBM motif 585 correlates with expression data (deletion and overexpression). ChIP motif 1466 has higher ChIP score but is lower on expression.
YHR206WSKN7583HighMotifs are remarkably discordant considering that they all resemble each other in being G+C rich and containing a GGCC core. Possibly reflecting different modes of multimerization? Include the two that score highest on independent data: PBM motif 583, which represents a monomer, and ChIP-chip motif 380, which appears to represent a dimer.
YHR206WSKN7380HighMotifs are remarkably discordant considering that they all resemble each other in being G+C rich and containing a GGCC core. Possibly reflecting different modes of multimerization? Include the two that score highest on independent data: PBM motif 583, which represents a monomer, and ChIP-chip motif 380, which appears to represent a dimer.
YHR178WSTB52068MediumAll motifs have CGG core and most have CGGnG. Most ChIP-derived motifs have no relationship to expression data. Motif 2068 scores highest overall; looks a bit unusual for a Gal4 class motif but also does well on expression data. Retain as potential dimer motif, although it may also incorporate extrinsic information.
YHR178WSTB51405HighAll motifs have CGG core and most have CGGnG. Most ChIP-derived motifs have no relationship to expression data. Mitomi motif 1405 and PBM motif 514 score decently on both ChIP-chip and expression data, and seem to nail the GO category (oxidative stress response), and look like classic Gal4 halfmers. MITOMI motif scores slighly higher overall. This is presumably the monomeric motif
YHR124WNDT801464HighMotif 1464 matches literature motifs and PBM motif, and nails sporulation on GO. It also has the highest correspondence to ChIP-chip data.
YHR084WSTE12400HighAll motifs but one resemble the canonical literature site. Motif 400 is derived from ChIP-chip data (on which it scores highest) but also scores highest on expression data.
YHR056CRSC302164MediumArbitrary choice - all PBM motifs look similar (and resemble motif from homolog Rsc3). I have downgraded this one from high to medium because the best scoring motif actually looks the least like the Rsc3 motif.
YHR006WSTP22174HighSTP1 and 2 have very similar DNA-binding domains. However, they are not similar to those of STP3 and 4. PBM motif for STP2 (2174) correlates highest with ChIP-chip and expression data. ChIP-chip motif for STP1 (660) most strongly resembles motif 800, and scores highly on ChIP-chip data. In addition, these motifs resemble halfmers of literature-derived binding sites.
YHL027WRIM101600HighChIP-chip motif 600 is almost identical to PBM motif 513, but scores slightly higher on expression data. Three of six motifs are very similar.
YHL020COPI10DubiousMotifs do not match and do not explain the ChIP-chip data from which they are derived. Motif 1049 resembles the expected UAS-INO (Ino2/4) binding site (CATGTGAAAT) - Opi1 acts as a repressor by binding Ino2. I believe this protein is a corepressor, and Ino2/4 are the DNA-binding factors. Dubious as sequence-specific TF.
YHL009CYAP3672HighChIP-chip yields a classic 7-mer Yap motif that scores well on ChIP and significantly on expression. Could be a heterodimer. Chose 672 over 1463 because it has a higher score on expression data, which is independent.
YHL009CYAP31411HighMitomi yields a nearly palindromic 8-mer motif with strong similarity to that of Yap6. PBM motif is similar but appears to be partial.
YGR288WMAL130None of the ChIP-chip motifs correspond wekk to the data they come from and/or resemble a GAL4 motif.
YGR249WMGA12141MediumPBM motif 2141 is similar to Hsf1 motif 476 (TTCCA). Has TTC "core" which is shared by most Hsf1 motifs. Scores reasonably on ChIP data but no other supporting information; hence "medium".
YGR140WCBF20DubiousUnlikely to be true TF.
YGR097WASK100DubiousI did not find any evidence that this is a sequence-specific DNA-binding protein.
YGR089WNNF20DubiousI did not find any evidence that this is a sequence-specific DNA-binding protein.
YGR071C0DubiousUnlikely to be true TF.
YGR067C2191HighPBM motif is a classical C2H2 motif that has good correspondence to ChIP-chip data. 2191 corresponds best and has fewer empty columns in the PWM.
YGR044CRME1273LowMotif 273 shows similarity to RME response elements (RREs), GTACC(T/A)ACAAAA (in fact it is derived from them). The fact that RME has three C2H2 zinc fingers and also requires an additional C-terminal region for binding in vitro, together with its relatively large footprint, are consistent with such a large binding site. However, I gave this motif a "low" score as there is no systematic analysis in vivo or in vitro indicating that these are really the most preferred sites. It would be valuable to redo the in vitro and in vivo experiments under appropriate conditions.
YGR040WKSS10DubiousThere is no evidence that Kss1 is a sequence-specific TF.
YGR002CSWC40DubiousUnlikely to be true TF.
YGL254WFZF169LowLiterature motif is the only one that appears credible. PBM motif I believe is a known artifact. Literature motif gets low confidence however as it is based on a single known binding sequence.
YGL237CHAP2695HighSubunit of the heme-activated, glucose-repressed Hap2/3/4/5 CCAAT-binding complex - there should be a single motif for all four proteins, containing CCAAT. ChIP-chip motif 695 resembles CCAATCA, and scores highly on ChIP-chip, OE, and deletion expression data.
YGL209WMIG22143HighPBM motif 2143 has highest correspondence to ChIP-chip data
YGL197WMDS30DubiousI found no evidence that this is a sequence-specific DNA-binding protein. ChIP-chip motif does not correlate with ChIP-chip data, or anything else.
YGL192WIME41000LowDubiousI could not find evidence that IME4 is a sequence-specific DNA-binding protein. C3H1 is more typically an RNA-binding domain or something besides nucleic acid binding. There is one significant ChIP-chip motif but perhaps it binds through a cofactor. No other supporting data.
YGL181WGTS1694LowNone of the three motifs resembles an AT-hook binding site. Only one correlates with the ChIP-chip data, but that's circular. Low confidence.
YGL166WCUP248MediumThree motifs account for three possible spacings in the literature motif; it is not clear that this is the optimal site, however
YGL162WSUT1673MediumFour motifs, all derived from ChIP-chip, contain CGG, but are unusual, with degeneracy and a core of CGGGG. Correlate somewhat with both OE and deletion data, however.
YGL133WITC10DubiousUnlikely to be true TF.
YGL131CSNT2612LowDubiousAll three motifs are derived from the same ChIP-chip data. However, there is no corroborating data, and not all SANT domains are DNA-binding - or are non-specific, in chromatin proteins. So it could be a cofactor motif; in fact it is similar to motifs of Stp3 and Stp4. The protein has other chromatin-related domains (BAH, PHD/RING). Hence the "Low" assessment.
YGL096WTOS8494HighNo corroborating data on this TF, and only one PBM motif known and one ChIP motif. But, it resembles TGTCA, which was also obtained for paralog Cup9 by multiple approaches (GTGNCA), as well as PBM results for the Meis/Mrg/Pknox/Tgif family, which are the closest mammalian homologs. The ChIP motif (1902) does not resemble a homeodomain binding sequence, and scores lower on expression data.
YGL073WHSF11461MediumFour types of motifs contain TTC monomeric core and all score highly on both ChIP and expression. Appear to represent different monomeric/multimeric binding configurations. This is the dimeric head-to-tail site. From ChIP and prior.
YGL073WHSF1615MediumFour types of motifs contain TTC monomeric core and all score highly on both ChIP and expression. Appear to represent different monomeric/multimeric binding configurations. This is the trimeric site. From ChIP.
YGL073WHSF1476MediumFour types of motifs contain TTC monomeric core and all score highly on both ChIP and expression. Appear to represent different monomeric/multimeric binding configurations. This is the monomeric site. From PBM.
YGL073WHSF1411MediumFour types of motifs contain TTC monomeric core and all score highly on both ChIP and expression. Appear to represent different monomeric/multimeric binding configurations. This is the spaced direct repeat dimeric site. From ChIP.
YGL071WAFT1658HighMost motifs are similar. Also very similar to AFT2 motifs. ChIP-chip motif 658 scores highest on both ChIP-chip and expression data.
YGL035CMIG12142HighPBM motif 2142 has highest correspondence to ChIP-chip AND AUC for GO category "generation of precursor metabolites and energy". The adjacent A/T stretch, which is also noted in the literature, is found in ChIP-chip motif 654 and others; however, that motif does not sort as well for GO category "generation of precursor metabolites and energy" and also scores lower for both ChIP and expression, so it seems unlikely to represent a key intrinsic activity of the protein itself.
YGL013CPDR1899IncorrectLikely represents Rap1 binding site.
YGL013CPDR1485HighPBM motif 485 looks like a traditional literature motif and has highest correspondence to ChIP and expression data. Dimeric GAL4 motif.
YFR037CRSC80DubiousUnlikely to be true TF.
YFR034CPHO42222HighAlmost all motifs match classic HLH E-box. PBM motif 2222 has highest match to both ChIP-chip and expression data, without being circular.
YFL052W0DubiousPutative zinc-cluster protein.
YFL044COTU11166LowChIP-chip motif 1166 has a good relationship to ChIP-chip data, but it is unusual for C2H2 motifs to be A/T rich, and there is no other support for this motif, so it could be a cofactor, nucleosome-excluding, TATA element, etc. In addition, it only has a single C2H2 domain, and is known to function as a deubiquitylation enzyme. Low confidence.
YFL031WHAC194Medium1788 is the overall winner. But, literature motif 94 also scores well in ChIP-chip, despite being somewhat different. Possible difference in heterodimerazation partners, or proteolytic fragment? Retain both.
YFL031WHAC11788High1788 is the overall winner. But, literature motif 94 also scores well in ChIP-chip, despite being somewhat different. Possible difference in heterodimerization partners, or proteolytic fragment? Retain both, score 94 as medium.
YFL021WGAT1962HighChIP-chip motif 962 scores higher on both ChIP-chip and expression data
YER184C512MediumOne motif from PBMs is a monomeric GAL4-like motif and the other is dimeric. Medium confidence because there is little independent support, and both contain the CCGG core that I believe may be an artifact. However, both score significantly on ChIP-chip data. Only 512 is significant on expression data.
YER184C2095MediumOne motif from PBMs is a monomeric GAL4-like motif and the other is dimeric. Medium confidence because there is little independent support, and both contain the CCGG core that I believe may be an artifact. However, both score significantly on ChIP-chip data. Only 512 is significant on expression data.
YER169WRPH1547HighAbout half of the motifs look similar to each other, with GGGG core typical of many yeast C2H2 proteins. PBM motif 547 has meaningful scores on both ChIP-chip and mutant expression data. I'm somewhat concerned that motif 279 lacks two A residues captured by both PBM experiments.
YER164WCHD10DubiousUnlikely to be true TF.
YER161CSPT21114LowDubiousI could not find any evidence that this protein binds directly to DNA. None of the motifs is significant. All are from ChIP-chip. Motif 1114 chosen simploy because it has the highest numbers overall.
YER159CBUR60DubiousUnlikely to be true TF.
YER148WSPT15798HighThis is TATA-binding protein. PBM motif 798 chosen because 1326 was derived from the 96-sequence TIRF-PBM array instead of a full 40K PBM
YER130CCOM2534HighPBM motif 534 has the highest correspondence to expression data. Not much else supporting any of the motifs, although the two PBM motifs look about the same. Also look like typical yeast C2H2 motifs.
YER111CSWI4584HighMotif is well-characterized and most published motifs match the expected one. PBM motif (584) scores highly (although not highest) in Chip-chip data. It is, however, non-circular, and specifically captures "DNA metabolic process" in GO analysis.
YER109CFLO867LowDubiousI found no evidence that this is a sequence-specific DNA-binding protein, i.e. that it binds directly to DNA in vitro. The motif has a limited relationship to ChIP-chip data. The literature motif scores better than the motif derived from the ChIP-chip study. Also, the motif is identical to that for MSS11.
YER088CDOT62221HighPBM motif 812 most closely resembles that of homolog TOD6, which is well-supported; has highest correlation to both ChIP and expression data.
YER069WARG5,61426MediumNot clear that motif is optimal.
YER068WMOT2556MediumPBM motif 556 has high correspondence to ChIP-chip data. However, also resembles TATA element, and could also be a structural motif. RRMs normally bind single-stranded RNA or DNA. Give medium confidence.
YER064C2094MediumPBM motif has high score on GO because it looks a lot like Gcn4
YER063WTHO10DubiousUnlikely to be true TF.
YER051WJHD1662LowDubiousThis is a histone demethylase. No evidence for direct DNA binding. Motif 662 is significant. Include, but give low confidence - could be a cofactor.
YER045CACA18MediumLiterature motif 8 is supported by experimental investigation, and resembles a bZIP site, but has no other support; motif was not obtained objectively. Can bind as heterodimer. The highest-scoring motif (from ChIP, 1457) has low information content - I'm concerned it is learning other features of bound promoters.
YER040WGLN3539HighMost motifs are classic GATA or GATAAG. PBM motif 539 scores highest on ChIP.
YER028CMIG32144HighPBM motif 2144 has highest correspondence to ChIP-chip data
YEL009CGCN41363HighVirtually all motifs look the same. MITOMI motif 1363 is as good as any of the ChIP-chip motifs but not circular; scores high across the board.
YDR520CURC2553HighThis is a monomeric GAL4-class motif. Two PBM studies essentially agree, and have some relationship to ChIP-chip data. No other informative data.
YDR485CVPS720DubiousUnlikely to be true TF.
YDR477WSNF11110MediumDubiousMotif 1110 has a quite strong correspondence to ChIP-chip data (from which it is derived). However, there seems to be no evidence that this is a sequence-specific DNA-binding protein. Aside from a weak relationship to expression data there is no corroborating evidence here (and no DNA-binding domain).
YDR463WSTP1660HighSTP1 and 2 have very similar DNA-binding domains. However, they are not similar to those of STP3 and 4. PBM motif for STP2 (800) correlates with ChIP-chip and expression data. ChIP-chip motif for STP1 (660) most strongly resembles motif 800, and scores highly on ChIP-chip data. In addition, these motifs resemble halfmers of literature-derived binding sites.
YDR451CYHP1716HighChIP-chip, EMSA, and one-hybrid all arrive at a classic homeodomain TAATTG motif. Microarray enrichment motif (716) scores higher on OE data from another study than ChIP motifs do, and does nearly as well on ChIP data.
YDR448WADA20DubiousUnlikely to be true TF.
YDR423CCAD12073HighClassic YAP motif in most cases. Include examples of both overlapping and adjacent monomeric sites - there are examples of both in PBM data and they both score highly on ChIP data. This one is overlapping.
YDR423CCAD12098HighClassic YAP motif in most cases. Include examples of both overlapping and adjacent monomeric sites - there are examples of both in PBM data and they both score highly on ChIP data. This one is adjacent.
YDR421WARO802115HighPBM motif 2115 appears monomeric and has highest correspondence to ChIP-chip data. ChIP motif 1509 appears dimeric and correlates with ChIP data. Literature motif 725 appears trimeric and has experimental support. Retain all three.
YDR421WARO80725HighPBM motif 2115 appears monomeric and has highest correspondence to ChIP-chip data. ChIP motif 1509 appears dimeric and correlates with ChIP data. Literature motif 725 appears trimeric and has experimental support. Retain all three.
YDR421WARO801509HighPBM motif 2115 appears monomeric and has highest correspondence to ChIP-chip data. ChIP motif 1509 appears dimeric and correlates with ChIP data. Literature motif 725 appears trimeric and has experimental support. Retain all three.
YDR409WSIZ10DubiousUnlikely to be true TF.
YDR362CTFC60DubiousUnlikely to be true TF.
YDR323CPEP70DubiousUnlikely to be true TF.
YDR310CSUM1383HighThis is the motif for the FL SUM1; scores highest on ChIP-chip and resembles the canonical literature motif; also has some relationship to deletion expression data
YDR310CSUM1478HighThis is the motif for the SUM1 AT_hook; scores highest in deletion expression data
YDR303CRSC3580HighPBM motif 580 has best correspondence to expression data - the only significant independent criterion - considering that the correlations are all in the same orientation (they are not for 2165). All motifs look similar. Propose that longer motifs could be due to multiple binding sites in the same sequence.
YDR277CMTH10DubiousSGD: "interacts with Rgt1p and the Snf3p and Rgt2p glucose sensors". There is no evidence that this is a sequence-specific transcription factor.
YDR266C1161LowMotifs from ChIP-chip do not correspond to ChIP-chip, and there is no other supporting data. Chose 1161 only because it looks more reasonable. Low confidence.
YDR259CYAP6599HighPBM and ChIP-chip can derive basically the same motif, which is a classical YAP motif. They score similarly on all criteria. The ChIP-chip motif (599) has fewer low-information flanking bases.
YDR253CMET322140HighMost motifs look similar. PBM motif 2140 has highest correspondence to both ChIP and expression.
YDR227WSIR40DubiousThere is no evidence that the SIR proteins are sequence-specific DNA-binding proteins. Most of the motifs for them are Rap1 sites.
YDR225WHTA10DubiousUnlikely to be true TF.
YDR216WADR1576HighPBM motif 576 has significant correspondence to both ChIP-chip and highest to expression data. And has a classic yeast C2H2 look.
YDR213WUPC2544HighThe SRE is bound by UPC2 and the "canonical" sequence is TCGTATA. However, the more degenerate version obtained by PBM (motif 544) scores better in both expression analysis and OE experiments. Newer motif 2109 scores better on ChIP-chip, but lower on expression, and the SRE is well-characterized....I think this one deserves further experimental analysis.
YDR207CUME62239HighAll motifs are similar to each other. BEEML-PBM motif 2239 scores highest across the board.
YDR174WHMO12249LowThis motif is uncharacteristic for a Sox protein and HMG proteins typically do not bind DNA in a sequence specific manner. Since it is from ChIP data it could be a cofactor motif. Low confidence.
YDR169CSTB32233HighSTB3 binds RRPE element (AAAAATTT) both in vivo and in vitro (PMID 17616518). PBM motifs 810 and 2233 strongly resembles the RRPE element, scores significantly in deletion expression data, and nail the GO categories "nucleolus" and "ribosome biogenesis". 2233 gets slightly higher scores.
YDR146CSWI5569HighPBM, Chip-chip, and conservation all yield similar motifs. ChIP-chip scores highest in ChIP-chip but that is circular. Choose PBM motif 569 which is nearly identical.
YDR123CINO2713HighIno2/4 binds as a heterodimer, so there should just be one motif for the two proteins. All motifs appear similar but none of them is derived from in vitro data. Nonetheless most motifs match a classic E-box with some preference for flanking bases. Motif 713 is derived from ChIP-chip; it is not the highest-scoring ChIP-chip motif but it is highest for OE and deletion expression.
YDR096WGIS1562HighAll motifs similar; PBM motif 562 has highest correspondence to deletion expression data and overexpression data
YDR081CPDC21050LowMotifs do not correlate with the ChIP-chip data from which it was derived. I found no other experimental evidence that this is a sequence-specific DNA-binding protein. However, it does have HTH and transposase motifs. Retain motif 1050 but give low confidence.
YDR049W0DubiousNo evidence this is a TF, aside from a poorly-scoring C2H2 zinc finger
YDR043CNRG12148HighPBM, ChIP-chip, and literature motifs all appear very similar, and resemble motif for the related protein NRG2. Choose top PBM motif (2148). There is also a recurring ChIP-chip motif (TGTGCCT) which I believe is actually the MOT3 binding site.
YDR034CLYS14133HighPBM motifs are virtually identical and appear monomeric; literature motif is dimeric. Include both. Choose PBM motif 865 as it appears to have more robust CGG.
YDR034CLYS14865HighPBM motifs are virtually identical and appear monomeric; literature motif is dimeric. Include both. Choose PBM motif 865 as it appears to have more robust CGG.
YDR026C696HighThree ChIP-chip motifs are virtually identical in appearance; resemble Reb1 motifs; high correspondence to ChIP-chip data
YDR009WGAL30DubiousGal3 is not a sequence-specific DNA-binding protein
YDL170WUGA3651HighAppears to be a dimeric GAL4-class motif. Scores highest in ChIP-chip data, but is derived from the same data. GO seems to match known function!
YDL170WUGA3486MediumAppears to be a monomeric GAL4-class motif. Derived from PBM data, scores highly in ChIP-chip data, but not as high as the dimeric site derived from the ChIP-chip data.
YDL166CFAP70DubiousThis is supposed to be a ribosome biogenesis factor. I found no evidence that it is a sequence-specific DNA-binding protein.
YDL106CPHO21680IncorrectLikely represents Abf1 binding site.
YDL106CPHO22154HighMotifs are largely all different from each other. PBM motif 2154 scores highly on ChIP data and resembles classic TAAT homeobox core. Note that PBM motif 794 even more strongly resembles homeobox (TAATTA) but scores slightly less highly.
YDL074CBRE10DubiousUnlikely to be true TF.
YDL056WMBP12138HighAlmost all motifs look similar to literature binding site. PBM motif 2138 scores at the top on ChIP-chip and expression. And is non-circular.
YDL048CSTP4559MediumSTP3 and 4 have very similar DNA-binding domains. However, they are not similar to those of STP1 and 2; the next most closely related are SWI5 and ACE2, with major differences in the recognition alpha helices. All of the STP4 motifs are different from each other and none have any supporting data. There is only one motif for STP3 (568) from PBM and it matches the STP4 motif from the same study (559) which is the basis for choosing these two motifs.
YDL042CSIR20DubiousThere is no evidence that the SIR proteins are sequence-specific DNA-binding proteins. Most of the motifs for them are Rap1 sites.
YDL020CRPN41700HighIn vitro motifs do not contain the TTT sequence on the end. But they were derived from the DBD only. The rest of the protein may contribute to binding the TTT segment. Motif 1700 has the highest correspondence to ChIP-chip and expression and GO.
YDL020CRPN41090IncorrectLikely represents Reb1 binding site.
YDL002CNHP10502LowDubiousNHP10 is an HMGB-type protein. Known to prefer DNA ends. There is no independent support for the single PBM motif.
YCR106WRDS1506HighAll motifs look similar. PBM motif 506 has a higher score on ChIP-chip than any of the ChIP-chip derived motifs.
YCR096CHMRA2558MediumShould be similar to MATALPHA2. The one PBM motif is indeed related to the MITOMI motif for MATALPHA2.
YCR066WRAD180DubiousUnlikely to be true TF.
YCR065WHCM1570HighPBM and SAAB/EMSA motifs both look similar to standard FH motif. PBM motif 570 has stronger correspondence to expression data.
YCR040WMATALPHA11418LowAccording to PMID: 15118075, binds the "Q site" which has "consensus" ACAATGACAG. Seems all that is in common is the CAAT. I believe further study is required.
YCR039CMATALPHA21364HighAccording to PMID: 9858582, "A comparison of the 2 binding sites in both asg and hsg operators yields the same consensus sequence, 5'-CATGTA-3"; results in Figure 2 of the same paper support a consensus of CATGTAA. MITOMI yields ACATG, which is the reverse complement of most of the literature consensus. Motif 1364 has highest information content; use this.
YCR033WSNT10DubiousUnlikely to be true TF.
YCR018CSRD12232MediumPBM studies yield nearly identical motifs. 2232 closely resembles motif from related GATA factors and scores highest overall. This is an unusual motif for the GATA class; hence medium confidence level.
YCL067CHMLALPHA22079MediumProtein is similar to PBX/MEIS/TGIF; both PBM motifs have some similarity (central ACA/TGT), so do sites in crystal and in vivo (e.g. PMID: 1682054) but no clear winner between the two. Keep both PBM motifs in curated set (2102 and 2079) but give medium confidence - no supporting ChIP or expression data.
YCL067CHMLALPHA22102MediumProtein is similar to PBX/MEIS/TGIF; both PBM motifs have some similarity (central ACA/TGT), so do sites in crystal and in vivo (e.g. PMID: 1682054) but no clear winner between the two. Keep both PBM motifs in curated set (2102 and 2079) but give medium confidence - no supporting ChIP or expression data.
YCL058CFYV51417LowLiterature motif is derived from a single promoter and while the protein seems to have some DNA-binding activity, perhaps in conjunction with other TFs, I find the evidence supporting this precise binding site incomplete, since it is derived from a single site. Hence, low confidence in the motif.
YCL055WKAR4127LowDubiousEvidence for sequence specific DNA binding seems weak, hence low confidence
YBR297WMAL330None of the ChIP-chip motifs correspond wekk to the data they come from and/or resemble a GAL4 motif.
YBR267WREI1489HighPBM motif looks like a yeast C2H2 motif (row of C's); highly significant relationship to ChIP-chip data
YBR240CTHI21449HighThis is a GAL4-class protein. All motifs are ChIP-chip derived, none resembles each other. 1449 is the only one with respectable scores on ChIP and expression,and it also has the appearance of a GAL4 class motif..although, the structural prior presumably forces it to have this property.
YBR239CERT12188MediumThree PBM motifs are all classic monomeric GAL4 motifs. Chose 2188 because it has fewer noninformative flanking positions, and higher significance on expression data. Also, 826 has the CCGG core that I suspect may be an artefact of PBMs or the DBD clones used in these studies. The highest-scoring ChIP motif is circular and does not resemble a GAL4 class binding site.
YBR182CSMP1864MediumPBM motif 864 scores highest on ChIP-chip and expression data. I gave it a medium, however, because it has low information content at most positions, does not closely match the literature motif (although the literature motif does not mach ChIP-chip or expression data), and also does not resemble that of RLM1, which according to the literature should be related.
YBR150CTBS12179HighTwo motifs from PBMs are nearly identical GAL4-class motifs with defined spacing and orientation. Motif 552 has slightly higher scores. Two motifs from BEEML analysis of PBM data give monomeric motif - also give this high confidence.
YBR150CTBS1552HighTwo motifs from PBMs are nearly identical GAL4-class motifs with defined spacing and orientation. Motif 552 has slightly higher scores. Two motifs from BEEML analysis of PBM data give monomeric motif - also give this high confidence.
YBR089C-ANHP6B792MediumNHP6A and NHP6B are similar to the HMGB family, which is thought to lack sequence specificity. However, the proteins do bend the DNA when they bind, and so may have some level of sequence specificity. Essentially similar motifs were obtained for the two different proteins (in the same study) and the PBM motif for Nhp6A has a good correspondence to ChIP-chip data. Give both Medium confidence.
YBR083WTEC1815HighAll motifs agree, and are significant by several criteria. PBM motif 815 has the second-highest scores overall, and it is non-circular for in vivo binding. Also has highest GO score.
YBR066CNRG21383HighMITOMI motif 1383 looks like a classic yeast C2H2 binding site (row of G's). Also resembles motifs obtained by both ChIP and PBMs for related protein Nrg1.
YBR060CORC20DubiousUnlikely to be true TF.
YBR049CREB1907HighAll motifs are similar. ChIP-chip motif 907 has highest correspondence to both ChIP-chip and expression data, and strongly resembles MITOMI and PBM motifs.
YBR033WEDS12093HighPBM and ChIP-chip motifs are very similar. PBM motif 2093 scores most significantly on ChIP data. Classic GAL4 class motif.
YBL103CRTG31446LowOnly the PBM motif is a classic HLH motif. Three different ChIP-chip-derived motifs are all diverse, but all score highly on ChIP-chip data! Are they motifs of other TFs? Check. 602: GCN4; 1095, TEC1; 1096: resembles 602, but is a closer match to CUP9/TOS8. Also hits GCN4. According to the literature (PMID: 9032238) the core binding site for the Rtg1p-Rtg3p heterodimer is 5'-GGTCAC-3'; the only motif that resembles this is 1446. Vague resemblance to 602 and 1096. I am going to retain 1446, which represents the literature site; PBM motif 870, which resembles an E-box, and ChIP-chip motif 1445, which scores highest on ChIP-chip data. But give all low confidence.
YBL103CRTG31445LowOnly the PBM motif is a classic HLH motif. Three different ChIP-chip-derived motifs are all diverse, but all score highly on ChIP-chip data! Are they motifs of other TFs? Check. 602: GCN4; 1095, TEC1; 1096: resembles 602, but is a closer match to CUP9/TOS8. Also hits GCN4. According to the literature (PMID: 9032238) the core binding site for the Rtg1p-Rtg3p heterodimer is 5'-GGTCAC-3'; the only motif that resembles this is 1446. Vague resemblance to 602 and 1096. I am going to retain 1446, which represents the literature site; PBM motif 870, which resembles an E-box, and ChIP-chip motif 1445, which scores highest on ChIP-chip data. But give all low confidence.
YBL103CRTG3870LowOnly the PBM motif is a classic HLH motif. Three different ChIP-chip-derived motifs are all diverse, but all score highly on ChIP-chip data! Are they motifs of other TFs? Check. 602: GCN4; 1095, TEC1; 1096: resembles 602, but is a closer match to CUP9/TOS8. Also hits GCN4. According to the literature (PMID: 9032238) the core binding site for the Rtg1p-Rtg3p heterodimer is 5'-GGTCAC-3'; the only motif that resembles this is 1446. Vague resemblance to 602 and 1096. I am going to retain 1446, which represents the literature site; PBM motif 870, which resembles an E-box, and ChIP-chip motif 1445, which scores highest on ChIP-chip data. But give all low confidence.
YBL054WTOD6852HighTwo PBM motifs largely agree; 852 has higher correspondence to expression data while 495 has higher correspondence to ChIP-chip. Use 852; score is way higher. Also for GO.
YBL052CSAS30DubiousUnlikely to be true TF.
YBL021CHAP3695HighSubunit of the heme-activated, glucose-repressed Hap2/3/4/5 CCAAT-binding complex - there should be a single motif for all four proteins, containing CCAAT. ChIP-chip motif 695 resembles CCAATCA, and scores highly on ChIP-chip, OE, and deletion expression data.
YBL008WHIR10DubiousHir1,2,3 are a nucleosome assembly complex, not TFs
YBL005WPDR31387MediumMITOMI yields a simple GAL4 monomeric site that scores well in ChIP-chip data. ChIP-chip yields a dimeric site that resembles the literature site. In vivo, PDR1 and PDR3 may form heterodimers. Retain both. This is the monomeric motif.
YBL005WPDR32062MediumMITOMI yields a simple GAL4 monomeric site that scores well in ChIP-chip data. ChIP-chip yields a dimeric site that resembles the literature site. In vivo, PDR1 and PDR3 may form heterodimers. Retain both. This is the dimeric ChIP-chip motif.
YBL003CHTA20DubiousUnlikely to be true TF.
YAL051WOAF12060MediumMotif 2060 has a strong resemblance to the literature motifs for the Oaf1-Pip2 dimer, and scores highly on both ChIP and expression data. No in vitro support and it's kind of weak looking so Medium confidence.
TBP-TFIIBTBP-TFIIB1329MediumThe TIRF-PBM data used to generate the motif included only 96 sequences; hence, medium confidence.
TBP-TFIIA-TFIIBTBP-TFIIA-TFIIB1330MediumThe TIRF-PBM data used to generate the motif included only 96 sequences; hence, medium confidence.
TBP-TFIIATBP-TFIIA1328LowThe TIRF-PBM data used to generate the motif included only 96 sequences. Also it is curious that there is no TATA sequence in the logo.
MBP1-SWI6-dimerMBP1-SWI6-dimer0Redundant with MBP1
MATALPHA1-MCM1-dimeralpha1-MCM1-dimer1442MediumNot clear that motif is optimal.
MATA1-MATALPHA2-dimera1-alpha2-dimer1436MediumNot clear that motif is optimal.
MATA10Need to study literature more carefully and consult experts.but at first glance none of these motifs seems right
MAL63136MediumThis is an unconventional dimeric GAL4-class motif