|
|
|
|
|
|
|
V |
YEL009C |
GCN4 |
1363 |
High |
|
Virtually all motifs look the same. MITOMI motif 1363 is as good as any of the ChIP-chip motifs but not circular; scores high across the board. |
V |
YKR099W |
BAS1 |
402 |
High |
|
Virtually all motifs are similar, with GAGTCA core. ChIP motif 402 has highest correspondence to both ChIP-chip and expression data. |
V |
YOR172W |
YRM1 |
813 |
High |
|
Two PBM studies largely agree on classic GAL4-class monomeric motif. Motif 813 has indications of spacing and orientation of dimeric protein. |
V |
YPL230W |
USV1 |
509 |
High |
|
Two PBM studies essentially agree on classical C2H2 GGGG-containing motif. Chose 509 because it scores much higher on both ChIP and expression data. |
V |
YML027W |
YOX1 |
498 |
High |
|
Two PBM studies and Pramila et al. (PMID 12464633) agree on classic homeodomain TAATTA motif. All three correlate with expression change and OE. Motif 453 is not a direct measurement so choose PBM motif that is the same length as the typical homeodomain footprint - 498 also correlates best with OE data; expression scores are skewed low by the large number of cell-cycle measurements. |
V |
YIR013C |
GAT4 |
565 |
High |
|
Two PBM motifs look similar, also similar to a subset of other GATAs. 565 scores higher on expression and OE data. |
V |
YBL054W |
TOD6 |
852 |
High |
|
Two PBM motifs largely agree; 852 has higher correspondence to expression data while 495 has higher correspondence to ChIP-chip. Use 852; score is way higher. Also for GO. |
V |
YIL130W |
ASG1 |
2116 |
Medium |
|
Two PBM motifs appear to represent monomeric and dimeric versions of the same motif. This is the monomeric version. No other supporting data; hence medium confidence. Picked 2116 because it has a higher GO score and expression score. |
V |
YIL130W |
ASG1 |
807 |
Medium |
|
Two PBM motifs appear to represent monomeric and dimeric versions of the same motif. This is the dimeric version. No other supporting data; hence medium confidence. |
V |
YMR168C |
CEP3 |
524 |
High |
|
Two PBM motifs agree. Went with 524 because it appears neater. No other supporting data for any of them. |
V |
YLR098C |
CHA4 |
2120 |
High |
|
Two PBM motifs agree, and PBM motif 2120 has highest correspondence to ChIP-chip data, even highter than the best ChIP-chip motif. Has a GAL4-like appearance, albeit a variant. Monomeric. (Highest scoring motif - 1607 - is actually a Rap1 motif). |
V |
YKL222C |
|
2192 |
High |
|
Two motifs from PBMs resemble monomeric GAL4-like motif. 2192 agrees best with ChIP-chip data and expression data. |
V |
YBR150C |
TBS1 |
552 |
High |
|
Two motifs from PBMs are nearly identical GAL4-class motifs with defined spacing and orientation. Motif 552 has slightly higher scores. Two motifs from BEEML analysis of PBM data give monomeric motif - also give this high confidence. |
V |
YBR150C |
TBS1 |
2179 |
High |
|
Two motifs from PBMs are nearly identical GAL4-class motifs with defined spacing and orientation. Motif 552 has slightly higher scores. Two motifs from BEEML analysis of PBM data give monomeric motif - also give this high confidence. |
V |
YBR239C |
ERT1 |
2188 |
Medium |
|
Three PBM motifs are all classic monomeric GAL4 motifs. Chose 2188 because it has fewer noninformative flanking positions, and higher significance on expression data. Also, 826 has the CCGG core that I suspect may be an artefact of PBMs or the DBD clones used in these studies. The highest-scoring ChIP motif is circular and does not resemble a GAL4 class binding site. |
V |
YOR337W |
TEA1 |
817 |
Medium |
|
Three motifs, all from PBMs. Choose 817 because it has a more robust GAL4 "CGG" core. But there is no convincing corroborating data for either motif and they do not match each other. |
V |
YLL054C |
|
526 |
Medium |
|
Three motifs available, from PBMs; two dimeric GAL4-like motifs but with different spacings and one monomeric. No backup data but looks tidy. Keep all three. |
V |
YLL054C |
|
816 |
Medium |
|
Three motifs available, from PBMs; two dimeric GAL4-like motifs but with different spacings and one monomeric. No backup data but looks tidy. Keep all three. |
V |
YLL054C |
|
2242 |
Medium |
|
Three motifs available, from PBMs; two dimeric GAL4-like motifs but with different spacings and one monomeric. No backup data but looks tidy. Keep all three. |
V |
YGL166W |
CUP2 |
48 |
Medium |
|
Three motifs account for three possible spacings in the literature motif; it is not clear that this is the optimal site, however |
V |
YDR026C |
|
696 |
High |
|
Three ChIP-chip motifs are virtually identical in appearance; resemble Reb1 motifs; high correspondence to ChIP-chip data |
V |
YDR174W |
HMO1 |
2249 |
Low |
|
This motif is uncharacteristic for a Sox protein and HMG proteins typically do not bind DNA in a sequence specific manner. Since it is from ChIP data it could be a cofactor motif. Low confidence. |
V |
YJL127C |
SPT10 |
1880 |
Low |
|
This is the protein that binds histone promoters. The sequence specificity is derived from the histone promoters only so the literature motif may be inaccurate. Motif 1880 has higher scores overall but does not resemble the literature motif. Uncertain what to do here - use 1880, but give low confidence. Motif learned in vivo could contain extrinsic information. |
V |
YDR310C |
SUM1 |
478 |
High |
|
This is the motif for the SUM1 AT_hook; scores highest in deletion expression data |
V |
YDR310C |
SUM1 |
383 |
High |
|
This is the motif for the FL SUM1; scores highest on ChIP-chip and resembles the canonical literature motif; also has some relationship to deletion expression data |
V |
YER148W |
SPT15 |
798 |
High |
|
This is TATA-binding protein. PBM motif 798 chosen because 1326 was derived from the 96-sequence TIRF-PBM array instead of a full 40K PBM |
V |
MAL63 |
|
136 |
Medium |
|
This is an unconventional dimeric GAL4-class motif |
V |
YPR186C |
PZF1 |
1321 |
Low |
|
This is a single literature site. The protein almost certainly binds the site but it has not been demonstrated that this is an optimal binding site. |
V |
YDR520C |
URC2 |
553 |
High |
|
This is a monomeric GAL4-class motif. Two PBM studies essentially agree, and have some relationship to ChIP-chip data. No other informative data. |
V |
YBR240C |
THI2 |
1449 |
High |
|
This is a GAL4-class protein. All motifs are ChIP-chip derived, none resembles each other. 1449 is the only one with respectable scores on ChIP and expression,and it also has the appearance of a GAL4 class motif..although, the structural prior presumably forces it to have this property. |
V |
TBP-TFIIB |
TBP-TFIIB |
1329 |
Medium |
|
The TIRF-PBM data used to generate the motif included only 96 sequences; hence, medium confidence. |
V |
TBP-TFIIA-TFIIB |
TBP-TFIIA-TFIIB |
1330 |
Medium |
|
The TIRF-PBM data used to generate the motif included only 96 sequences; hence, medium confidence. |
V |
TBP-TFIIA |
TBP-TFIIA |
1328 |
Low |
|
The TIRF-PBM data used to generate the motif included only 96 sequences. Also it is curious that there is no TATA sequence in the logo. |
V |
YDR213W |
UPC2 |
544 |
High |
|
The SRE is bound by UPC2 and the "canonical" sequence is TCGTATA. However, the more degenerate version obtained by PBM (motif 544) scores better in both expression analysis and OE experiments. Newer motif 2109 scores better on ChIP-chip, but lower on expression, and the SRE is well-characterized....I think this one deserves further experimental analysis. |
V |
YJR147W |
HMS2 |
992 |
Low |
|
The one ChIP-chip motif bears little relationship to the ChIP data.it kind of looks like an HNF-like site, but still, low confidence. |
V |
YNL167C |
SKO1 |
1401 |
High |
|
The MITOMI motif 1401 is an offset and asymmetric version of the traditional consensus (TGACGTCA) but has a higher ChIP-chip and expression correspondence than the motifs that are more symmetric. |
V |
YKL185W |
ASH1 |
28 |
Medium |
|
The literature motif may not represent the full binding activity of the protein. Also, it is not supported by ChIP-chip. ChIP-chip identifies Mcm1-like motifs. But, it does score highly in both ChIP-chip and expression. The only higher-scoring motif has almost no information content. |
V |
YML113W |
DAT1 |
1416 |
Medium |
|
The literature (e.g. PMID: 8532535) suggests that the sequence specificity may be more promiscuous than the name suggests. To my knowledge there has not been any SELEX or PBM demonstrating that any motif is correct. But, it does bear some relationship to ChIP-chip and expression data. |
V |
YGL237C |
HAP2 |
695 |
High |
|
Subunit of the heme-activated, glucose-repressed Hap2/3/4/5 CCAAT-binding complex - there should be a single motif for all four proteins, containing CCAAT. ChIP-chip motif 695 resembles CCAATCA, and scores highly on ChIP-chip, OE, and deletion expression data. |
V |
YBL021C |
HAP3 |
695 |
High |
|
Subunit of the heme-activated, glucose-repressed Hap2/3/4/5 CCAAT-binding complex - there should be a single motif for all four proteins, containing CCAAT. ChIP-chip motif 695 resembles CCAATCA, and scores highly on ChIP-chip, OE, and deletion expression data. |
V |
YKL109W |
HAP4 |
695 |
High |
|
Subunit of the heme-activated, glucose-repressed Hap2/3/4/5 CCAAT-binding complex - there should be a single motif for all four proteins, containing CCAAT. ChIP-chip motif 695 resembles CCAATCA, and scores highly on ChIP-chip, OE, and deletion expression data. |
V |
YOR358W |
HAP5 |
695 |
High |
|
Subunit of the heme-activated, glucose-repressed Hap2/3/4/5 CCAAT-binding complex - there should be a single motif for all four proteins, containing CCAAT. ChIP-chip motif 695 resembles CCAATCA, and scores highly on ChIP-chip, OE, and deletion expression data. |
V |
YDL048C |
STP4 |
559 |
Medium |
|
STP3 and 4 have very similar DNA-binding domains. However, they are not similar to those of STP1 and 2; the next most closely related are SWI5 and ACE2, with major differences in the recognition alpha helices. All of the STP4 motifs are different from each other and none have any supporting data. There is only one motif for STP3 (568) from PBM and it matches the STP4 motif from the same study (559) which is the basis for choosing these two motifs. |
V |
YLR375W |
STP3 |
568 |
Medium |
|
STP3 and 4 have very similar DNA-binding domains. However, they are not similar to those of STP1 and 2; the next most closely related are SWI5 and ACE2, with major differences in the recognition alpha helices. All of the STP4 motifs are different from each other and none have any supporting data. There is only one motif for STP3 (568) from PBM and it matches the STP4 motif from the same study (559) which is the basis for choosing these two motifs. |
V |
YHR006W |
STP2 |
2174 |
High |
|
STP1 and 2 have very similar DNA-binding domains. However, they are not similar to those of STP3 and 4. PBM motif for STP2 (2174) correlates highest with ChIP-chip and expression data. ChIP-chip motif for STP1 (660) most strongly resembles motif 800, and scores highly on ChIP-chip data. In addition, these motifs resemble halfmers of literature-derived binding sites. |
V |
YDR463W |
STP1 |
660 |
High |
|
STP1 and 2 have very similar DNA-binding domains. However, they are not similar to those of STP3 and 4. PBM motif for STP2 (800) correlates with ChIP-chip and expression data. ChIP-chip motif for STP1 (660) most strongly resembles motif 800, and scores highly on ChIP-chip data. In addition, these motifs resemble halfmers of literature-derived binding sites. |
V |
YDR169C |
STB3 |
2233 |
High |
|
STB3 binds RRPE element (AAAAATTT) both in vivo and in vitro (PMID 17616518). PBM motifs 810 and 2233 strongly resembles the RRPE element, scores significantly in deletion expression data, and nail the GO categories "nucleolus" and "ribosome biogenesis". 2233 gets slightly higher scores. |
V |
YCR096C |
HMRA2 |
558 |
Medium |
|
Should be similar to MATALPHA2. The one PBM motif is indeed related to the MITOMI motif for MATALPHA2. |
V |
YJL206C |
|
0 |
|
|
Seven motifs from ChIP-chip, but none of them corresponds well to ChIP-chip data, and none of them resembles a GAL4 motif. 1169 has a CGG in the middle, but too much flanking information to be credible without further independent support. |
V |
YOR363C |
PIP2 |
0 |
|
|
See Oaf1-Pip2-dimer |
V |
MBP1-SWI6-dimer |
MBP1-SWI6-dimer |
0 |
|
|
Redundant with MBP1 |
V |
YCL067C |
HMLALPHA2 |
2079 |
Medium |
|
Protein is similar to PBX/MEIS/TGIF; both PBM motifs have some similarity (central ACA/TGT), so do sites in crystal and in vivo (e.g. PMID: 1682054) but no clear winner between the two. Keep both PBM motifs in curated set (2102 and 2079) but give medium confidence - no supporting ChIP or expression data. |
V |
YCL067C |
HMLALPHA2 |
2102 |
Medium |
|
Protein is similar to PBX/MEIS/TGIF; both PBM motifs have some similarity (central ACA/TGT), so do sites in crystal and in vivo (e.g. PMID: 1682054) but no clear winner between the two. Keep both PBM motifs in curated set (2102 and 2079) but give medium confidence - no supporting ChIP or expression data. |
V |
YDR043C |
NRG1 |
2148 |
High |
|
PBM, ChIP-chip, and literature motifs all appear very similar, and resemble motif for the related protein NRG2. Choose top PBM motif (2148). There is also a recurring ChIP-chip motif (TGTGCCT) which I believe is actually the MOT3 binding site. |
V |
YDR146C |
SWI5 |
569 |
High |
|
PBM, Chip-chip, and conservation all yield similar motifs. ChIP-chip scores highest in ChIP-chip but that is circular. Choose PBM motif 569 which is nearly identical. |
V |
YCR018C |
SRD1 |
2232 |
Medium |
|
PBM studies yield nearly identical motifs. 2232 closely resembles motif from related GATA factors and scores highest overall. This is an unusual motif for the GATA class; hence medium confidence level. |
V |
YDR034C |
LYS14 |
133 |
High |
|
PBM motifs are virtually identical and appear monomeric; literature motif is dimeric. Include both. Choose PBM motif 865 as it appears to have more robust CGG. |
V |
YDR034C |
LYS14 |
865 |
High |
|
PBM motifs are virtually identical and appear monomeric; literature motif is dimeric. Include both. Choose PBM motif 865 as it appears to have more robust CGG. |
V |
YPR013C |
CMR3 |
859 |
High |
|
PBM motifs are very similar. No other supporting data, but it's a clean motif. Chose 859 because it most closely resembles motif from paralog YPR015c. |
V |
YML081W |
|
2194 |
High |
|
PBM motifs are a classical C2H2 motif that match each other and have some correspondence to ChIP-chip data. 2194 has highest correspondence to ChIP chip. |
V |
YOL089C |
HAL9 |
799 |
High |
|
PBM motifs 799 and 2134 score highest on ChIP-chip data; classic dimeric and monomeric GAL4 sites, respectively. |
V |
YOL089C |
HAL9 |
2134 |
High |
|
PBM motifs 799 and 2134 score highest on ChIP-chip data; classic dimeric and monomeric GAL4 sites, respectively. |
V |
YBR267W |
REI1 |
489 |
High |
|
PBM motif looks like a yeast C2H2 motif (row of C's); highly significant relationship to ChIP-chip data |
V |
YGR067C |
|
2191 |
High |
|
PBM motif is a classical C2H2 motif that has good correspondence to ChIP-chip data. 2191 corresponds best and has fewer empty columns in the PWM. |
V |
YIL056W |
VHR1 |
2091 |
Medium |
|
PBM motif has high score on GO because it looks a lot like Gcn4 |
V |
YER064C |
|
2094 |
Medium |
|
PBM motif has high score on GO because it looks a lot like Gcn4 |
V |
YBR182C |
SMP1 |
864 |
Medium |
|
PBM motif 864 scores highest on ChIP-chip and expression data. I gave it a medium, however, because it has low information content at most positions, does not closely match the literature motif (although the literature motif does not mach ChIP-chip or expression data), and also does not resemble that of RLM1, which according to the literature should be related. |
V |
YER088C |
DOT6 |
2221 |
High |
|
PBM motif 812 most closely resembles that of homolog TOD6, which is well-supported; has highest correlation to both ChIP and expression data. |
V |
YIL036W |
CST6 |
585 |
High |
|
PBM motif 585 correlates with expression data (deletion and overexpression). ChIP motif 1466 has higher ChIP score but is lower on expression. |
V |
YDR303C |
RSC3 |
580 |
High |
|
PBM motif 580 has best correspondence to expression data - the only significant independent criterion - considering that the correlations are all in the same orientation (they are not for 2165). All motifs look similar. Propose that longer motifs could be due to multiple binding sites in the same sequence. |
V |
YPL021W |
ECM23 |
578 |
High |
|
PBM motif 578 strongly resembles that from other yeast GATA-class TFs |
V |
YDR216W |
ADR1 |
576 |
High |
|
PBM motif 576 has significant correspondence to both ChIP-chip and highest to expression data. And has a classic yeast C2H2 look. |
V |
YJL089W |
SIP4 |
573 |
Medium |
|
PBM motif 573 is a monomeric GAL4-type motif (others appear dimeric) but it has good correspondence to ChIP-chip data. Only a few of the dimeric sites are more significant - the motif from in vivo analysis (PMID: 14685767) does not score as highly as 2067 from ChIP-chip data, but they look very similar. This is 573, the presumed monomeric site |
V |
YJL089W |
SIP4 |
2067 |
Medium |
|
PBM motif 573 is a monomeric GAL4-type motif (others appear dimeric) but it has good correspondence to ChIP-chip data. Only a few of the dimeric sites are more significant - the motif from in vivo analysis (PMID: 14685767) does not score as highly as 2067 from ChIP-chip data, but they look very similar. This is 2067, the presumed dimeric site. |
V |
YER068W |
MOT2 |
556 |
Medium |
|
PBM motif 556 has high correspondence to ChIP-chip data. However, also resembles TATA element, and could also be a structural motif. RRMs normally bind single-stranded RNA or DNA. Give medium confidence. |
V |
YER130C |
COM2 |
534 |
High |
|
PBM motif 534 has the highest correspondence to expression data. Not much else supporting any of the motifs, although the two PBM motifs look about the same. Also look like typical yeast C2H2 motifs. |
V |
YMR182C |
RGM1 |
531 |
High |
|
PBM motif 531 looks like a C2H2 motif (row of G's), and scores well on both ChIP-chip and deletion expression data. |
V |
YKL062W |
MSN4 |
518 |
High |
|
PBM motif 518 resembles both the classical MSN motif and the PBM motif, and scores highest on both expression and ChIP-chip. |
V |
YNL027W |
CRZ1 |
516 |
High |
|
PBM motif 516 scores highest on ChIP and expression; resembles classic literature motifs |
V |
YOR113W |
AZF1 |
499 |
High |
|
PBM motif 499 scores as well as the ChIP-chip motifs, but without the circularity. No significant data except ChIP-chip, however. |
V |
YGL013C |
PDR1 |
485 |
High |
|
PBM motif 485 looks like a traditional literature motif and has highest correspondence to ChIP and expression data. Dimeric GAL4 motif. |
V |
YKL038W |
RGT1 |
2227 |
High |
|
PBM motif 2227 is very similar to "traditional" motif and to monomeric GAL4 motifs, and scores highest on ChIP-chip data. All PBM motifs are similar. |
V |
YML007W |
YAP1 |
2186 |
High |
|
PBM motif 2186 looks like a monomeric bZIP site but it has the highest scores on both ChIP and expression |
V |
YER028C |
MIG3 |
2144 |
High |
|
PBM motif 2144 has highest correspondence to ChIP-chip data |
V |
YGL209W |
MIG2 |
2143 |
High |
|
PBM motif 2143 has highest correspondence to ChIP-chip data |
V |
YGL035C |
MIG1 |
2142 |
High |
|
PBM motif 2142 has highest correspondence to ChIP-chip AND AUC for GO category "generation of precursor metabolites and energy". The adjacent A/T stretch, which is also noted in the literature, is found in ChIP-chip motif 654 and others; however, that motif does not sort as well for GO category "generation of precursor metabolites and energy" and also scores lower for both ChIP and expression, so it seems unlikely to represent a key intrinsic activity of the protein itself. |
V |
YGR249W |
MGA1 |
2141 |
Medium |
|
PBM motif 2141 is similar to Hsf1 motif 476 (TTCCA). Has TTC "core" which is shared by most Hsf1 motifs. Scores reasonably on ChIP data but no other supporting information; hence "medium". |
V |
YLR228C |
ECM22 |
849 |
High |
|
PBM motif 2122 is a monomeric GAL4 class motif, and scores highest on both ChIP and expression ata. 849 is a classic dimeric GAL4 motif with lower but still reasonable scores and is moderately predictive across the board. |
V |
YLR228C |
ECM22 |
2122 |
High |
|
PBM motif 2122 is a monomeric GAL4 class motif, and scores highest on both ChIP and expression ata. 849 is a classic dimeric GAL4 motif with lower but still reasonable scores and is moderately predictive across the board. |
V |
YDR421W |
ARO80 |
725 |
High |
|
PBM motif 2115 appears monomeric and has highest correspondence to ChIP-chip data. ChIP motif 1509 appears dimeric and correlates with ChIP data. Literature motif 725 appears trimeric and has experimental support. Retain all three. |
V |
YDR421W |
ARO80 |
1509 |
High |
|
PBM motif 2115 appears monomeric and has highest correspondence to ChIP-chip data. ChIP motif 1509 appears dimeric and correlates with ChIP data. Literature motif 725 appears trimeric and has experimental support. Retain all three. |
V |
YDR421W |
ARO80 |
2115 |
High |
|
PBM motif 2115 appears monomeric and has highest correspondence to ChIP-chip data. ChIP motif 1509 appears dimeric and correlates with ChIP data. Literature motif 725 appears trimeric and has experimental support. Retain all three. |
V |
YMR019W |
STB4 |
2107 |
High |
|
PBM motif 2107 is clearly a dimeric GAL4-class motif, and it blows all the other motifs out of the water. |
V |
YMR070W |
MOT3 |
2080 |
Medium |
|
PBM motif 2080 is very similar to the literature motif and scores highest on expression data. Moreover, this motif explains high-scoring ChIP-chip motifs for many other TFs, e.g. Nrg1, Yap6, Sok2 |
V |
YCR065W |
HCM1 |
570 |
High |
|
PBM and SAAB/EMSA motifs both look similar to standard FH motif. PBM motif 570 has stronger correspondence to expression data. |
V |
YIL101C |
XBP1 |
2039 |
High |
|
PBM and in vitro selection-derived motifs have highest scores across the board. 842 is higher on GO, but only slightly in AUC, and it has a very large number of empty flanking bases. 2039 (in vitro selection) seems a reasonable compromise - it's highest on ChIP and almost the highest on expression. |
V |
YBR033W |
EDS1 |
2093 |
High |
|
PBM and ChIP-chip motifs are very similar. PBM motif 2093 scores most significantly on ChIP data. Classic GAL4 class motif. |
V |
YNL314W |
DAL82 |
690 |
High |
|
PBM and ChIP-chip motifs agree; select ChIP-chip as it scores higher on ChIP-chip although the extra A's on the side could be either due to the FL protein or some other in vivo factor. |
V |
YDR259C |
YAP6 |
599 |
High |
|
PBM and ChIP-chip can derive basically the same motif, which is a classical YAP motif. They score similarly on all criteria. The ChIP-chip motif (599) has fewer low-information flanking bases. |
V |
YBL103C |
RTG3 |
870 |
Low |
|
Only the PBM motif is a classic HLH motif. Three different ChIP-chip-derived motifs are all diverse, but all score highly on ChIP-chip data! Are they motifs of other TFs? Check. 602: GCN4; 1095, TEC1; 1096: resembles 602, but is a closer match to CUP9/TOS8. Also hits GCN4. According to the literature (PMID: 9032238) the core binding site for the Rtg1p-Rtg3p heterodimer is 5'-GGTCAC-3'; the only motif that resembles this is 1446. Vague resemblance to 602 and 1096. I am going to retain 1446, which represents the literature site; PBM motif 870, which resembles an E-box, and ChIP-chip motif 1445, which scores highest on ChIP-chip data. But give all low confidence. |