We resemble prokaryotes much more than we would like to admit
or
90% of the Functional Binding Sites of know human transcription
factors appear in the first ~200bp from the Transcription Start Site
Yuval Tabach
Weizmann Institute
One of the major challenges in Systems Biology is to predict a
gene's regulation by identifying transcription factor binding motifs
on its promoter. Taking into account positional bias and using a novel
scoring method based on groups of putative co-regulated genes and ortholog
conservation, we created a data base of motif over representation,
for all Gene Ontology groups and for 414 known transcription factors.
Surprisingly, almost all motifs were found to be over represented
nearly exclusively within ~200bp from the Transcription Start Site.
These bioinformatics results were validated for cell cycle motifs and for
NFkB using expression data, and for Myocardin in a direct in-vitro
experiment. We have supporting evidence for our findings from various
studies in a variety of biological fields such as promoter SNPs,
medium throughput wet biology, histone distribution, evolutionary
conservation, purified selection of repetitive elements, and the
heterogeneity of GC content and CpG islands in the promoters.