We resemble prokaryotes much more than we would like to admit

or

90% of the Functional Binding Sites of know human transcription factors appear in the first ~200bp from the Transcription Start Site

Yuval Tabach
Weizmann Institute

One of the major challenges in Systems Biology is to predict a gene's regulation by identifying transcription factor binding motifs on its promoter. Taking into account positional bias and using a novel scoring method based on groups of putative co-regulated genes and ortholog conservation, we created a data base of motif over representation, for all Gene Ontology groups and for 414 known transcription factors. Surprisingly, almost all motifs were found to be over represented nearly exclusively within ~200bp from the Transcription Start Site. These bioinformatics results were validated for cell cycle motifs and for NFkB using expression data, and for Myocardin in a direct in-vitro experiment. We have supporting evidence for our findings from various studies in a variety of biological fields such as promoter SNPs, medium throughput wet biology, histone distribution, evolutionary conservation, purified selection of repetitive elements, and the heterogeneity of GC content and CpG islands in the promoters.