Optimizing the core conserved parts of orthologous modules.
We wished to ensure that the differences in the cis-elements enriched in S. cerevisiae and S. pombe are not an artifact resulting from the way we identified orthologous transcription modules and that these differences are identifiable even when looking for cis-elements only in the promoters of conserved genes shared by the two modules. We therefore repeated the cis-regulatory analysis using perfectly orthologous transcription modules – in which each gene in one module is matched by at least one ortholog in the other module. To generate perfect orthologous modules from a pair of mutual (non-perfect) orthologous ones, we enhanced the existing SAMBA algorithm. The modified algorithm is initialized with a pair of modules and starts by removing all non-orthologous genes in the two modules. The algorithm then iteratively adds and removes pairs of orthologous genes to improve the total score of the module pair. In cases where a gene has more than one ortholog, the algorithm can add either a single gene pair or a larger orthologous group of genes, depending on which alternative scores higher. The algorithm outputs a pair of transcription modules, such that each gene in one module has at least one ortholog in the other module and such that additional gene pairs cannot be added or removed without decreasing the total score of the module pair. Importantly, the results of enriched cis-elements obtained on such perfect orthologous module pairs are consistent with those we reported for the (non-perfect) orthologous modules. They thus confirm that our finings on the evolutionary dynamic of cis-regulation were not biased by the imperfect orthology.
Here are browsable examples of the derived core-conserved modules: