ShortCAKE - Shortest Sequence to Cover All K-mers

ShortCAKE (short for Shortest sequence to Cover All K-mErs) is a software for generating a shortest sequence that for each DNA k-mer includes the k-mer or its reverse complement. In other words, it generates a shortest possible double-stranded sequence covering all k-mers. ShortCAKE can also generate the longest k-unique DNA sequence. For each k-mer the sequence may include the k-mer or its reverse complement, but not both.

Input and Output

ShortCAKE takes as input four parameters:

1. a/d - shortest sequence that cover all k-mers / longest k-unique sequence.
2. k - the order of the sequence.
3. The output file name.
4. 0/1 - a suboptimal linear time algorithm / an optimal polynomial time algorithm.

It outputs the sequence as a textual file.

ShortCAKE was developed by Yaron Orenstein in Ron Shamir's Computational Genomics group at Tel Aviv University.

Get the software

Java executable distribution and example files

This distribution is our officially supported executable for ShortCAKE. This binary is completely self-contained and should work out of the box without any issues. The package includes a README file and example output files for optimal sequences for k=2,3,4,5,6,7,8,9,10.

The software is freely available under the GNU Lesser General Public License, version 3, or any later version at your choice.

ShortCAKE is a research software, still in the development stage. Hence, it is not presented as error-free, accurate, complete, useful, suitable for any specific application or free from any infringement of any rights. The Software is licensed AS IS, entirely at the user's own risk.

How to use it

java -jar ShortCAKE.jar <a/d> <k> <output_filename> <0/1>

Example runs:

java -jar ShortCAKE.jar a 8 sequence8.txt 0

java -jar ShortCAKE.jar d 7 sequence7.txt 1

java -jar ShortCAKE.jar d 10 sequence10.txt 1

Interpreting the output

The output file is a text file containing the sequence.

* Note that the sequence is cyclic. This means that the last k-1 k-mers are missing. To correct this, attach the first k-1 characters to the end of the sequence.

Citing ShortCAKE

ShortCAKE can be cited as follows:
Design of Shortest Double-Stranded DNA Sequences Covering All K-mers with Applications to Protein Binding Microarrays and Synthetic Enhancers,
Yaron Orenstein, Ron Shamir.
Vol. 29 (13), Pages i71-i79, Bioinformatics (2013).
doi: 10.1093/bioinformatics/btt230
Optimal Design of k-Unique DNA Sequence
Yaron Orenstein, Ron Shamir.
In preparation

Get in touch

In case of any questions or suggestions please feel free to contact Yaron Orenstein.