## DOCKS - Design of Compact K-mer Set for hash functionsDOCKS (short for Design of Compact K-mer Sets for hash functions) is a software for finding a compact set of k-mers that hits every L-long sequence. DOCKS takes as input a list of parameters and outputs a list of k-mers. Further details on the functionality of DOCKS are available in the paper listed below.DOCKS was developed by Yaron Orenstein, David Pellow, Guillaume Marcais, Ron Shamir and Cal Kingsford. |

decycling.jar

DOCKS.jar

code.zip

This distribution is our officially supported executable for DOCKS. These binary are mostly self-contained and should work out of the box without any issues. For Integer Linear Programming solver (ILP) DOCKS uses Gurobi solver.

The software is freely available under the GNU Lesser General Public License, version 3, or any later version at your choice.

DOCKS is a research tool, still in the development stage. Hence, it is not presented as error-free, accurate, complete, useful, suitable for any specific application or free from any infringement of any rights. The Software is licensed AS IS, entirely at the user's own risk.

java -jar decycling.jar <output file> <k> <Alphabet>

Example run:

java -jar decycling.jar decycling_5_ACGT.txt 5 ACGT

This command will genearte a decycling set for k=5 over ACGT alphabet. The output will be saved to decycling_5_ACGT.txt.

Example DOCKS run:

java -jar DOCKS.jar res_5_20_ACGT_0_1.txt decycling_5_ACGT.txt 5 20 ACGT 0 1

Example DOCKSAny run:

java -jar DOCKS.jar res_5_20_ACGT_0_2.txt decycling_5_ACGT.txt 5 20 ACGT 0 2

Example DOCKSAnyX (X=125) run:

java -jar DOCKS.jar res_5_20_ACGT_0_2_125.txt decycling_5_ACGT.txt 5 20 ACGT 0 2 125

In all of the runs, the use needs to provide the decycling set computed by decycling.jar.

Example ILP run (limited to 1000s), with no DOCKS solution:

java -jar DOCKS.jar res_5_20_ACGT_1000_3.txt decycling_5_ACGT.txt 5 20 ACGT 1000 3

You may need to use more memory for higher values of k, i.e. k>=10, by adding -Xmx4096m option for example.

You may need to increase heapspace size for random mode for DFS, by adding -Xss515m option for example.

AAAACAAA AAAAGAAA AAAATAAA AAACCAAA ... ATAACGAA TCACCGAA GCCTACTA TCCTCCTAEach line is a k-mer in the set.

Universal hitting k-mer sets for k=11 20 <= L <= 200

Universal hitting k-mer sets for k=12 20 <= L <= 200 computed by DOCKSany

Universal hitting k-mer sets for k=13 20 <= L <= 200 computed by DOCKSany1000

Compact universal k-mer sets

Yaron Orenstein, David Pellow, Guillaume Marcais, Ron Shamir and Carl Kingsford.

WABI 2016.

Journal version submitted.