Fast Lossless Compression via Cascading Bloom Filters
BARCODE achieves highly efficient compression by using a reference genome, but completely circumvents the need for alignment, affording a great reduction in the time needed to compress. We hash all reads into Bloom filters to encode, and decode by querying the same Bloom filters using read-length subsequences of the reference. Further compression is achieved by using a cascade of such filters. Our method runs an order of magnitude faster than reference-based methods, while compressing an order of magnitude better than reference-free methods, over a broad range of sequencing coverage depths.
BARCODE is freely available under the GNU Lesser General Public License, version 3, or any later version.