HYDEN
A Software for Designing Degenerate Primers

Chaim Linhart and Ron Shamir
Tel-Aviv University
Original version: August 2002
Last update: September 2008

Overview

HYDEN (HighlY DEgeNerate primers) is a program for designing pairs of degenerate primers for a given set of DNA sequences. HYDEN works well for large input sets of genomic sequences (e.g., hundreds of sequences of length 1Kbp). HYDEN is a batch (i.e., command-line, as opposed to graphical interface) program, available for Windows XP (downloadable version) and Linux (upon request).

Download

The HYDEN software is freely available for academic use - download HYDEN here.
It is also available for non-academic use under appropriate licensing. Please contact Ron Shamir (rshamir AT post.tau.ac.il) or Chaim Linhart (chaiml AT post.tau.ac.il) for further information.

Degenerate Primers

A primer is degenerate if some of its positions admit more than one nucleotide. It is in fact a mixture of unique primers. For example, AAC{G,T}G{A,C,G}G is a 7-long degenerate primer, in which the fourth and sixth positions are degenerate. It corresponds to the primers AACGGAG, AACGGCG, AACGGGG, AACTGAG, AACTGCG, AACTGGG.

The degeneracy of a primer is the number of unique sequences it corresponds to (6 in the example above). Degenerage primers can be used in PCR reactions to amplify many related sequences from genomic DNA or from cDNA libraries. They can be used when some of the related genomic sequences are unknown, or known only in a related species.

In expriments, pairs of primers with combined degeneracies of up to 10¹⁰ were successfully used to amplify sequences from genomic background, with specificity over 99.5% [1].

In the example below, a pair of primers of length 6 cover a set of 5 sequences.
Notice that there is a mismatch between the 3' primer and the fourth sequence.

Algorithm

The input to HYDEN is a list of DNA sequences and a set of parameters that specify the length of the primers, their maximum degeneracy, and the number of mismatches they are allowed to have with every sequence they cover (i.e., theoretically amplify). HYDEN constructs primers with the specified length and degeneracy that cover many of the given sequences. It does so by running a 3-phase algorithm:

Phase 1: Locate conserved regions in the DNA sequences by finding ungapped local alignments with a low entropy score.

Phase 2: Design primers using variants of simple approximation algorithms, called CONTRACTION and EXPANSION (these algorithms approximate the number of sequences the primer does not match, provided that the sequences are over a binary alphabet).

Phase 3: Run a greedy hill-climbing procedure to improve the primers, and select the one with the largest coverage as the output.

HYDEN can design several pairs of degenerate primers that together cover many of the given set of sequences. After designing the first pair, HYDEN designs a second pair of primers on the set of sequences that are not covered by the first pair. And so on for the rest of the pairs.

The full details of the algorithm and the relevant computational analysis are described in [2,3].
HYDEN is written in C++, and runs under Windows and Linux.

Usage

HYDEN receives an input file that contains a set of DNA sequences in fasta format, and a list of parameters that specify the length of the primers to design, their maximal degeneracy, and more. The sample input file HGP_50genes.fasta.txt contains 50 human olfactory receptor genes of length ~1Kbp.

The file output.txt contains a sample output of HYDEN for this input file. The output shows two pairs of primers of length 25 and degeneracy ~5,000 (5' end), ~30,000 (3'). The primers cover 40 out of the 50 input genes, with up to 3 mismatches (in both ends combined) per covered gene.

Application

HYDEN was studied and implemented as part of DEFOG - an experimental scheme for DEciphering Families Of Genes [1]. DEFOG provides a powerful means for analyzing the composition of a large family of genes with conserved regions, and is thus especially useful in species for which little genomic data is available. In addition, DEFOG can be applied to analyze cDNA libraries of gene families.

Given a subset of known gene sequences, HYDEN is used to design degenerate primer pairs. The primers are then used in PCR procedures to amplify fragments of genes, known as well as unknown, of the same family. The fragments are cloned, and an oligofingerprinting (OFP) process [4] characterizes the clones by their patterns of hybridization with a series of very short (8-mer) oligonucleotides. The hybridization pattern of a clone is called its fingerprint. Another novel algorithm, called CLICK [5], clusters the clones into groups corresponding to the same gene according to their fingerprints. Finally, representatives from each cluster are sequenced and compared to the existing OR database. The newly revealed sequences can be used to design primer pairs for another cycle of DEFOG.

The DEFOG scheme is illustrated below. The numbers beneath the boxes summarize the actual parameters in our DEFOG experiment on the human OR subgenome.

References

[1]	T. Fuchs, B. Malecova, C. Linhart, R. Sharan, M. Khen, R. Herwig, D. Shmulevich, R. Elkon, M. Steinfath, J.K. O'Brien, U. Radelof, H. Lehrach, D. Lancet, and R. Shamir, "DEFOG: A Practical Scheme for Deciphering Families of Genes", Genomics, Vol. 80, No. 3, pp. 295-302, 2002. pdf.
[2]	C. Linhart and R. Shamir, "The Degenerate Primer Design Problem", Bioinformatics, Vol. 18, Suppl. 1, pp. S172-S180, 2002: pdf.
[3]	C. Linhart and R. Shamir, "The degenerate primer design problem: Theory and applications", JCB, Vol. 12(4), pp. 431-456, 2005: pdf.
[4]	U. Radelof, S. Hennig, P. Seranski, M. Steinfath, J. Ramser, R. Reinhardt, A. Poustka, F. Francis, and H. Lehrach, "Preselection of Shotgun Clones by Oligonucleotide Fingerprinting: An Efficient and High Throughput Strategy to Reduce Redundancy in Large-Scale Sequencing Projects", Nucleic Acids Research, Vol. 26, pp. 5358-5364, 1998.
[5]	R. Sharan and R. Shamir, "CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis", Proc. 8th International Conference on Intelligent Systems for Molecular Biology (ISMB 2000), pp. 307-316, 2000: ps.

Phase 1:	Locate conserved regions in the DNA sequences by finding ungapped local alignments with a low entropy score.
Phase 2:	Design primers using variants of simple approximation algorithms, called CONTRACTION and EXPANSION (these algorithms approximate the number of sequences the primer does not match, provided that the sequences are over a binary alphabet).
Phase 3:	Run a greedy hill-climbing procedure to improve the primers, and select the one with the largest coverage as the output.

HYDEN A Software for Designing Degenerate Primers