About HiDe

HiDe (short for Highway Detection) is a software package for inferring highways of horizontal gene transfer in the evolutionary history of a set of species. HiDe takes as input a collection of unrooted gene trees along with a rooted species tree. Further details on the functionality of HiDe are available in the paper listed below.

HiDe was developed by Guy Banay with Mukul Bansal in Ron Shamir's Computational Genomics group at Tel Aviv University.

Get the software

Windows binaries
Linux binaries
Source code

The Windows version above is our officially supported executable for HiDe. These Windows binaries are completely self-contained and should work out of the box without any issues. For those who prefer not to use Windows, we have also included binaries for Linux. To use the Linux binaries, users must ensure that their Linux system has GSL (GNU scientific library) installed. Both the Windows and Linux packages include a small test dataset.

The software is freely available under the GNU General Public License, version 3, or any later version at your choice.

HiDe is a research tool, still in the development stage. Hence, it is not presented as error-free, accurate, complete, useful, suitable for any specific application or free from any infringement of any rights. The Software is licensed AS IS, entirely at the user's own risk.

How to use it

Invoke the script like this: lua score.lua <dir> [cutoff] [ignore]

Make sure to run the script from the same directory in which you unpacked it, so that Lua can find all the required DLLs.

<dir> should contain a file named species.newick with the species tree in newick format. All other *.newick files in that directory are taken to be gene trees. Please note that all trees (gene trees and species tree) must be rooted (even though the gene trees will actually be treated as unrooted by the program). Also note that the current version of HiDe cannot correctly parse input trees with branch lengths, so we ask that users remove branch lengths from their input trees before using HiDe. In case there are multiple representatives for each gene tree (e.g. bootstrap replicates), put all versions of each gene-tree in one file and specify a cutoff percentage. The cutoff must be higher than 50.

If you want to ignore quartets that are explained by some (directed) edges, specify [ignore] as follows: "{{u1, v1}, {u2, v2}, ...}" , (do not omit the quotes!) where ui, vi are the indexes of each edge's endpoints. For undirected edges, specify each edge twice, once in each direction.

The output, a sorted list of horizontal edges along with their scores, is printed to stdout, so make sure to redirect it if you want to save it to a file.

Example: lua score.lua test 80 "{{11, 13}, {13, 11}}" > scores.txt

Interpreting the output

On the provided test dataset HiDe's output should look like this:
0.985625        2-Ivysaur --> 16-Charizard      (50%/49%)                       
0.985625        4-Venosaur --> 14-Charmeleon    (50%/49%)                       
0.909734        3-LCA(Ivysaur,Venosaur) --> 12-Charmander       (67%/32%)       
0.909734        0-Bulbasaur --> 15-LCA(Charmeleon,Charizard)    (32%/67%)       
0.892734        2-Ivysaur --> 12-Charmander     (53%/46%)
.
.
.
Each line is of the form score [tab] node --> node [tab] directionality .
score
This is the HiDe score of the horizontal edge that this line refers to. Edges with higher scores are more likely to be highways of horizontal gene transfer.

In a sense, an edge's score is an estimation of the number of genes that were transferred along that edge. Each gene tree in the input contributes a number between 0 and 1 to the score. This contribution is a measure of whether the gene tree contains evidence that supports the theory that a horizontal transfer occured along that edge: a gene tree that exactly matches what we would expect assuming a transfer occured will contribute 1 to the final score, while gene trees that match only partially will contribute less. Thus the maximum possible score for a horizontal edge is equal to the number of input gene trees.

The output of HiDe is sorted in order of decreasing score, so that the most interesting edges appear at the beginning of the list.

node
Each node is specified in two ways: a number and a name. The number is the (zero-based) index of the node in an in-order traversal of the rooted species tree. It can be safely ignored. It is preferrable to refer to tree nodes by name. Leaf nodes are named after the species that they correspond to, and internal nodes are named LCA(u, v) where u, v are leafs whose latest common ancestor is the named nodes.

directionality
This field specifies how much evidence there is for the directed highway to have the same direction as the arrow in the previous field. The first number reflects how much of an edge's score comes from the edge as displayed in the output, while the second number gives the opposite edge's contribution to the score. It should be noted that, in the current version of HiDe, these percentages are rather inaccurate and should not be used for direction inference.

Citing HiDe

HiDe can be cited as follows:
Systematic Inference of Highways of Horizontal Gene Transfer in Prokaryotes,
Mukul S. Bansal, Guy Banay, Timothy J. Harlow, J. Peter Gogarten, Ron Shamir.
To appear in Bioinformatics.

Get in touch

In case of any questions or suggestions please feel free to contact Guy Banay or Mukul Bansal.