Following
the scientific literature: A personal practical guide for young computational
biologists
by Igor Ulitsky & Ron Shamir
The goal of this guide is to describe why, when,
where and how can you follow the most up-to-date science of
interest and what papers/journals you should follow. The guide is biased
towards the fields of genomics/systems biology, but all the practical advice
and the guidelines for the top biology and computational biology journals apply
also if you’re into SNPs/proteins/phylogeny/etc. This guide also assumes that
you’re passionate about biology and that your fields of scientific interest are
not very narrow.
This is a personal perspective, and as such it is subjective.
While many will agree with some of our guidelines and literature selection - probably none will
agree with everything, and many will disagree with a lot of what we say. It is aimed primarily for students and young
researchers making their first steps in following the literature. We hope that
this guide will ease your first steps, and after following the literature for a
while, you will have formed your own different views and preferences (and eventually
write your own better guide ;-).
Why should I
constantly follow scientific literature?
There are simple pros and cons to devoting time to regularly
getting updated on what’s going on in other labs worldwide:
· Cons
O It takes
time. How much time? Depends on how wide (how many fields/journals
will you follow) and how deep (how many papers in each field) you want to be.
O Most of the
papers you will read will be eventually useless, as they
will either be very poor, or you will never use the information you read.
The good news about these problems
is that the amount of time it will take you to (a) decide if a paper is good;
or (b) read a paper properly; drops dramatically over time, particularly if the
paper is in the field you’re working on (probably about 5- to 10-fold over 5 years
of a PhD).
· Pros
o The more
you read – the more knowledgeable you become about (a) the problem
you’re working in your current project; (b) the frontiers of science that you
should work on, and may form your next project; (c) the current status of
science in general, approaches used, common difficulties etc. Thus, being
up-to-date will save you a lot of time that you may waste “reinventing the
wheel”:
§ working on
problems already solved
§ thinking about problems
that are solved/ will be solved very soon by others
§ writing your own
code for problems already solved by tools available off the
shelf.
o Having a good
grasp of your field is essential for writing fellowship requests, paper
introductions, grant proposals etc. (as referring to up-to-date studies is very
important)
o The more
you read, the better the flow of your projects becomes, the easier it is
going to be to write your own paper(s). The ability to initiate and
carry out a proper compbio project (choosing a problem, solving it, doing
controls, finding new biology) will
improve dramatically if you read papers about projects with similar structure
(even if the problem they solve is entirely different).
o Reading
many scientific papers improves your critical judgment – you will become
much more critical about both the work of others and about your own work (which
is always a good thing).
o (If you
follow Science and Nature closely and read things outside your
field). You will read about plenty of interesting and sometimes exotic stuff
about neuroscience/education/behavior, that do not make it to regular
newspapers (monkeys count better than graduate students, people holding hot
coffee cups are more likely to be nice to etc.)
When should
I get updated / read?
The short answer is all the time. Doing a routine
update, e.g., once you finish a project, or once a month, is something that
sounds appealing, but is impossible in practice. You will probably never get to
it (there is always something that seems more urgent than spending two weeks
just reading papers), and if you do find the time, you will encounter too many
papers at once, and miss most of the really useful material, nothing will “sink
in” and you will eventually miss out on most of the Pros described above.
Therefore, it is best to screen for interesting stuff and read what passes your
‘interesting’ filter all the time (i.e., at least once a week). If you’re
really flooded with work, still mark/print the paper if it looks interesting,
and get back to it when you have time.
Where to get
updated?
There are three main places you can get updates from:
· PubMed – where you can get updates
both based on specific keywords, by journal or by a specific author (e.g., the
head of a lab that you know is doing stuff similar to yours).
· Journal publisher’s websites – usually these get updated both when a new
journal issue is out and when new “advance online” papers become available (it
sometimes takes months before these find their way into an issue/print)
· ISI Web of Science – a commercial site that TAU has a license to use - http://
www.isiknowledge.com/ .
· Science news websites, such as GenomeWeb, and news@nature.
·
Podcasts/Webcasts on science are
available from several journals and can be an interesting thing to listen to
while commuting (not driving!), jogging etc. Cell, Science, Nature and Science
Times.
I will focus below on how to get
updates from the first two sources, but the methods (i.e., RSS) apply to the
third as well.
How to get
updated?
There are three simple methods for regularly receiving
updates about papers being published. The best is probably to use a mixture of
these methods, as each has different strengths, and there are some
journal-specific aspects (see table below). The three technical ways of getting
updates are:
1. RSS – the best
(in my opinion) option to get updates from most journals and from PubMed. Background on RSS is found here. The general
idea is that you need to use an RSS reader, which can be either a standalone
software (most e-mail software support also RSS reading) or web-based such as
Google Reader (http://www.google.com/reader).
You then subscribe to feeds, that are usually marked on the web pages by
these icons:
or
.
A feed is basically an XML file that contains some info. The RSS reader
regularly checks this XML file and if something changes there – updates the
info in your reader. Thus, by hourly/daily/weekly browsing through your unread items
you can efficiently get all the updates from a large number of websites. You
can also mark items that interest you, share them with other people (if you
both use Google reader), easily e-mail an item etc. Marking items that interest
you is particularly useful if you want to mark an interesting paper to read it
later (e.g., because the printer doesn’t work, because the university forgot to
renew a journal or because a formatted PDF of the paper is not yet available).
Importantly, as the usage of RSS is very widespread on the web, you can use RSS
to follow other interesting stuff: news (e.g., the latest culture updates from
CNN or Ynet), blogs, Facebook,
the stock market, billboards etc.
2. E-mail
updates –also can be used to get updates about specific keywords
(from PubMed) or from specific journals (Table of
Contents, or TOC alerts). The e-mail updates have some cons compared to RSS:
a. Your Inbox
gets cluttered
b. Each e-mail
will usually contain 10s of papers, out of which only one is really of interest
for you. If you want to mark it (e.g., to read later), share it or e-mail it to
someone else, it is more difficult to do so.
c. In most
journals, it is possible only to get e-mail updates about the regular issues,
and not about “advance online” papers.
3. Visiting
the publisher website once a week/2 weeks/month – this option is difficult
to stick to and is not recommended.
Using these three options are there two main methods for
getting updates:
1. Get updates
about every new paper coming out in journal X: This is the
best option for the top journals or journals specifically in your main topic of
interest. Even if the paper is not directly what you’re looking for, it is
possible that it is relevant or can ignite your imagination. This can be done
by either adding an RSS feed from the journal (listed below, or from the web)
or signing up for an e-mail alerts on the journal’s homepage (all leading
journals have this option these days).
2. Get updates
about every paper about subject Y: This is a good option if you want
to be sure you read everything about a specific topic (microRNA
function, motif finding, protein interaction networks, ChIP-seq,
metagenomics etc.). This is recommended, as some
papers relevant to you may appear in good biomedical journals that are too
tedious to follow, as they very rarely publish compbio-relevant stuff (Cancer
Cell, Cancer Research, Blood, Immunity, NEJM, JAMA, Genetics etc.) The way to
do this is by performing a search for your keyword in PubMed, then
selecting “Send to->” RSS Feed or E-mail. This will result in
a daily digest that you will get as soon as some papers with this keyword are
added to the PubMed index.

Put effort into your query! Otherwise
you will get a lot of irrelevant papers. Take into account that no matter how
hard you try, if you search by a common keyword (e.g., microRNA
function), >50% of the papers will be coming from very (very) small journals
and will probably be irrelevant. On the other hand, getting such updates will
usually make sure you’re not missing any publication relevant to you. Samples
queries are “Ideker T[au]”, “microRNA AND (function
OR evolution OR expression)” (selecting also Limits:English langage”) or “ChIP-seq”.
3. Get updates
about every paper that cites paper Z: There are a number of ways to do
this. One is to find the paper in the ISI
Web of Knowledge, and then create a ‘Citation alert’, which can be directed
either to your e-mail or RSS.
What journals
should I follow?
This is the most complex part to answer. For starters
you should follow Science/Nature/ PLoS Biology & PLoS Computational Biology (see table for details). Once
you have handled that for 1-2months, you’re ready to expand. On the one hand
you should follow journals that publish things that interest you, but on the
other hand, reading some other journals will expand you fields of interest. The
answer is probably that you should follow each of Cell/Science/Nature/PLoS Biology, as these are the top biology journals, which
publish almost entirely excellent science. You should try and read 1-2 papers
from each of these every month or so (most will not be in your direct field,
just pick ones that you think could be interesting). If you’re into
genomics/systems biology, you should also follow the “top layer of
genetics/genomics” listed below. Once you see you can handle this volume – add
to your RSSs/E-mails the other journals of
genomics/genetics listed below. In addition, you should follow the three
leading compbio journals (listed below), and, once your screening ability and
critical judgment improve, you can also scan through BMC Bioinformatics and PLoS One.
|
Section |
Journal (hyperlinked) |
Current issue feed |
New articles feed |
Volume |
Notes |
|
Science
/ Biology in general: |
|
Bi-weekly. Few papers but all describe landmark studies. |
Resources'
articles are particularly compbio-useful, as they usually describe huge
datasets |
||
|
Weekly. Only 3-4 papers per week are molecular biology papers, only
about 0-1 per month are compbio. |
E-mail table
of contents are recommended for Nature & Science, since the News and
Correspondence sections are also quite interesting sometimes |
||||
|
Weekly. 5-6 papers on molecular biology |
|||||
|
|
A bi-weekly digest of papers is sent. 5-6 weekly papers on molecular
biology. 0-1 per month are compbio. |
|
|||
|
Top
layer of Genetics / Genomics |
Monthly, about 20-30% of the papers are relevant, unless you're
interested in association studies |
|
|||
|
Monthly, only 2-3 papers are compbio relevant |
|
||||
|
|
Bi-weekly |
'Resources' papers
are particularly useful |
|||
|
Monthly, most is compbio relevant |
|
||||
|
Weekly, 4-5 papers per week are compbio relevant |
PNAS
contains different articles from very different fields of science, many of
which are usually irrelevant to us (e.g., Microbiology etc.). Therefore, it
is better to subscribe to the e-mail alerts, in which the table of contents
is broken into sections (and then just read the parts of the table of
contents that interest you). |
||||
|
|
Bi-weekly, 5-6 articles, mostly compbio-relevant |
|
|||
|
Reviews |
|
Monthly |
|
||
|
|
Monthly |
|
|||
|
Genetic
& Genomics - optional |
|
A weekly digest of papers, few compbio-relevant |
Mostly
papers on genetics, but some are of more broad interest |
||
|
|
Monthly |
Some papers
are 'review-like', while other describe small 'peculiarities' found, that are
usually difficult to explain mechanistically, and sometimes are very
interesting |
|||
|
|
Very rare |
The top
layer of the BMC journals. Very few papers, but frequently very interesting
ones |
|||
|
|
Not very systematic, you get updates occasionally. Many compbio paper,
that are usually much better than the ones in BMC Bioinformatics |
BMC journals
publish papers first in an unformatted way- 50+ pages PDFs, that are quite
unreadable and not environmentally friendly. It is usually better to wait 2-3
weeks before printing/reading the paper |
|||
|
|
|||||
|
Monthly |
|
||||
|
|
Monthly |
Publishes a
very large number of papers (40-50). Usually ~5 are compbio-relevant,
particularly if you're into sequence motifs or RNA |
|||
|
Core
of CompBio journals |
|
A weekly digest of papers |
|
||
|
|
Monthly |
The e-mail
table of contents alerts are broken into sections, making them easier to read |
|||
|
|
Monthly |
More computationally-oriented
than the others |
|||
|
Additional
optional CompBio journals |
|
Not very systematic, you get updates almost daily |
Very large
volume, frequently very low quality, but occasional gems |
||
|
|
A weekly digest |
Similar to
BMC bioinformatics, but even fewer gems |
|||
|
IEEE/ACM
Transactions on Computational Biology and Bioinformatics (TCBB) |
|
4 issues a year |
Computational
orientation |
||
|
|
|
About 3 issues a year |
Computational
orientation |
||
|
|
1-3 per month |
Reviews |
|||
|
Misc.
journals |
|
Not very systematic, you get updates occasionally |
More focused
on the biophysical side of systems biology |
||
|
|
|
Top resource
on stem cell biology |
Lots of RSS feeds are available here: http://barf.jcowboy.org/
The RSS feeds on all Nature journals: http://www.nature.com/webfeeds/index.html
What should I avoid?
A very common scenario is that one decides to start following
journals, subscribes to too many RSS feeds/e-mail alerts, gets flooded with too
many updates to follow, and abandons following literature altogether. On
the other hand, if you subscribe to just 1-2 journals and use RSS, you will
rarely get any updates, which will probably cause you to forget/stop operating
the RSS reader. The best strategy is therefore to subscribe to a PubMed feed(s) with specific keyword relevant to your
research (resulting in about 10-20 updates per week) + 5-10 top journals (e.g.,
Cell/Science/Nature/Nature Genetics/Genome Research/PLoS
Biology/PLoS Computational Biology/Bioinformatics).
If you see that you can handle the inflow of updates and read some papers for
about 2 months, and still have appetite for some more science – gradually
increase the number of keywords/journals you follow.
Questions? ulitskyi@tau.ac.il