Tuesdays, 1:00 p.m.- 3:00 p.m.
Please feel free to contact the librarian at any other time by phone or email.
The publication of the results of the Human Genome Project provide a good preview of what will likely be the future of the literature of Big Science. Two groups raced to be the first to create a draft, a public project titled the International Human Genome Sequencing Consortium (IHGSC) and a private company called Celera Genomics. While the results were published in mid-February 2001 in special issues of Science and Nature, the most traditional of formats, the sequence itself is nearly unusable in print format. It was made available at the Human Genome Project Working Draft Web site hosted by the University of California at Santa Cruz and individual gene sequences continue to be submitted to public databases such as GenBank, DDBJ, and EMBL (see the Molecular and Cellular Biology page for further discussion of these databases). The journal issues themselves were made freely available on the Web by both Nature and Science. The tensions between the public and private ventures were widely reported and are almost certainly a sign of a continuing trend as commercial interests and basic research converge.
The vast proliferation of the biological literature has made the computer an indispensable part of any biologist's toolkit. Abstracts and indexes have been computerized since the early 1970s and were originally searched by trained intermediaries such as librarians and information specialists. Beginning in the mid-1980s, CD-ROMs (Compact Disk-Read Only Memory) have widened the access of electronic databases beyond the expert searcher and have allowed the end-user (the person actually using the information) to perform his or her own searches. In addition, many libraries offer locally-mounted or Web-accessible databases for their patrons. These databases are available with many different search engines, some easier to use than others but all intended for untrained users.
The early wave of computerization made secondary tools such as abstracts and indexes more widely available and more easily used. The next wave, which is still in progress, is to improve access to the primary literature, particularly journal articles. Electronic journals have become commonplace today; only a few years ago they were a novelty, eliciting a great deal of discussion concerning utility, availability, cost, archival storage, ownership, intellectual property rights, peer review, and copyright compliance. These controversial issues are still relevant, but publishers are moving rapidly into the electronic age. All of the major commercial publishers and most of the major association publishers currently make their journals available electronically, and there are several initiatives in the biological sciences to aid smaller society publishers move to full text. Stanford University Library has been offering electronic publishing assistance since 1995 through its HighWire Press. They currently provide access to nearly 200 journals published by societies and university presses. A more recent initiative made by the Scholarly Publishing and Academic Resources Coalition (SPARC), called BioOne, will create a single database containing articles from journals published by member societies from the American Institute of Biological Sciences (AIBS). It will be launched in April 2001 with about 30 journals participating, with more to come.
Almost all of the e-journals presently available are electronic versions of existing print journals. In the heady early days of electronic publishing a number of new paradigms were envisioned, including abolishing or drastically modifying the present system of peer review. Preprint archives such as Paul Ginsparg's Los Alamos physics service were seen as a way of providing speedy access to research. In fact, the physics preprint server has become one of the primary communications channels among high-energy physicists since its inception in 1991. Its very success has lead to a need for quality control, so arrangements were made with several major physics journals to provide a peer-reviewed "stamp of approval" for some of the submissions. A proposal for a similar preprint archive for the life sciences, made as part of NIH's PubMed Central initiative, was scrapped after protests from life sciences researchers who were concerned about the lack of quality control.
In some electronic publishing models, readers would post comments about the articles so that an article would be rated according to consensus among the community of like-minded researchers rather than by a just a few gatekeepers. In less radical models, new journals would be published in electronic format only, without a print equivalent but after undergoing the usual peer review process. The acceptance of even these more traditional new journals has been slow, however. The first well-funded online journal in the life sciences was the Online Journal of Current Clinical Trials (OJCCT), which was founded in July 1992. Despite extensive efforts, it was initially difficult to find authors willing to publish in the journal, even after 1994, when it gained an official stamp of approval by becoming the first online journal indexed in Index Medicus.
One problem that a number of Web-savvy users have identified is the lack of links between citations in indexing databases such as Biological Abstracts to the actual full-text article. This should be a simple matter in the Web environment, and in fact standards exist to allow this cross-linking. The difficulty lies more in persuading publishers to cooperate with existing indexes rather than creating their own proprietary search engines. However, many databases have begun to link citations and full-text journals, though libraries may not have implemented these links due to licensing or other issues. PubMed, the free search engine for the Medline database, has offered direct links to full-text articles since 1999. Users can access full-text articles from journals for which their institution has subscriptions.
Another series of initiatives will create comprehensive full text databases such as the National Institute of Health's PubMed Central. This initiative is controversial and shows quite clearly how difficult it can be to make major changes in the realm of scientific publishing and dissemination. The original PubMed Central proposal, first publicized in March 1999, was for a single all-inclusive database containing biomedical research papers from traditional journals as well as preprints. The articles and preprints would be accessible at no cost to users. Both parts of the proposal proved to be controversial, and when the PubMed Central project went online in February of 2000, it was with a far more limited scope than originally planned. As of February 2001, only about 10 journals were available on PubMed Central. Traditional publishers who objected to aspects of the original proposal got together in November 1999 and came up with a plan of their own to create the Publisher's Reference Service, which cross-links references in full text articles to articles from different publishers.
Annother controversial plan, developed and advanced by scientists, was proposed in the fall of 2000. The Public Library of Science proposal built upon the PubMed Central initiative. Scientists were asked to sign an open letter that urged publishers to allow the content of their journals to be freely available in central archives, or what the letter called "an international online public library". The signers of the letter also pledged that they would not edit, publish in, or subscribe to, journals that did not follow this policy. Both parts of the proposal were controversial. A number of scientific societies as well as commercial science publishers came out against the proposal, but it succeeded in generating a great deal of discussion and may result in more open access to older journal literature. Whatever happens with the Public Library of Science proposal, this will certainly not be the last attempt to change scholarly communication as we know it.
Publishers, librarians, and scientists are uncertain about what to expect in the new world of electronic publishing. There are a number of models for electronic subscriptions, many of them not favoring libraries or users. A few years ago, the most common subscription format was to bundle the print and electronic subscriptions together at a single fee, with no choice about whether to subscribe to the electronic or print formats alone (the "free with print" model). Some publishers offer electronic subscriptions for an additional 20-30% of the print subscription fee, as long as the print subscription is continued. Others permit electronic-only subscriptions to their journals, usually at about 90% of the equivalent print subscription. Several publishers offer package deals, forcing libraries to subscribe to all of the publisher's electronic journals. Another trend is to allow only site licensing, where the cost of the subscription depends on the number of potential users. A growing number of journals offer free access to older articles, usually after one year. HighWire Press's site currently offers the largest number of free back issues. All of these subscription models may or may not include a choice of HTML or PDF formats. In some cases, institutions can not access PDF files (which are scanned images of journal pages), though individual subscribers can. Or perhaps the PDF files are only available for older articles, usually after one year. The variety of subscription models is nearly endless.
When computer networks burst upon the scene in the early 1990s, they profoundly changed the way in which researchers communicated. In many ways, the Internet is used as an expansion of the eternal invisible college. It makes networking and brainstorming with colleagues from distant areas a daily occurrence, rather than something that only happens at conferences or symposia. In addition to using email, many researchers subscribe to one or more discussion groups such as USENET groups, listservs, or bulletin boards. These discussion groups make it convenient and easy to communicate with other experts and to tap into vast amounts of data, whether it is Arabidopsis genetics or molecular sequences.
While there is still a great deal of uncontrolled or unofficially-sanctioned data available through the World Wide Web, the Web also houses an enormous data treasure trove of use to biologists. There are encyclopedias, dictionaries, nomenclatures, herbaria listings, museum holdings, historically important materials, and the roster goes on and on. Many of the authoritative resources developed by traditional publishers are only available by subscription, though limited sections of the resource may be freely accessible as a draw to the site. Of the free information, some of what is available is non-copyrighted, unreviewed information posted by volunteers or is rather narrow in scope. However, much of what is left is extremely valuable. Scientific associations and governmental agencies often have excellent Web sites that are full of data, for instance.
Researchers in the biological sciences create massive amounts of data that must be accessible to be useful. The data may include ecological data from long-term studies, the holdings of museum collections, or molecular or genetic sequences. Formerly, the data were published in articles or books and rarely updated, but with the development of the Internet, this material is far more accessible and easier to manipulate. Molecular biology is a good example of a discipline that uses electronic publishing to share new data with a multi-disciplinary research community through electronic productions like GenBank, PDB (the Protein Data Bank), the Human Genome Project, and the like. What is unique about these databases is that data is accepted before being published in the journal literature, and in fact most journals require that sequences be added to GenBank prior to their publication in print.
The sudden appearance and exponential growth of the huge and complex Web has made it a victim of its own success in some ways. Information on the Web is linked in complex and arbitrary ways and its accessibility is often severely compromised by erratic indexing. According to Lawrence and Giles, writing in Nature, there is no single search engine that indexes more than sixteen percent of the Web, and only six percent of Web servers have scientific or educational contents. All servers are significantly biased toward indexing the popular and well-linked pages, while new unlinked pages can take up to six months before appearing on search engine listings. Obviously, this situation can delay or prevent widespread usability of new high quality information. The difficulty in keeping up with the ever-changing electronic world has also created a level of anxiety among users. A survey done by the Higher Education Research Institute at the University of California, Los Angeles found that keeping up with information technology beat out teaching loads and publication pressures as a source of stress. Keeping up with changing URLs is a problem that even search engines have not been able to completely solve. In recognition of the problem, all of the URLs for sites mentioned in this book will be collected together in a Web site and kept up-to-date for an extended period of time.
While the basic strategies and tactics of scientific research remain the same, electronic resources and the World Wide Web are changing the medium in which biological information is exchanged and distributed. Internet-accessible sources are certainly enhancing and expanding, if not completely replacing, the authoritative print resources annotated in this book. As cyberspace becomes more transparent to the casual user and as authoritative, older materials are electronically converted to the Internet, its dependability and usefulness will increase. For now the Internet acts as a complement and/or a supplement: most people agree that printed materials have a higher comfort level and are handier to consult than the computer screen for most purposes. On the other hand, who's to say what the future will hold?