Chapter 1: Introduction

Chapter 1: Introduction

Electronic biological literature

The vast proliferation of the biological literature has made the computer an indispensable part of any biologist's toolkit. Abstracts and indexes have been computerized since the early 1970s and were originally searched by trained intermediaries such as librarians and information specialists (see the introduction to Chapter 4 for more information). Beginning in the mid-1980s, these tools have been available for end users, and their availability has only expanded in the years since then. The next wave of computerization improved access to journals, and electronic books trailed behind their periodical siblings by a couple of decades but are beginning to come into their own.

Electronic journals have become commonplace today; only 15 years ago they were a novelty, eliciting a great deal of discussion concerning utility, availability, cost, archival storage, ownership, intellectual property rights, peer review, and copyright compliance. These controversial issues are still relevant but the electronic age is proceeding apace. All of the major commercial publishers and society publishers currently make their journals available electronically, and several initiatives in the biological sciences assisted smaller society publishers to move to full text. Stanford University Library has been offering electronic publishing assistance since 1995 through its HighWire Press (http://www.highwire.org). They currently provide access over 1,500 journals published by societies and university presses. A more recent initiative made by the Scholarly Publishing and Academic Resources Coalition (SPARC), called BioOne (http://www.bioone.org), created a single database containing articles from journals published by member societies from the American Institute of Biological Sciences (AIBS). It was launched in April 2001 with about 30 journals participating, and is now up to more than 175 titles in 3 collections.

Most of the e-journals presently available are electronic versions of existing print journals. In the heady early days of electronic publishing a number of new paradigms were envisioned, including abolishing or drastically modifying the present system of peer review. Preprint archives such as the physics service arXiv (http://arxiv.org/‎) were seen as a way of providing speedy access to research. Partly due to the concerns about peer review mentioned earlier, the beginning of the electronic journal age in the life sciences got off to a much more sedate start than in physics. The earliest e-journals in the mid to late 1990s were simply electronic versions of standard print journals, usually in bundled subscriptions with both print and online versions for one price. New journals published in electronic format only, without a print equivalent but after undergoing the usual peer review process, were launched but faced numerous hurdles. The first well-funded online journal in the life sciences was the Online Journal of Current Clinical Trials (OJCCT), which was founded in July 1992. Despite extensive efforts, it was initially difficult to find authors willing to publish in the journal, even after 1994 when it gained an official stamp of approval by becoming the first online journal indexed in Index Medicus. It ceased publication in 1996, but it was only the first of many online-only journals. Now in the mid-2010s, many standard journals have ceased publication in print and are only available online and many newly created journals are only available online.

The issue of how to archive electronic journal backfiles is one that publishers and librarians have wrestled with extensively. We know how to preserve copies of print books and journals: publishers print them on acid-free paper and librarians place copies of them in climate-controlled facilities in multiple locations around the world. The situation is profoundly different in the case of electronic journals. Initially, publishers kept control of the electronic backfiles for their journals, causing librarians and users to be concerned about what would happen when keeping the files was no longer economically advantageous to the publishers, or if they went out of business. Also, given all the changes in electronic media (magnetic tapes to thumb drives) and standard software programs (remember WordPerfect?), there are concerns about migrating all the masses of data from one standard platform to another. Librarians and publishers have worked on this issue, so robust and redundant standards and systems are in place that should allow seamless updates and transfers in the future. These systems have yet to be significantly tested in real life, but having them in existence is a relief to everyone involved. Portico (http://www.portico.org/), LOCKSS (Lots Of Copies Keeps Stuff Safe, http://www.lockss.org/), and CLOCKSS (Controlled LOCKSS, http://www.clockss.org) are among the most important initiatives.

Another series of initiatives that have revolutionized the biological literature focuses on Open Access (OA). The genesis of the OA movement goes back to the beginning of the Internet age, with its mantra that “Information wants to be free.” In addition, the subscription price crisis in the 1980s and 1990s encouraged authors and librarians to explore alternatives to the traditional modes of publishing. The final outcome of the tension between traditional publishing and OA is yet to be determined, but among other things governmental policies such as the NIH Public Access Policy discussed below guarantee that there will be a place for OA in the future of scientific communication.

There are many “flavors” of Open Access, but the basic definition provided by advocate Peter Suber (2013) is that “Open-access (OA) literature is digital, online, free of charge, and free of most copyright and licensing restrictions.” The two types of OA found in the life sciences include Author Pays (also known as “gold” OA) and Open Access repositories (“green” OA). The Author Pays model has attracted most of the attention, both positive and negative. In this model, authors of scientific publications pay a fee to the journal publisher to make their articles available for free immediately upon publication. Some journals are completely OA, such as the PLoS journals, but there are many other mixed journals in which some authors choose to pay for Open Access and some do not. These OA articles receive the same peer review that other articles receive and many grants provide funds for OA publications. This model is rather similar to the system found in many society publications, in which authors are expected to pay page charges. This helps keep subscription prices low. Two major resources for OA information are the Directory of Open Access Journals (DOAJ), which lists OA journals (http://www.doaj.org/‎), and the SHERPA/Romeo Web site (http://www.sherpa.ac.uk/romeo/), which lists publisher OA policies.

The biological and medical sciences have been in the vanguard of the Open Access movement. More than half of the substantial journals listed in the DOAJ are biomedical; biomedical journals also publish more articles than the journals in other fields and charge higher author fees. The three largest OA publishers, PLoS, Biomed Central, and Oxford University Press, all publish in the biomedical field (Walters and Linville, 2011). More recently, in August 2013 a series of reports commissioned for the European Commission’s Directorate-General for Research and Innovation showed that 40% of articles published worldwide from 2004-2011 were available as Open Access (Archambault et al, 2013). Biology had achieved 57% OA, while 61% of biomedical research articles were available as OA. The NIH and NSF Public Access policies discussed below have also had a significant effect on the OA cause in the biomedical fields.

Some of the early concern with the Author Pays model focused on the possibility of abuse. While the major OA publishers such as PLoS and BioMed Central have strict peer review processes that are equal to those in the best standard journals, the fear has always been that unscrupulous publishers would see this model as a cash cow, publishing any kind of dreck as long as authors were willing to cough up the money. While the vast majority of OA publishers are focused more on the ideal of making information available for free and breaking even rather than making money, some unscrupulous publishers have been identified. Authors looking for OA journals to publish in should ask many of the same questions that they would for any journal. Who publishes the journal? Are the author fees in line with other OA journals in your field? Have you or your mentors ever heard of it? Is the journal indexed in any of the major indexes? Have authors you know and respect published in it, or are on the editorial board? A good resource to check is Jeffrey Beall’s List of Predatory Publishers (http://scholarlyoa.com/publishers/).

The OA Repository or Green OA model is different from the Author Pays model. In this model, after publishing in a journal authors make their articles available for free in some kind of repository, such as the author’s personal Web site or an institutional or disciplinary repository. In some cases publishers only allow copyedited word processor versions of the final article rather than a PDF of the actual article to be available in a repository. Many journals now follow this Green model by making all of the content of their journals available for free after an embargo period, which is usually between 6 months to a year or even longer. The assumption in this model is that most of the use of articles comes in a fairly short period following their publication so publishers are not risking the loss of subscriptions by making older content available for free. This is the model followed by the NIH and NSF policies discussed below.

Like other initiatives that seek to change the publication model for scientists, OA has been controversial from the start. The history of PubMed Central illustrates this. The original PubMed Central proposal, first publicized in March 1999, was for a single all-inclusive database containing all biomedical research papers from traditional journals as well as preprints, all available for free. Both parts of the proposal proved to be controversial, and when the PubMed Central project went online in February of 2000, it was with a far more limited scope than originally planned. It consisted of only a few journals and the preprint server idea was completely dropped. As of February 2001, only about 10 journals were available on PubMed Central but by 2013 over a thousand journals were included. Despite the early setbacks OA supporters soldiered on, and in April 2008 the Public Access Policy was implemented. It required that all research funded by NIH grants must be made publicly available within 12 months of publication, and in February 2013 a similar policy was passed affecting the NSF and all the other Federal agencies that spend over $100 million per year supporting research. Authors must either publish in journals following the green OA system or deposit their articles in PubMed Central or another repository. The US is not the only country interested in OA repositories. At the same time PubMed Central was proposed, the E-BioSci OA portal was implemented.

One of the potential benefits of the OA model that has been extensively touted by OA advocates is that greater availability of free articles would lead to increased visibility and use of those articles. Research studying citation rates for OA and non OA articles has been mixed, with most recent studies showing only a modest increase of citation rates for OA articles (Davis and Walters, 2011; Archambault et al, 2013). That includes comparisons of OA and non OA articles within the same mixed-model journals (Davis, 2009).

Non-European and small European countries have been quick to find value in the OA model as a mechanism to promote the research performed in their own countries. One good example is Brazil’s SciELO (http://www.scielo.org), a platform that publishes over 1,000 OA journals from several South and Central American countries. Waters and Linville (2011) found that 27% of the OA journals they studied published articles in languages other than English, and that the percentage of OA journals published outside of Europe and North America had increased from 10% in 2005 to 31% in 2009. While it isn’t a completely valid comparison, compare this to the 19% of non-European and North American journals indexed in BIOSIS Previews mentioned above.

Researchers in the biological sciences create massive amounts of data that must be accessible to be useful. The data may include ecological data from long-term studies, the holdings of museum collections, neuroscience images, or molecular or genetic sequences. Formerly, the data were published in articles or books and rarely updated but with the development of electronic journals and databases this material is far more accessible and easier to manipulate.  Molecular biology is a good example of a discipline that uses electronic publishing to share new data with a multi-disciplinary research community through electronic productions like GenBank, PDB (the Protein Data Bank), the Human Genome Project, and so on. What is unique about these databases is that data is accepted before being published in the journal literature, and in fact most journals require that sequences be added to GenBank prior to their publication in print.

As a result of the availability of all this data, techniques for finding and interconnecting data have become one of the fastest growth areas in biology and information science. Bioinformatics, the use of computer and information science to analyze biological data, has exploded in use. While the term is often used to refer just to the analysis of genomic or molecular biology information, all areas of biology that create large amounts of data have their own bioinformatics needs and practitioners. The Open Access movement has made some areas of bioinformatics such as text mining possible. Articles that are locked away behind a paywall are not available for text mining sweeps, but abstracts in PubMed and full text from OA journals are. The next step is to combine the journal literature with the huge molecular biology databases in new and interesting ways.

All of the above discussion of the electronic biological literature focuses on electronic journals and databases, a measure of their importance to the biological sciences. Electronic books have been slower in coming. The earliest e-books included encyclopedias, dictionaries, and textbooks. While some implementations of electronic textbooks have not been popular with students, their promise is obvious. More recently, monographs that mimic journals in that they consist of individual chapters acting like separate articles rather than a cohesive whole have been successful online, and most publishers produce electronic versions of their books. One complication is the multiplicity of incompatible e-book readers, although one way around this problem is to publish scientific books as PDF files of individual chapters. Libraries can subscribe to individual titles or large or small book packages, much like the infamous Big Deals that journal publishers offer.

There are relatively few OA books, but out of copyright books could be seen as the e-book equivalent of OA although the two issues are only tangentially related. At this point, according to US copyright law all books published before 1923 are out of copyright (also known as in the public domain) and can be used and republished as desired; books published between 1923 and 1989 may or may not be out of copyright and materials published after 1989 are almost certainly in copyright. Other countries’ copyright laws vary, so the issue is extremely complicated and can slow scientific advancement. Probably the most famous digitization project is Google’s Google Book project (http://books.google.com), which aims to digitize all the world’s literature, ran into many issues related to copyright but has made public domain books much more accessible. The Internet Archive at http://archive.org (also home to the Wayback Machine, which archives Web pages) is another digitization project, although it focuses on material in the public domain.

Of even greater interest to biologists, especially taxonomists, is the Biodiversity Heritage Library (BHL), which has the goal of digitizing all the biodiversity literature in the world. This project began in 2005 and was created by a coalition of major botanical gardens, natural history museums, and universities in the US and the UK. One major benefit of this project is to make the historical taxonomic literature (which can go back to Linnaeus’s publications) more widely available to local taxonomists who do not have easy access to the major American and European institutional libraries, which may hold the only copies of rare taxonomic works. The BHL records feed into the Encyclopedia of Life, a project aimed at producing a Web page for each of the approximately 1.8 million species of organisms (see Chapter 3).

While no one can read the future of the biological literature, it is safe to say that it will continue to grow apace, and that while peer review will continue, new formats merging the best of the print world and the new electronic world will emerge. Publishers, authors, and librarians will continue to wrestle with issues related to Open Access and journal prices.

 

Bibliography

Archambault, Eric, et al. (2013). Proportion of Open Access Peer-Reviewed Papers at the European and World Levels—2004-2011. http://www.science-metrix.com/pdf/SM_EC_OA_Availability_2004-2011.pdf.

Davis, P. M. (2009). Author-choice open access publishing in the biological and medical literature: A citation analysis. Journal of the American Society for Information Science and Technology 60(1):3-8.

Davis, P. M. and W. H. Walters. (2011). The impact of free access to the scientific literature: a review of recent research. Journal of the Medical Library Association 99(3): 208-217.

Suber, P. (2013). Open Access Overview. http://legacy.earlham.edu/~peters/fos/overview.htm.

Walters, W. H. and A. C. Linvill. (2011). Characteristics of Open Access journals in six subject areas. College and Research Libraries 72(4): 372-392.


Go To:

 


Contact the Biosciences Librarian:
Kelli Trei
(217) 244-2503
ktrei2@illinois.edu