University of Illinois Library at Urbana Champaign Library Gateway divider line divider
divider line divider
divider line divider
divider line divider
divider line divider
divider line divider

« Learn of New Literature Based on Its Taxa | Main | Yale Drops It's Pre-Pay Membership to BioMed Central »

August 1, 2007

BioText: Search for Text within the Captions of Journal Articles

Below is a posting from the BioMed Central blog announcing that the BioText search engine is available! Many of you will recall that several months ago it's developer, Marti Hearst gave a presentation at UI, to the Bioinformatics Group, about Biotext.

BioText, in it's current rendition allows one to perform text searches within the captions of figures (as well as the abstracts) in ~150 journals housed in BioMed Central.

Here's a link to the search engine:

Here's a listing of the journals you'll be searching (includes BMC Bioinformatics!) (click on the "Collection" tab): "The current collection consists of more than 150 journals, 20,000 articles, and 80,000 figures."

Research report (Bioinformatics):

Some searching tips...
If you search for several words, it does an OR search. That is, it's not like Google, which does an AND search!
To force it to search on several words in a Google-like mode, put a "+" in front of the word, or enclose a phrase in quotations. To search for word stems, put an asterisk after the word stem.

Examples of legitimate searches would be....
(Searching over "captions (list view)", with number of "hits"...)
bee 30
"honey bee" 5
bee bees 44
"Apis mellifera" 35
microarray genom* 4973
+microarray +genom* 107
+microarra* +genom* 128


As seen on the Open Access blog, an excerpt from the BioMed Central Blog:
Matt Hodgkinson, BioText - a search engine for open access figures, BioMed Central blog, July 31, 2007. Excerpt:

At the ISMB conference we met Anna Divoli, a postdoc at the University of California, Berkeley, who showed us the BioText Search Engine, which she was presenting as a poster, and has recently published....

I came across it briefly earlier this month thanks to the blog of medical librarian David Rothman, who described it as "A supercool way to search PubMed Central", which is a pretty good description!

It is part of the text mining BioText project and goes beyond the abstract searching in MEDLINE seen previously to extend searching to the figure legends of Open Access journals in PubMed Central.

As the homepage of PubMed Central notes, "All the articles in PMC are free (sometimes on a delayed basis). Some journals go beyond free, to Open Access". Because Open Access explicitly allows the reuse of the content of the articles in these journals (which include all 170+ BioMed Central journals) this has allowed the BioText people to create a search engine that allows keyword searching of abstracts, figure legends, titles and authors, returning results sorted by date and relevance, and in two formats: abstracts with figure thumbnails and legends, or figure legends with thumbnails....

Anna hinted at upcoming functions such as returning snippets that match the search terms from the full text of the article (much as Google Scholar does). We look forward to these further developments, and we'd like to thank Anna, Marti Hearst and the others on the BioText team for developing such a useful and user friendly tool. This is a great example of how Open Access allows others to make further use of published work, in ways that the authors or publishers had not anticipated.

Posted by Katie Newman at August 1, 2007 1:49 PM