Hahn Innovation Fund Report May 2015

Development of a Topic Keyword Generator


Innovation Fund Report

May 2015

Jim Hahn and Susan Avery

Recognizing the large, and growing, international student population on this campus we sought to create a tool that would assist them in better understanding aspects of the research process. Thinking of synonyms and alternatives to keywords is a difficulty that has been noted by those teaching ESL students and those assisting them at services points in the library. Specifically we wanted to create a tool that would help non-native English speakers brainstorm additional terminology to use in the completion of Concept Maps prior to searching of library databases for articles to support their research.  The Concept Map worksheet instructs students to divide their question into keywords to include in their searching. This informed the interface design in order to parallel the instructional approach used in Library instruction for international students.


Our project work investigating several data sets with web-based APIs. These data sources included:


  • Freebase (https://developers.google.com/freebase/ ): excellent source for defining terms, but didn’t match up well when those terms were incorporated as synonyms for student search topics. A good amount of the fall semester development work went into pulling out definition text from Freebase and using this as the basis for generating more relevant words. In the course of evaluating Freebase, we saw other promising APIs to test that could achieve the project goals.


  • Thesaurus API:  offered improved synonyms to Freebase, but did not consistently provide relevant word matches. Though improved in its source material over Freebase concepts, we didn’t see this as viable for the assignment due to inconsistent keyword generation.


  • WikiSynonyms (http://wikisynonyms.ipeirotis.com/page/api ): was by far one of the most consistently relevant synonym matcher. Our work for this summer in completing the UIUC Topic Keyword Generator include data clean up, since the matches are relevant but not always spelled correctly or have extraneous characters – due to the Wikipedia data source.




Since none of the data source APIs were hosted on the same domain as Library servers, programmers extended a middleware solution that could pull in the remote suggestions and style them into our search interface.  Developers also wrote the JavaScript code to pull in the API response data into the required three-column layout view. The HTML view of the data (http://minrva-dev.library.illinois.edu/freebase/) re-used CSS and Library Header for stylistic consistency with the rest of the Library.


Given a greater difficulty than expected in identifying an API that provided appropriate results for ESL student research projects, it was not possible to test this in the course context this spring. Testing with ESL students will be done in summer 2015 and it is anticipated the tool will be shared on the ESL Undergraduate Students LibGuide and introduced in classes in the fall.