Exploration of Data Analytics Tools for Library Data

Jim Hahn, Jen-chien Yu, Megean Osuchowski

 

  • The project idea

 

The demand for data analytics in the Library has increased in order to support the data-driven decision-making practices within our organization. Library staff and faculty require a versatile, compelling, and user-friendly tool in order to meet this demand. Reporting tasks are increasingly complex due to volume of data, disparity of data sources, and the high bar that exists for data driven trend analysis and visualization software.

 

In consultation with Library IT leadership, we have developed a phased plan to inform the development of a Library data analytics dashboard. We seek funding to undertake phase 1 — which will be to review library literature on data warehousing and decision support systems in libraries, investigate what peer institutions are doing in this area, gather requirements, develop use cases and for visualizations tools that Library decision-makers need

 

Voyager generates millions of records about Library collections and usage each year. Interpreting and analyzing Voyager data is not an easy task therefore colleagues in Library IT developed Bean Counter [http://www.Library.illinois.edu/it/helpdesk/service/bean.html] several years ago. However, past Library IT leadership has decided to “freeze” the development of Bean Counter due to resource constraints despite the fact that Bean Counter continues to be one of the most highly used home-grown applications.  Current data analysis for Voyager in the Library is supported with Bean Counter, but there are many additional queries that are required and are not currently being developed or supported in the Library.

 

Usage data about virtual Library services is also in huge demand. For example, virtual reference transactions recorded by iWonder has a wealth of information about users’ information seeking behavior. Many librarians have requested access to the iWonder data for research or service planning. Real-time web analytics data (top keywords, referring search engines, top page URLs, etc.) is another type of usage data that many units require; however, there is no Library-wide support for these analytics.

 

The Prototyping Service has developed a proof of concept for a prototype dashboard of collection trends. A development database for caching data has been developed to power trend visualization and has increased the speed of the experimental dashboard prototype.  This screenshot of the collection trend dashboard is an example of the enhanced presentation and analysis that can be provided through visualization tools to better understand and analyze Library data.

 

 

 

Image 1 – New analytics data provides trends by Library unit of subjects that are popular (by circulation) each month.

 

 

  • Overall Objectives: what problem(s) will it solve

 

The Library has several needs for data visualization, data analysis and reporting. These include:

 

  • Understanding users’ information seeking behavior and information discovery patterns
  • Assessing the usage of the Library Gateway, unit web pages and other web-based services
  • Streamlined reporting and data intelligence for the Library
  • Reduce staff time and resources so assessment and IT staff can utilize expertise in other areas
  • Understanding collection use and circulation locations
    • Providing Library-wide capability to react rapidly for acquiring collections
    • Providing a basis for Library-wide adaptation into newly emerging topics
    • Providing Library-wide and unit specific understanding of topical and collection trends
    • With budget cuts looming and possibly continuing for the next few years, reliable reporting features and options for visualization will help the Library assert its importance in university activity, and better understand how to allocate limited resources based on need
    • New reports in Bean Counter are no longer supported, and it offers no visualization support

 

  • Phase 1 Objectives:
    • Review literature on data warehousing and decision support systems in Libraries
    • Investigate what peer institutions are doing in this area
    • Gather requirements for data visualization
    • Create use cases of data dashboards
    • Demo and test data warehousing, analytics, and visualization tools with sample data sets
    • Survey potential users in the library
    • Using the information gathered, create a roadmap for continuation of the project and future steps

 

  • How it fits with existing activities in the Library

 

After discussing our proposal with the Interim Director for Library IT, we learned that Library IT has been looking at reporting, data warehousing, and analytics options for the Library. With several disparate points of data coming from multiple vended solutions, an enterprise level tool is required to that end, it may be possible to incorporate library data into the AITS data warehouse and Web intelligence systems. However, currently there is no ongoing project dedicated to data analytics within the Library IT.

 

The Coordinator of Library Assessment (with a 33% FTE graduate assistant) has been supporting most of the data analytics work in the Library. This project, if funded, will significantly improve the turn-around time and the quality of data that the Assessment provides. It will also guide the Library assessment work into a new stage, where we not only supply the data that people request but also create data analytics that help people think beyond the numbers. Within the Technology Prototyping Service, there have been several experiments of how best to analyze and display data from Library systems. Given the team’s familiarity with the Voyager reporting server, caching employed to generate quick views of data, and the ability to architect extensible systems Library-wide, we are in a strong position to support connecting library data to sample tools, like Tableau.

 

CARLI provides a Reports server which I-Share libraries are able to run queries through Oracle/ODBC connection. The reporting database is not visual and does not have features for data trend analysis – it is simply a reporting tool – most library users consume CARLI reports in the form of excel sheets generated by queries to this database. The queries are neither mined nor visualized in current practice. The current version of Oracle that CARLI utilizes is not compatible with Oracle Dataminer (http://www.oracle.com/technetwork/database/options/odm/dataminerworkflow-168677.html), so that commercially derived text mining and web analytics is not possible in the currently available infrastructure.

 

  • Resources needed

 

Technologies

 

Software. 1 Tableau Desktop licenses ($1,475 per license) for the project team to prototype analytics and data visualization based on identified user needs.

 

Hourly Employees

 

We are requesting Academic Hourly funding for user research and mockup development over the next year of $8,938.

 

Total cost: $10,413.00

 

  • Sustainability

 

By exploring what peer institutions have done in this area we can avoid potential pitfalls early in the design phase and focus on best practices.

 

One possible option for data reporting and visualization is the possibility of warehousing library reports data into the AITS data warehouse. Such an approach would free the library from having to develop underlying infrastructure to power data visualizations and analytics; however this is only one of several options that will be explored for analytics and visualization; which may include recommending development of a custom solution in the Library, implementing an open source tool, or the commercial solution over the AITS data warehouse, or a combination of these approaches as we meet identified user needs and use cases.

 

  • Timeline (All Phases)

 

Phase 1

 

October 2015 – March 2016

 

Gather requirements, use cases and results from testing.

–       1 month: Literature reviews and peer institution investigations

–       1 month: Gather user use cases and requirements

–       3 months: Demo and test tools with sample data sets

–       1 month: Survey potential users and create roadmap for continuation of project

 

 

  • How to measure benefits of the project

 

Analytics and visualizations assist with data-driven decision making that contribute to effective strategy and optimization in organizations. The Dashboard allows professionals to make informed decisions rather than gut decisions. For example:

1) The Library can utilize Google Analytics data to find out which features to keep and which features to abandon on the main Library website.

2) When making a big purchase, analytics are used to choose how to get maximum impact for expenditures. Trending data might help the Library when making purchases, both in what to purchase and in justification of these purchases.

 

3) Analytics are used to monitor behavior. In industry, many companies collect statistics to see what makes a customer stick with their product vs leave (aka “turn analysis”). This can help for shaping services that would be most useful for students in the Library.

 

4) Getting better mileage out of the data the Library generates. Oftentimes there are significant amount of data that accumulate which go unused. Providing a way to access that data in an easy manner opens up many new possibilities and ideas (e.g. recommendations, personalization, training, collection development, resource guide development, system design)

 

 

The specific metrics that can be used to measure benefits for the project are:

 

–       Number of people who access the dashboard (weblogs)

–       Number of mentions in print or media (in Library statistical reports, Library web pages)

–       User feedback results (can be collected at open houses, faculty meetings or professional events)

–       Number of request for datasets to be included in the dashboard

 

How to determine whether the project has succeeded or failed

 

Success in the information gathering of use cases, reviews, and testing will result in a roadmap detailing the anticipated timeline, costs, and future steps in deciding and implementing a library tool for data analysis and visualizations.  A detailed roadmap will determine success of the project.