Competency E

>

Design, query, and evaluate information retrieval systems.

Introduction

In a broad sense, information retrieval is the finding of documents via unstructured or semi-structured data in order to resolve an information need (Manning et al., 2008). The use of information retrieval systems is abundant and most frequently is in the form of web search. In the field of library and information science, information retrieval usually refers to the design, querying, and evaluation of integrated library systems, online public access catalogs (OPACs), as well as library databases that store and organize collections of information resources.

Design

Information retrieval system design aims to maximize the capacity to distinguish between pertinent and irrelevant documents (Weeman, 2018). A well-designed retrieval system needs to effectively store, represent, and retrieve information so that materials are findable (Tucker, 2024). In order for materials to be findable, besides being efficiently stored, the front end of the retrieval system needs to be designed with end-users in mind. This includes intuitive interfaces, clear navigation, and robust search functionalities that cater to diverse user needs.

In an information retrieval system, every document is processed to produce an index that may be searched to find information (Chowdhury, 2011). Yet these intricate relationships between documents, rather than the language itself, are the main source of uncertainty in the design of information retrieval systems. Designers must overcome these relationships' definition and organization in order to successfully satisfy user needs while abiding by a number of restrictions (Weedman, 2017). Language tends to be ambiguous, with words and phrases taking on dual or specific contextual meanings. Therefore, disambiguation, a fundamental concept of database design, uses contextual information or unique identifiers to make terms with multiple meanings more precise (Tucker, 2024). By implementing advanced disambiguation techniques, such as leveraging user behavior data and contextual cues, designers can significantly enhance the relevance of search results.

One way the design of information retrieval systems in Library and Information Science (LIS) can facilitate disambiguation and bring structured data to information retrieval systems is through the incorporation of controlled vocabularies such as Library of Congress Subject Headings (Library of Congress, 2007). A well-established control for vocabulary is essential in order to allow effective faceting, or the ability to filter and refine search results based on specific attributes or categories. Controlled vocabularies can consist of both post- and pre-coordinated subjects. Post-coordination enables users to combine terms dynamically during the search process, allowing for more precise queries, whereas pre-coordination combines multiple concepts into a single term, simplifying the search process by providing a more unified descriptor for complex ideas.

Querying

Querying information retrieval systems can be categorized into two broad types: searches for specific items that are already known and searches focused on broader subjects through the processes of aggregation, discrimination, and disambiguation (Tucker, 2024). To enhance the effectiveness of these searches, users often employ Boolean or logical operators such as OR, AND, and NOT, which help refine the results by including or excluding certain terms. Additionally, wildcards and truncation symbols, commonly represented by characters like “!”, “?”, or “, “ allow for variations in word endings or spellings, broadening the search scope. For example, using a wildcard at the end "librar" can retrieve results for "library," "libraries," “librarian”, and "librarians", thus capturing a wider range of relevant documents. Proximity operators can also be utilized to find terms that are located close to each other within the text, for instance, specifying the number of words between two keywords. Lastly, field restrictions enable users to limit their searches within specific sections of a database, such as titles, authors, or publication dates. In the example of web search engines, a field restriction used might be “site:” or “inurl:” to limit the search to a specific domain.

Evaluation

Building information retrieval systems is an iterative process, and therefore the more measures used to evaluate systems, the more systems can be refined. The functionality of information retrieval can be determined based on the usefulness of the controlled vocabulary used to index them and also the relevance of the documents they contain (Weedman, 2018). Relevance can be highly subjective, so tests that gain feedback and usability testing are some of the most vital measures one can use for the evaluation process of information retrieval systems. A solid stream of user feedback can be a boon to development, as it allows developers to identify and address specific user needs and preferences.

Technically speaking, recall and precision are the two most common and fundamental metrics for assessing the efficacy of information retrieval (Manning et al., 2008). Precision is measured by dividing the relevant results by the number of retrieved items. Recall is the reverse: the number of retrieved items divided by the number of relevant results. Thus, precision and recall can provide a quantitative basis for understanding how well a system might be meeting user needs and allow for designers to better tune the balance between retrieving relevant information and also minimizing irrelevant results.

Competency Development

My journey through various roles in academic and public libraries has provided me with a diverse set of skills and experiences. During my time as a student working in the digital services department of an academic library, one of my projects was to check connecting links in records to documents hosted at other institutions. During my time as a research services assistant at an academic library, I was also involved in a massive weeding project that involved searching various government document databases for availability. In my current position at a public library as an Information Services Assistant, I instruct patrons on how to use the various subscription databases, both in one-on-one reference consultations and within library programs.

The coursework at SJSU that has developed my skills in information retrieval design, querying, and evaluation is firstly INFO 202 Information Retrieval System Design, which covered all aspects of Competency E. INFO 210 Reference and Information Services helped to solidify both my querying skills with weekly search exercises and my evaluation skills of information retrieval systems for use in reference. INFO 246—MySQL in Depth gave me hands-on experience designing a database that underlies the portion of an integrated library system. Assignments in INFO 220 Data Services Librarianship required evaluating databases, repositories, and data portals for the accessibility of datasets.

Evidence

Evidence 1

INFO 202 - Chip Alternatives Database Prototype - Group Project 1

A database of snacks that are alternatives to chips consists of seven fields, with list-string, multiselect dropdown, text with autocomplete, and checkbox data types. The final product included a search page, an index page for entering new objects, and a user-friendly interface for easy navigation. My role in the project was the techie. My main contribution to the project was building, editing, and managing the database in Caspio. This included the datatable and the submission and search forms. In addition to tech duties, I also wrote the rules for the type of oil fields as well as the gluten-free field, along with contributing to some drafting on the statement of purpose, ensuring clarity and consistency in the project's objectives. Therefore, this project highlights the principles of effective database design, focusing on usability and functionality, making it a strong representation of the competency.

Evidence 2

INFO 246 - MySQL Database

This relational database written in MySQL is a proof of concept for record sharing among a consortium of two libraries, one museum, and one archive. It features fictional data related to H.P. Lovecraft’s Cthulhu mythos with actual historic books referenced in his works. The system enables the four locations in Essex County to share records, allowing stakeholders to track the current location of items, the number of copies available, works by specific authors, and the availability of items for viewing or reservation. The design of the database prioritizes efficiency and scalability. The schema consists of three tables: item, catalog, and location. Included also a document of example SQL queries to use against the database. The link leads to a GitHub repository that I used to store all aspects of the project in one place.

This relational database project qualifies as evidence for Competency E by showcasing the design and querying of an information retrieval system through a MySQL relational database. The database features a well-structured schema demonstrating effective database design principles such as efficiency in decomposition and scalability. It facilitates record sharing among a consortium of libraries, a feature in many information retrieval systems. The inclusion of example SQL queries illustrates advanced querying techniques like boolean operators.

Evidence 3

INFO 210 - Sharing Post - Search Bangs

In this discussion post, I shared a web searching technique called "bangs," which I’ve been exploring on DuckDuckGo. Bangs allow users to redirect their search queries directly to specific websites by placing a bang before the search term, such as using !w Canada Goose to go straight to Wikipedia. I highlighted that DuckDuckGo offers over 13,563 bangs and provided a link for others to explore them. I also noted that while many bangs work on Brave and You.com, they do not function on Google or Bing. To encourage further exploration, I mentioned that I would share a list of library and reference-related bangs for the community's benefit.

This post satisfies the querying portion of Competency E by demonstrating a practical application of advanced search techniques that enhance information retrieval efficiency. By introducing the concept of bangs, I illustrated how users can design their search queries to directly access relevant information from specific websites, thereby optimizing the retrieval process. This aligns with the competency's focus on effective querying, as it emphasizes the importance of crafting precise search strategies to improve the relevance and speed of information access.

Evidence 4

INFO 220 - Data Seeking & Access Assignment \ INFO 220 - Data Reference Worksheet

For the Data Seeking & Access Assignment and the Data Reference Worksheet that went with it, I had to look for datasets on the research topic of "grey literature" in different databases, repositories, and portals and rate them. Some of the information retrieval systems searched and assessed include data.gov, re3data, and EDGAR, as well as the Data Archiving and Networking Service (DANS) The information retrieval systems were evaluated first on relevance to the research topic, then on data availability. After discovering two useful datasets, I list the attributes for each and detail my search process.

The Data Seeking & Access Assignment demonstrates Competency E by the skills involved with systematically searching and assessing several databases. The evaluation process reflected a critical approach to assessing information retrieval systems for relevant criteria. By identifying and detailing the attributes of two useful datasets, the project illustrates the principles of database evaluation and the importance of thorough search processes, as well as the identification of object attributes.

Conclusion

The knowledge and skills I have gained through my exploration of information retrieval systems will undoubtedly shape my future as an information professional. Understanding the intricacies of system design, querying techniques, and evaluation metrics has equipped me with the tools necessary to create and refine effective information retrieval solutions. As I move forward in my career, I am committed to applying these principles to enhance user experiences and ensure that information is not only accessible but also relevant and precise. Database design, querying, and evaluation are of significant interest to me professionally. I hope to contribute to the development of systems that meet the diverse needs of users in an increasingly complex information landscape in a significant way. I will empower library patrons to navigate and utilize information retrieval effectively in order to cultivate a more informed and literate society.

References

Chowdhury, G. G. (2010). Introduction to Modern Information Retrieval (3rd ed.). ALA Neal-Schuman

Library of Congress. (2007). Library of Congress: Pre- vs. post-coordination and related issues. https://www.loc.gov/catdir/cpso/pre_vs_post.pdf

Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to Information Retrieval. Cambridge: Cambridge University Press

Tucker, V.M. (2024, August) Design concepts in information retrieval: Creating user-centered systems, search engines, and sites. Free PDF download from the INFO 202 course blog: https://ischoolblogs.sjsu.edu/202/textbook. (open access)

Weedman, J. (2017). Design science in the information sciences. In J. D. McDonald and M. Levine-Clark (Eds.), Encyclopedia of library and information sciences (4th ed., pp. 1242-1255).

Weedman, J. (2018). Information retrieval: Designing, querying, and evaluating information systems. In K. Haycock & M.-J. Romaniuk (Eds.), The Portable MLIS: Insights from the Experts (2nd ed., pp 171-185). Libraries Unlimited.