1.1 Introduction: A thesaurus serves as a powerful tool in the domain of information organization and retrieval, functioning as a comprehensive representation of keywords related to specific subject areas. It is instrumental in guiding both indexers and end users to understand and utilize appropriate terms, thereby enhancing the precision and relevance of search results. Designed for indexing and retrieving documents within particular subject domains, thesauri cover diverse areas, such as education resources, economic resources, social science, and law. Through the structured arrangement of terms and their relationships, thesauri facilitate efficient information retrieval for users. As emphasized by Aitchison and colleagues (2000), the primary role of thesauri lies in information retrieval, which holds paramount significance in the realm of librarianship, as it caters to the information needs of library patrons.
Composition of Thesaurus:
A thesaurus serves as a valuable resource that provides various types of information for both indexers and end users alike. Within its framework, a thesaurus encompasses a collection of terminologies used to describe specific concepts and their semantic relationships with one another. Key components of a thesaurus include preferred terms or descriptors, which are selected for indexing, as well as non-preferred terms or non-descriptors that help guide users to the preferred terms. Additionally, thesauri include related terms that share conceptual associations, narrow terms (NT) that represent specific subcategories, and broad terms (BT) that encompass broader categories. Another essential aspect of thesaurus construction involves the use of “USE” and “Used For” (UF) entries to establish equivalence relationships between preferred and non-preferred terms. Effectively utilizing a thesaurus for indexing documents and allocating appropriate keywords requires a clear understanding of these relationships. As a result, efforts have been made to comprehensively define all the various relations employed within a thesaurus to optimize its effectiveness in information retrieval and organization.
1. Preferred Terms/Descriptor:
Preferred terms, also known as descriptors, play a fundamental role in the construction of a thesaurus. They represent the most appropriate and standardized terms selected by indexers while assigning keywords to bibliographic records. As the foundation of a thesaurus, these descriptors form a significant part of the controlled vocabulary. When indexers begin the indexing process, they start with a preferred term, which serves as a starting point and a reference point for selecting related terms and narrow terms. By using these preferred terms as anchors, indexers can effectively guide the process of indexing and ensure the consistency and precision of the resulting index.
2. Non-preferred Terms/Non-Descriptor:
Non-preferred terms, also referred to as non-descriptors, are an essential component of a thesaurus. They represent synonymous terms that cannot be used by indexers or searchers as subject headings when indexing a bibliographic record. Instead, these non-preferred terms serve as cross-references, directing users to the most appropriate and preferred term or descriptor to be assigned to a record. When indexers encounter a non-preferred term, it guides them to choose the most appropriate descriptor from the controlled vocabulary. By using non-preferred terms in this way, the thesaurus ensures consistency in indexing and helps users locate the most relevant information by following the link to the preferred term.
3. Related Terms:
Related terms (RTs) play a crucial role in a standard thesaurus, alongside preferred and non-preferred terms. They represent a list of terms that are semantically related to a specific descriptor or preferred term. These related terms serve as additional access points, allowing users to explore related concepts and find relevant information beyond the specific preferred term. The inclusion of related terms in the thesaurus enhances the user’s search experience by providing a network of interconnected terms that share meaningful associations, facilitating comprehensive and accurate retrieval of information.
4. Semantic Relations:
Semantic relations are fundamental elements of a thesaurus structure, providing a systematic and rigorous framework for organizing and connecting terms within a controlled vocabulary. These relationships are designed to facilitate effective information retrieval and aid users in identifying the most suitable search terms. Thesauri typically encompass three main types of semantic relations:
i. Equivalence: Equivalence relationships establish connections between preferred terms (descriptors) and their corresponding non-preferred terms (non-descriptors). Non-preferred terms, also known as synonyms or quasi-synonyms, are synonymous with preferred terms but are not used as index terms. Instead, they serve as pointers, guiding users to the most preferred term for indexing and retrieval.
ii. Hierarchy: Hierarchy relationships represent the relationships of superordination (Broader Term – BT) and subordination (Narrower Term – NT) between terms. BT refers to broader terms that represent higher-level concepts or categories, while NT refers to narrower terms that represent specific subcategories or instances of the broader term. Hierarchy relationships help users navigate through a hierarchical structure of terms, providing a broader or narrower context for a specific concept.
iii. Association: Association relationships link terms that are conceptually related but not hierarchically connected. These related terms (RTs) share meaningful associations or connections and can be used interchangeably in certain contexts. Association relationships allow users to explore concepts that are semantically connected to the preferred term, expanding their search scope and retrieving relevant information beyond the immediate hierarchy.
By incorporating these semantic relationships, a thesaurus facilitates precise and efficient information retrieval by guiding users to the most appropriate search terms and presenting them in a coherent and interconnected manner.
5. Meaning of USE and Used For (UF) in a Thesaurus:
In a thesaurus, “USE” and “Used For (UF)” are reference pointers used to link non-preferred terms (non-descriptors) to their corresponding preferred terms (descriptors). The purpose of these references is to guide users to the appropriate and most preferred term when searching for information.
When a non-preferred term is linked to its preferred term by a “USE” reference, it means that the non-preferred term should not be used for indexing or searching. Instead, users should “USE” the preferred term when referring to the concept or subject. The “USE” reference indicates that the non-preferred term is synonymous with the preferred term but is not the preferred term for indexing purposes.
Conversely, when a preferred term is linked to its corresponding non-preferred term by a “Used For (UF)” reference, it means that the preferred term is the appropriate term to be used for indexing and searching. The “Used For” reference points users to the non-preferred term, indicating that the non-preferred term is synonymous with the preferred term but should not be used as an index term.
For example:
Preferred Term: Gender Discrimination
Non-Preferred Term: Sex Discrimination
In this example, “Gender Discrimination” is the preferred term, and “Sex Discrimination” is the non-preferred term. The “USE” reference will appear under “Sex Discrimination,” guiding users to “USE” the term “Gender Discrimination” instead when referring to the concept. Conversely, the “Used For (UF)” reference will appear under “Gender Discrimination,” pointing users to the synonymous non-preferred term “Sex Discrimination” but indicating that “Gender Discrimination” is the appropriate index term to be used for retrieval purposes.
6. Scope Notes:
Scope notes in a thesaurus play a crucial role in providing additional context and clarification about the meaning and usage of a descriptor (preferred term). They serve as guidance for indexers who may not be subject experts, helping them to understand the specific scope and context of a term. A well-written scope note enhances the quality of the thesaurus and subsequently improves the indexing process.
The purpose of scope notes is to avoid ambiguity and ensure that indexers select the most appropriate and accurate descriptors while indexing documents. They provide valuable information about the scope, limitations, and specific contexts in which a term should be used. This helps indexers avoid potential confusion and make informed decisions when assigning index terms to bibliographic records.
For indexers who may not have in-depth knowledge of a subject area, scope notes act as a valuable aid in understanding the nuances of various descriptors. They prevent misinterpretation and misrepresentation of concepts by offering clear explanations and context. As a result, scope notes contribute to the consistency and precision of the indexing process.
Additionally, scope notes can also improve the indexing skills of professionals by providing them with insights into domain-specific terminology and facilitating a better understanding of subject matter. This, in turn, leads to more accurate and comprehensive indexing, ultimately enhancing the overall quality of the index and making it more useful for end users searching for relevant information. In summary, scope notes are an indispensable element of a thesaurus, aiding indexers in making well-informed decisions and enhancing the efficiency and effectiveness of the indexing process.
How to Build a Thesaurus
A thesaurus that only lists all the preferred and non-preferred terms are known as an enumerative thesaurus. The building processes in a thesaurus include: collecting terms, modifying terms, the decision for descriptor or non-descriptor, establishing semantic relations and scope notes for defining a concept, etc. An attempt has been made to describe the steps involved in the construction of a thesaurus.
1. Collecting Terms:
The initial step in constructing a thesaurus is the process of collecting a comprehensive set of terms relevant to the subject domain. These terms will form the foundation of the controlled vocabulary within the thesaurus. The sources from which these terms are identified should be carefully determined. These sources may include existing thesauri, indexes, dictionaries, glossaries, and other specialized resources. Alternatively, terms can be extracted from textual metadata such as titles, abstracts, full-text documents, and other relevant content. Additionally, engaging in discussions with subject experts can be beneficial in identifying specific terms related to the domain.
During the term collection phase, some of the gathered terms will be categorized as preferred terms, while others will be assigned as non-preferred terms. Preferred terms serve as the primary entry points for indexing documents, guiding users to the most relevant subject headings. Non-preferred terms, on the other hand, act as synonymous terms that direct users to the corresponding preferred terms, ensuring a smooth and efficient search process.
It is essential to consider the nature of the terms to be included in the thesaurus. Typically, terms should be nouns or noun phrases, as they best describe the subjects or concepts of interest. Proper nouns are generally excluded from the thesaurus, as they often refer to specific entities or individuals rather than general concepts. By carefully selecting and collecting terms from appropriate sources, a comprehensive and user-friendly thesaurus can be developed, facilitating effective information retrieval within the subject domain.
2. Modification of Terms as Per the Local Requirements:
During the process of building a thesaurus, the collected terms may include nouns, noun phrases, and adjectives that are relevant to the subject domain. However, it is crucial to consider the specific requirements and preferences of the end users in the intended region or country where the thesaurus will be utilized. Local variations in language, terminology, and spelling can significantly impact the effectiveness of the thesaurus in facilitating information retrieval.
One essential aspect of modification involves identifying terms that are most commonly sought after by the end users during the retrieval process. This entails conducting user studies or engaging with subject experts to gain insights into the terminology preferred by the target audience. For instance, a particular term like “Reservation Policy” may be widely accepted and used in India, but in the United States of America (USA), the more commonly used term for the same concept is “Affirmative Action.” Such differences in terminology can greatly influence the relevance of search results and improve the user’s experience.
Furthermore, attention should be given to spelling variations that exist across different countries, particularly in regions where English is the primary language. For instance, the word “Labour” is commonly spelled as “Labor” in the USA. Considering these variations and regional preferences is crucial to ensure that the thesaurus is optimized for users in different geographical locations.
By carefully adapting and modifying terms to align with local requirements, the thesaurus becomes more user-friendly and relevant, enabling users to access information using familiar and preferred terminology. This localization process enhances the usability and effectiveness of the thesaurus in specific contexts and contributes to more efficient and accurate information retrieval.
3. Establishing Relations:
Establishing relationships between terms is a critical aspect of thesaurus construction, and it constitutes the third step in the process. Thesauri typically include three types of semantic relationships: equivalence, hierarchy, and association.
The equivalence relationship is established between a preferred term (descriptor) and its corresponding non-preferred term (non-descriptor). This relationship provides a connection between synonymous terms, where one term is selected as the preferred term for indexing purposes, and the non-preferred term serves as an alternative or synonym. This ensures that users can access relevant information using different synonymous terms.
On the other hand, the hierarchical relationship deals with the organization of terms based on their hierarchical structure. A topical term, also known as the superior term, is linked to its subordinate terms or hyponyms using the Broad Term (BT) and Narrow Term (NT) relationships. The BT indicates the broader or more general concept, while the NT points to the narrower or more specific concept. This hierarchical arrangement enhances the user’s ability to navigate through the subject domain and retrieve information at varying levels of specificity.
The associative relationship, as described by Weinberg, represents the overlap or similarity between terms in meaning. This relationship does not involve a hierarchical arrangement but rather identifies concepts that are conceptually related or associated with each other. Associative relationships can be symmetrical, where both terms are related to each other in a similar manner, or asymmetrical, where one term is related to another, but not vice versa. For instance, “gold” and “money” have a symmetrical associative relationship, while “population control” and “family planning” have an asymmetrical association, as someone searching for family planning may not necessarily be interested in population control.
By incorporating these semantic relationships into the thesaurus, users can benefit from a richer and more nuanced information retrieval experience. The relationships between terms enable users to explore related concepts, find synonymous terms, and navigate the subject domain effectively, making the thesaurus a valuable tool for information organization and retrieval.
4. Thesaurus Display Format:
The final step in the construction of a thesaurus involves determining the display format, which is essential for effective utilization and user-friendliness. There are two primary display formats commonly used in thesauri: the alphabetic sequence and the classified sequence.
In an alphabetic thesaurus, terms are organized in a single alphabetical order. Each preferred term (descriptor) is listed in alphabetical order, and its associated terms, such as non-preferred terms (non-descriptors) and related terms (RTs), are displayed beneath it. This format allows users to easily locate specific terms and their corresponding relationships, making it convenient for both indexers and end users.
On the other hand, the classified thesaurus arranges all terms related to the same concept together, forming a facet. A facet represents a specific aspect of the subject domain and contains various terms related to that aspect. This arrangement provides a comprehensive and holistic view of the subject area, as all the relationships and hierarchies are displayed together. Classified thesauri are particularly useful for exploring interconnected concepts and gaining a deeper understanding of the subject matter.
According to Soergel (1974), the relationships within an alphabetic thesaurus should adhere to a specific sequence for consistency and ease of use. The sequence starts with the descriptor, followed by the Scope Note (SN), which provides a brief definition or explanation of the term. Next come the Broad Term (BT) and Narrow Term (NT) relationships, indicating broader and narrower concepts, respectively. Finally, the Related Term (RT) relationships are listed, showcasing terms that are conceptually associated with the descriptor.
The choice between the alphabetic and classified display formats depends on the specific needs of the users and the subject domain covered by the thesaurus. Both formats have their advantages and cater to different information retrieval requirements. Ultimately, the display format plays a crucial role in presenting the organized information in a user-friendly manner, enhancing the thesaurus’s effectiveness as a valuable tool for information organization and retrieval.
Role of Thesaurus in Indexing
The role of a thesaurus in indexing is paramount to ensure effective and accurate information retrieval. In the indexing process, the primary goal is to represent the concepts contained within a document using appropriate and standardized terms. Thesauri, along with classification systems and subject headings, are widely recognized and accepted as essential indexing tools.
One of the key functions of a thesaurus is to facilitate the indexer’s understanding of the subject area’s overall comprehension. By providing a structured and controlled vocabulary, the thesaurus guides the indexers in selecting the most relevant and preferred terms to describe the content of the document. This ensures consistency and coherence in the indexing process.
Moreover, a thesaurus enables the indexer to outline the inter-relationships between various concepts within the subject domain. The hierarchical relationships, such as broader terms (BT) and narrower terms (NT), and associative relationships (RT) allow indexers to establish connections between related terms, enhancing the user’s ability to navigate and explore related topics during information retrieval.
Additionally, the thesaurus serves as a valuable resource for providing clear and precise definitions of terms. Scope notes associated with descriptors offer concise explanations of the term’s meaning and usage, aiding the indexer in making informed decisions while assigning appropriate terms to the document.
By utilizing a thesaurus in the indexing process, specific collections and databases can significantly improve the quality of information retrieval within their respective subject domains. Consistent and standardized use of preferred terms enhances the accuracy of search results and ensures that relevant documents are retrieved, meeting the information needs of users more effectively.
In summary, the role of a thesaurus in indexing is vital in supporting the indexer’s comprehension of the subject, establishing meaningful inter-relationships between concepts, and providing clear definitions of terms. By using a thesaurus as a controlled vocabulary tool, information retrieval systems can enhance their efficiency and effectiveness, ultimately benefiting both indexers and end users in accessing relevant and valuable information within their chosen subject areas.
Conclusion: In conclusion, the role of a thesaurus in indexing is indispensable for the efficient and accurate retrieval of indexed documents within bibliographic databases, whether in print or electronic formats. With a vast number of records in these databases, the need for effective surrogates arises to enhance information retrieval comprehensively. Descriptors, as surrogates for subjects, encapsulate the thought content of each document, facilitating the indexing process.
Utilizing a standard thesaurus in the indexing process empowers indexers to select appropriate and consistent search terms, leading to highly relevant document retrieval during searches. The use of a thesaurus ensures that the search terms employed are consistent and standardized, increasing the precision and relevance of the retrieved documents.
The construction of a thesaurus involves a series of logically sequenced tasks, including collecting terms, modifying them to suit local requirements, establishing semantic relationships, and providing scope notes to define concepts accurately. These steps are crucial in creating a comprehensive and effective thesaurus.
In the context of the present research study on Indian social science literature, all the essential steps and considerations have been diligently followed in constructing the thesaurus. As a result, the thesaurus is well-equipped to assist indexers in accurately representing the subject content of documents, ultimately improving the efficiency and effectiveness of information retrieval within the domain of Indian social science literature.
For citing this article use:
- Pandya, M. Y. (2016). Thesaurus development for Indian social science literature on relational database management system and its integration with OII. Retrieved from: http://hdl.handle.net/10603/123508
1 Comment
very helpful piece.
please I need more explanations on indexing and abstracting. Sir, thanks