1. Introduction: All library work is a matter of storage and retrieval of information, and cataloguing and indexing are specially performed to achieve that. Subject cataloguing is intended to embrace only that activity which provides a verbal subject approach to materials added to library collections. Subject indexing is used in information retrieval especially to create index records to retrieve documents on a particular subject. Descriptive cataloguing makes it possible to retrieve the materials in a library by title, author, etc. – in short, all the searchable elements of a cataloguing record except the subjects.
Until the second half of the nineteenth century, descriptive cataloguing was the basic library cataloguing practice that was found necessary. Libraries were much smaller than they are today, and scholarly librarians then were able, with the aid of printed bibliographies, to be familiar with everything available on a given subject and guide the users to it. With the rapid growth of knowledge in many fields during the nineteenth century and the resulting increase in the volume of books and periodicals, it became desirable to do a preliminary subject analysis of such works and then represent them in the catalogue or in printed indexes in such a way that they could be retrievable by subject. Subject cataloguing deals with what a book or other library item is about, and its purpose is to list, under one uniform word or phrase all the materials on a given topic that a library has in its collection. A subject heading is a uniform word or phrase used in the library catalogue to express a topic. The use of authorized words or phrases only, with cross-references from unauthorized synonyms, is the essence of bibliographic control in subject cataloguing. In the literature of LIS, the phrases subject cataloguing and subject indexing are used more or less interchangeably. In this context, it is to be pointed out here that it was Charles Ammi Cutter who first gave a generalized set of rules for subject indexing in his Rules for a Dictionary Catalogue (RDC) published in 1876. But he never used the term ‘indexing’; rather he used the term ‘cataloguing’. In this course material, the phrase subject indexing includes subject cataloguing also. The literature differentiates the two as subject cataloguing is intended to embrace only that cataloguing activity which provides a verbal subject approach to library collections, especially macro documents (i.e. books). It refers to the determination and assignment of suitable entries for use in the subject component of a library’s catalogue. The primary purpose of the subject catalogue is to show which books on a specific subject are possessed by the library. Subject indexing refers to that indexing activity which provides a verbal subject approach to micro documents (e.g., journal articles, research reports, patent literature, etc.). Subject indexing provides a subject entry for every topic associated with the content of a micro document.
The representation of documents and the knowledge expressed by them is one of the central and unique areas of study within Library and Information Science (LIS) and is commonly referred to as indexing. Subject approach to information has been a long and extensive concern of librarianship and is assumed to be the major approach (access method) of users for a very long period. Indexing has traditionally been one of the most important research topics in information science. Indexes facilitate retrieval of information in both traditional manual systems and newer computerized systems. Without proper indexing and indexes, search and retrieval are virtually impossible.
2. Subject Indexing: Origin and Development:
The origin and development of subject indexing are intimately related to the historical development of libraries through ancient and medieval periods to modern days. The libraries of the ancient world used to arrange documents under some subjects. The catalogue, which worked as an index to this store, was predominantly a systematic subject listing according to a scheme of subject headings. The arrangement more or less conformed to the arrangement of documents in the store.
The specific usage of the term index goes back to ancient Rome. There, when used in relation to literary works, the term index was used for the little slip attached to papyrus scrolls on which the title of the work (and sometimes also the name of the author) was written so that each scroll on the shelves could be easily identified without having to pull them out for inspection. From this developed the usage of the index for the title of books. In the first century A.D., the meaning of the word was extended from “title” to a table of contents or a list of chapters (sometimes with a brief abstract of their contents) and hence to a bibliographical list or catalogue. Only the invention of printing around 1450 made it possible to produce identical copies of books in large numbers, so that soon afterward the first indexes began to be compiled, especially those to books of reference. By the end of 13th-century alphabetization by names of authors under the systematic subject, the arrangement was well known. The index to the store or the shelf-list used to be supplemented with an author index to satisfy the author approach of the users of the store. Index entries were not always alphabetized by considering every letter in a word from beginning to end. Most early indexes were arranged only by the first letter of the first word, the rest being left in no particular order at all. Gradually, alphabetization advanced to an arrangement by the first syllable, that is, the first two or three letters, the rest of the entry still being left unordered.
The 15th century saw the entry of the university libraries which brought about a qualitative change. Efforts were made to rank subjects and on devising indexing or cataloguing methods for better utilization of documents. Towards the end of the 15th century, the practice of supplementing systematic listing with an alphabetical subject index was introduced. Only very few indexes compiled in the 16th and early 17th centuries had fully alphabetized entries, but by the 18th century, full alphabetization became the rule. Alphabetical indexing gained new momentum as intellectual debates among the scholars required ready reference to scholarly works with the rise of universities. The pressmarks, which were mainly used for storage of documents, started being used in catalogues as a retrieval tool. But the press marks could not ensure a flexible hierarchical order of subjects and hence it was discarded in favor of notation. In the 19th century, subject access to books was provided by means of classification. Books were arranged by subject and their surrogates were correspondingly arranged in a classified catalogue. Only in the late 19th century, alphabetical subject indexing became widespread and more systematic. The classification system was primitive in nature. It could not go deep enough to the extent of individualizing subjects of documents. The separate existence of the classed catalogues and indexes stirred up the imagination for the compilation of a catalogue which was very much akin to a dictionary in form. Thus were born the forerunners of our dictionary and classified catalogues.
Preparation of back-of-the-book index, historically, maybe regarded as the father of all indexing techniques. Indexing techniques actually originated from this index. It was of two types: Specific index, which shows broad topic on the form of one-idea-one-entry, i.e. the specific context of a specific idea; and Relative index, which shows various aspects of an idea and its relationship with other ideas. The specific index cannot show this, it only shows a broad topic on the form of one-idea-one-entry, i.e. the specific context of a specific idea.
The dictionary catalogue brought some relief into the sharp conflict between subjects of documents and the practice of naming them. Charles Ammie Cutter, who was both a classificationist as well as a theoretician of the library catalogue, observed that the name of the subject assigned to a document did not indicate its specific subject. Rather it indicated the class to which the subject of the document belonged to. For example, assigning the subject ‘plant’ to a document discussing the plant ‘cactus’. The practice
was deficient in helping a user who came for information on a specific subject. The root of the conflict remained deep in the classification system also as the classification was not coextensive with the subjects of the documents. Hence, whatever was left out in classification became conspicuous by their absence while giving class names to individual entries as the subject heading. Cutter, who was an advocate of dictionary catalogue wanted to solve the conflict at the cataloguing level. The year 1876 is particularly important for the library profession for the publication of two outstanding books:
(1) A Classification and Subject Index for Cataloguing and Arranging the Books and Pamphlets of a Library, by Melvil Dewey; and
(2) Rules for a Dictionary Catalogue, by Charles Ammie Cutter.
The first sought to solve the problems by organizing the document store and simultaneously providing an alphabetical subject index for easy access to it while the second, expressing doubts about the efficacy of class headings to be used as specific subject heading, decided to go through a different way by some specific method for naming of the subjects. While Dewey offers a ready made list of names (class names in this case) Cutter suggested some methods for building them up in order to name them more specifically. Cutter’s rules for specific subject headings for use in a dictionary catalogue seemed to have appealed the library professional. Subsequently, there was a demand for some ‘standard list of subject headings’ which could be used in carrying out the specifications in Cutter’s rules. This paved the way for the publication of a list of subject headings by the American Library Association (ALA), to be used in a dictionary catalogue. The list was later revised and published in two more editions which ultimately established a pattern for subsequent subject heading lists like subject headings used in the dictionary catalogue of Library of Congress and Sears List of Subject Headings.
Use of the above standard lists of subject headings raised important questions relating to the use of terminology (whether common or popular terms or scientific and technical terms were to be used), and sequencing of terms in the subject heading (what should be the sequence of terms in case of compound subject headings). But Cutter, as well as compilers of several standard lists of subject headings, failed to provide satisfactory answers to the above-noted questions.
The first quest for a logical approach towards solving the above-noted problems is evident in J. Kaiser’s Systematic Indexing (1911). Kaiser was the first person who gave the idea of categorizing the terms under two fundamental categories: concretes and processes. He recommended the citation order of these categories into the index string. Kaiser suggested that many composite subjects could be analyzed into a combination of concepts indicating a ‘concrete’ object and a ‘process’. In such cases, the concrete should be given precedence over ‘process’ in the order of citation of index terms in a compound subject heading. Kaiser failed to analyze deeply the various types of intricacies involved in the naming of subjects. Nevertheless, his work remains unique till date as he is the first person to suggest certain logical processes for naming subjects in terms of fundamental categories and a citation order of index strings.
Dr. S. R. Ranganathan was the first to analyze the universe of subjects in depth and suggesting a complete theory of naming subjects using a subject indexing language. He realized the fallacy of trying to symbolize the extremely flexible and dynamic multidimensional universe of subjects into a linear, rigid notational model. Just as ready-made class numbers cannot be given according to his scheme of classification to all subjects of the past, present, and future, so also subject headings cannot be made available ready-made. He, therefore, enunciated certain rules on the basis of which subject names could be framed. Ranganthan developed a mechanical procedure for doing it and called it the chain procedure. The basic contention of chain procedure is that a multidimensional universe of subjects cannot be fitted into a rigid one-dimensional model and hence, a chain of terms is required to name a subject where the term indicating the specific subject is stated in a particular context. Chain procedure demonstrated that it is not necessary to depend on the flair of some authorities for the supply of names of subjects. One can very well build up one’s own authority file and use subject names consistently. The names used will be uniform for all libraries following the same scheme of classification. The chain, which is a string of terms, gets organized or arranged following the classification scheme used. Qualities of the classification scheme therefore very much determine the qualities of the subject headings drawn according to chain procedure.
J. E. L. Farradane devised a scheme of pre-coordinate indexing system known as Relational Indexing in the early 1950s. The basic proposition of Farradane’s Relational Indexing was to identify the relationship between concepts by following the learning process through which we develop our power of discrimination in time and space. Farradane’s Relational Indexing has been the subject of scholarly research but was never implemented. Still, we can say that Farradane’s contribution to the area of subject indexing was: analysis of the relationship among each pair of terms, use of relational operators, and representation of the relationship among terms by relational operators leading to the creation of `Analets’. `Analee refers to a pair of terms linked by any of the relational operators as developed by Farradane. Each relational operator is denoted by a slash and a special symbol having a unique memory. For example,
The contribution of E. J. Coates in subject indexing was not original in nature. Coates merely synthesized the ideas of Cutter, Kaiser, Ranganathan, and Farradane. Coates applied his idea on British Technology Index (now Current Technology Index) of which he was an editor from its inception in 1963 until his retirement in 1976.
Preserved Context Index System (PRECIS), developed by Derek Austin and applied to BNB in 1971 as an alternative to the chain procedure for deriving subject index entries, sought to rectify the problem of co-extensiveness by generating entries with a lead term and the full context of the document. Depending heavily on the computer to generate mechanically all index entries from input strings, PRECIS developed its own code for preparation of input strings by the human indexer and its subsequent processing by computer. Its emphasis has been on generating a printed index for BNB. Though PRECIS was fairly successful in its original mission it does not have the simplicity of chain procedure and considerable skill is required to use it effectively.
Postulate-based Permuted Subject Indexing (POPSI) sought to overcome the shortcomings of chain procedure from an entirely different perspective. It recommended postulates and principles for analyzing the subjects into elementary categories and their subsequent ordering. The postulates are not rigid and hence give flexibility to indexers. As it is essentially distilled out of chain procedure it has managed to retain most of the helpful features of chain procedure such as simplicity. Over the years, Bhattacharya,
Neelameghan, Devadasan, Gopinath, and others have given a sound theoretical foundation to POPSI in terms of ‘General Theory of Subject Indexing Languages’ (GT-SIL). The GT-SIL seeks to analyze the deep structure of Subject Indexing Languages in terms of semantic structure, elementary structure and syntactic structure of subject propositions. In essence, GT-SIL is a logical abstraction of the structures of outstanding subject indexing languages such as those of Cutter, Dewey, Kaiser, and Ranganathan.
It is evident from the above discussion that the research on the development and use of various subject indexing systems was devoted to techniques of constructing pre-coordinate subject headings. A greater part of the pre-coordinate subject indexing system was devoted to syntactical rules of indexing. Rigidity of significance order may not meet the approaches of all users of the index file, though this problem is solved by rotating terms or multiple entry system. It is also evident that even the acceptance of multiple entry system covers only a fraction of the possible number of total permutations. Thus, a large portion of probable approaches or access points is left uncovered. This gap widens rapidly with every increase in the number of terms in a subject heading due to the demand for more specific subject headings. The index file may fail to provide a particular combination which the user is looking for. It may also provide a combination which proves too broad for a particular search. The above considerations and difficulties stemming from the pre-coordination of terms led to the development of post-coordinate indexing or simply coordinate indexing systems like Uniterm, Optical Coincidence Card / Peek-a-boo, Edge-Notched Card, etc during the 1960s.
Computers began to be used to aid information retrieval in the 1950s. The Central Intelligence Agency (CIA) of USA is said to be the first organization to use the machine-produced keywords from Title Index since 1952. H P Luhn and his associates produced and distributed copies of machine produced permuted title indexes in the International Conference of Scientific Information held at Washington in 1958, which he named as Keyword-In-Context (KWIC) index and reported the method of generation of KWIC index in a paper. American Chemical Society established the value of KWIC after its adoption in 1961 for its publication ‘Chemical Titles’. A number of varieties of keyword index are evident in the literature. They differ only in terms oftheir formats but indexing techniques remain more or less the same.
The publication of Science Citation Index (SCI) by Eugene Garfield of the Institute of Scientific Information (ISI), Philadelphia in 1963 provided a new approach to the bibliographic file organization. The online version ofthe SCI, known as SCISEARCH, was published in 1974. ISI also brought out the Social Science Citation Index (SSCI) and Arts and Humanities Citation Index (A&HCI) in 1973 and 1978 respectively. The publication of the citation classics, with the first issue of Current Contents in 1977, forms an important and interesting venture ofthe ISI.
It has already been mentioned above that the traditional subject indexing systems and techniques have taken a new turn with the applications of computers in the 1950s. In fact, all attempts at computerised indexing were based on two basic methods: Statistical analysis; and Syntactic and semantic analysis. In the arena of computerised indexing, there has been considerable research on the user-interface design, indexing systems using Artificial Intelligence techniques like Natural Language Processing (NLP), Knowledge Representation Model and Expert System-based subject indexing systems. As a result of the phenomenal growth of content on the web as an indexing problem, we have seen a continued interest in the development of tools and techniques to index the Web resources. Different search tools and technologies were developed in finding the resources on the Web so far to make computers understand the semantics underlying contents of the web resources.
3. Meaning and Purpose of Index:
The term index came from the Latin word indicare which means ‘to point out, to guide, to direct, to locate’. An index indicates or refers to the location of an object or idea. It is a methodically arranged list of items or concepts along with their addresses. The process of preparing an index is known as indexing. According to the British Standards (BS 3700: 1964), an index is “a systematic guide to the text of any reading matter or to the contents of other collected documentary material, comprising a series of entries, with headings arranged in alphabetical or other chosen order and with references to show where each item indexed is located”. An index is, thus, a working tool designed to help the user to find his way out of a mass of documented information in a given subject field, or document store. It gives subject access to documents irrespective their physical forms like books, periodical articles, newspapers, audio-visual documents, and computer-readable records including web resources.
It appears from the foregoing discussion that an index indicates or refers to the location of an object/idea/concept. A concept is a unit of thought. The semantic content of a concept can be re-expressed by a combination of other and different concepts, which may vary from one language or culture to another. What the particular body of information is about, in a document constitute its subject. A subject can be defined as any concept or combination of concepts representing a theme in a document. An indexing term is defined as the representation of a concept in the form of either a term derived from natural language or a classification symbol.
A subject is then any concept or combination of concepts which is expressed in the document. The readers’ task is to interpret the words and sentences in the document in order to understand the concepts. Whether a reader understands a document depends on how precisely the author expresses the concepts he refers to and whether the reader is aware of the concepts the author expresses. The basic idea is that the concepts exist before the author writes the document and the reader reads the document.
Similarly, the indexer’s task is to identify concepts in the document and re-express these in indexing terms. This is done first by establishing the subject content, or in other words the content of concepts in the document. Thereafter the principal concept presented in the subject content is identified, and finally, the concepts are expressed in the indexing language. The indexing is successful when the document and the indexing term express the same concepts.
Modem subject indexing practice has its roots in Charles Ammi Cutter’s Rules for a Dictionary Catalog published in 1876. Cutter’s statement of the basic objectives of a catalogue is:
(i) To enable a person to find a book of which the subject is known, and
(ii) To show what the library has on a given subject (and related subjects).
This implies that the main purpose of subject indexing is to satisfy the subject query of the users by enabling an enquirer to identify documents on a given subject and providing information on the presence of material on allied or related subjects.
The first objective refers to the need to locate individual items, and the second refers to the need to collocate materials on the same subject as well as related subjects. A subject is a set of interrelated component ideas in which each component idea is related directly or indirectly to other component ideas. A subject of a document is amenable for structuring into subject heading. It is a kind of linear structuring of subject surrogates, and some criteria for formatting or modeling it into an accessible procedure. The purpose of subject indexing is to:
a) satisfy the subject approach to information;
b) identify pertinent materials on a given subject or topic;
c) enable the enquirer to find materials on related subjects;
d) link related subjects by a network of references;
e) prescribe a standard methodology to subject cataloguers/indexers for constructing uniform subject headings;
f) bring consistency in the choice and rendering of subject entries, using standard vocabulary and according to the given rules and procedures;
g) be helpful to users in accessing any desired document from the catalogue or index through different means of such approach;
h) decide on the optimum number of subject entries, and thus economize the bulk and cost of indexing; and
i) provide user-oriented approach in naming the subjects through any vocabulary common to a considerable group of users, specialists or laymen.
4.Indexing Principles and Process:
4.1 Need and Purpose of Indexing Principles:
Before we discuss the principles of indexing, it is important to know why we need to have principles of indexing. We need principles of indexing:
1) To set out the general directions for the consistent application of subject indexing techniques;
2) To serve as a useful guide for developing new indexing techniques and to develop one that already exists;
3) To facilitate the evaluation of indexing systems;
4) To provide theoretical rationale for particular standards or guidelines for designing subject indexing system and its application;
5) To promote understanding of different subject indexing systems by identifying commonalities underlying them and providing a structure for their comparison; and
6) To determine how the subject headings are established and applied.
4.2 Indexing Principles:
Indexing principles may be stated as:
a) The user as focus: The wording and structure of the subject heading should match what the user will seek in the index;
b) Unity: A subject index must bring together, under one heading all the documents which deal principally or exclusively with the subject, whatever the terms, applied to it by the authors and whatever the varying terms, applied to it at different times. It must use a term which is unambiguous and does not overlap in meaning with other headings in the index.
c) Common Usage: The subject heading chosen must represent common usage or, at any rate, the usage of the class of users for whom the documents on the subject within which the heading falls are intended. Whether a popular term or a scientific one is to be chosen should depend on the approaches of the users.
d) Specificity: The heading should be as specific as the topic it is intended to cover. As a corollary, the heading should not be broader than the topic. Rather than using a broader heading, the cataloguer should use two specific headings which will approximately cover it.
4.3 Indexing Policy:
Indexers must take policy decisions about how many terms should be included in an index entry, how specific the terms should be and how many entries an index should incorporate. Together this gives a depth of indexing. The depth of indexing describes the thoroughness of the indexing process with reference to exhaustivity and specificity. While taking such a policy decision, indexers should strive for a balance between specificity and exhaustivity and should consider the requirement of the users of the index along with the cost and time factors.
Exhaustivity in Indexing:
Exhaustivity in indexing is the detail with which the topics or features of a document are analyzed and described. In other words, an exhaustive index is one which lists all possible index terms associated with the thought content of a document. In contrast to higher exhaustivity, higher specificity increases precision at the cost of the impaired recall. Greater exhaustivity gives a higher recall leading to the retrieval of all the relevant documents along with the retrieval of a large number of irrelevant documents or documents which only deal with the subject in little depth.
Specificity in Indexing:
The specificity describes how closely the index terms match the topics they represent in a document. It is the extent to which the indexing system permits us to be precise when specifying the subject of a document we are processing. Higher specificity leads to high precision, whereas lower specificity will lead to low precision, but high recall. Specific indexing provides specific terms for all or most topics and features and results in a larger indexing vocabulary than more generic indexing. Specificity tends to increase with exhaustivity in indexing vocabulary as the more terms we include, the narrower those terms will be. A high level of specificity increases precision.
4.4 Indexing Process:
The representation of documents and the knowledge expressed by them is one of the central and unique areas of study within library and information science (LIS) and is commonly referred to as indexing. A common demand in the LIS field is for a set of rules or a prescription for how to index. When this demand is raised it is usually based on the assumption that it is possible to explain the intellectual operations in the subject indexing process. The indexing process basically consists of two intellectual steps: conceptual analysis and translation.
This step refers to the identification of different component ideas associated with the thought content of the document and the establishment of the interrelationship between those component ideas. According to Ranganathan, it involves the work in the idea plane which is carried out in two stages, although these tend to overlap in practice:
a) examining the document and establishing its subject content;
b) identifying the principal concepts present in the subject;
a) Examining the document and establishing its subject content: In the first stage of the conceptual analysis of the thought content of the document, it is examined for the establishment of its subject content. A complete reading of the document often is impracticable, but the indexer should ensure that no useful information has been overlooked. While examining the document, the indexer should give particular attention to a number of places in the document: the title; the abstract, if provided; the list of contents; the introduction, the opening chapters and paragraphs, and the conclusion; illustrations, diagrams, tables and their captions; words or groups of words which are underlined or printed in an unusual typeface.
b) Identifying the principal concepts present in the subject: In this stage of the indexing process the indexer identifies the principal concepts in the subject. The second stage is laid over the first stage in the sense that the indexer should not go back to the document to look for concepts. Rather, the indexer should look for concepts within the findings of the first step; that is the natural language representations of the subject content. The indexer does not necessarily need to retain, as indexing elements, all the concepts identified during the examination of the document. After examining the document, the indexer needs to follow a logical approach in selecting those concepts that best express its subject. While selecting the principal concepts of the document the indexer should take into consideration the purpose for which the indexing data will be used. Indexing data may be used for the purpose like preparation of subject headings for the subject catalogue, production of printed alphabetical indexes to different types of information products, and computerized storage of indexing data elements for subsequent retrieval of the documents.
During the first two stages, the indexer has established the subject content of the document and identified the principal concepts in the subject. The indexer is hereafter ready to translate the concepts into the indexing language. This step refers to the expression of principal concepts as identified while analyzing the thought content of the document into the language of the indexing system. According to Ranganathan, it involves the work in the verbal plane which calls for the familiarity with different components of the given indexing language: controlled vocabulary, syntax and semantics including their working roles for displaying the indexing data in a subject index.
If the concepts that the indexer has identified during the second stage are present in the indexing language the indexer should translate the concept into preferred terms. At this point in the indexing process, the indexer should be aware that indexing languages may impose certain constraints in translating the concepts. If the indexer uses a controlled indexing language, this may not permit the exact representation of a concept encountered in a document. The concern is that the concepts that the indexer identified during the second stage of the indexing process might not be present in the indexing language. The indexer is then forced either to choose a term that does not express exactly the same concept or add a new term to the vocabulary to represent the concept. Here, the indexer is required to be familiar with the particular indexing language and the specific rules and mechanisms of the indexing language.
4.5 Indexing Language:
An indexing language is a set of terms and devices used to establish the relationship between terms for representing the content of the documents as well as queries of the users. It consists of three basic elements: controlled vocabulary, syntax and semantics. Controlled vocabulary has been defined as a limited set of terms showing their relationships and indicating ways in which they may usefully be combined to provide a subject index to the documents and to search for these documents, in a particular system. Syntax comprises a grammatical structure or a set of rules that govern the sequence of occurrence of terms/words in representing the content of the document. Semantics refers to the systematic study of how meaning is structured, expressed and understood in the use of an indexing language. More discussion on indexing languages can be seen in another post.
4.6 Problems in Indexing:
An indexer analyses a text and strives to ascertain meaning. Ideally, this analysis anticipates a searcher at some future time, looking for text with the same meaning. But, meaning is not fixed at either end of this process. And even if the meaning is relatively unambiguous or stable, the terms used to represent it are not. Thus, most indexing processes encounter problems at two levels:
- Interpreting meaning as intended by the author and as construed by the potential user;
- Choosing the terms to represent that meaning that will enable this communication to be clear and as true as it can be. (Bearing in mind that such fidelity is a relative thing, to begin with)
Fidelity in the context of IR denotes the accuracy with which term(s) used to represent the name of the subject represent the meaning. A number of problems and issues associated with indexing are:
a) Subjects of documents are complex—usually multi-worded terms;
b) Users’ request for information tend to multidimensional;
c) Choice of terms—among different categories, viz. entities, activities, abstracts, properties and heterogeneous concepts (synonymous to semantic factoring);
d) Choice of word forms—among different forms, viz. noun vs. adjective, singular vs. plural;
e) Homographs—ifneglected, will give rise to reduced relevance. Seriousness o f the problem will depend on the coverage of the system.
f) Choice of the kind of vocabulary that should be used, and syntactical and other rules necessary for representing complex subjects;
g) Identification of term relationship—semantic vs. syntactic;
h) Decision about the exhaustivity level (i.e. the depth to which indexing should be done);
i) Decision about the specificity level (i.e. The levels of generality and specificity at which concepts should be represented);
j) Ensuring inter indexer consistency (i.e. consistency in indexing between several indexers), and intra-indexer consistency (i.e. consistency in indexing by the same indexer at different times); and
k) Ensuring that indexing is done not merely on the basis of a document’s intrinsic subject content but also according to the type of users who may be expected to benefit from it and the types of requests for which the document is likely to be regarded as useful.
4.7 Quality in Indexing:
The quality of an index is defined in terms of its retrieval effectiveness—the ability to retrieve what is wanted and to avoid what is not. Quality in indexing leads to a better performance in retrieving documents. The governing idea is that indexing should be neutral, objective, and independent of the particular indexer’s subjective judgment. An indexing failure on the part of the indexer may take place at the following stages of the indexing process:
➢ Failure in establishing concepts during the conceptual analysis of the content of a document;
➢ Failure to identify a topic that is of potential interest to the target user group;
➢ Misinterpretation of the content of the document, leading to the selection of inappropriate term(s);
➢ Failure in translating the result of conceptual analysis into the indexing language;
➢ Failure to use the most specific term(s) to represent the subject of the document;
➢ Use of inappropriate term(s) for the subject of a document because of the lack of subject knowledge or due to lack of seriousness on the part of the indexer; and
➢ Omission of important term(s).
The quality of indexing depends on two factors: (i) qualification and expertise of the indexer; and (ii) quality of the indexing tools. In order to achieve quality in indexing, the indexer should have adequate knowledge of the field covered by the documents s/he is indexing. S/he should understand the term of the documents as well as the rules and procedures of the specific indexing system. Quality control would be achieved more effectively if the indexers have contact with users. An indexer who has contact with the users might better be able to represent the documents in accordance with how the users think. The idea is that the indexer should attempt to determine the subject of the document taking into account the users’ questions and information needs. This might help the indexer when a document contains multiple concepts. In such a situation, the indexer can select only those concepts to represent the content of a document which is regarded as most relevant by a given community of users. Indexing quality can be tested by analysis of retrieval results, e.g. by calculating recall and precision ratios.
It is assumed that there is a relationship between indexing consistency and indexing quality. That is to say, an increase in consistency can be expected to cause an improvement in indexing quality. Traditionally, consistency in indexing has long been considered as an acceptable indicator of indexing quality. Consistency in indexing is essential for effective retrieval. Indexing consistency refers to “the extent to which agreement exists on the terms to be used to index some document” (Lancaster, 2003). Consistency is a measure that relates to the work of two or more indexers. It should, remain relatively stable throughout the life of a particular indexing system. Consistency is particularly important if the information is to be exchanged between agencies in a documentary network. An important factor in reaching the level of consistency is complete impartiality in the indexes. The goal of the consistency is to promote standard practice in indexing.
It has for long been observed that different indexers tend to assign different index terms to the same document as they differ considerably in their judgment as to which terms reflect the contents of the document most adequately. Essentially, indexing consistency is seen as a measure of the similarity of the reaction of different human beings processing the same information. Indexing consistency in a group of indexers is defined as the degree of agreement in the representation of the essential information content of the document by certain sets of indexing terms selected individually and independently by each of the indexers in the group.
In the process of indexing, indexers choose what topics to represent and what to call those topics. The goal is to select and name topics consistently so that all of the material about any given topic will be found together. Ideally, if two indexers use the same thesaurus or classification system to index the same document, they are supposed to assign the same index terms or class numbers. In practice, indexers are not always consistent with each other, because subject indexing is essentially a subjective process. Indexers may miss important points of the document, and add irrelevant terms. This would stem from insufficient knowledge of indexers about the subject. Decades of research on consistency between indexers and by the same indexer at different times has documented medium to high levels of inconsistency.
Article Collected From:
- Sarkhel, J. (2017). Unit-9 Basics of Subject Indexing. Retrieved from http://egyankosh.ac.in/handle/123456789/35769
- Juran Sarkhel (2017).(Professor of Library & Information Science, University of Kalyani, India)
what is the difference and similarity between searching and indexing languages and also explanation of searching languages? if it is possible send me sir
thank you very much, your materials are wonderful and educative
Pingback: Basics of Search Engine - Library & Information Science Network