By Gerard Salton
Offers a concept of indexing in a position to rating index phrases, or topic identifiers in lowering order of significance. This ends up in the alternative of fine record representations, and likewise bills for the function of words and of glossary sessions within the indexing technique.
This learn is general of theoretical paintings in computerized details association and retrieval, in that suggestions are used from arithmetic, laptop technological know-how, and linguistics. a whole concept of info retrieval may perhaps emerge from a suitable blend of those 3 disciplines.
Read or Download A Theory of Indexing PDF
Best probability books
A vintage textual content, this two-volume paintings offers the 1st entire improvement of chance conception from a subjectivist standpoint. Proceeds from a close dialogue of the philosophical and mathematical elements of the rules of chance to an in depth mathematical therapy of likelihood and facts.
A finished textual content and reference bringing jointly advances within the idea of chance and statistics and concerning them to purposes. the 3 significant different types of statistical types that relate based variables to explanatory variables are lined: univariate regression versions, multivariate regression versions, and simultaneous equations types.
This booklet emphasizes the statistical innovations and assumptions essential to describe and make inferences approximately genuine info. in the course of the e-book the authors inspire the reader to plan and consider their facts, locate self belief periods, use strength analyses to figure out pattern measurement, and calculate influence sizes.
This quantity comprises the total lawsuits of the miniconference, "Probability and Analysis", held on the collage of latest South Wales, in Sydney, in July 1991. the most issues of the convention have been using likelihood in research, and geometric and operator theoretic facets of Banach area idea.
- Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis (Springer Series in Statistics)
- Validation of Stochastic Systems: A Guide to Current Research
- A Bayesian approach to relaxing parameter restrictions in multivariate GARCH models
- Recent developments in nonparametric inference and probability: festschrift for Michael Woodroofe
Extra resources for A Theory of Indexing
1 subtraction. The total is 2K' + 1 additions or subtractions, and K' + 2 multiplications or divisions. For t terms, this produces (2K' + \)t additions and (K' + 2)t multiplications. The last term represents the increment over and above the simple frequency counts of expressions (4) and (5). 24 G. SALTON The signal-noise calculations are more expensive to perform than the EK values. Consider first the noise Nk (formula (6)); the requirements are K' additions for Fk, 2K' divisions, K' logarithms, K' multiplications, and K' additions to compute the final sum.
B. PT phrases from nondiscriminators A. Standard /* run vs. B. SPT phrases from discriminators A. Standard /J run vs. B. Combined PT + SPT phrases A. ft • IDF weights vs. B. 01 (A> B) a thesaurus class, the class will exhibit a much higher document frequency, and most likely a better discrimination value, than any of the original terms. There exist well-known procedures for constructing thesauruses either manually or automatically , , . In the latter case, automatic term classification methods may be used to generate the appropriate term groups .
A simple characterization of a useful retrieval term is thus difficult to generate directly from the IDF distributions of Table 3. 42 G. SALTON The situation is apparently less complicated when the terms are considered in order by discrimination value as represented in the lower half of Table 5. Obviously, the best terms have interesting frequency distributions, whereas the average and poor DVterms have either very low or very high occurrence frequencies. Furthermore, a direct correlation exists between discrimination value order and document frequency Bk.