Extracting topics in texts: Towards a fuzzy logic approach

Mohand Boughanem, Henri Prade, Ourdia Bouidghaghen.

The paper presents a preliminary investigation of potential methods for extracting semantic views of text contents, which go beyond standard statistical indexation. The aim is to build kinds of fuzzily weighted structured images of semantic contents. A preliminary step consists in identifying the different types of relations (is-a, part-of, related-to, synonymy, domain, glossary relations) that exist between the words of a text, using some general ontology such as WordNet. Then taking advantage of these relations, different types of fuzzy clusters of words can be built. Moreover, apart from its frequency of occurrence, the importance of a word may be also evaluated through some estimate of its specificity. The size of the clusters, the frequency and the specificity of their words are indications that enable us to build a fuzzy set of sets of words that progressively "emerge" from a text, as being representative of its contents. The ideas advocated in the paper and their potential usefulness are illustrated on a running example. It is expected that obtaining a better representation of the semantic contents of texts may help to better retrieve the texts that are relevant with respect to a given query, and to give some indication of what the text is about to a potential reader.

PDF full paper