PROCEEDINGS IPMU '08
Extracting topics in texts: Towards a fuzzy logic approach
Mohand Boughanem, Henri Prade, Ourdia Bouidghaghen.
The paper presents a preliminary
investigation of potential methods for
extracting semantic views of text
contents, which go beyond standard
statistical indexation. The aim is to
build kinds of fuzzily weighted
structured images of semantic
contents. A preliminary step consists
in identifying the different types of
relations (is-a, part-of, related-to,
synonymy, domain, glossary relations)
that exist between the words of a text,
using some general ontology such as
WordNet. Then taking advantage of
these relations, different types of
fuzzy clusters of words can be built.
Moreover, apart from its frequency of
occurrence, the importance of a word
may be also evaluated through some
estimate of its specificity. The size of
the clusters, the frequency and the
specificity of their words are
indications that enable us to build a
fuzzy set of sets of words that
progressively "emerge" from a text, as
being representative of its contents.
The ideas advocated in the paper and
their potential usefulness are
illustrated on a running example. It is
expected that obtaining a better
representation of the semantic
contents of texts may help to better
retrieve the texts that are relevant with
respect to a given query, and to give
some indication of what the text is
about to a potential reader.
PDF full paper |