Jump to content

User:Alvations/word sense induction and disambiguation

From Wikipedia, the free encyclopedia

The word sense induction and disambiguation task consisted of three separate phases:

  1. In the training phase, evaluation task participants were asked to use a traning dataset to induce the sense inventories for a set of polysemous words. The training dataset consisting of a set of polysemous nouns/verbs and the sentnece instances that they occurred in. No other resources were allowed other than morphological and syntactic Natural Language Processing components, such as morpohological analyzers, Part-Of-Speech taggers and syntactic parsers.
  2. In the testing phase, participants were provided with a test set for the disambiguating subtask using the induced sense inventory from the training phase.
  3. In the evaluation phase, answers of to the testing phase were evaluated in a supervised an unsupervised framework.

The unsupervised evaluation for WSI considered two types of evaluation V Measure (Rosenberg and Hirschberg, 2007), and paired F-Score (Artiles et al., 2009). This evaluation follows the supervised evaluation of SemEval-2007 WSI task (Agirre and Soroa, 2007)

Word Sense Induction and Disambiguation Example

[edit]

Often in the induction process, stop words are considered to be semantically irrelevant and hence not considered in the process of building the sense inventory. The induction process outputs clusters of candidate senses that are related to a certain latent semantic variable or sense cluster. Note that these sets of candidate senses should not be regarded as lexicographic meaning distinction (like synsets in WordNet or BabelNet). Rather, it should be regarded as a more coarse-grained and topic-related entity[1].

Target word: chip
Occurs in the contexts[2]:
"An N.V. Philips unit has created a computer system that processes video images 
3,000 times faster than conventional systems."
"Using reduced instruction - set computing, or RISC, chips made by Intergraph of 
Huntsville, Ala., the system splits the image it ‘sees’ into 20 digital 
representations, each processed by one chip."
Induced senses {Centroid:: Candidate senses}: {computer:: cache, CPU, memory, microprocessor, processor, RAM, register}

Disambiguation of the target word in context (a.k.a. coarse-grained sense):
{computer}

See also

[edit]

References

[edit]
  1. ^ Tim Van de Cruys and Marianna Apidianaki. 2011. Latent semantic word sense induction and disambiguation. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT). pp. 1476– 1485. Portland, Oregon, USA.
  2. ^ Note: strikethrough words in the contexts are not considered in the induction process. They are considered as Stop_words.

Category:Computational linguistics Category:Natural language processing Category:Semantics