English all words [64 teams]
As we did for Senseval2, we will tag approximately 5000 words
of coherent Penn Treebank text with WN 1.7 tags. We will tag
all of the predicating words and the head words of their arguments,
and as many adjectives and adverbs as we can. We will do
double-blind tagging with adjudication.
Coordinator: Martha Palmer mpalmer@cis.upenn.edu
Italian all words[7 teams]
In addition to the lexical sample task, we propose an "all words" task for
Italian.
Each participant will be provided with a relatively small set extracted from
the Italian Treebank, consisting of about 5000 words. The sentences can be
provided with POS tagging and syntactic dependency-based tagging (functional
annotation).
The content words (nouns, verbs, and adjectives) will be semantically tagged
according to the sense repository of ItalWordNet.
Coordinators:
Nicoletta Calzolari (ILC-CNR, Pisa, Italy - glottolo@ilc.cnr.it)
Bernardo Magnini (ITC-irst, Trento, Italy - magnini@itc.it)
Basque lexical sample[8 teams]
We propose a "Lexical-Sample" task for Basque in order to evaluate
supervised and semi-supervised learning systems for WSD. Each participant
will be provided with a relatively small set of labelled examples (2 thirds
of
75+15*senses+7*multiwords) and a comparatively very large set of unlabelled
examples (ten times more when possible) for around 40 words. The test set
will be comprised with one third of 75+15*senses+7*multiwords. We target at
two types of participants: supervised systems (not using unlabelled data)
and semi-supervised systems (those taking profit from the unlabelled data),
but unspervised systems can also participate, of course. The sense inventory
will be manually linked to WordNet 1.6 (automatic links to WordNet 1.7
will be also provided). This task will be coordinated with other
lexical-sample tasks (Catalan, English, Italian, Romanian, Spanish) in order
to share around 10 of the target words.
Coordinator: Eneko Agirre eneko@si.ehu.es
Catalan lexical sample[8 teams]
We propose a "Lexical-Sample" task for Catalan in order to evaluate
supervised and semi-supervised learning systems for WSD. Each participant
will be provided with a relatively small set of labelled examples (2 thirds
of 75+15*#senses) and a comparatively very large set of unlabelled
examples (ten times more, when possible) for around 45 words. The test set
will be comprised with one third of 75+15*#senses. We target at
two types of participants: supervised systems (not using unlabelled data)
and semi-supervised systems (those taking profit from the unlabelled data),
but unspervised systems can also participate, of course. The sense inventory,
which is specially developed for the task, will be manually linked to
WordNet 1.6 (automatic links to WordNet 1.7 will be also provided).
This task will be coordinated with other lexical-sample tasks (Basque,
English, Italian, Romanian, Spanish) in order to share around 10 of the
target words.
Coordinator: Lluís Màrquez (lluism@lsi.upc.es)
Chinese lexical sample[16 teams]
The mainland Chinese lexical sample task will consist of three sets of data: dictionary, training data, and test data. The dictionary will contain entries for 20 different Chinese words. For each word, several senses will be defined based on HowNet knowledge base. For each sense, the dictionary entry will list: an id for the sense, a part of speech tag, a definition, and an English translation, as well as some additional information regarding the sense distinctions. Training data will consist of 20-100 examples per word, with more examples for words with larger number of senses. Two sets of training data will be provided: one with part of speech tagging information included, and one without. A part of speech tagging system will be also provided. Evaluation data will consist of about half the number of examples in the training data.
Coordinators:
PengYuan Liu, pyliu@mtlab.hit.edu.cn
English lexical sample[65 teams]
The goal of this task is to create a framework for the evaluation of systems that perform Word Sense Disambiguation. The data will be collected via the Open Mind Word Expert (OMWE) interface. To ensure reliability, we collect at least two tags per item, and conduct inter-tagger agreement and replicability tests. Previously performed evaluations have proved the high quality and usefulness of the OMWE data. By the time Senseval-3 will take place, we estimate to have enough data for at least 150 ambiguous nouns, adjectives, verbs, and adverbs. Part of the test data will be created by lexicographers from the Department of Linguistics at UNT. Another part of the test data will be extracted from the sense tagged corpus collected over the Web.
We will also provide sense maps to enable both fine grained and coarse grained evaluations.
It is anticipated that the English lexical sample task will also include a set of test items drawn from current Web pages.
Coordinators:
Rada Mihalcea, rada@cs.unt.edu
Adam Kilgarriff, Adam.Kilgarriff@itri.brighton.ac.uk
Tim Chklovski, timc@mit.edu
Italian lexical sample[11 teams]
We propose a "Lexical-Sample" task for Italian in order to evaluate
supervised and semi-supervised learning systems for WSD. Each
participant will be provided with a relatively small set of labelled
examples (2 thirds of 75+15*#senses) and a comparatively very large
set of unlabelled examples (ten times more, when possible) for around
45 words. The test set will be comprised with one third of
75+15*#senses. We target at two types of participants: supervised
systems (not using unlabelled data) and semi-supervised systems (those
taking profit from the unlabelled data), but unspervised systems can
also participate, of course. The sense inventory, which is specially
developed for the task, will be manually linked to WordNet 1.6
(automatic links to WordNet 1.7 will be also provided). This task will
be coordinated with other lexical-sample tasks (Basque, English,
Catalan, Romanian, Spanish) in order to share around 10 of the target
words.
Coordinators:
Nicoletta Calzolari (ILC-CNR, Pisa, Italy - glottolo@ilc.cnr.it)
Bernardo Magnini (ITC-irst, Trento, Italy - magnini@itc.it)
Romanian lexical sample[8 teams]
A lexical task for Senseval-3 that addresses the Romanian language. We will select about 50 words, covering all open class parts of speech, with various degrees of ambiguity, and for each such word collect a set of examples from a large Romanian corpus. The number of examples per word will be determined using the 15n+10m+75 formula used during Senseval-1 and Senseval-2 (n = number of senses, m = number of multi-word expressions). The senses and multi-word expressions for each ambiguous word will be taken from the new Romanian WordNet, or DEX (a widely recognized dictionary of the Romanian language). The data will be collected via the Open Mind Word Expert (Romanian edition). A comparatively very large set of unlabelled examples (ten times more, when possible) will be also provided. This task will be coordinated with other lexical-sample tasks (Basque,
Catalan, English, Italian, Spanish) in order to share around 10 of the target words.
Coordinators:
Rada Mihalcea, rada@cs.unt.edu
Vivi Nastase, vnastase@csi.uottawa.edu
Dan Tufis, tufis@racai.ro
Tim Chklovski, timc@mit.edu
Spanish lexical sample[18 teams]
We propose a "Lexical-Sample" task for Spanish in order to evaluate
supervised and semi-supervised learning systems for WSD. Each participant
will be provided with a relatively small set of labelled examples (2 thirds
of 75+15*#senses) and a comparatively very large set of unlabelled
examples (ten times more, when possible) for around 45 words. The test set
will be comprised with one third of 75+15*#senses. We target at
two types of participants: supervised systems (not using unlabelled data)
and semi-supervised systems (those taking profit from the unlabelled data),
but unspervised systems can also participate, of course. The sense inventory,
which is specially developed for the task, will be manually linked to
WordNet 1.6 (automatic links to WordNet 1.7 will be also provided).
This task will be coordinated with other lexical-sample tasks (Basque,
Catalan, English, Italian, Romanian) in order to share around 10 of
the target words.
Coordinator: Lluís Màrquez (lluism@lsi.upc.es)
Swedish lexical sample [4 teams]
A lexical sample task for Swedish, similar in spirit with the Swedish task organized for Senseval-2.
Coordinator: Dimitrios Kokkinakis, Dimitrios.Kokkinakis@svenska.gu.se
Automatic subcategorization acquisition[35 teams]
This task involves evaluating word sense disambiguation (WSD) systems in
the context of automatic subcategorization acquisition. Our task will
restrict to a set of 30 verbs. These are "hard" verbs: high in frequency
and with multiple senses. The participants will be given the list of verbs
in advance to allow a training phase (no training data will be made
available). We will provide the test corpus. This will contain around 1000
instances of each verb, which the participants will be expected to
annotate with WordNet 1.7.1 senses. After receiving the sense annotated
data, we will map the detected WordNet senses to our senses, which are
based on broad Levin style verb classes. We will feed the sense annotated
data from each system to Anna Korhonen's subcategorization acquisition
software. The acquired frames will be evaluated against manually obtained
gold standard frames, which will yield a ranking of the WSD systems.
Coordinators:
Judita Preiss (Judita.Preiss@cl.cam.ac.uk)
Anna Korhonen (Anna.Korhonen@cl.cam.ac.uk)
http://www.cl.cam.ac.uk/users/jp233/senseval/index.html
Multilingual lexical sample[23 teams]
The goal of this task is to create a framework for the evaluation of systems that perform Machine Translation, with a focus on the translation of ambiguous words. The task will be very similar to the lexical sample task, except that rather than using the sense inventory from a dictionary we will follow the suggestion of Resnik and Yarowsky and use the translations of the target words into a second language as the "inventory". The contexts will be in English, and the tags for the target words will be their translations in a second language.
We plan to select words with various degrees of "interlingual-ambiguity", to create a complete picture of the various problems that may appear in this task. At the moment, we plan on two pairs of languages, English-French, and English-Hindi, with an estimated number of about 50 ambiguous words per language pair. The data will be collected via the Open Mind Word Expert (bilingual edition).
Coordinators:
Ted Pedersen, tpederse@d.umn.edu
Amruta Purandare, pura0010@d.umn.edu
Rada Mihalcea, rada@cs.unt.edu
Tim Chklovski, timc@mit.edu
Word-Sense Disambiguation of WordNet Glosses [36 teams]
In preparations for WordNet 2.0 (George Miller et al.) and eXtended
WordNet (XWN, Dan Moldovan et al.), a large number of the WordNet
glosses are being hand-tagged. Each content word (noun, verb,
adjective, and adverb) are being labelled with their WordNet senses.
This manual effort is time-consuming and energy intensive. The
Senseval-3 task is to perform this tagging automatically using a
selected set of the hand-tagged glosses as the test set, with the
hand-tagging also serving as the gold standard for evaluation. The task
will be performed as an "all-words" task, except that no context will be
provided. However, it is expected that participants will make use of
additional WordNet information (synset, the WordNet hierarchy, and other
WordNet relations) in their disambiguation.
Coordinator: Ken Litkowski (ken@clres.com)
Automatic Labeling of Semantic Roles [36 teams]
Word-sense disambiguation has frequently been criticized as a task in
search of a reason. Heretofore, the focus of disambiguation has been on
the sense inventory and has not examined the major reason why we would
have lexical knowledge bases: how the meanings would be represented and
thus, available for use in natural language processing applications. An
important baseline study for automatic labelling of semantic roles
(following the FrameNet paradigm) has recently appeared in the
literature ("Automatic Labeling of Semantic Roles" by Daniel Gildea and
Daniel Jurafsky). The FrameNet project has put together a body of
hand-labeled data and this study has put together a set of suitable
metrics for evaluating the performance of an automatic system. The
proposed Senseval-3 task would call for the development of systems to
meet the same objectives as the Gildea and Jurafsky study. The data for
this task would be a sample of the FrameNet hand-annotated data.
Evaluation of systems would follow the metrics of the Gildea and
Jurafsky study.
Coordinator: Ken Litkowski (ken@clres.com)
Identification of Semantic Roles in Swedish[2 teams]
Organize a task based on "semantic roles", using
labels such as "Agent", "Recipient", "Material", "Phenomenon",
"Location" etc. In order to do this type of semantic role annotation
there is a requirement for syntactic tagged texts which we are
willing to provide from our treebank for the task (thus potential
participants will use a uniform syntactic annotation).
Coordinator: Dimitrios Kokkinakis, Dimitrios.Kokkinakis@svenska.gu.se
Identification of Logic Forms in English[26 teams]
Automated reasoning is one major goal of humankind, but lately only
little attention has been paid to the task of automatically creating
reliable logic forms. Natural language based representations are more
powerful when predicates are disambiguated. This task is complementary to
the mainstream task in Senseval The goal is to transform English
sentences into a first order logic notation. A predicate corresponds to
each content word, conjunctions and prepositions and arguments have
syntactic values. Guidelines and examples of logic form will be provided
to participants. The performance of the systems will be evaluated at
sentence and predicate level, using precision and recall measures
determined against the gold standard, which will consist of logic forms
created by human annotators.
Coordinator: Vasile Rus, vasile@cs.iusb.edu
|