SENSEVAL 3
Call for Interest in Participation

The call for interest in participation for Senseval-3 is now closed. The call was active for one month, and the response we received was quite impressive! A total of 90 different sites (teams) have expressed interest in Senseval-3. Most of the teams have signed up for more than one task, for a total of 367 team-tasks, with one "team-task" being defined as the interest in participation expressed by one team in a certain task. Check below for the number of teams who signed up for each individual task. (Note that this represents "expression of interest", and it is likely that some teams will not be able to participate in all the tasks they expressed interest in.)

Senseval 3 is scheduled to take place in March 2004, and the workshop is planned for July, hopefully in conjunction with ACL-04 in Barcelona. As with previous Senseval evaluations, participants will be provided with training data, and will have the chance to test their systems on common data sets. Similar tasks will use the same data format.

Note: Senseval-3 is still open to all. The call for participation will come out in February 2004. The purpose of the call for interest was to estimate the level of participation in Senseval-3.

Senseval 3 Tasks

English all words [64 teams]

As we did for Senseval2, we will tag approximately 5000 words of coherent Penn Treebank text with WN 1.7 tags. We will tag all of the predicating words and the head words of their arguments, and as many adjectives and adverbs as we can. We will do double-blind tagging with adjudication.

Coordinator: Martha Palmer mpalmer@cis.upenn.edu


Italian all words[7 teams]

In addition to the lexical sample task, we propose an "all words" task for Italian. Each participant will be provided with a relatively small set extracted from the Italian Treebank, consisting of about 5000 words. The sentences can be provided with POS tagging and syntactic dependency-based tagging (functional annotation). The content words (nouns, verbs, and adjectives) will be semantically tagged according to the sense repository of ItalWordNet.

Coordinators:
Nicoletta Calzolari (ILC-CNR, Pisa, Italy - glottolo@ilc.cnr.it)
Bernardo Magnini (ITC-irst, Trento, Italy - magnini@itc.it)


Basque lexical sample[8 teams]

We propose a "Lexical-Sample" task for Basque in order to evaluate supervised and semi-supervised learning systems for WSD. Each participant will be provided with a relatively small set of labelled examples (2 thirds of 75+15*senses+7*multiwords) and a comparatively very large set of unlabelled examples (ten times more when possible) for around 40 words. The test set will be comprised with one third of 75+15*senses+7*multiwords. We target at two types of participants: supervised systems (not using unlabelled data) and semi-supervised systems (those taking profit from the unlabelled data), but unspervised systems can also participate, of course. The sense inventory will be manually linked to WordNet 1.6 (automatic links to WordNet 1.7 will be also provided). This task will be coordinated with other lexical-sample tasks (Catalan, English, Italian, Romanian, Spanish) in order to share around 10 of the target words.

Coordinator: Eneko Agirre eneko@si.ehu.es


Catalan lexical sample[8 teams]

We propose a "Lexical-Sample" task for Catalan in order to evaluate supervised and semi-supervised learning systems for WSD. Each participant will be provided with a relatively small set of labelled examples (2 thirds of 75+15*#senses) and a comparatively very large set of unlabelled examples (ten times more, when possible) for around 45 words. The test set will be comprised with one third of 75+15*#senses. We target at two types of participants: supervised systems (not using unlabelled data) and semi-supervised systems (those taking profit from the unlabelled data), but unspervised systems can also participate, of course. The sense inventory, which is specially developed for the task, will be manually linked to WordNet 1.6 (automatic links to WordNet 1.7 will be also provided). This task will be coordinated with other lexical-sample tasks (Basque, English, Italian, Romanian, Spanish) in order to share around 10 of the target words.

Coordinator: Lluís Màrquez (lluism@lsi.upc.es)


Chinese lexical sample[16 teams]

The mainland Chinese lexical sample task will consist of three sets of data: dictionary, training data, and test data. The dictionary will contain entries for 20 different Chinese words. For each word, several senses will be defined based on HowNet knowledge base. For each sense, the dictionary entry will list: an id for the sense, a part of speech tag, a definition, and an English translation, as well as some additional information regarding the sense distinctions. Training data will consist of 20-100 examples per word, with more examples for words with larger number of senses. Two sets of training data will be provided: one with part of speech tagging information included, and one without. A part of speech tagging system will be also provided. Evaluation data will consist of about half the number of examples in the training data.

Coordinators:
PengYuan Liu, pyliu@mtlab.hit.edu.cn


English lexical sample[65 teams]

The goal of this task is to create a framework for the evaluation of systems that perform Word Sense Disambiguation. The data will be collected via the Open Mind Word Expert (OMWE) interface. To ensure reliability, we collect at least two tags per item, and conduct inter-tagger agreement and replicability tests. Previously performed evaluations have proved the high quality and usefulness of the OMWE data. By the time Senseval-3 will take place, we estimate to have enough data for at least 150 ambiguous nouns, adjectives, verbs, and adverbs. Part of the test data will be created by lexicographers from the Department of Linguistics at UNT. Another part of the test data will be extracted from the sense tagged corpus collected over the Web. We will also provide sense maps to enable both fine grained and coarse grained evaluations. It is anticipated that the English lexical sample task will also include a set of test items drawn from current Web pages.

Coordinators:
Rada Mihalcea, rada@cs.unt.edu
Adam Kilgarriff, Adam.Kilgarriff@itri.brighton.ac.uk
Tim Chklovski, timc@mit.edu


Italian lexical sample[11 teams]

We propose a "Lexical-Sample" task for Italian in order to evaluate supervised and semi-supervised learning systems for WSD. Each participant will be provided with a relatively small set of labelled examples (2 thirds of 75+15*#senses) and a comparatively very large set of unlabelled examples (ten times more, when possible) for around 45 words. The test set will be comprised with one third of 75+15*#senses. We target at two types of participants: supervised systems (not using unlabelled data) and semi-supervised systems (those taking profit from the unlabelled data), but unspervised systems can also participate, of course. The sense inventory, which is specially developed for the task, will be manually linked to WordNet 1.6 (automatic links to WordNet 1.7 will be also provided). This task will be coordinated with other lexical-sample tasks (Basque, English, Catalan, Romanian, Spanish) in order to share around 10 of the target words.

Coordinators:
Nicoletta Calzolari (ILC-CNR, Pisa, Italy - glottolo@ilc.cnr.it)
Bernardo Magnini (ITC-irst, Trento, Italy - magnini@itc.it)


Romanian lexical sample[8 teams]

A lexical task for Senseval-3 that addresses the Romanian language. We will select about 50 words, covering all open class parts of speech, with various degrees of ambiguity, and for each such word collect a set of examples from a large Romanian corpus. The number of examples per word will be determined using the 15n+10m+75 formula used during Senseval-1 and Senseval-2 (n = number of senses, m = number of multi-word expressions). The senses and multi-word expressions for each ambiguous word will be taken from the new Romanian WordNet, or DEX (a widely recognized dictionary of the Romanian language). The data will be collected via the Open Mind Word Expert (Romanian edition). A comparatively very large set of unlabelled examples (ten times more, when possible) will be also provided. This task will be coordinated with other lexical-sample tasks (Basque, Catalan, English, Italian, Spanish) in order to share around 10 of the target words.

Coordinators:
Rada Mihalcea, rada@cs.unt.edu
Vivi Nastase, vnastase@csi.uottawa.edu
Dan Tufis, tufis@racai.ro
Tim Chklovski, timc@mit.edu


Spanish lexical sample[18 teams]

We propose a "Lexical-Sample" task for Spanish in order to evaluate supervised and semi-supervised learning systems for WSD. Each participant will be provided with a relatively small set of labelled examples (2 thirds of 75+15*#senses) and a comparatively very large set of unlabelled examples (ten times more, when possible) for around 45 words. The test set will be comprised with one third of 75+15*#senses. We target at two types of participants: supervised systems (not using unlabelled data) and semi-supervised systems (those taking profit from the unlabelled data), but unspervised systems can also participate, of course. The sense inventory, which is specially developed for the task, will be manually linked to WordNet 1.6 (automatic links to WordNet 1.7 will be also provided). This task will be coordinated with other lexical-sample tasks (Basque, Catalan, English, Italian, Romanian) in order to share around 10 of the target words.

Coordinator: Lluís Màrquez (lluism@lsi.upc.es)


Swedish lexical sample [4 teams]

A lexical sample task for Swedish, similar in spirit with the Swedish task organized for Senseval-2.

Coordinator: Dimitrios Kokkinakis, Dimitrios.Kokkinakis@svenska.gu.se


Automatic subcategorization acquisition[35 teams]

This task involves evaluating word sense disambiguation (WSD) systems in the context of automatic subcategorization acquisition. Our task will restrict to a set of 30 verbs. These are "hard" verbs: high in frequency and with multiple senses. The participants will be given the list of verbs in advance to allow a training phase (no training data will be made available). We will provide the test corpus. This will contain around 1000 instances of each verb, which the participants will be expected to annotate with WordNet 1.7.1 senses. After receiving the sense annotated data, we will map the detected WordNet senses to our senses, which are based on broad Levin style verb classes. We will feed the sense annotated data from each system to Anna Korhonen's subcategorization acquisition software. The acquired frames will be evaluated against manually obtained gold standard frames, which will yield a ranking of the WSD systems.

Coordinators:
Judita Preiss (Judita.Preiss@cl.cam.ac.uk)
Anna Korhonen (Anna.Korhonen@cl.cam.ac.uk)

http://www.cl.cam.ac.uk/users/jp233/senseval/index.html


Multilingual lexical sample[23 teams]

The goal of this task is to create a framework for the evaluation of systems that perform Machine Translation, with a focus on the translation of ambiguous words. The task will be very similar to the lexical sample task, except that rather than using the sense inventory from a dictionary we will follow the suggestion of Resnik and Yarowsky and use the translations of the target words into a second language as the "inventory". The contexts will be in English, and the tags for the target words will be their translations in a second language. We plan to select words with various degrees of "interlingual-ambiguity", to create a complete picture of the various problems that may appear in this task. At the moment, we plan on two pairs of languages, English-French, and English-Hindi, with an estimated number of about 50 ambiguous words per language pair. The data will be collected via the Open Mind Word Expert (bilingual edition).

Coordinators:
Ted Pedersen, tpederse@d.umn.edu
Amruta Purandare, pura0010@d.umn.edu
Rada Mihalcea, rada@cs.unt.edu
Tim Chklovski, timc@mit.edu


Word-Sense Disambiguation of WordNet Glosses [36 teams]

In preparations for WordNet 2.0 (George Miller et al.) and eXtended WordNet (XWN, Dan Moldovan et al.), a large number of the WordNet glosses are being hand-tagged. Each content word (noun, verb, adjective, and adverb) are being labelled with their WordNet senses. This manual effort is time-consuming and energy intensive. The Senseval-3 task is to perform this tagging automatically using a selected set of the hand-tagged glosses as the test set, with the hand-tagging also serving as the gold standard for evaluation. The task will be performed as an "all-words" task, except that no context will be provided. However, it is expected that participants will make use of additional WordNet information (synset, the WordNet hierarchy, and other WordNet relations) in their disambiguation.

Coordinator: Ken Litkowski (ken@clres.com)


Automatic Labeling of Semantic Roles [36 teams]

Word-sense disambiguation has frequently been criticized as a task in search of a reason. Heretofore, the focus of disambiguation has been on the sense inventory and has not examined the major reason why we would have lexical knowledge bases: how the meanings would be represented and thus, available for use in natural language processing applications. An important baseline study for automatic labelling of semantic roles (following the FrameNet paradigm) has recently appeared in the literature ("Automatic Labeling of Semantic Roles" by Daniel Gildea and Daniel Jurafsky). The FrameNet project has put together a body of hand-labeled data and this study has put together a set of suitable metrics for evaluating the performance of an automatic system. The proposed Senseval-3 task would call for the development of systems to meet the same objectives as the Gildea and Jurafsky study. The data for this task would be a sample of the FrameNet hand-annotated data. Evaluation of systems would follow the metrics of the Gildea and Jurafsky study.

Coordinator: Ken Litkowski (ken@clres.com)


Identification of Semantic Roles in Swedish[2 teams]

Organize a task based on "semantic roles", using labels such as "Agent", "Recipient", "Material", "Phenomenon", "Location" etc. In order to do this type of semantic role annotation there is a requirement for syntactic tagged texts which we are willing to provide from our treebank for the task (thus potential participants will use a uniform syntactic annotation).

Coordinator: Dimitrios Kokkinakis, Dimitrios.Kokkinakis@svenska.gu.se


Identification of Logic Forms in English[26 teams]

Automated reasoning is one major goal of humankind, but lately only little attention has been paid to the task of automatically creating reliable logic forms. Natural language based representations are more powerful when predicates are disambiguated. This task is complementary to the mainstream task in Senseval The goal is to transform English sentences into a first order logic notation. A predicate corresponds to each content word, conjunctions and prepositions and arguments have syntactic values. Guidelines and examples of logic form will be provided to participants. The performance of the systems will be evaluated at sentence and predicate level, using precision and recall measures determined against the gold standard, which will consist of logic forms created by human annotators.

Coordinator: Vasile Rus, vasile@cs.iusb.edu


Senseval home page