Current projects and research interests
Grammar engineering is the task of designing and implementing linguistically motivated electronic descriptions of natural language (so-called grammars). These grammars are expressed within well-defined theoretical frameworks, and offer a fine-grained description of natural language. While grammars were first used to describe syntax, that is to say, the relations between constituents in a sentence, they often go beyond syntax and include e.g. semantic information.
I have been working with colleagues (see e.g. (Crabbé et al., 2013), (Petitjean et al., 2016)) on the definition and implementation of description languages for grammar engineering, that is formal languages which help linguists to describe various dimensions of language (syntax, semantics, morphology). We are also interested in the application of these description languages to the actual description of natural languages (such as Ikota (Duchier et al., 2012) or Arabic (BenKhelil et al., 2016)).
Related projects include:
- eXtensible Meta-Grammar 2 (XMG2)
- Formal grammars (Grammaires formelles), collaboration between CS and CL departments of the University of Orléans
Parsing (aka syntactic analysis) is the task of computing a representation of the relations between words in a string. Parsing usually relies on a formal description of language (grammar) and produces a tree structure (constituency tree or dependency structure, depending on the framework one is working with).
I have worked in the context of parsing natural language with mildly context-sensitive grammars (namely Tree-Adjoining Grammars TAG). The objectives are manyfold, and include the following:
- enhance practical TAG parsing, see e.g. (Gardent et al., 2014)
- performing semantic construction based on syntax, following Montague's legacy, see e.g. (Gardent and Parmentier, 2005)
I also developed, together with colleagues (Duchier et al., 2014), a parsing prototype for Property Grammars, a constraint-based grammar formalism. Property Grammars differ from generative formalisms insofar as they can describe the syntax of ungrammatical or partially grammatical utterances, thus providing a formal framework for grammaticality judgement.
Multi-word expressions (MWEs) are sequences of words with some unpredictable properties, such as to count somebody in (to rely on somebody) or to take a haircut (to suffer from some loss). Processing such expressions is particularly difficult because of their highly heterogeneous behaviour at the lexical, syntactic and semantic level.
Starting with my participation to the PARSEME COST Action (Savary et al., 2015) led by Agata Savary, I paid more and more attention to the representation of these expressions in linguistic resources and their impact on parsing (see e.g. (Waszczuk et al., 2016)). Ongoing projects on this topic include:
- PARSEME-FR (ANR Project)
- Phraseology and Mutliword Expressions series to appear at Language Science Press
During my PhD (2003-2007) under the supervision of Claire Gardent, I worked on the semi-automatic generation of real-size Tree-Adjoining Grammars, which led to the development of the XMG and SemConst softwares. The former is a compiler for the XMG description language (language for describing syntactic trees via reusable tree fragments and flat semantic representations), and the latter a semantic wrapper for the DyALog system:
During a post-doctoral visit to Laura Kallmeyer's group at the University of Tübingen in 2007-2008, I worked on a parsing architecture for mildly context-sensitive formalisms and which is based on Range Concatenation Grammar as a pivot formalism (Parmentier et al., 2008) (a formalism which can represent all languages whose parsing complexity is polynomial):
- Tuebingen Linguistic Parsing Architecture (TuLiPA)
- Anastasia Shimorina - PhD in Computer Science, co-supervision with Claire Gardent, Nancy, 2017 - ongoing.
- Cherifa Ben Khelil - PhD in Computer Science in cotutelle, co-supervision with Denys Duchier (Orléans), Chiraz Zribi (Tunis), 2015 - ongoing.
- Jakub Waszczuk - PhD in Computer Science, thesis entitled Leveraging MWEs in practical TAG parsing: towards the best of the two worlds, co-supervision with Agata Savary, Blois, 2013 - 2017. Now post-doctoral research fellow at University of Düsseldorf, Germany.
- Simon Petitjean - MSc in Computer Science, Orléans, 2010 ; PhD in Computer Science, thesis entitled Génération Modulaire de Grammaires Formelles, co-supervision with Denys Duchier, Orléans, 2010 - 2014. Now research fellow at University of Düsseldorf, Germany.
- Kilian Evang - BA in Computational Linguistics, Tübingen, 2008. Now post-doctoral research fellow at University of Düsseldorf, Germany.
- Johannes Dellert - BA in Computational Linguistics, Tübingen, 2008. Now PhD candidate at University of Tübingen, Germany.
- Brice Ambrosiak - Licence Mathématiques-Informatique - Undergraduate Studies in Computer Science, Nancy, 2007. Now software engineer at ARHS Cube, Luxemburg.
Chérifa Ben Khelil, Denys Duchier, Yannick Parmentier, Chiraz Zribi, and Fériel Ben Fraj. ArabTAG: from a Handcrafted to a Semi-automatically Generated TAG. In TAG+12: 12th International Workshop on Tree-Adjoining Grammars and Related Formalisms. Düsseldorf, Germany, 06 2016. URL: https://hal.archives-ouvertes.fr/hal-01320995. ↩
Benoît Crabbé, Denys Duchier, Claire Gardent, Joseph Le Roux, and Yannick Parmentier. XMG : eXtensible MetaGrammar. Computational Linguistics, 39(3):591–629, 09 2013. URL: https://hal.archives-ouvertes.fr/hal-00768224. ↩
Denys Duchier, Thi-Bich-Hanh Dao, and Yannick Parmentier. Model-Theory and Implementation of Property Grammars with Features. Journal of Logic and Computation, 24(2):491–509, 03 2014. URL: https://hal.archives-ouvertes.fr/hal-00782398, doi:10.1093/logcom/exs080. ↩
Denys Duchier, Brunelle Magnana Ekoukou, Yannick Parmentier, Simon Petitjean, and Emmanuel Schang. Describing Morphologically-rich Languages using Metagrammars: a Look at Verbs in Ikota. In Workshop on ”Language technology for normalisation of less-resourced languages”, 8th SALTMIL Workshop on Minority Languages and the 4th workshop on African Language Technology, 55–60. Istanbul, Turkey, 05 2012. URL: https://hal.archives-ouvertes.fr/hal-00688643. ↩
Claire Gardent and Yannick Parmentier. Large scale semantic construction for Tree Adjoining Grammar. In Philippe Blache, Edward Stabler, Joan Busquets, and Richard Moot, editors, Logical Aspects in Computational Linguistics - LACL'05, volume 3492 of Lecture Notes in Computer Science, 131–146. Springer, 2005. URL: https://hal.archives-ouvertes.fr/inria-00000251. ↩
Claire Gardent, Yannick Parmentier, Guy Perrier, and Sylvain Schmitz. Lexical Disambiguation in LTAG using Left Context. In Zygmunt Vetulani and Joseph Mariani, editors, Human Language Technology. Challenges for Computer Science and Linguistics. 5th Language and Technology Conference, LTC 2011, Poznan, Poland, November 25-27, 2011, Revised Selected Papers, volume 8387 of Lecture Notes in Computer Science (LNCS) series / Lecture Notes in Artificial Intelligence (LNAI) subseries, pages 67–79. Springer, 07 2014. URL: https://hal.archives-ouvertes.fr/hal-00921246, doi:10.1007/978-3-319-08958-4_6. ↩
Yannick Parmentier, Laura Kallmeyer, Timm Lichte, Wolfgang Maier, and Johannes Dellert. TuLiPA: A Syntax-Semantics Parsing Environment for Mildly Context-Sensitive Formalisms. In 9th International Workshop on Tree-Adjoining Grammar and Related Formalisms (TAG+9), 121–128. Tübingen, Germany, 06 2008. URL: https://hal.archives-ouvertes.fr/inria-00288429. ↩
Simon Petitjean, Denys Duchier, and Yannick Parmentier. XMG2: Describing Description Languages. In Maxime Amblard, Philippe de Groote, Sylvain Pogodalla, and Christian Rétoré, editors, Logical Aspects of Computational Linguistics (LACL 2016), volume 10054 of Lecture Notes in Computer Science, 255–272. Nancy, France, 12 2016. Springer-Verlag. URL: https://hal.archives-ouvertes.fr/hal-01361316, doi:10.1007/978-3-662-53826-5_16. ↩
Agata Savary, Manfred Sailer, Yannick Parmentier, Michael Rosner, Victoria Rosén, Adam Przepiórkowski, Cvetana Krstev, Veronika Vincze, Beata Wójtowicz, Gyri Smørdal Losnegaard, Carla Parra Escartín, Jakub Waszczuk, Mathieu Constant, Petya Osenova, and Federico Sangati. PARSEME – PARSing and Multiword Expressions within a European multilingual network. In 7th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics (LTC 2015). Poznań, Poland, 11 2015. URL: https://hal.archives-ouvertes.fr/hal-01223349. ↩
Jakub Waszczuk, Agata Savary, and Yannick Parmentier. Promoting multiword expressions in A* TAG parsing. In 26th International Conference on Computational Linguistics (COLING 2016). Osaka, Japan, 12 2016. URL: https://hal.archives-ouvertes.fr/hal-01378903. ↩