WebNLG Challenge 2017

Info

The WebNLG Challenge 2017 is over. It was held in April-October 2017.

General Information

Task

The WebNLG challenge consists in mapping data to text. The training data consists of Data/Text pairs where the data is a set of triples extracted from DBpedia and the text is a verbalisation of these triples. For instance, given the 3 DBpedia triples shown in (a), the aim is to generate a text such as (b).

a. (John_E_Blaha birthDate 1942_08_26) (John_E_Blaha birthPlace San_Antonio) (John_E_Blaha occupation Fighter_pilot)
b. John E Blaha, born in San Antonio on 1942-08-26, worked as a fighter pilot

As the example illustrates, the task involves specific NLG subtasks such as sentence segmentation (how to chunk the input data into sentences), lexicalisation (of the DBpedia properties), aggregation (how to avoid repetitions) and surface realisation (how to build a syntactically correct and natural sounding text).

Data

The WebNLG Challenge dataset consists of 21,855 data/text pairs with a total of 8,372 distinct data input. The input describes entities belonging to 9 distinct DBpedia categories namely, Astronaut, University, Monument, Building, ComicsCharacter, Food, Airport, SportsTeam and WrittenWork. The WebNLG data is licensed under the following license: CC Attribution-Noncommercial-Share Alike 4.0 International. For a more detailed description of the dataset, see here.

After the challenge had finished, a larger dataset was released, describing 15 DBpedia categories. New categories include CelestialBody, MeanOfTransportation, City, Athlete, Politician, Artist.

References

Creating Training Corpora for NLG Micro-Planners. Claire Gardent, Anastasia Shimorina, Shashi Narayan and Laura Perez-Beltrachini. Proceedings of ACL 2017. PDF
Building RDF Content for Data-to-Text Generation. Laura Perez-Beltrachini, Rania Sayed and Claire Gardent. Proceedings of COLING 2016. Osaka (Japan). PDF
The WebNLG Challenge: Generating Text from DBpedia Data. Emilie Colin, Claire Gardent, Yassine Mrabet, Shashi Narayan and Laura Perez-Beltrachini. Proceedings of INLG 2016 PDF

To cite the dataset and/or challenge, use:

@inproceedings{gardent2017creating,
    title = "Creating Training Corpora for {NLG} Micro-Planners",
    author = "Gardent, Claire  and
      Shimorina, Anastasia  and
      Narayan, Shashi  and
      Perez-Beltrachini, Laura",
    booktitle = "Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2017",
    address = "Vancouver, Canada",
    publisher = "Association for Computational Linguistics",
    doi = "10.18653/v1/P17-1017",
    pages = "179--188",
    url = "https://www.aclweb.org/anthology/P17-1017.pdf"
}

Important Dates

  • 14 April 2017: Release of Training and Development Data
  • 30 April 2017: Release of Baseline System
  • 18 August 2017: Release of Test Data 1 July - 22 August 2017: Test data submission period
    • Fill in the form and retrieve data
    • Submit test data outputs at the latest 48 hours after download and no later than August 22nd.
  • 25 22 August 2017: Entry submission deadline
  • 5 September 2017: WebNLG meeting at INLG 2017 and presentation of the results of the automatic evaluation
  • 30 September October 2017: Results of human evaluation

Organising Committee

  • Claire Gardent, CNRS/LORIA, Nancy, France
  • Anastasia Shimorina, CNRS/LORIA, Nancy, France
  • Shashi Narayan, School of Informatics, University of Edinburgh, UK
  • Laura Perez-Beltrachini, School of Informatics, University of Edinburgh, UK

Contacts

webnlg2017@inria.fr

Acknowledgments

The WebNLG challenge is funded by the WebNLG ANR Project.

Participation in the Challenge

If you plan to participate in the WebNLG challenge, here is how it goes. All requests should be sent to webnlg2017@inria.fr.

Registration

Please register using the following form.

Test Data

The test data will consist of around 1700 meaning representations (sets of DBpedia triples) equally distributed in terms of size (1 to 7 triples) and divided into two halves. The first half will contain inputs from DBpedia categories that have been seen in the training data (Astronaut, University, Monument, Building, ComicsCharacter, Food, Airport, SportsTeam, City, and WrittenWork), the second half will contain input extracted for entities belonging to 5 unseen categories.

Submitting Results

The results must be submitted (email to webnlg2017@inria.fr) to the organisers 48 hours after the organisers have sent the data. To allow for a fair comparison, late submissions will be rejected.

In addition to system outputs, the participants are requested to send by email (webnlg2017@inria.fr) to the organisers a 2 page description of their system. This description will be made available on the WebNLG challenge portal.

Data Format

  • Test Data

    Test data will be in the same format as training data (see documentation), but without <lex> sections. Each set of DBpedia triples has an ID.

    The example of test data is here.

  • Submission Entry

    Your submission file must be in plain text, lowercased and tokenised. Multiple verbalisations per set of DBpedia triples are not allowed.

    The example of submission file is here.

    Each line corresponds to a verbalisation of a DBpedia triple set. Line 1 must represent the verbalisation of the DBpedia triple set with the ID=1, line 2 — the DBpedia triple set with the ID=2, etc.

Evaluation

Evaluation will proceed in two steps.

First, the results of automatic metrics (BLEU, TER, METEOR) will be provided. We will provide global and detailed results (per DBpedia category, per input size, per Category and Input Size, etc.). These results will be presented at the INLG conference in Santiago de Compostelle, Spain on September 5th.

Second, the results of a human evaluation will be provided. The human evaluation will seek to assess such criteria as fluency, grammaticality and appropriateness (does the text correctly verbalise the input?)

WebNLG Baseline System

For the WebNLG challenge, we provide a baseline system which can serve as a starting point for your experiments.

Scripts to reproduce our experiments are available on GitLab.

Preparing data

Linearisation, tokenisation, delexicalisation

  • Unpack the archive with the WebNLG dataset into a data-directory folder.
  • Run a preprocessing script.

    python3 webnlg_baseline_input.py -i <data-directory>
    

The script extracts tripleset-lexicalisation pairs, linearises triples, performs tokenisation and delexicalisation using the exact match, and writes source and target files.

After the preprocessing, an original pair "tripleset-lexicalisation" [1] is modified into a pair of a source and target sequence [2].

Original [1]

<modifiedtripleset>
    <mtriple>Indonesia | leaderName | Jusuf_Kalla</mtriple>
    <mtriple>Bakso | region | Indonesia</mtriple>
    <mtriple>Bakso | ingredient | Noodle</mtriple>
    <mtriple>Bakso | country | Indonesia</mtriple>
</modifiedtripleset>
<lex>
Bakso is a food containing noodles;it is found in Indonesia where Jusuf Kalla is the leader.
</lex>

Modified [2]

source files .triple_:

COUNTRY leaderName LEADERNAME FOOD region COUNTRY FOOD ingredient INGREDIENT FOOD country COUNTRY
target files _.lex:
FOOD is a food containing noodles ; it is found in COUNTRY where LEADERNAME is the leader .

The script writes training and validation files which are used as input to neural generation, as well as reference files for evaluation.

Training a model and generating verbalisations

A simple sequence-to-sequence model with the attention mechanism was trained using the OpenNMT toolkit using the default parameters for training and translating.

  1. Install OpenNMT.

  2. Navigate to the OpenNMT directory.

  3. Process data files and convert them to the OpenNMT format.

    th preprocess.lua \
    -train_src <data-directory>/train-webnlg-all-delex.triple \
    -train_tgt <data-directory>/train-webnlg-all-delex.lex \
    -valid_src <data-directory>/dev-webnlg-all-delex.triple \
    -valid_tgt <data-directory>/dev-webnlg-all-delex.lex \
    -src_seq_length 70 \
    -tgt_seq_length 70 \
    -save_data baseline
    

    baseline-train.t7 file will be generated, which is used in the training phase

  4. Train the model.

    th train.lua -data baseline-train.t7 -save_model baseline
    

    After training for 13 epochs, the script outputs the model file baseline_epoch13_*.t7. Training takes several hours on a GPU.

  5. Translating.

    th translate.lua -model baseline_epoch13_*.t7 -src <data-directory>/dev-webnlg-all-delex.triple -output baseline_predictions.txt
    

    The script generates the file baseline_predictions.txt.

Relexicalisation

  • Relexicalise data.

    python3 webnlg_relexicalise.py -i <data-directory> -f <OpenNMT-directory>/baseline_predictions.txt
    

    The script generates the file relexicalised_predictions.txt with the initial RDF subjects and objects.

Evaluating on a development set

  • BLEU-score

    Calculate BLEU on the development set. We use multi-bleu.pl from Moses SMT. (Note that the official script for MT evaluations is mteval-v13a.pl)

    ./calculate_bleu_dev.sh
    

    BLEU = 54.03

    Additional note about BLEU scoring: multi-bleu.pl does not work properly in case of references of different length (e.g., one test instance has 3 references, and another has 5), that's why the challenge evaluation was done with three references only.

    Consider using other scripts to calculate BLEU:

  • Prepare input files for other evaluation metrics.

    python3 metrics.py
    
  • METEOR

    Download and install METEOR.

    Navigate to the METEOR directory (cd meteor-1.5/).

    java -Xmx2G -jar meteor-1.5.jar <data-directory>/relexicalised_predictions.txt <data-directory>/all-notdelex-refs-meteor.txt -l en -norm -r 8
    

    METEOR = 0.39

  • TER

    Download and install TER.

    Navigate to the TER directory (cd tercom-0.7.25/).

    java -jar tercom.7.25.jar -h <data-directory>/relexicalised_predictions-ter.txt -r <data-directory>/all-notdelex-refs-ter.txt
    

    TER = 0.40

Challenge Results

Participant Submissions

Download .zip with all the submissions, teams' reports, and a baseline output; or download the same data per team.

Automatic Evaluation Results

  • The WebNLG Challenge: Generating Text from RDF Data. C. Gardent, A. Shimorina, S. Narayan, L. Perez-Beltrachini. Proceedings of INLG 2017. PDF
  • Scripts to reproduce results are here.
  • The Jupyter notebook with automatic results can be found here. (credit: Abelardo Vieira Mota)

Human Evaluation Results

  • The WebNLG Challenge: Report on Human Evaluation. A. Shimorina, C. Gardent, S. Narayan, L. Perez-Beltrachini. PDF-v1 PDF-v2 (corrected version)
  • Human scores, references and scripts are here.