Recipe Execution Benchmark | a benchmark for natural language understanding

cookingbot evaluator logo cropped

This benchmark for recipe understanding in autonomous agents aims to support progressing the domain of natural language understanding by providing a setting in which performance can be measured on the everyday human activity of cooking. Showing deep understanding of such an activity requires both linguistic and extralinguistic skills, including reasoning with domain knowledge. For this goal, the benchmark provides a number of recipes written in natural (human) English that should be converted to a procedural semantic network of cooking operations that can be interpreted and executed by autonomous agents. A system, which supports one-click installation and execution, is also included that can perform recipe execution tasks in simulation allowing both analysis and evaluation of predicted networks. The provided evaluation metrics are mostly simulation-based, because demonstrating deep understanding of recipes can be done by effectively taking all the appropriate actions required for cooking the intended dish.

The full benchmark has been made available standalone and as part of the Babel toolkit. Both options provide the same benchmark functionalities, but the Babel toolkit also provides the option of extending the system.

Download the benchmark:

Subscribe to Our Newsletter:

I agree with the Privacy policy

Meaning and Understanding
in Human-centric
Artificial Intelligence

Follow Us
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 951846