08 March 2021

Understanding Everyday Activities

Robert Porzel, UHB.

If the proof of the pudding is in the eating then the ultimate test for understanding an instruction is its proper execution. This view greatly expands the scope of natural language understanding beyond the usual syntactic and semantic analysis. In this part of the MUHAI project we seek to operationalize the basic principles of human-centric AI so that machines will be able to understand how to perform everyday actions in the cooking domain. This involves moving away from executing fully explicit standardised instructions towards understanding instructions conveyed through natural language dialogues. The key challenge here is the integration of world knowledge and pragmatic inferencing into the understanding process, both on the level of language processing and on the level of task execution. For example, the knowledge that chopping a cucumber involves the use of a cutting board and a knife, and presupposes a specific orientation of the cucumber, as well as a conventional slice thickness, is not explicitly mentioned in a recipe, but is essential to carrying out the task and must therefore be inferred from common sense knowledge. Also the build-up of knowledge that generalises across recipes and ingredients is of importance, as it is a precondition for adapting existing recipes to given constraints, and ultimately for the creative design of novel recipes.

In order to achieve these goals we will define two kinds of benchmarks:

one that consists in mapping between existing recipes formulated in natural language and actions executed in the VR world
one that allows us to evaluate a new recipe design or variant proposal

As in all parts of the MUHAI project the notion of meaning-based and human-centric narratives also applied in the cooking domain. These narratives give meaning to collections of experiences of a virtual agent, i.e. object perceptions, body postures, force dynamics, visual processing and structured data collection, i.e. recipes, images and procedures. Building narratives requires the integration of multimodal sources of input (text, image, sound) and pattern detection in a model of constructional language processing. Constructions will be used as the basic representational unit in which all of these sources are combined. The outcome of constructional language processing is a semantic analysis, including identification of goals, plans, actions, objects, time and causation. The set of analyses make up the starting point for narratives in the domain that can be integrated with the personal dynamic memory in order to truly understand them, in the sense that they can be mapped to a series of low-level actions that can then be executed by a simulated agent in the VR kitchen environment.

To demonstrate the potential of this approach MUHAI will develop two applications for recipe execution and design:

Recipe execution - This application consists in executing recipes expressed in natural language in a VR kitchen environment. This requires mapping between a recipe (i.e. a sequence of instructions) and a sequence of low-level actions to be executed. The application will involve constructional language processing, consultation with the personal dynamic memory for pragmatic inference, and planning the execution of the concrete cooking actions. The application will be evaluated on the benchmarks described above
Recipe design - This application is situated in the domain of professional recipe design. In the first part of the project MUHAI will focus on the challenge of building a virtual agent that can act as an assistant chef. This digital assistant needs to integrate technical cooking knowledge with a considerable prior memory of recipes, previously successful and unsuccessful variants, cooking procedures, and cultural context. Most importantly, it needs to do so in an explicable, transparent manner. In a second phase, the focus will shift to a more challenging task that embraces even more aspects of human-centric AI, namely that of recipe design. This is a capacity that goes beyond skill and knowledge and introduces creativity.

24 May 2024

Understanding Everyday Activities

Robert Porzel, UHB.

More Articles

Can Robots Cook? Culinary challenges for advancing artificial intelligence

Anaphora Unveiled: Tracking Culinary Transformation in the Tech-Driven Kitchen

From Kitchen to AI: A Task-based Metric for Measuring Trust

Narrative Objects

Deep Understanding of Everyday Activity Commands

Curiosity-Driven Exploration of Pouring Liquids

Toward a formal theory of narratives


I agree with the Privacy policy ×