MUHAI Data Ecosystem

Abstract
The main objective of this miniproject is to identify a design solution for MUHAI’s data ecosystem and for the integration of its components. MUHAI, is a project that focuses on human-machine understanding and cooperation, by means of mimicking in machines how humans make sense of experiences. For this, it requires a data ecosystem where representations of changing environments and associated human worldviews can be stored and coherently integrated. For example, people's perceptions of social inequalities are not influenced only by empirical evidence and time-invariant beliefs, but are also the product of salient events, like news about hate crimes against ethnic minorities, (social) media statements, social norms, political debates, and uncertainties, as well as collective interpretative frames and heuristics, used to retrieve information and make sense of it.
Accordingly, since MUHAI’s artificial intelligence aims to understand and interact with humans in relation to these complex issues, its data ecosystem must be tailored to represent, store and analyse these multifaceted meaning constructs and their dynamics. To do so, this miniproject explores how to model and store such constructs as transient narrative network structures (TNN): dynamically changing graphs representing narrations of (causally) related events, whose relations can be enriched with provenance metadata. We will evaluate the pros and cons of alternative storing and modelling solutions, like LPG and RDF(-star) triple stores. Finally, we will explore how information from different sources of structured or unstructured knowledge can be integrated into MUHAI's ecosystem, by linking these sources through a data streaming platform, like Apache Kafka.

Status
Ongoing

Progress
To start, we analysed the existing literature on Labelled Property Graphs (LPG) and RDF triple stores and related knowledge representation applications. This, to appraise the main differences, in terms of features and data modelling constraints, between the aforementioned models and related database solutions. RDF, which stands for Resource Description Framework, was developed by the W3C’s community to serve as a standard for the Semantic Web. One of its key features consists of the possibility to link web resources in a way that is understandable by machines, and query any such RDF statement (a triple in the form of an <subject, predicate, object>) across the Web, without the need for centralized control. However, RDF is an abstract knowledge representation model that does not differentiate data from metadata. In particular, it was not designed to capture provenance of statements such as the confidence of whether a certain statement is true, the time-frame in which the statement holds true, or the source for which the statement is true. For this reason, we are currently exploring the use of a recent extension of RDF called RDF-star. Besides offering all features of RDF, RDF-star allows for the native storing of metadata related to any triple. Additionally, we are evaluating how RDF-star datasets can be stored in, and queried through, existing graph DB solutions, like Neo4J, without loss of information.

Outputs
Demo, presentations, tutorials and code samples;
References
Moreira, E. J. V. F., & Ramalho, J. C. (2020). SPARQLing Neo4J. In 9th Symposium on Languages, Applications and Technologies (SLATE 2020). Schloss Dagstuhl-Leibniz-Zentrum für Informatik.
Hartig, O. (2017). Foundations of RDF* and SPARQL*:(An alternative approach to statement-level metadata in RDF). In AMW 2017 11th Alberto Mendelzon International Workshop on Foundations of Data Management and the Web, Montevideo, Uruguay, June 7-9, 2017. (Vol. 1912). Juan Reutter, Divesh Srivastava.
Hartig, O. (2014). Reconciliation of RDF* and property graphs. arXiv preprint arXiv:1409.3288.
Angles, R., Thakkar, H., & Tomaszuk, D. (2019). RDF and property graphs interoperability: Status and issues. CEUR Workshop Proceedings, 2369(Section 2).
https://kafka.apache.org/25/documentation/streams/architecture

Team
Maria M. Hedblom (This email address is being protected from spambots. You need JavaScript enabled to view it.)
Susanne Putze (This email address is being protected from spambots. You need JavaScript enabled to view it.)
Carlo R. M. A. Santagiustina (This email address is being protected from spambots. You need JavaScript enabled to view it.)
Lise Stork (This email address is being protected from spambots. You need JavaScript enabled to view it.)

Subscribe to Our Newsletter:


I agree with the Privacy policy

Meaning and Understanding
in Human-centric
Artificial Intelligence

Follow Us
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 951846