The Optique platform is built on Ontology Based Data Access (OBDA), a technology for providing uniform access to data stored in heterogeneous sources. Uniform access is achieved through an ontology. Ontologies are data description frameworks, usually formulated in languages that are readily available to end users.
User Oriented Query Language
The ontology in an OBDA system is designed according to the user’s view of the domain of interest. The ontologies we use in Optique are built from concepts and roles. Necessary concepts and roles can be specified by the user, and these, along with the relationships between them, are stored in the ontology. The system then connects the terms in the ontology to the underlying sources. This process can be automated, but may require the input of a domain expert.
The following is an example of a simple ontology. It contains the concepts DairyProduct, Seafood, Product and Supplier; and the role suppliedBy. The ontology also contain information about how these concepts and roles are connected. For example, we may require that every Seafood is a Product. If you inspect an ontology, you will find statements that look like the following.
|DairyProduct ⊑ Product||(Every DairyProduct is a Product)|
|Seafood ⊑ Product||(Every Seafood is a Product)|
|Product ⊑ ∃suppliedBy.Supplier||(Every Product is related to at least one Supplier through the role suppliedBy)|
In natural language, the above statements would be “every dairy product is a product”, “every seafood product is a product”, and “every product is supplied by some supplier”. The symbols used are standard notation for Description Logic. The symbol ⊑ is used to define a hierarchical structure, where some groups are fully contained in certain larger groups. The symbol ∃ is used to denote the scope (domain) of a role or relation.
Ontology design is a collaboration between domain experts and ontology designers. In the Optique platform, the process is streamlined with a bootstrapper that lays much of the groundwork for the ontology design.
A typical OBDA query language is descriptive in nature. That is, a query describes what the user wants, not how the system should answer the query. This allows the query language to be largely independent on the data sources, which in turn helps build uniform access to heterogeneous sources. If we want a list of all suppliers that supply dairy products, we can execute the query
The q(y) part names the query and the desired output. Here, we only want to output the possible values of the variable y. The rest of the query says that the value of variable x must be some DairyProduct, and that x must be related to the value of the output variable y through suppliedBy. For example, if myCheeseFactory is a supplier, someCheddar is a dairy product, and someCheddar is supplied by myCheeseFactory, then our query q would return the answer myCheeseFactory (the value of y), but not someCheddar (the value of x).
The most common query language for ontologies is SPARQL. This language lies at the heart of the Optique platform’s Visual Query System (VQS). For querying over streams, the Optique platform uses the SPARQL-like language STARQL.
Data and Axioms
The ontologies used in Optique can be divided into two parts: data and axioms. The data consists of statements like
These statements are specific to certains entities or objects. The axioms are general statements, like the ones above, repeated here.
Seafood ⊑ Product
Product ⊑ ∃suppliedBy.Supplier
Axioms are statements that apply to all entities or objects of a certain description. Combining the data and axioms above, we see that someCheddar is a Product. The axioms are what make up the ontology in the Optique platform. The data statements are generated on demand, using data from the data sources.
Open World Assumption and Reasoning
In OBDA, it is common to assume that data is incomplete. This means we do not draw conclusions from the absence of information. If it is not know that someCheddar is a Product, this does not mean that it is not a Product. A key feature in OBDA is the ability to extend our data using knowledge of the domain the data lives in. Above, we saw an example of this when we observed that someCheddar must be a Product, a fact that was not represented in the original data, but which followed from the axioms.
DairyProduct ⊑ Product
The statements above the line are assumed to be true, in which case the statement below the line must also be true.
The mappings are what connect the Optique platform to the data sources. The mappings are built by IT-experts with detailed knowledge of the data sources.
DairyProduct(productID) ↜ “SELECT productID FROM Supplier_Offerings”
The above example provide the system with ways of finding members of Product and DairyProduct, in this case through SQL queries. When answering a query, the optique platform uses the ontology and the mappings to determine how to query the data sources. This process is divided into two steps, known as rewriting and unfolding. The ontology plays the major role in the rewriting, and the mappings play the major role in the unfolding.
Uniform Access to Heterogeneous Sources
Combined, the ontology and the mappings allow uniform access to heterogeneous sources. The Optique platform uses the mappings to allow the user to access data from multiple sources through a single interface: the user-oriented ontology. Using this ontology, the user describes the needed data, and the system determines and executes the necessary queries.