ico

Module: Exareme federation

You are here: Home » Training programme » Module: Exareme federation

Exareme

Exareme is a federation engine which can be used to combine multiple datasources, and it is the module responsible for federation in Optique. It is also a standalone component with its own website. This tutorial shows how to use Exareme to integrate data from two datasources.

The Northwind and Northstream databases

For this tutorial we will use two databases: Northwind (NW) and Northstream (NS). NW is owned by a company which buys and sells food products, while NS contains information about beverage production. Our goal is to integrate the two databases.

To run this tutorial on your local machine, you need to run the two databases and the Exareme server using Docker, and Protege with the Ontop plugin. The 7 setup steps will guide you through the setup process.

Setup 1: Run the Northwind and Northstream container

To host your databases, you need to install Docker and run one Docker container for each of the databases. For information on how to do this, visit the GitHub repositories of Northwind and Northstream. When the containers are set up, you will be able to access both databases by pointing at the Docker machine IP and using the credentials you provided.

Setup 2: Run the Exareme container

To host the Exareme server, you also need Docker. Go to the GitHub repositories of Exareme to get instructions on how to set up the container.

Setup 3: Install Protégé with Ontop

We will use Protégé with the Ontop plugin in this demo. The easiest way to get this setup, is by downloading the  Protégé and Ontop bundle directly. Make sure that you get Ontop version 1.18 or later, since owl:sameAs is not supported for older versions.

Setup 4: Add PostgreSQL and Exareme JDBC drivers to Protégé

Before we can access the database through Ontop, we need to install JDBC drivers for PostgreSQL and Exareme.

Download the .jar files, and add the drivers to Protégé by going to Preferences > JDBC Drivers. There you can upload each .jar file and provide the correct class names:

  • Exareme driver class name: madgik.exareme.jdbc.federated.AdpDriver
  • PostgreSQL driver class name: org.postgresql.Driver

 

screen-shot-2016-12-04-at-20-21-21

Setup 5: Load ontology, mappings and queries

Download the following zip file, which contains the ontology, mapping and queries we will use. Unzip the file, and open exareme-federation.owl in Protégé.

 

 

Setup 6: Add the Exareme data source

Go to the Ontop Mappings tab in Protégé and click on Datasource manager. We will now connect to our two databases through Exareme by using the following settings:

  • Connection URL: see below
  • Database Username: adp
  • Database Password: adp
  • Driver Class: madgik.exareme.jdbc.federated.AdpDriver

Click “Test Connection” to check that everything is working.

Exareme Connection URL:

The Exareme connection URL needs to contain information about where the Exareme server is located (EXA), and connection details to each of the databases we want to connect to (DB_i).

The complete Exareme connection string is put together like this:

CONNECTION_URL = EXA DB_1 DB_2 ... DB_n

 

where EXA and DB_i is defined as:

EXA = jdbc:fedadp:https://<EXAREME_IP>:<EXAREME_PORT>/tmp/
DB_i = -fedDB-<DB_ID>-next-<JDBC_CONNECTION_STRING>-next-<DRIVER_CLASS>-next-<USERNAME>-next-<PASSWORD>-next-<SCHEMA>

 

In our case we have two databases, so we need to find EXA, DB_1 and DB_2. An example is shown below:

EXA = jdbc:fedadp:https://192.168.99.100:9090/tmp/
DB_1 = -fedDB-northstream-next-jdbc:postgresql://192.168.99.100:5432/nw-next-org.postgresql.Driver-next-postgres-next-optique-next-public
DB_2 = -fedDB-northwind-next-jdbc:postgresql://192.168.99.100:5433/northwind-next-org.postgresql.Driver-next-postgres-next-optique-next-public

 

This example gives the final Exareme connection string:

CONNECTION_URL = jdbc:fedadp:https://192.168.99.100:9090/tmp/-fedDB-northstream-next-jdbc:postgresql://192.168.99.100:5432/nw-next-
org.postgresql.Driver-next-postgres-next-optique-next-public-fedDB-northwind-next-jdbc:postgresql://192.168.99.100:5433/northwind-next-
org.postgresql.Driver-next-postgres-next-optique-next-public

 

How to setup the Exareme datasource in Protégé.

Setup 7: Turn on the Ontop reasoner

The last thing we have to do before we can run queries over our data, is to enable the Ontop reasoner by clicking Reasoner > Start reasoner. Make sure that Ontop is selected.

reasoner

How to turn on the Ontop reasoner.

Now we have completed all the setup steps. The following tasks will show you how to run queries over the databases using Exareme.

 

Task 1: Get all NW products

In this task we will look at all the products from the NW table. Open and run Query1 from the Ontop SPARQL tab in Protege. This query asks for all instances of the NorthwindProduct class. We get 77 results, one row per product from the NW table.

Query 1:

SELECT DISTINCT * WHERE { 
 ?product rdf:type nw:Product.
 ?product nw:product-productid ?nwproductid.
 ?product nw:product-productname ?nwproductname.
}

Results of Query 1:

 

Task 2: Get all NS products

In Task 1 we looked at NW products, but there are also some products in the NS product table. These are mapped to the NorthstreamProduct class. To find these products, run Query2. The results show 3 products.

Query 2:

SELECT DISTINCT * WHERE { 
 ?product rdf:type ns:Product.
 ?product ns:product-productid ?nsproductid.
 ?product ns:product-productname ?nsproductname.
}

Results of Query 2:

query2-results

All the Northstream products, returned by Query 2.

Task 3: Equal products

The NS database and the NW database are developed by two different companies, but they both contain data about products. And in fact, some of the products are actually the same, but stored in different databases with different product ids. One example of such a product is “Chang”, which stored with id 2 in NW, and id P1002 in NS. In our set of mappings, products from NW are named using the following URI template:

nw:product-{NW_ID}

while NS products are given this template:

ns:product-{NS_ID}

To link similar products represented by two different URIs, we use the owl:sameAs predicate.

The NS database already contains a table called “sameas”, which contains information about which products exist in both NS and NW. This table has been hand made by one of the product experts working in NW. The mapping collection contains a mapping which turns this into owl:sameAs triples.

sameas-mapping

Mapping turning equal products into owl:sameAs triples.

This mapping will result in two virtual triples:

nw:product-2 owl:sameAs ns/product-P1002
nw:product-3 owl:sameAs ns/product-P1001

 

Ontop will not use owl:sameAs triples by default, so to enable owl:sameAs, go to Preferences > Ontop Reasoner and click “Enable reasoning with owl:sameAs from mappings”.

Now we can run Query 3, which asks for all instances which are both a NorthwindProduct and a NorthstreamProduct, in addition to their names and ids in both NW and NS.

Query 3:

SELECT DISTINCT * WHERE { 
 ?product rdf:type nw:Product.
 ?product rdf:type ns:Product.

 ?product nw:product-productid ?nwproductid.
 ?product nw:product-productname ?nwproductname.

 ?product ns:product-productid ?nsproductid.
 ?product ns:product-productname ?nsproductname.
}

 

Results of Query 3:

query3-results

Products appearing in both NW and NS, returned by Query 3. Since each of the two products has two URIs, they both appear in two rows.

 

The result table gives us four rows, but there are in fact just two products here. The two rows with Aniseed Syrup are representing the same product, but it is listed twice, since it has two possible URIs. One of the URIs are generated by NW from the NW id, while the other one is generated by NS from the NS id. The same is true for Chang.

owl:sameAs does not make any new URIs, but it makes sure that everything which is true about the first instance, also is true for the other one.

If we replace remove the column with the URIs, we will not see the URIs, which (because of DISTINCT), removes the duplicates.

query3-modified-result

Results after removing the URI from Query 3. The two products which appears in both NW and NS. URIs are not shown, to remove duplicate rows.

 

Task 4: Use the aligned Product

In the aligned ontology we have also created a superclass Product, and both NorthwindProduct and NorthstreamProduct are contained in this class. So when we ask for the aligned product, we ask for everything which is either a NorthwindProduct, or a NorthstreamProduct, or both.

Query 4.1:

SELECT DISTINCT * WHERE { 
 ?product rdf:type al:Product.
}

 

Results of Query 4.1:

query4-1-results

The 19 first aligned products, returned by Query 4.1.

In Query 4.1, we ask for all instances of Product. This returns in total 80 instances, which are the 77 products from NW plus the 3 products from NS. In fact, this query returns exactly the same rows as Query 1 and Query 2 combined. But we already know that two products are present in both databases, so, there are actually only 78 different product. The results are not wrong, it is just that products with two different URIs appear twice.

To only get products which appear in both databases, we can ask for those products which has both a NW id and a NS id. This is done in query 4.2, and the results are exactly the same as for Query 3.

Query 4.2:

SELECT DISTINCT * WHERE { 
 ?product rdf:type al:Product.

 ?product nw:product-productid ?nwproductid.
 ?product nw:product-productname ?nwproductname.

 ?product ns:product-productid ?nsproductid.
 ?product ns:product-productname ?nsproductname.
}

Results of Query 4.2:

query3-results

Products with data from both NW and NS, returned by Query 4.2.

 

 

Task 5: Get orders affected by faulty bottling equipment

Finally, in this task we will show a case where it is useful to link data from both the NW database and the NS database. Assume that one of the employees discovers that two of the NS bottling machines are configured incorrectly, resulting in bottles filled with too little liquid. This has been the case since they started the production of beverages, so faulty beverage products sold to Northwind, which again has sold them to other companies. Northwind wants to notify their customers about this, so they need to figure out which orders has been affected by this problem. The two faulty bottling machines has ids E0016 and E0020.

Query 5 captures all of the information needed to find the affected orders. It asks for all orders with an orderdetail mentioning a product which is produced by one of the faulty bottling machines.

Query 5:

SELECT DISTINCT ?orderid WHERE { 
 ?order nw:order-orderid ?orderid.
 ?orderdetail nw:orderdetail-belongsto ?order.
 ?orderdetail nw:orderdetail-containsinfoabout ?product.
 ?product rdf:type al:Product.
 ?equipment ns:productionequipment-produces ?product.
 ?equipment ns:productionequipment-equipmentid ?equipmentid.
 FILTER(?equipmentid="E0016"^^xsd:string || ?equipmentid="E0020"^^xsd:string).
}

 

Results of Query 5:

query5-results

Query 5 returns all the 21 orders affected by the faults on the bottling machines (E0016 and E0020). Northwind can now orient the relevant customers about the problem.

This final query shows how to seamlessly query semantically over the aligned ontology and the two underlying datasources.

 

Show Comments