EXAREME: IaaS clouds in Optique

Image courtesy of Perspecsys.

Modern applications face the need to process large amount of data. Such processing tasks are typically expressed using high-level APIs or languages and are transformed into data intensive workflows, or simply, dataflows.

Infrastructure-as-a-Service (IaaS) clouds have become attractive platforms for large-scale dataflow processing. In the Optique project, the Ontology-Based Data Access paradigm (OBDA) offers end-users homogeneous declarative access to vast amounts of relational data (both static and streaming) that may be spread across multiple, federated databases. User queries are transformed into complex SQL expressions that cannot be handled efficiently by existing relational databases. Exareme [1], with its IaaS architecture fulfills the need to process in a cost-effective manner complex dataflows in distributed OBDA systems.

With Exareme, our goal has been to develop an efficient and flexible system that elevates the computational model of clouds by i) enhancing the elasticity properties of IaaS clouds, ii) defining language abstractions that can declaratively express data parallelism and complex computations using user-defined-functions (UDFs), iii) providing efficient execution of UDFs using Just-In-Time (JIT) tracing compilation techniques.

The unique features of Exareme are summarized as follows:

Elasticity: Exareme elevates elasticity to a first-class-citizen in cloud computing. System components are especially designed for elastic computing. Exareme manages dynamically the size of the allocated virtual infrastructure and offers the concept of multi-dimensional optimization providing trade-offs between time and execution cost.

Declarative processing: The system offers a high level language (ExaQL) with appropriate syntax to declare data parallelism, which enables Exareme to scale automatically and choose the appropriate degree of parallelism in each case. ExaQL is based on SQL, enhanced with a syntax that makes it easy to write data pipelines.

Native UDF execution: Exareme natively supports UDFs with arbitrary user code. The engine blends the execution of UDFs together with relational operators using JIT tracing compilation techniques. This greatly speeds up the execution as it reduces context switches, and most importantly, only the relevant execution traces are used, allowing the engine to perform optimizations at runtime that are not possible when the query is precompiled.

In addition to the above, the system has the following features that are specifically tailored to the needs of the Optique platform:

  • Exareme incorporates state of the art techniques for common subexpression identification in the SQL expressions resulting from the ontology mappings. The optimizer identifies intermediate results that could be used more than once in a complex OBDA query and takes a cost-based decision regarding the materialization and reuse of these results.
  • Exareme provides efficient real-time processing of stream data. When connected to the stream-temporal sub-module of the Optique platform, it permits efficient execution of multiple queries on sensor value streams, in a distributed manner, capitalizing on the scalability provided by the IaaS cloud.
  • Exareme enables processing of complex dataflows that span multiple connected databases. In the federated mode of execution, the optimizer makes intelligent decisions on how these dataflows should be federated across multiple systems and the IaaS cloud, reducing data movement and increasing parallelism.Yannis Ioannidis

Author is Professor Yannis Ioannidis. He is Professor at the Department of Informatics and Telecommunication of the University of Athens (UoA), and the leader of the UoA Optique team.

[1] Exareme, The name is inspired, on the one hand, by hexareme, an ancient Greek type of warship with six rows of oars moving in a coordinated fashion to obtain great speed and agility, and on the other hand, by the long-term goal of exascale data processing. Reference; http://www.exareme.org/.