Maximally exploiting available data is increasingly critical to competitiveness. Unfortunately, accessing the relevant data is becoming increasingly difficult due to the explosion in the size and complexityof data sets.
How much time do engineers spend searching for data?
Optique targets the key bottleneck limiting exploitation of “Big Data”:
- Massive amounts of data are accumulated, in real time and over decades.
- Accessing relevant parts of the data requires in depth knowledge of the domain and of the organisation of data repositories
- Existing approaches limit data access to a restricted set of predefined queries.
How much value could they create in that time?
Maximally exploiting data requires flexible access—engineers need to explore the data in ways not supported by current applications. This typically requires an IT-expert to:
- write special purpose queries; and
- optimize queries for efficient execution.
With this process, accessing the data can take several days. In data-intensive industries, engineers spend up to 80% of their time on data access problems. Apart from the enormous direct cost, freeing up expert time would lead to even greater value creation through deeper analysis and improved decision making.
Optique will bring about a paradigm shift for data access by
- providing a semantic end-to-end connection between users and data sources
- enabling users to rapidly formulate intuitive queries using familiar vocabularies and conceptualisations
- seamlessly integrating data spread across multiple distributed data sources, including streaming sources
- exploiting massive parallelism for scalability far beyond traditional RDBMSs
and thus reducing the turnaround time for information requests to minutes rather than days.
The Optique platform will use an ontology to capture (possibly multiple) user conceptualisations, and declarative mappings to transform user queries into complete, correct and highly optimised queriesover the data sources.
Optique brings a unique combination of technologies to bear on Big Data challenges:
The Optique platform will be tested and evaluated on two large-scale case studies from the energy sector:
In the Siemens scenario, diagnosis engineers in their service centers for power plants try to detect events from time-stamped sensor data. To operate their tools for visualization and trend detection they need to query several TB of sensor data and several GB of data about events, such as “alarm triggered at time T,” distributed across several databases. With a 30 GB daily growth the total amount of raw data even exceeds what they can currently record.
In the Statoil scenario, experts in geology and geophysics develop stratigraphic models of unexplored areas on the basis of data acquired from previous operations at nearby geographical locations. To feed data into their advanced visual analytics tool they need to query a pool of more than 1000 TB of relational data, structured according to several schemes with a total of more than 2,000 tables distributed across several databases.
- Financed as European Commision FP7 Integrated Project
- Total budget of about 14 million €
- Four years running time, starting 1. November 2012.
- Coordinated by University of Oslo, Norway
- 10 partners from Norway, UK, Germany, Italy, and Greece.