This project proposes a unified, open-source execution framework for scalable data analytics. Data analytics tools have become essential for harnessing the power of our era's data deluge. Current technologies are restrictive, as their efficacy is usually bound to a single data and compute model, often depending on proprietary systems. The main idea behind ASAP is that no single execution model is suitable for all types of tasks and no single data model (and store) is suitable for all types of data. The project makes the following innovative contributions:
(a) A general-purpose task-parallel programming model. The runtime will incorporate and advance state-of-the-art task-parallel programming models features, namely: (i) irregular general-purpose computations, (ii) resource elasticity, (iii) synchronization, data-transfer, locality and scheduling abstraction, (iv) ability to handle large sets of irregular distributed data, and (v) fault-tolerance.
(b) A modeling framework that constantly evaluates the cost, quality and performance of data and computational resources in order to decide on the most advantageous store, indexing and execution pattern available.
(c) A unique adaptation methodology that will enable the analytics expert to amend the task she has submitted at an initial or later stage.
(d) A state-of-the-art visualization engine that will enable the analytics expert to obtain accurate, intuitive results of the analytics tasks she has initiated in real-time.
Two exemplary applications that showcase the ASAP technology in the areas of Web content and large-scale business analytics will be developed. The consortium -- led by the Foundation for Research & Technology -- is well-positioned to achieve its objectives by bringing together a team of leading researchers in data-management technologies. These are combined with active industrial and leading user organizations that offer expertise in the production-level domain of data analytics.
We partecipate to this project in a joint collaboration with Wind.
The increasing availability of large amounts of data and digital footprints has given rise to ambitious research challenges in many fields, which spans from medical research, financial and commercial world, to people and environmental monitoring.