Supporting User Steering In Large-Scale Workflows With Provenance Data

Renan Souza 1, 2, 3
3 ZENITH - Scientific Data Management
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, CRISAM - Inria Sophia Antipolis - Méditerranée
Abstract : Computational Science and Engineering (CSE) workflows are large-scale, require High Performance Computing (HPC) execution, and have the exploratory nature of science. During the long run, which often lasts for hours or days, users need to steer the workflow by dynamically analyzing it and adapting it to improve the quality of results or to reduce the execution time. However, to steer the workflow, users typically perform several interactions (called user steering actions), which need to be tracked. Otherwise, users find it harder to understand how and what needs to be steered, they can steer in a misleading way, it can be difficult to explain the results that were consequences of their actions, and it can be impossible to reproduce the results. This thesis addresses this problem by proposing an approach that defines the fundamental concepts for user steering action; introduces the notion of provenance of steering actions; and contemplates a W3C PROV-compliant data diagram to model steering action data with provenance. Also, the approach presents system design principles to enable the management of steering action data by capturing, explicitly relating the actions to the rest of the workflow data, and storing these data efficiently. Two instances of this approach were designed and built: one is a lightweight tool to be plugged into parallel scripts and the other is to be used within a Parallel Workflow Management System, which are the two typical ways to conduct CSE experiments in HPC. Using real use cases in the Oil and Gas industry, the experiments show that the proposed approach enables users to understand how their actions directly affect the workflow results at runtime and that the system design principles were essential to add negligible overhead to the HPC workflows.
Document type :
Theses
Complete list of metadatas

Cited literature [135 references]  Display  Hide  Download

https://hal-lirmm.ccsd.cnrs.fr/tel-02418022
Contributor : Patrick Valduriez <>
Submitted on : Friday, January 17, 2020 - 3:42:51 PM
Last modification on : Sunday, January 26, 2020 - 4:57:06 PM

File

tese_tex_Jan16-02.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : tel-02418022, version 3

Collections

Citation

Renan Souza. Supporting User Steering In Large-Scale Workflows With Provenance Data. Databases [cs.DB]. UFRJ, Rio de Janeiro, 2019. English. ⟨tel-02418022v3⟩

Share

Metrics

Record views

61

Files downloads

25