ElasticDL is a Kubernetes-native deep learning framework built on top of TensorFlow 2.0 that supports fault-tolerance and elastic scheduling.
Elastic Scheduling and Fault-Tolerance
Through Kubernetes-native design, ElasticDL enables fault-tolerance and works with the priority-based preemption of Kubernetes to achieve elastic scheduling for deep learning tasks.
TensorFlow 2.0 Eager Execution
A distributed deep learning framework needs to know local gradients before the model update. Eager Execution allows ElasticDL to do it without hacking into the graph execution process.
Minimalism Interface
Given a model defined with Keras API, train the model with a command line.
elasticdl train --model_def=mnist_functional_api.custom_model --training_data_dir=/mnist/train --output=output
Integration with SQLFlow
ElasticDL will be integrated seamlessly with SQLFlow to connect SQL to distributed deep learning tasks with ElasticDL.
SELECT * FROM employee LABEL income INTO my_elasticdl_model