ElasticDL is a Kubernetes-native deep learning framework built on top of TensorFlow 2.0 that supports fault-tolerance and elastic scheduling.

Elastic Scheduling and Fault-Tolerance

Through Kubernetes-native design, ElasticDL enables fault-tolerance and works with the priority-based preemption of Kubernetes to achieve elastic scheduling for deep learning tasks.

TensorFlow 2.0 Eager Execution

A distributed deep learning framework needs to know local gradients before the model update. Eager Execution allows ElasticDL to do it without hacking into the graph execution process.

Minimalism Interface

Given a model defined with Keras API, train the model with a command line.

                    
elasticdl train --model_def=mnist_functional_api.custom_model --training_data_dir=/mnist/train --output=output
                    
                  

Integration with SQLFlow

ElasticDL will be integrated seamlessly with SQLFlow to connect SQL to distributed deep learning tasks with ElasticDL.

                    
SELECT * FROM employee LABEL income INTO my_elasticdl_model