Paleo
Fork me on GitHub
An analytical model to estimate the scalability and performance of deep learning systems.
Choose a setup



Must be an integer
Must be an integer

The effective batch size of SGD grows as the number of workers increases.
Equivalent to serial SGD on a single worker. Each worker computes with (batch size / # workers) training examples.
How does it work?
Please check out our paper for details:

Hang Qi, Evan R. Sparks, and Ameet Talwalkar.
Paleo: A Performance Model for Deep Neural Networks.
ICLR 2017.

1 The current live demo only supports data parallelism and a predefined set of models and devices. Features including customization and model parallelism will be available in later releases.

2 The current live demo does not deal with for multiple GPUs on the same host. More flexibilities will be added in later releases.

3 Cost is calculated based on $0.9/hour per GPU.

Paleo Estimation
Estimated Scalability

Speedup in throughput (images/sec) relative to one worker.

Estimated Training Time

Total time of forward pass, backward pass, and weights update for the given epochs.

Estimated Cost

Cost for running a fixed number of epochs. Only support AWS EC2 P2 instances for now.3