Paleo

An analytical model to estimate the scalability and performance of deep learning systems.

Choose a setup

Model Architecture¹

Cloud

Device

Network²

Software

Use cuDNN

Batch size:

Dataset size Must be an integer

Epochs Must be an integer

Weak scaling Strong scaling

The effective batch size of SGD grows as the number of workers increases.

Equivalent to serial SGD on a single worker. Each worker computes with (batch size / # workers) training examples.

How does it work?

Please check out our paper for details:

Hang Qi, Evan R. Sparks, and Ameet Talwalkar.
Paleo: A Performance Model for Deep Neural Networks.
ICLR 2017.

¹ The current live demo only supports data parallelism and a predefined set of models and devices. Features including customization and model parallelism will be available in later releases.

² The current live demo does not deal with for multiple GPUs on the same host. More flexibilities will be added in later releases.

³ Cost is calculated based on $0.9/hour per GPU.

Paleo Estimation

Estimated Scalability

Speedup in throughput (images/sec) relative to one worker.

Estimated Training Time

Total time of forward pass, backward pass, and weights update for the given epochs.

Estimated Cost

Cost for running a fixed number of epochs. Only support AWS EC2 P2 instances for now.³