2/ Clouds are expensive π΅. SkyPilot reduces cloud bills by 3-6x.
Here's SkyPilot training BERT on WikiText-103 from @huggingface on spot instances *across* AWS & GCP. (Video attached above)
SkyPilot offers auto-recovery from preemptions, saving 3x cost for this job.
3/ Save more π΅π΅π΅ by (1) auto stopping idle clusters (2) auto picking cheapest region/cloud.
Here's an example of SkyPilot shopping for the cheapest region/cloud for allocating 8x A100 GPUs.
Single-cloud users also save on intra-cloud price differences (see blog for details).
4/ Clouds are also getting hard to use π§. With SkyPilot, you can run ML/Data Science jobs on any cloud, with no code change π.
SkyPilot simplifies the cloud infra heavy-lifting that ML/DS users don't want to deal with.
Running an existing project is simple:
5/ Reliably spin up GPU/TPU/CPU
Clouds often run out of capacity and getting VMs fails.
SkyPilot solves this by:
π checking all regions and clouds
π auto-failover across them
β° Reliably spin up a VM with 1 command:
$ sky gpunode
$ sky tpunode
$ sky cpunode
SkyPilot is in active use by dozens of ML/data science users in 10+ orgs.
β’ Groups at @berkeley_ai@stanfordnlp use SkyPilot for training on GPUs and TPUs
β’ @salkinstitute uses SkyPilot to cut 80% cost for bio batch jobs on CPU instances
β’ Active collab w/ several companies
ππ Start using the cloud easily and cost effectively today ππ