Modal
Serverless GPU infrastructure platform for AI workloads. Run inference, training, and batch jobs with sub-second cold starts. Scale from zero to thousands of GPUs automatically. Pay-per-second pricing.

About Modal
Modal is a high performance AI infrastructure platform that enables developers to run compute intensive workloads without managing servers. Founded in 2021 by Erik Bernhardsson and Akshat Bubna, Modal has grown to unicorn status, serving customers like Meta, Substack, and Ramp.
The platform abstracts away infrastructure complexity through a serverless model where developers write Python functions with simple decorators. Modal handles containerization, GPU allocation, scaling, and scheduling automatically. The company built its entire stack from scratch including a custom container runtime in Rust, distributed file system, and multi cloud scheduler to achieve sub-second cold starts for GPU workloads.
Modal pools capacity across AWS, GCP, and Oracle Cloud Infrastructure, dynamically routing workloads to optimize for availability and cost. This architecture allows customers to burst from zero to hundreds of GPUs in seconds then scale back to zero, paying only for actual compute time used.
Key Features
- Sub-Second Cold Starts: Memory snapshotting technology loads large models into GPU memory in seconds, not minutes.
- Multi-Cloud GPU Pool: Access to T4, A100, H100, and B200 GPUs across AWS, GCP, and Oracle Cloud without quotas or reservations.
- Autoscaling: Scale automatically from zero to thousands of GPUs based on demand, then back to zero when idle.
- Serverless Pricing: Pay only for compute time used with per-second billing and no minimum usage requirements.
- Custom Runtime: Built-from-scratch container runtime, file system, and scheduler optimized for AI workloads.
- Observability: Real-time dashboard with logs, metrics, and execution traces for debugging production workloads.
- Sandboxes: Spin up thousands of isolated secure environments for executing AI-generated or untrusted code.
- GPU Notebooks: Collaborative browser-based notebooks with serverless GPU backing and auto-idle shutdown.
- Programmable Infrastructure: Define compute, storage, and networking entirely in Python code with zero YAML configuration.
- Usage Controls: Workspace budgets, incremental billing thresholds, and programmatic billing APIs for cost management.
Pricing
-
Starter
- $0/month
- Includes $30/month free compute credits
- Pay per second for CPU at $0.00003942/core/sec and memory at $0.00000672/GiB/sec
- GPU instances billed at T4 $0.000164/sec, A100 $0.00044/sec, H100 $0.001097/sec
- Requires payment method on file
-
Team
- $250/month + usage
- Additional compute credits included
- Access to programmatic billing APIs, tagging for cost attribution, and granular billing reports
- Incremental billing with workspace budget controls
-
Enterprise
- Custom pricing
- Custom invoicing, international bank transfers, split invoices, and committed spend options
- Transact through AWS and GCP marketplaces
- Dedicated support and SLAs
Pricing last updated: March 1, 2026 at 12:00 AM
Use Cases
- LLM inference and fine-tuning at scale
- Batch processing and media transcoding
- Computational biotech and protein folding
- AI agent sandboxing and code execution
- Real-time speech-to-text transcription
Pros & Cons
Pros:
- Sub-second cold starts eliminate waiting for GPU allocation
- True serverless model scales to zero when not in use
- No infrastructure configuration or YAML required
- Multi-cloud capacity provides better GPU availability
- Per-second billing with no minimum usage commitments
Cons:
- Primarily Python focused with limited native language support
- Enterprise features require custom sales engagement
Integrations
Python, JavaScript, TypeScript, Go, AWS Marketplace, GCP Marketplace, OpenTelemetry, Stripe Billing
FAQ
Last edited
March 1, 2026 at 7:07 AM by Venkatraman C
