Replicate
Run open-source ML models via API — image, video, audio, LLMs, no infra needed.
Add Replicate to your hut →Replicate is an API for running open-source ML models in the cloud with no infrastructure to manage. You call the API with inputs and get back outputs — image generation (Flux, Stable Diffusion), video, audio, language models, and more, all on-demand. Developers use it to build apps that need model inference without spinning up GPUs, and for one-off runs of models too large or complex to self-host. The web UI lets you try any model instantly before writing a line of code.
Billing is usage-based per second of compute — popular image models cost fractions of a cent per run. No subscription required. Most often compared to fal.ai and Modal for serverless ML inference — Replicate's edge is the enormous catalogue of community-hosted models and the approachable interface for non-infra developers; fal.ai is faster for media generation; Modal gives more control for custom deployments.
| Made by | Replicate |
|---|---|
| Pricing | Usage-based (per second of compute, from ~$0.0002/sec) |
| Best for | Serverless model inference, image and video generation APIs, developer prototyping |
Alternatives to Replicate
- Hugging Face
The GitHub of AI — browse, download, and deploy 500,000+ open-source models and datasets.
- OpenRouter
Single API for 200+ LLMs — route between Claude, GPT, Gemini, Llama, and more.
- Together AI
Fast inference API for open-source models — Llama, Mixtral, Flux, at low cost.
- Groq
Extremely fast LLM inference on custom LPU hardware.
- Fireworks AI
Fast, low-cost inference and fine-tuning for open models.
- fal.ai
Fast generative media inference — image, video, and audio APIs.