Heroku Managed Inference and Agents

Heroku AI provides access to to top models and built-in tools for agents.

Managed Inference

Managed Inference and Agents simplifies AI integration by providing access to powerful foundation models, including text, embedding, and diffusion models. Easily attach model resources to your Heroku app, and the add-on will automatically configure environment variables, enabling seamless API calls. Invoke models using the CLI plug-in or with API endpoints.

Agents

Extend Agents with tools that allow Large Language Models (LLMs) to execute actions within Heroku’s trusted environment. Deploy autonomous agents that can call APIs, run code, or interact with your app through tools like code_exec, http, or custom ones. Move from prototyping to production with optimized inference latency and minimal infrastructure management.

Model Context protocol

The Model Context Protocol (MCP) is an open standard that helps you extend Agents by connecting large language models to tools, services, and data sources. You can bring your own custom tools by deploying them as a heroku app and registering them by attaching the addon. Access all you mcp servers through a single toolkit.

Use Cases

Text Generation Use models like Claude-Sonnet to generate text, write code, or chat intelligently. Retrieval-Augmented Generation (RAG) Bring your own data to power LLMs with up-to-date, domain-specific knowledge. Personalize User Experiences: Leverage agents to deliver tailored content, recommendations, or support. Data Analysis and Business Intelligence: Deploy agents that can analyze large datasets, identify trends, generate reports, and provide actionable insights.

Metered Billing

For those customers paying by credit card, Heroku Managed Inference and Agents uses metered billing, as set forth in the Plans & Pricing tables below For enterprise customers, your usage of Heroku Managed Inference and Agents will consume your General Add-on Credits and/or Data Add-on Credits as set forth in the Plans & Pricing tables below.

Data protection

Heroku Managed Inference and Agents doesn’t store or log your prompts and completions. Heroku Managed Inference and Agents doesn’t use your prompts and completions to train any models and doesn’t distribute them to third parties for training.

Amazon Rerank 1.0

New!

Text → Score

IN $1.00/1K queries

US EU

Cohere Rerank 3.5

New!

Text → Score

IN $2.00/1K queries

US EU

Claude Opus 4.5

Text → Text

IN $5.00/1M tokens OUT $25.00/1M tokens

US EU

Claude-3-5-haiku

Text → Text

IN $0.80/1M tokens OUT $4.00/1M tokens

US EU

Claude-3-5-sonnet-latest

Text → Text

IN $3.00/1M tokens OUT $15.00/1M tokens

US EU

Claude-3-7-sonnet

Text → Text

IN $3.00/1M tokens OUT $15.00/1M tokens

US EU

Claude-3-haiku

Text → Text

IN $0.25/1M tokens OUT $1.25/1M tokens

US EU

Claude-4-5-haiku

Text → Text

IN $1.10/1M tokens OUT $5.50/1M tokens

US EU

Claude-4-5-sonnet

Text → Text

IN $3.30/1M tokens OUT $16.50/1M tokens

US EU

Claude-4-sonnet

Text → Text

IN $3.00/1M tokens OUT $15.00/1M tokens

US EU

Cohere embed multilingual

Text → Embedding

IN $0.10/1M tokens

US EU

GPT-OSS-120B

Text → Text

IN $0.15/1M tokens OUT $0.60/1M tokens

US EU

Kimi K2 Thinking

Text → Text

IN $0.60/1M tokens OUT $2.50/1M tokens

US EU

Minimax M2

Text → Text

IN $0.30/1M tokens OUT $1.20/1M tokens

US EU

Nova 2 Lite

Text → Text

IN $0.33/1M tokens OUT $2.75/1M tokens

US EU

Nova Lite

Text → Text

IN $0.06/1M tokens OUT $0.24/1M tokens

US EU

Nova Pro

Text → Text

IN $0.80/1M tokens OUT $3.20/1M tokens

US EU

Qwen3 235B

Text → Text

IN $0.22/1M tokens OUT $0.88/1M tokens

US EU

Qwen3 Coder 480B

Text → Text

IN $0.45/1M tokens OUT $1.80/1M tokens

US EU

Stable-image-ultra

Text → Image

OUT $0.14/image

US EU

Access All Models with One Add-on

Recommended

Attach Heroku Managed Inference in Standard Mode to get instant access to all supported Managed Inference models through a single add-on.

Access to 20+ AI models

Access to new models immediately when they become available

Switch models without redeploying

Pay only for what you use

One inference key for all models

Available in US and EU regions

The Heroku Managed Inference and Agent add-on may employ third-party generative AI models to provide the Service. Due to the nature of generative AI, the output that it generates may be unpredictable, and may include inaccurate or harmful responses. Customer assumes all responsibility for such output, including ensuring its accuracy, safety, and compliance with applicable laws and third-party acceptable use policies. For more information, please see the Heroku Notices and License Information Documentation.

Heroku Managed Inference and Agents

Managed Inference

Agents

Model Context protocol

Use Cases

Metered Billing

Data protection

Pricing & Availability

Amazon Rerank 1.0

Description

API Endpoint

Rate Limit

Documentation

Cohere Rerank 3.5

Description

API Endpoint

Rate Limit

Documentation

Claude Opus 4.5

Description

API Endpoints

Documentation

Claude-3-5-haiku

Description

API Endpoints

Documentation

Claude-3-5-sonnet-latest

Description

API Endpoints

Documentation

Claude-3-7-sonnet

Description

API Endpoints

Documentation

Claude-3-haiku

Description

API Endpoints

Documentation

Claude-4-5-haiku

Description

API Endpoints

Documentation

Claude-4-5-sonnet

Description

API Endpoints

Documentation

Claude-4-sonnet

Description

API Endpoints

Documentation

Cohere embed multilingual

Description

API Endpoint

Documentation

GPT-OSS-120B

Description

API Endpoints

Documentation

Kimi K2 Thinking

Description

API Endpoints

Documentation

Minimax M2

Description

API Endpoints

Documentation

Nova 2 Lite

Description

API Endpoints

Documentation

Nova Lite

Description

API Endpoints

Documentation

Nova Pro

Description

API Endpoints

Documentation

Qwen3 235B

Description