Skip to main content

Deploy LLaMA 4

Text & Chat

LLaMA 4 is Meta's latest open-weight model family. Scout uses a 109B MoE architecture with 17B active parameters, 10M token context window, and native multimodal capabilities. LLaMA 3.3 70B remains a strong general-purpose option.

Deploy LLaMA 4 in minutes

Starting at $0.64/hr on dedicated GPU

Available Variants (2)

ModelGPUVRAMPriceAction
LLaMA 4 Scout
Scout (109B MoE)
A100 80GB PCIe80 GB$1.81/hrDeploy
LLaMA 3.3 70B
Large (70B)
RTX A600048 GB$0.64/hrDeploy

Prices include 30% service fee. Billed per minute while running.

Includes OpenWebUI chat interface and OpenAI-compatible API endpoint.

Use Cases

  • General-purpose AI assistants
  • Long-context document processing
  • Multimodal understanding
  • Enterprise AI applications

Related Models

Ready to deploy LLaMA 4?

Pick your GPU and have it running in minutes. No infrastructure setup required.