NVIDIA: Llama 3.3 Nemotron Super 49B V1.5

131K

Llama-3.3-Nemotron-Super-49B-v1.5 is a 49B-parameter, English-centric reasoning/chat model derived from Meta’s Llama-3.3-70B-Instruct with a 128K context. It’s post-trained for agentic workflows (RAG, tool calling) via SFT across math, code, science, and multi-turn chat, followed by multiple RL stages; Reward-aware Preference Optimization (RPO) for alignment, RL with Verifiable Rewards (RLVR) for step-wise reasoning, and iterative DPO to refine tool-use behavior. A distillation-driven Neural Architecture Search (“Puzzle”) replaces some attention blocks and varies FFN widths to shrink memory footprint and improve throughput, enabling single-GPU (H100/H200) deployment while preserving instruction following and CoT quality. In internal evaluations (NeMo-Skills, up to 16 runs, temp = 0.6, top_p = 0.95), the model reports strong reasoning/coding results, e.g., MATH500 pass@1 = 97.4, AIME-2024 = 87.5, AIME-2025 = 82.71, GPQA = 71.97, LiveCodeBench (24.10–25.02) = 73.58, and MMLU-Pro (CoT) = 79.53. The model targets practical inference efficiency (high tokens/s, reduced VRAM) with Transformers/vLLM support and explicit “reasoning on/off” modes (chat-first defaults, greedy recommended when disabled). Suitable for building agents, assistants, and long-context retrieval systems where balanced accuracy-to-cost and reliable tool use matter.

OpenRouter 원가 (1M 토큰)

입력$0.100

출력$0.400

합계$0.500

인보이스랩 (월 예상)

입력 0.5M + 출력 0.5M 기준

공급가액₩411

부가세 (10%)₩41

결제 금액₩453

💡 부가세 ₩41 매입공제 가능

모델 정보

기본 정보

모델 ID	nvidia/llama-3.3-nemotron-super-49b-v1.5
제공사	NVIDIA
컨텍스트 윈도우	131,072 토큰
모달리티	text->text

지원 기능

⚡ Function Calling📄 JSON Mode🌡️ Temperature📏 Max Tokens

API 사용법

Python (OpenAI SDK 호환)

from openai import OpenAI

client = OpenAI(
    api_key="your-dream-api-key",
    base_url="https://api.invoicedream.co.kr/v1"
)

response = client.chat.completions.create(
    model="nvidia/llama-3.3-nemotron-super-49b-v1.5",
    messages=[
        {"role": "user", "content": "안녕하세요"}
    ]
)

print(response.choices[0].message.content)

Node.js / TypeScript

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'your-dream-api-key',
  baseURL: 'https://api.invoicedream.co.kr/v1'
});

const response = await client.chat.completions.create({
  model: 'nvidia/llama-3.3-nemotron-super-49b-v1.5',
  messages: [{ role: 'user', content: '안녕하세요' }]
});

console.log(response.choices[0].message.content);

cURL

curl https://api.invoicedream.co.kr/v1/chat/completions \
  -H "Authorization: Bearer your-dream-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nvidia/llama-3.3-nemotron-super-49b-v1.5",
    "messages": [{"role": "user", "content": "안녕하세요"}]
  }'

💡 Tip: OpenAI SDK를 그대로 사용할 수 있습니다.base_url만 변경하면 됩니다!