NVIDIA: Llama 3.3 Nemotron Super 49B V1.5
Llama-3.3-Nemotron-Super-49B-v1.5 is a 49B-parameter, English-centric reasoning/chat model derived from Meta’s Llama-3.3-70B-Instruct with a 128K context. It’s post-trained for agentic workflows (RAG, tool calling) via SFT across math, code, science, and multi-turn chat, followed by multiple RL stages; Reward-aware Preference Optimization (RPO) for alignment, RL with Verifiable Rewards (RLVR) for step-wise reasoning, and iterative DPO to refine tool-use behavior. A distillation-driven Neural Architecture Search (“Puzzle”) replaces some attention blocks and varies FFN widths to shrink memory footprint and improve throughput, enabling single-GPU (H100/H200) deployment while preserving instruction following and CoT quality. In internal evaluations (NeMo-Skills, up to 16 runs, temp = 0.6, top_p = 0.95), the model reports strong reasoning/coding results, e.g., MATH500 pass@1 = 97.4, AIME-2024 = 87.5, AIME-2025 = 82.71, GPQA = 71.97, LiveCodeBench (24.10–25.02) = 73.58, and MMLU-Pro (CoT) = 79.53. The model targets practical inference efficiency (high tokens/s, reduced VRAM) with Transformers/vLLM support and explicit “reasoning on/off” modes (chat-first defaults, greedy recommended when disabled). Suitable for building agents, assistants, and long-context retrieval systems where balanced accuracy-to-cost and reliable tool use matter.
OpenRouter 원가 (1M 토큰)
인보이스랩 (월 예상)
입력 0.5M + 출력 0.5M 기준
모델 정보
기본 정보
| 모델 ID | nvidia/llama-3.3-nemotron-super-49b-v1.5 |
| 제공사 | NVIDIA |
| 컨텍스트 윈도우 | 131,072 토큰 |
| 모달리티 | text->text |
지원 기능
API 사용법
Python (OpenAI SDK 호환)
from openai import OpenAI
client = OpenAI(
api_key="your-dream-api-key",
base_url="https://api.invoicedream.co.kr/v1"
)
response = client.chat.completions.create(
model="nvidia/llama-3.3-nemotron-super-49b-v1.5",
messages=[
{"role": "user", "content": "안녕하세요"}
]
)
print(response.choices[0].message.content)Node.js / TypeScript
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: 'your-dream-api-key',
baseURL: 'https://api.invoicedream.co.kr/v1'
});
const response = await client.chat.completions.create({
model: 'nvidia/llama-3.3-nemotron-super-49b-v1.5',
messages: [{ role: 'user', content: '안녕하세요' }]
});
console.log(response.choices[0].message.content);cURL
curl https://api.invoicedream.co.kr/v1/chat/completions \
-H "Authorization: Bearer your-dream-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "nvidia/llama-3.3-nemotron-super-49b-v1.5",
"messages": [{"role": "user", "content": "안녕하세요"}]
}'💡 Tip: OpenAI SDK를 그대로 사용할 수 있습니다.base_url만 변경하면 됩니다!