NVIDIA: Llama 3.1 Nemotron Ultra 253B v1
Llama-3.1-Nemotron-Ultra-253B-v1 is a large language model (LLM) optimized for advanced reasoning, human-interactive chat, retrieval-augmented generation (RAG), and tool-calling tasks. Derived from Meta’s Llama-3.1-405B-Instruct, it has been significantly customized using Neural Architecture Search (NAS), resulting in enhanced efficiency, reduced memory usage, and improved inference latency. The model supports a context length of up to 128K tokens and can operate efficiently on an 8x NVIDIA H100 node. Note: you must include `detailed thinking on` in the system prompt to enable reasoning. Please see [Usage Recommendations](https://huggingface.co/nvidia/Llama-3_1-Nemotron-Ultra-253B-v1#quick-start-and-usage-recommendations) for more.
OpenRouter 원가 (1M 토큰)
인보이스랩 (월 예상)
입력 0.5M + 출력 0.5M 기준
모델 정보
기본 정보
| 모델 ID | nvidia/llama-3.1-nemotron-ultra-253b-v1 |
| 제공사 | NVIDIA |
| 컨텍스트 윈도우 | 131,072 토큰 |
| 모달리티 | text->text |
지원 기능
API 사용법
Python (OpenAI SDK 호환)
from openai import OpenAI
client = OpenAI(
api_key="your-dream-api-key",
base_url="https://api.invoicedream.co.kr/v1"
)
response = client.chat.completions.create(
model="nvidia/llama-3.1-nemotron-ultra-253b-v1",
messages=[
{"role": "user", "content": "안녕하세요"}
]
)
print(response.choices[0].message.content)Node.js / TypeScript
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: 'your-dream-api-key',
baseURL: 'https://api.invoicedream.co.kr/v1'
});
const response = await client.chat.completions.create({
model: 'nvidia/llama-3.1-nemotron-ultra-253b-v1',
messages: [{ role: 'user', content: '안녕하세요' }]
});
console.log(response.choices[0].message.content);cURL
curl https://api.invoicedream.co.kr/v1/chat/completions \
-H "Authorization: Bearer your-dream-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "nvidia/llama-3.1-nemotron-ultra-253b-v1",
"messages": [{"role": "user", "content": "안녕하세요"}]
}'💡 Tip: OpenAI SDK를 그대로 사용할 수 있습니다.base_url만 변경하면 됩니다!