Llama 3.1 8b Instant

llama-3.1-8b-instant#

Providers / Groq / llama-3.1-8b-instant

📋 Overview#

  • ID: llama-3.1-8b-instant
  • Provider: Groq
  • Authors: Meta
  • Release Date: 2024-07-23
  • Knowledge Cutoff: 2023-12-01
  • Open Weights: true
  • Context Window: 131k tokens
  • Max Output: 8k tokens

🔬 Technical Specifications#

Sampling Controls: Temperature Top-P

🎯 Capabilities#

Feature Overview#

Supports text generation and processing Supported input modalities Supported output modalities Temperature sampling control Nucleus sampling (top-p) Maximum token limit Stop sequences Response streaming

Input/Output Modalities#

DirectionTextImageAudioVideoPDF
Input
Output

Core Features#

Tool CallingTool DefinitionsTool ChoiceWeb SearchFile Attachments

Response Delivery#

StreamingStructured OutputJSON ModeFunction CallText Format

🎛️ Generation Controls#

Sampling & Decoding#

TemperatureTop-P
0.0-2.00.0-1.0

Length & Termination#

Max TokensStop Sequences
1-8k

💰 Pricing#

Pricing shown for Groq

Token Pricing#

InputOutputReasoningCache ReadCache Write
$0.05/1M$0.08/1M---

💰 Cost Calculator#

Calculate costs for common usage patterns:

Use CaseInputOutputTotal Cost
Quick chat (1K in, 500 out)1k tokens500 tokens$0.000090
Document summary (10K in, 1K out)10k tokens1k tokens$0.000580
RAG query (50K in, 2K out)50k tokens2k tokens$0.002660
Code generation (5K in, 10K out)5k tokens10k tokens$0.001050

Pricing Formula:

1Cost = (Input Tokens / 1M × $0.05) + (Output Tokens / 1M × $0.08)

📊 Example Costs#

Real-world usage examples and their costs:

Usage TierDaily VolumeMonthly TokensMonthly Cost
Personal (10 chats/day)10 chats675k$0.0405
Small Team (100 chats/day)100 chats9.0M$0.5400
Enterprise (1000 chats/day)1000 chats135.0M$8.10

📋 Metadata#

Created: 0001-01-01 00:00:00 UTC

Last Updated: 2025-10-19 18:13:08 UTC



Last Updated: 2025-10-21 23:55:56 UTC | Generated by ModelWiki