海量在线大模型 兼容OpenAI API

全部大模型

320个模型 · 2025-07-23 更新
Cohere: Command R (03-2024)
$0.0020/1k
$0.0060/1k
cohere/command-r-03-2024
Command-R is a 35B parameter model that performs conversational language tasks at a higher quality, more reliably, and with a longer context than previous models. It can be used for complex workflows like code generation, retrieval augmented generation (RAG), tool use, and agents. Read the launch post here. Use of this model is subject to Cohere's Usage Policy and SaaS Agreement.
2024-03-02 128,000 text->text Cohere
Cohere: Command R
$0.0020/1k
$0.0060/1k
cohere/command-r
Command-R is a 35B parameter model that performs conversational language tasks at a higher quality, more reliably, and with a longer context than previous models. It can be used for complex workflows like code generation, retrieval augmented generation (RAG), tool use, and agents. Read the launch post here. Use of this model is subject to Cohere's Usage Policy and SaaS Agreement.
2024-03-14 128,000 text->text Cohere
Cohere: Command
$0.0040/1k
$0.0080/1k
cohere/command
Command is an instruction-following conversational model that performs language tasks with high quality, more reliably and with a longer context than our base generative models. Use of this model is subject to Cohere's Usage Policy and SaaS Agreement.
2024-03-14 4,096 text->text Cohere
Auto Router
免费使用
openrouter/auto
Your prompt will be processed by a meta-model and routed to one of dozens of models (see below), optimizing for the best possible output. To see which model was used, visit Activity, or read the model attribute of the response. Your response will be priced at the same rate as the routed model. The meta-model is powered by Not Diamond. Learn more in our docs. Requests will be routed to the following models: - openai/gpt-4o-2024-08-06 - openai/gpt-4o-2024-05-13 - openai/gpt-4o-mini-2024-07-18 - openai/chatgpt-4o-latest - openai/o1-preview-2024-09-12 - openai/o1-mini-2024-09-12 - anthropic/claude-3.5-sonnet - anthropic/claude-3.5-haiku - anthropic/claude-3-opus - anthropic/claude-2.1 - google/gemini-pro-1.5 - google/gemini-flash-1.5 - mistralai/mistral-large-2407 - mistralai/mistral-nemo - deepseek/deepseek-r1 - meta-llama/llama-3.1-70b-instruct - meta-llama/llama-3.1-405b-instruct - mistralai/mixtral-8x22b-instruct - cohere/command-r-plus - cohere/command-r
2023-11-08 2,000,000 text->text Router
cognitivecomputations/dolphin-mistral-24b-venice-edition:free
Venice Uncensored Dolphin Mistral 24B Venice Edition is a fine-tuned variant of Mistral-Small-24B-Instruct-2501, developed by dphn.ai in collaboration with Venice.ai. This model is designed as an “uncensored” instruct-tuned LLM, preserving user control over alignment, system prompts, and behavior. Intended for advanced and unrestricted use cases, Venice Uncensored emphasizes steerability and transparent behavior, removing default safety and alignment layers typically found in mainstream assistant models.
2025-07-10 32,768 text->text Other
TheDrummer: Valkyrie 49B V1
$0.0026/1k
$0.0040/1k
thedrummer/valkyrie-49b-v1
Built on top of NVIDIA's Llama 3.3 Nemotron Super 49B, Valkyrie is TheDrummer's newest model drop for creative writing.
2025-05-24 131,072 text->text Other
TheDrummer: Skyfall 36B V2
$0.0001/1k
$0.0001/1k
thedrummer/skyfall-36b-v2
Skyfall 36B v2 is an enhanced iteration of Mistral Small 2501, specifically fine-tuned for improved creativity, nuanced writing, role-playing, and coherent storytelling.
2025-03-11 16,384 text->text Other
TheDrummer: Anubis Pro 105B V1
$0.0020/1k
$0.0040/1k
thedrummer/anubis-pro-105b-v1
Anubis Pro 105B v1 is an expanded and refined variant of Meta’s Llama 3.3 70B, featuring 50% additional layers and further fine-tuning to leverage its increased capacity. Designed for advanced narrative, roleplay, and instructional tasks, it demonstrates enhanced emotional intelligence, creativity, nuanced character portrayal, and superior prompt adherence compared to smaller models. Its larger parameter count allows for deeper contextual understanding and extended reasoning capabilities, optimized for engaging, intelligent, and coherent interactions.
2025-03-11 131,072 text->text Other
tencent/hunyuan-a13b-instruct:free
Hunyuan-A13B is a 13B active parameter Mixture-of-Experts (MoE) language model developed by Tencent, with a total parameter count of 80B and support for reasoning via Chain-of-Thought. It offers competitive benchmark performance across mathematics, science, coding, and multi-turn reasoning tasks, while maintaining high inference efficiency via Grouped Query Attention (GQA) and quantization support (FP8, GPTQ, etc.).
2025-07-08 32,768 text->text Other
Tencent: Hunyuan A13B Instruct
$0.0001/1k
$0.0001/1k
tencent/hunyuan-a13b-instruct
Hunyuan-A13B is a 13B active parameter Mixture-of-Experts (MoE) language model developed by Tencent, with a total parameter count of 80B and support for reasoning via Chain-of-Thought. It offers competitive benchmark performance across mathematics, science, coding, and multi-turn reasoning tasks, while maintaining high inference efficiency via Grouped Query Attention (GQA) and quantization support (FP8, GPTQ, etc.).
2025-07-08 32,768 text->text Other
thudm/glm-z1-32b:free
GLM-Z1-32B-0414 is an enhanced reasoning variant of GLM-4-32B, built for deep mathematical, logical, and code-oriented problem solving. It applies extended reinforcement learning—both task-specific and general pairwise preference-based—to improve performance on complex multi-step tasks. Compared to the base GLM-4-32B model, Z1 significantly boosts capabilities in structured reasoning and formal domains. The model supports enforced “thinking” steps via prompt engineering and offers improved coherence for long-form outputs. It’s optimized for use in agentic workflows, and includes support for long context (via YaRN), JSON tool calling, and fine-grained sampling configuration for stable inference. Ideal for use cases requiring deliberate, multi-step reasoning or formal derivations.
2025-04-18 32,768 text->text Other
THUDM: GLM Z1 32B
$0.0001/1k
$0.0001/1k
thudm/glm-z1-32b
GLM-Z1-32B-0414 is an enhanced reasoning variant of GLM-4-32B, built for deep mathematical, logical, and code-oriented problem solving. It applies extended reinforcement learning—both task-specific and general pairwise preference-based—to improve performance on complex multi-step tasks. Compared to the base GLM-4-32B model, Z1 significantly boosts capabilities in structured reasoning and formal domains. The model supports enforced “thinking” steps via prompt engineering and offers improved coherence for long-form outputs. It’s optimized for use in agentic workflows, and includes support for long context (via YaRN), JSON tool calling, and fine-grained sampling configuration for stable inference. Ideal for use cases requiring deliberate, multi-step reasoning or formal derivations.
2025-04-18 32,768 text->text Other