海量在线大模型 兼容OpenAI API

全部大模型

350个模型 · 2026-04-03 更新
StepFun: Step 3.5 Flash
$0.0004/1k
$0.0012/1k
stepfun/step-3.5-flash
Step 3.5 Flash is StepFun's most capable open-source foundation model. Built on a sparse Mixture of Experts (MoE) architecture, it selectively activates only 11B of its 196B parameters per token. It is a reasoning model that is incredibly speed efficient even at long contexts.
2026-01-30 262,144 text->text Other
Relace: Relace Search
$0.0040/1k
$0.012/1k
relace/relace-search
The relace-search model uses 4-12 view_file and grep tools in parallel to explore a codebase and return relevant files to the user request. In contrast to RAG, relace-search performs agentic multi-step reasoning to produce highly precise results 4x faster than any frontier model. It's designed to serve as a subagent that passes its findings to an "oracle" coding agent, who orchestrates/performs the rest of the coding task. To use relace-search you need to build an appropriate agent harness, and parse the response for relevant information to hand off to the oracle. Read more about it in the Relace documentation.
2025-12-09 256,000 text->text Other
Relace: Relace Apply 3
$0.0034/1k
$0.0050/1k
relace/relace-apply-3
Relace Apply 3 is a specialized code-patching LLM that merges AI-suggested edits straight into your source files. It can apply updates from GPT-4o, Claude, and others into your files at 10,000 tokens/sec on average. The model requires the prompt to be in the following format: {instruction} {initial_code} {edit_snippet} Zero Data Retention is enabled for Relace. Learn more about this model in their documentation
2025-09-26 256,000 text->text Other
Reka Flash 3
$0.0004/1k
$0.0008/1k
rekaai/reka-flash-3
Reka Flash 3 is a general-purpose, instruction-tuned large language model with 21 billion parameters, developed by Reka. It excels at general chat, coding tasks, instruction-following, and function calling. Featuring a 32K context length and optimized through reinforcement learning (RLOO), it provides competitive performance comparable to proprietary models within a smaller parameter footprint. Ideal for low-latency, local, or on-device deployments, Reka Flash 3 is compact, supports efficient quantization (down to 11GB at 4-bit precision), and employs explicit reasoning tags ("") to indicate its internal thought process. Reka Flash 3 is primarily an English model with limited multilingual understanding capabilities. The model weights are released under the Apache 2.0 license.
2025-03-13 65,536 text->text Other
Reka Edge
$0.0004/1k
$0.0004/1k
rekaai/reka-edge
Reka Edge is an extremely efficient 7B multimodal vision-language model that accepts image/video+text inputs and generates text outputs. This model is optimized specifically to deliver industry-leading performance in image understanding, video analysis, object detection, and agentic tool-use.
2026-03-21 16,384 text+image+video->text Other
qwen/qwen3.6-plus:free
Qwen 3.6 Plus builds on a hybrid architecture that combines efficient linear attention with sparse mixture-of-experts routing, enabling strong scalability and high-performance inference. Compared to the 3.5 series, it delivers major gains in agentic coding, front-end development, and overall reasoning, with a significantly improved “vibe coding” experience. The model excels at complex tasks such as 3D scenes, games, and repository-level problem solving, achieving a 78.8 score on SWE-bench Verified. It represents a substantial leap in both pure-text and multimodal capabilities, performing at the level of leading state-of-the-art models.
2026-04-02 1,000,000 text+image+video->text Qwen3
Qwen: Qwen3.5-Flash
$0.0003/1k
$0.0010/1k
qwen/qwen3.5-flash-02-23
The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. Compared to the 3 series, these models deliver a leap forward in performance for both pure text and multimodal tasks, offering fast response times while balancing inference speed and overall performance.
2026-02-26 1,000,000 text+image+video->text Qwen3
Qwen: Qwen3.5-9B
$0.0002/1k
$0.0006/1k
qwen/qwen3.5-9b
Qwen3.5-9B is a multimodal foundation model from the Qwen3.5 family, designed to deliver strong reasoning, coding, and visual understanding in an efficient 9B-parameter architecture. It uses a unified vision-language design with early fusion of multimodal tokens, allowing the model to process and reason across text and images within the same context.
2026-03-10 256,000 text+image+video->text Qwen3
Qwen: Qwen3.5-35B-A3B
$0.0006/1k
$0.0052/1k
qwen/qwen3.5-35b-a3b
The Qwen3.5 Series 35B-A3B is a native vision-language model designed with a hybrid architecture that integrates linear attention mechanisms and a sparse mixture-of-experts model, achieving higher inference efficiency. Its overall performance is comparable to that of the Qwen3.5-27B.
2026-02-26 262,144 text+image+video->text Qwen3
Qwen: Qwen3.5-27B
$0.0008/1k
$0.0062/1k
qwen/qwen3.5-27b
The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, delivering fast response times while balancing inference speed and performance. Its overall capabilities are comparable to those of the Qwen3.5-122B-A10B.
2026-02-26 262,144 text+image+video->text Qwen3
Qwen: Qwen3.5-122B-A10B
$0.0010/1k
$0.0083/1k
qwen/qwen3.5-122b-a10b
The Qwen3.5 122B-A10B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. In terms of overall performance, this model is second only to Qwen3.5-397B-A17B. Its text capabilities significantly outperform those of Qwen3-235B-2507, and its visual capabilities surpass those of Qwen3-VL-235B.
2026-02-26 262,144 text+image+video->text Qwen3
Qwen: Qwen3.5 Plus 2026-02-15
$0.0010/1k
$0.0062/1k
qwen/qwen3.5-plus-02-15
The Qwen3.5 native vision-language series Plus models are built on a hybrid architecture that integrates linear attention mechanisms with sparse mixture-of-experts models, achieving higher inference efficiency. In a variety of task evaluations, the 3.5 series consistently demonstrates performance on par with state-of-the-art leading models. Compared to the 3 series, these models show a leap forward in both pure-text and multimodal capabilities.
2026-02-16 1,000,000 text+image+video->text Qwen3