ModelsHub 模型仓库

TNG: DeepSeek R1T2 Chimera

$0.0012/1k

tngtech/deepseek-r1t2-chimera

DeepSeek-TNG-R1T2-Chimera is the second-generation Chimera model from TNG Tech. It is a 671 B-parameter mixture-of-experts text-generation model assembled from DeepSeek-AI’s R1-0528, R1, and V3-0324 checkpoints with an Assembly-of-Experts merge. The tri-parent design yields strong reasoning performance while running roughly 20 % faster than the original R1 and more than 2× faster than R1-0528 under vLLM, giving a favorable cost-to-intelligence trade-off. The checkpoint supports contexts up to 60 k tokens in standard use (tested to ~130 k) and maintains consistent token behaviour, making it suitable for long-context analysis, dialogue and other open-ended generation tasks.

2025-07-08 163,840 text->text DeepSeek

立即聊天

TNG: DeepSeek R1T Chimera (free)

免费使用

tngtech/deepseek-r1t-chimera:free

DeepSeek-R1T-Chimera is created by merging DeepSeek-R1 and DeepSeek-V3 (0324), combining the reasoning capabilities of R1 with the token efficiency improvements of V3. It is based on a DeepSeek-MoE Transformer architecture and is optimized for general text generation tasks. The model merges pretrained weights from both source models to balance performance across reasoning, efficiency, and instruction-following tasks. It is released under the MIT license and intended for research and commercial use.

2025-04-27 163,840 text->text DeepSeek

立即聊天

Perplexity: R1 1776

$0.0080/1k

$0.032/1k

perplexity/r1-1776

R1 1776 is a version of DeepSeek-R1 that has been post-trained to remove censorship constraints related to topics restricted by the Chinese government. The model retains its original reasoning capabilities while providing direct responses to a wider range of queries. R1 1776 is an offline chat model that does not use the perplexity search subsystem. The model was tested on a multilingual dataset of over 1,000 examples covering sensitive topics to measure its likelihood of refusal or overly filtered responses. Evaluation Results Its performance on math and reasoning benchmarks remains similar to the base R1 model. Reasoning Performance Read more on the Blog Post

2025-02-20 128,000 text->text DeepSeek

立即聊天

Microsoft: MAI DS R1 (free)

免费使用

microsoft/mai-ds-r1:free

MAI-DS-R1 is a post-trained variant of DeepSeek-R1 developed by the Microsoft AI team to improve the model’s responsiveness on previously blocked topics while enhancing its safety profile. Built on top of DeepSeek-R1’s reasoning foundation, it integrates 110k examples from the Tulu-3 SFT dataset and 350k internally curated multilingual safety-alignment samples. The model retains strong reasoning, coding, and problem-solving capabilities, while unblocking a wide range of prompts previously restricted in R1. MAI-DS-R1 demonstrates improved performance on harm mitigation benchmarks and maintains competitive results across general reasoning tasks. It surpasses R1-1776 in satisfaction metrics for blocked queries and reduces leakage in harmful content categories. The model is based on a transformer MoE architecture and is suitable for general-purpose use cases, excluding high-stakes domains such as legal, medical, or autonomous systems.

2025-04-21 163,840 text->text DeepSeek

立即聊天

Microsoft: MAI DS R1

$0.0012/1k

microsoft/mai-ds-r1

2025-04-21 163,840 text->text DeepSeek

立即聊天

DeepSeek: R1 0528 (free)

免费使用

deepseek/deepseek-r1-0528:free

May 28th update to the original DeepSeek R1 Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass. Fully open-source model.

2025-05-29 163,840 text->text DeepSeek

立即聊天

DeepSeek: R1 0528

$0.0011/1k

deepseek/deepseek-r1-0528

2025-05-29 163,840 text->text DeepSeek

立即聊天

DeepSeek: R1 (free)

免费使用

deepseek/deepseek-r1:free

DeepSeek R1 is here: Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass. Fully open-source model & technical report. MIT licensed: Distill & commercialize freely!

2025-01-20 163,840 text->text DeepSeek

立即聊天

DeepSeek: DeepSeek V3 0324 (free)

免费使用

deepseek/deepseek-chat-v3-0324:free

DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from the DeepSeek team. It succeeds the DeepSeek V3 model and performs really well on a variety of tasks.

2025-03-24 32,768 text->text DeepSeek

立即聊天

DeepSeek: DeepSeek V3 0324

$0.0010/1k

$0.0034/1k

deepseek/deepseek-chat-v3-0324

2025-03-24 163,840 text->text DeepSeek

立即聊天

DeepSeek: DeepSeek Prover V2

$0.0020/1k

$0.0087/1k

deepseek/deepseek-prover-v2

DeepSeek Prover V2 is a 671B parameter model, speculated to be geared towards logic and mathematics. Likely an upgrade from DeepSeek-Prover-V1.5 Not much is known about the model yet, as DeepSeek released it on Hugging Face without an announcement or description.

2025-04-30 163,840 text->text DeepSeek

立即聊天

Anthropic: Claude Sonnet 4

$0.012/1k

$0.060/1k

anthropic/claude-sonnet-4

Claude Sonnet 4 significantly enhances the capabilities of its predecessor, Sonnet 3.7, excelling in both coding and reasoning tasks with improved precision and controllability. Achieving state-of-the-art performance on SWE-bench (72.7%), Sonnet 4 balances capability and computational efficiency, making it suitable for a broad range of applications from routine coding tasks to complex software development projects. Key enhancements include improved autonomous codebase navigation, reduced error rates in agent-driven workflows, and increased reliability in following intricate instructions. Sonnet 4 is optimized for practical everyday use, providing advanced reasoning capabilities while maintaining efficiency and responsiveness in diverse internal and external scenarios. Read more at the blog post here

2025-05-23 200,000 text+image->text Claude

立即聊天

全部大模型