Despite its relatively modest 32 billion parameters, QwQ-32B has demonstrated high performance in mathematics, programming, and general problem-solving tasks. For comparison, DeepSeek R1 uses 671 billion parameters, and OpenAI o1-mini has 100 billion. The reduced number of parameters allows the model to operate with lower computational costs, simplifying its deployment.
The development of QwQ-32B aligns with Alibaba’s strategic focus on creating AI with practical applications, as previously stated by the company’s chairman, Joseph Tsai. The new model is based on reinforcement learning methods similar to those used in the creation of DeepSeek R1.