Qwen team has just released their latest O1-level model: Qwen with Questions (QwQ-32B-preview), achieving OpenAI-o1 level performance on multiple mathematical, reasoning, and code benchmark, according to their technical report.
I deployed QwQ-32B-preview on RTX 3090 with 4-bit quantization and vLLM. After evaluating the model on Kaggle AI Mathematical Olympiad Competition II, I found that:
- It costs about 10K tokens on average for a single question.
- It achieved similiar score with CoT prompt compared to Qwen2.5–72B-Math-Instruct model with TIR & RM, score 8/50 on the leaderboard.
- Perhaps with better prompt engineering, such as TIR, and techniques like Majority Voting or RM sampling, the model could achieve better results.
Fell free to leave your comment if you find anything interesting about this model.