qwen3.5-9b original→qwopus-9b-v3.5（thinking/general tasks）MMLU↓CMMLU↓JMMLU↑TRUTHFULQA↓HUMANEVAL↑gain more than loss, yet not very impressive

苹果测试工程师的日常

params recommended for thinking/general tasks works well however the benchmark were like running forever

qwen3.5-9b original
→
qwopus-9b-v3.5
（thinking/general tasks）

MMLU↓
CMMLU↓
JMMLU↑
TRUTHFULQA↓
HUMANEVAL↑

gain more than loss, yet not very impressive