九维我操你爹
今天的社交能量用尽了……我要挑些旅游的时候故意压着没发的照片出来欣赏了
顺便创死路过的还在上班的网友
@yzqzss 吃吃吃我最会吃了🥺
从前车马慢
洇在纸上的情意,把滚烫的念想放凉了再写,写完了还要再等,那迢递信件
所谓江海万里,心中念你,便不觉遥远
苹果测试工程师的日常
params recommended for thinking/general tasks works well however the benchmark were like running forever
qwen3.5-9b original

qwopus-9b-v3.5
(thinking/general tasks)

MMLU↓
CMMLU↓
JMMLU↑
TRUTHFULQA↓
HUMANEVAL↑

gain more than loss, yet not very impressive
https://mp.weixin.qq.com/s/0qt9NOovocqxGRM5RbZMcA

天声人语 2026.05.15|名为“治安维持”的恶龙
开启思考后 general/coding tasks 的参数表现都不错,之前不开思考的时候得到的结果就惨不忍睹了……朋友们 qwopus 千万要打开思考开关用啊
跑这么点基准测试在顶配 M5 无印上需要 20 个小时,明天会拿原版 qwen 3.5 9B 的测试结果做最后的对比,看看到底 qwopus v3.5 是否真的提升了模型表现
苹果测试工程师的日常
params recommended for thinking/general tasks works well however the benchmark were like running forever
Intelligence Benchmark Comparison

              Mode    Sampled         Qwopus3.5-9B-v3.5-oQ8-mtp
---------------------------------------------------------------
MMLU          Sample  1000/14042                          86.3%
CMMLU         Sample  300/11582                           82.3%
JMMLU         Sample  300/7536                            83.0%
TRUTHFULQA    Full    817                                 82.4%
HUMANEVAL     Full    164                                 88.4%

--- Detail ---

Model: Qwopus3.5-9B-v3.5-oQ8-mtp
Benchmark         Accuracy   Correct   Total   Time(s)   Think
--------------------------------------------------------------
MMLU                 86.3%       863    1000   27876.7     Yes
CMMLU                82.3%       247     300    7532.3     Yes
JMMLU                83.0%       249     300    8052.7     Yes
TRUTHFULQA           82.4%       673     817   20773.1     Yes
HUMANEVAL            88.4%       145     164      6723     Yes
hf-mirror 的 hfd.sh 加速脚本小心用吧……
脚本下载模型失败了它会静默自动清理目录,但是连路径里的空格都没处理。这个脚本质量也是醉了
哪天真的 rm -rf / usr/local 就好玩了
Back to Top