苹果测试工程师的日常

23:13 · 2026年5月17日 · 周日

从前车马慢
洇在纸上的情意，把滚烫的念想放凉了再写，写完了还要再等，那迢递信件
所谓江海万里，心中念你，便不觉遥远

14:23 · 2026年5月17日 · 周日

苹果测试工程师的日常

params recommended for thinking/general tasks works well however the benchmark were like running forever

qwen3.5-9b original
→
qwopus-9b-v3.5
（thinking/general tasks）

MMLU↓
CMMLU↓
JMMLU↑
TRUTHFULQA↓
HUMANEVAL↑

gain more than loss, yet not very impressive

13:44 · 2026年5月17日 · 周日

https://fixupx.com/i/status/2055841151970930785

肝胆香皂草

FixupX

中国ほのぼの商品館 (@honobonochina)

肝臓胆嚢石鹸
1167円

23:52 · 2026年5月15日 · 周五

https://mp.weixin.qq.com/s/0qt9NOovocqxGRM5RbZMcA

天声人语 2026.05.15｜名为“治安维持”的恶龙

22:31 · 2026年5月15日 · 周五

20:15 · 2026年5月15日 · 周五

https://mp.weixin.qq.com/s/7EDO2rTjxl49Y4od65xYog

“青色”是 green 还是 blue？

19:37 · 2026年5月15日 · 周五

开启思考后 general/coding tasks 的参数表现都不错，之前不开思考的时候得到的结果就惨不忍睹了……朋友们 qwopus 千万要打开思考开关用啊
跑这么点基准测试在顶配 M5 无印上需要 20 个小时，明天会拿原版 qwen 3.5 9B 的测试结果做最后的对比，看看到底 qwopus v3.5 是否真的提升了模型表现

19:36 · 2026年5月15日 · 周五

苹果测试工程师的日常

params recommended for thinking/general tasks works well however the benchmark were like running forever

Intelligence Benchmark Comparison

              Mode    Sampled         Qwopus3.5-9B-v3.5-oQ8-mtp
---------------------------------------------------------------
MMLU          Sample  1000/14042                          86.3%
CMMLU         Sample  300/11582                           82.3%
JMMLU         Sample  300/7536                            83.0%
TRUTHFULQA    Full    817                                 82.4%
HUMANEVAL     Full    164                                 88.4%

--- Detail ---

Model: Qwopus3.5-9B-v3.5-oQ8-mtp
Benchmark         Accuracy   Correct   Total   Time(s)   Think
--------------------------------------------------------------
MMLU                 86.3%       863    1000   27876.7     Yes
CMMLU                82.3%       247     300    7532.3     Yes
JMMLU                83.0%       249     300    8052.7     Yes
TRUTHFULQA           82.4%       673     817   20773.1     Yes
HUMANEVAL            88.4%       145     164      6723     Yes

19:26 · 2026年5月15日 · 周五

hf-mirror 的 hfd.sh 加速脚本小心用吧……
脚本下载模型失败了它会静默自动清理目录，但是连路径里的空格都没处理。这个脚本质量也是醉了
哪天真的 rm -rf / usr/local 就好玩了

17:58 · 2026年5月15日 · 周五

Welcome to the Black Parade

这个问题的迷人之处在于，甚至 StackOverflow 都无法给出正确的回答： https://stackoverflow.com/questions/47968861/does-python-logging-support-multiprocessing: 高赞回答全错 https://stackoverflow.com/questions/1154446/is-file-append-atomic-in-unix: 高赞回答全错其中有个回答非常具有迷惑性，这个博客 (https://www.no…

但其实第二高赞的回答是正确的

17:58 · 2026年5月15日 · 周五

这个问题的迷人之处在于，甚至 StackOverflow 都无法给出正确的回答：
https://stackoverflow.com/questions/47968861/does-python-logging-support-multiprocessing: 高赞回答全错
https://stackoverflow.com/questions/1154446/is-file-append-atomic-in-unix: 高赞回答全错

其中有个回答非常具有迷惑性，这个博客 (https://www.notthewizard.com/2014/06/17/are-files-appends-really-atomic/) 里用 bash 做了 O_APPEND 实验，方法是 20 个进程并行 echo "$line" >> /tmp/out.tmp ，由于 echo 默认会输出 \n，最后验证一下 /tmp/out.tmp 里是否有预期的行数和字节数（数据完整性）、每一行的字节数是否为预期值（单次 write 的原子性），最后发现在 Linux ext3 上 4096 是分水岭，4097 时会出现碎片，进程之间的 write 会穿插输出。我在 Linux 6.17 ext4 上可以复现。

上面这个实验得到的错误结论毒害了整个 SO 很多年，大量的回答都在复读 PIPE_BUF (4096) 这个数，然而 ta 的实验错在 bash echo 的行为很隐晦，如果 echo 的数据（含 \n suffix）大于 4096 并且输出 fd 是 regular file，echo 会拆成多个 buffer 调用多次 write，根本就没有测试到 write 的原子性。

# $ strace -e write -fTtt bash -c 'line=$(printf "%4096s" "" | tr " " A); echo "$line" >> /tmp/a'
16:36:20.792871 write(1, "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"..., 4096) = 4096 <0.000043>
16:36:20.792936 write(1, "\n", 1)       = 1 <0.000007>

真相是 O_APPEND 是具有多进程原子性的，在 https://elixir.bootlin.com/linux/v6.18.29/source/mm/filemap.c#L4406 的 generic_file_write_iter 里：

ssize_t generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
{
[...]
  inode_lock(inode);
  ret = generic_write_checks(iocb, from);
  if (ret > 0)
    ret = __generic_file_write_iter(iocb, from);
  inode_unlock(inode);
[...]
}

generic_write_checks() 会 if (iocb->ki_flags & IOCB_APPEND) iocb->ki_pos = ... 来推进 pos 指针，然后 __generic_file_write_iter() 写入数据，这两步都在 inode_lock(inode) semaphore 保护下，多进程安全。

此外，就算没有 O_APPEND，现代 Linux 也实现了相当程度的 write 原子性，在 man 2 write 里，最后一节 BUGS:

BUGS
According to POSIX.1-2008/SUSv4 Section XSI 2.9.7 ("Thread Interactions with Regular File Operations"):

All of the following functions shall be atomic with respect to each other in the effects specified in POSIX.1-2008 when they operate on regular files or symbolic links: ...

Among the APIs subsequently listed are write() and writev(2). And among the effects that should be atomic across threads (and processes) are updates of the file offset. However, before Linux 3.14,
this was not the case: if two processes that share an open file description (see open(2)) perform a write() (or writev(2)) at the same time, then the I/O operations were not atomic with respect to up‐
dating the file offset, with the result that the blocks of data output by the two processes might (incorrectly) overlap. This problem was fixed in Linux 3.14.

就是在说，如果两个进程的 fd 指向同一个 file description（如图），那它们共享同一个 file offset，自动拥有原子性和互斥性。

内核实现是在 https://elixir.bootlin.com/linux/v6.18.29/source/fs/file.c#L1200 的 file_needs_f_pos_lock 里，如果引用计数大于一，有多进程共享，会 mutex_lock(&file->f_pos_lock) 上锁

static inline bool file_needs_f_pos_lock(struct file *file)
{
  if (!(file->f_mode & FMODE_ATOMIC_POS))
    return false;
  if (__file_ref_read_raw(&file->f_ref) != FILE_REF_ONEREF)
    return true;
[...]
}

其中 FMODE_ATOMIC_POS 对于 regular file 是自动加上的，注释标明这是 SUSv4 的要求

  /* POSIX.1-2008/SUSv4 Section XSI 2.9.7 */
  if (S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode))
    f->f_mode |= FMODE_ATOMIC_POS;

不少著名服务其实依赖了这个行为，比如 nginx 的 worker log 都是从 master fork 继承过来的，那无需 O_APPEND 就能正确运行；Python 著名的 WSGI server gunicorn 也是这种 fork 继承 log fd 模式，曾经有人提问它怎么保证日志输出不会产生多进程 race，我也曾百思不得其解： https://github.com/benoitc/gunicorn/issues/1272

这是我九年前在北京日夜学习思考记录在 trello 的最后一个未解之谜，虽然多少有点焦虑 LLM 的强大，依然很高兴自己能在 AI 辅助下以前所未有地速度解决复杂问题。（这样就有更多时间摸鱼和玩游戏了😉）

(@refault_any 考古 2002 年 Linus 撕逼提到 O_APPEND 也很有趣： https://lore.kernel.org/all/Pine.LNX.4.33.0208011613440.1315-100000@penguin.transmeta.com/

Stack Overflow

Does python logging support multiprocessing?

I have been told that logging can not be used in Multiprocessing. You have to do the concurrency control in case multiprocessing messes the log.

But I did some test, it seems like there is no prob...

11:01 · 2026年5月15日 · 周五

https://fixupx.com/i/status/2055010623910785535

FixupX · github.com

yetone (@yetone)

由于这篇文章太伟大了，所以我把它变成了一个 Agent Skill。

大家可以使用自己的 Coding Agent 安装一下这个 Skill，这样就可以用「最佳实践」来轻松地重构或者开发一个既容易跨平台、又极其接近 Native 性能的桌面端应用。

https://github.com/yetone/native-feel-skill

引用 Pedro Duarte (@peduarte)

everything you need to know about how the team built…

23:38 · 2026年5月14日 · 周四

老同学第一次去北京很是吃了些苦头，天天打电话和我控诉北京吃的又贵又难吃（他喝过豆汁而且喝完了，我觉得他已经比我牛逼 100 倍了）
找在北京的朋友问了下他出差的地方附近有哪些好吃的，希望能在最后几天帮北京改善下印象。结果老同学已经绝望到把最后的期待留给冰糖葫芦了……
我说夏天不兴吃这个，同学问我那夏天吃什么？
是啊，吃什么？一时除了柳絮真的想不到其他答案了（

20:51 · 2026年5月14日 · 周四

#晚安世界
https://www.bilibili.com/video/BV1wN5R68EBq/
Entends l'écho

Bilibili

[翻唱] L'amour nous désarme / 为爱投降（摇滚红与黑） by 洛朗班/Laurent Ban_哔哩哔哩_bilibili

法语音乐剧《摇滚红与黑》选段, 视频播放量 2602、弹幕量 4、点赞数 583、投硬币枚数 366、收藏人数 220、转发人数 92, 视频作者 LaurentBanOff, 作者简介洛朗班Laurent Bàn，法国歌手、音乐剧演员、艺术家联系：laurentbanstudioart@gmail.com，相关视频：[原唱] Je m'efface à jamais / 将自我隐去（大鼻子情圣） by 洛朗班/Laurent Ban，[翻唱] L'assasymphonie / 杀人交响曲(摇滚莫扎特)…

晚安世界

20:47 · 2026年5月14日 · 周四

今天是什么特别好的好日子吗？粮这么多？

19:39 · 2026年5月14日 · 周四

苹果测试工程师的日常

https://fixupx.com/LfXAMDg4PE50i9e/status/2053291937743130932

サカバンバスピスにそっくり気がする…
https://store.line.me/stickershop/product/31842022/ja

19:39 · 2026年5月14日 · 周四

https://fixupx.com/LfXAMDg4PE50i9e/status/2053291937743130932

🧵 スレッド • FixupX

昔の芸術をつぶやくよ (@LfXAMDg4PE50i9e)

良い感じの脱力感。跳ねた尻尾と目がキュートですね。画像はカルフォルニア州南部に住んでいたとされるチュマシュ族のクジラの彫刻（12～16世紀頃）です。

漁の安全と大量を願うお守り的役割を果たしていたとか。玄関の飾りに欲しいです。現在モントリオール美術館が所蔵しています。

19:07 · 2026年5月14日 · 周四

苹果测试工程师的日常

找网友帮忙测试下 thinking / general tasks 的参数配置，因为电脑太烫跑不下去了我这边下午针对 nonthinking / reasoning tasks 补充了一些测试，到现在还没跑完（目前出来的结果可以看截图）。说实话跑到这里我已经对 qwopus 没什么信心了，既然网友不想测了我干脆也放弃这个模型了明天测下 qwen 3.6 原版的智商，不出意外之后 local LLM 就跑它了

params recommended for thinking/general tasks works well
however the benchmark were like running forever

18:50 · 2026年5月14日 · 周四

只是打了一个半小时的卫戍协议而已😅