Holo 3.1：消费级显卡跑 Computer Use Agent，法国 H Company 把「本地部署」做成了核心卖点

2026 年 6 月 1 日，法国 AI 公司 H Company 在官方博客发布 Holo3.1: Fast & Local Computer Use Agents。

上一代 Holo3 在 3 月发布时，OSWorld-Verified 拿到 78.85%，已经是开源 Computer Use 模型的天花板。但 H Company 发现了一个问题：跑分高不等于生产环境好用。

As teams moved Holo3 from evaluation to production, we repeatedly observed the same challenge: strong performance in one setting does not necessarily transfer to another.

Holo 3.1 不是一次简单的跑分升级，而是对「生产环境可用性」的系统性修复。

官方定位：三个维度的鲁棒性

H Company 官方博客明确说了 Holo 3.1 改进的方向：

Holo3.1 improves robustness across the three dimensions that matter most in production: environments (web, desktop, mobile), agent frameworks, and deployment targets.

维度	Holo3 的问题	Holo3.1 的改进
环境	浏览器/桌面强，移动端弱	移动端大幅提升
Agent 框架	只支持结构化 JSON 输出	新增 function-calling 协议
部署目标	只有云端推理	首次提供量化权重，支持本地推理

模型家族：从 0.8B 到 35B-A3B

Holo 3.1 发布了完整的尺寸矩阵：

模型	参数量	定位
Holo3.1-0.8B	8 亿	超轻量，端侧推理
Holo3.1-4B	40 亿	轻量，消费级显卡
Holo3.1-9B	90 亿	平衡型
Holo3.1-35B-A3B	350 亿总参数 / 30 亿激活	旗舰，SOTA 性能

35B-A3B 是 MoE 架构：350 亿总参数，每次推理只激活 30 亿。 这意味着它的推理成本接近 3B 模型，但能力接近 35B 模型。

所有模型基于 Qwen 模型家族，在 H Company 自有的 Computer Use 数据上做后训练。

移动端：AndroidWorld 从 67% 到 79.3%

官方给出的移动端基准提升：

模型	AndroidWorld (Holo3)	AndroidWorld (Holo3.1)	提升
35B-A3B	67%	79.3%	+12.3%
4B / 9B	58%	71%	+13%

12-13 个百分点的提升，不是微调，是架构级改进。 H Company 的解释是：移动设备、替代 Agent 框架和不同执行框架都会引入分布偏移（distribution shift），Holo3.1 专门针对这些偏移做了训练。

跨框架：function-calling 协议

Holo3 只支持结构化 JSON 输出，这意味着它只能在特定框架里跑。Holo3.1 新增了 function-calling 协议支持，这是接入第三方 Agent 框架的关键。

官方数据：

Across OSWorld and our internal benchmark suite covering e-commerce, business software, and collaboration workflows, function-calling and native execution now achieve near-parity performance.

function-calling 和原生执行现在达到接近一致的性能。 这意味着你用 OpenClaw、Hermes 或其他支持 function-calling 的框架跑 Holo3.1，性能不会比用 H Company 自家框架差多少。

另外，Holo3.1 在 Holotab（H Company 的浏览器扩展产品）的 harness 里比 Holo3 提升超过 25%。

量化权重：本地推理的核心

这是 Holo 3.1 最重要的新特性。H Company 第一次发布量化权重。

量化格式	说明	目标硬件
FP8	8 位浮点	DGX Spark / 数据中心 GPU
Q4 GGUF	4 位量化	消费级显卡（4G-24G 显存）
NVFP4 (W4A16)	NVIDIA 专用 4 位格式	NVIDIA RTX / DGX Spark

性能退化极小

官方数据：

FP8 and NVFP4 achieve the same OSWorld scores, just ~2 points under the full precision BF16 checkpoint.

FP8 和 NVFP4 在 OSWorld 上的得分只比 BF16 低约 2 分。 这意味着量化几乎没有性能损失。

速度提升显著

精度	相对吞吐量（vs BF16）
BF16	1×
FP8	~1.23×
NVFP4 W4A16	1.74×

在 DGX Spark 上，NVFP4 的总 token 吞吐量是 FP8 的 1.41 倍，是 BF16 的 1.74 倍。

端到端 Agent 速度

更关键的是端到端 Agent 步骤时间：

Agent harness optimizations we developed with NVIDIA combined with the NVFP4 quantization above deliver a compound ~2× end-to-end speedup over the FP8 baseline, cutting average step time from 6.8s to 3.3s.

平均步骤时间从 6.8 秒降到 3.3 秒，接近 2 倍加速。 这是量化 + Agent 框架优化的复合效果。

消费级硬件：真正能跑的本地 Agent

H Company 官方博客专门用一段讲消费级硬件部署：

The agent itself runs locally on a Windows or Mac machine, while the model can either run on that same machine — we include reference numbers for Apple Silicon — or on a DGX Spark on the same network. In both cases execution stays fully private and local, with nothing leaving the user's network.

关键信息：

Agent 在本地 Windows / Mac 上运行
模型可以在同一台机器上运行（包括 Apple Silicon）
也可以在局域网内的 DGX Spark 上运行
所有执行完全私有、本地，不离开用户网络

「nothing leaving the user's network」——这是 Holo 3.1 与所有闭源云模型最根本的区别。 在 Claude Fable 5 被美国政府强制关停的背景下，这个特性不是营销话术，是业务连续性保障。

与 OpenClaw 的关系

Holo 3.1 新增的 function-calling 协议支持，意味着它可以原生接入 OpenClaw（开源 AI 智能体框架）。OpenClaw 在 2026 年 6 月 1 日同步发布了 v2026.6.1-beta.3，强化了工具调用中断、会话失效等场景下的运行时恢复。

Holo 3.1 + OpenClaw 的组合，构成了一个完全开源、完全本地的 Computer Use Agent 方案：

组件	来源	协议
模型	Holo 3.1（H Company）	开源权重
框架	OpenClaw	开源
推理	Q4 GGUF 本地运行	无 API 费用
数据	不离开用户网络	无数据留存

Token 免费，不是因为促销，而是因为模型跑在你自己的显卡上。

H Company 的背景

来自 hcompany.ai 官方博客的时间线：

时间	事件
2025 年 6 月	发布 Runner H（Computer Use Agent 产品）
2025 年 10 月	开源 Holo1.5（3B / 7B / 72B）
2025 年 11 月	发布 Holo2（235B-A22B）
2026 年 3 月	加入 NVIDIA Nemotron Coalition
2026 年 3 月	发布 Holo3（OSWorld 78.85%）
2026 年 4 月	发布 HoloTab（浏览器扩展）
2026 年 4 月	发布 Holotron 3 Nano（基于 NVIDIA Nemotron）
2026 年 6 月	发布 Holo3.1

H Company 是法国 AI 初创公司，定位「自主企业」（Autonomous Enterprise）。联合创始人为 Charles Kantor 和 Laurent Sifre，CEO 为 Gautier Cloix。与 FDJ UNITED（欧洲最大博彩运营商）有战略合作伙伴关系，与 NVIDIA 有深度合作（Nemotron Coalition 成员）。

行业影响

1. 本地 Computer Use Agent 从概念走向产品。 Holo3.1 的量化权重 + Q4 GGUF 格式，让消费级显卡（4G-24G 显存）能跑 Computer Use Agent。这不是「理论上可以」，而是 H Company 给出了 Apple Silicon 和 DGX Spark 的参考数据。

2. 闭源云模型的替代路径出现。 在 Fable 5 被禁的背景下，Holo3.1 + OpenClaw 的全本地方案对企业和个人用户都有实际吸引力——模型不会被远程关停，数据不会离开网络。

3. 量化不再是性能牺牲。 NVFP4 在 OSWorld 上只比 BF16 低 2 分，但速度快 1.74 倍。这个 trade-off 在生产环境中几乎总是值得的。

4. function-calling 协议让模型和框架解耦。 Holo3.1 可以跑在 OpenClaw、Hermes 或任何支持 function-calling 的框架里。模型层和框架层的解耦，让用户可以自由组合。

5. 移动端 Computer Use 开始成熟。 AndroidWorld 79.3% 意味着移动端自动化不再是实验，而是可以进入生产评估阶段。

诚实的局限

官方没有给出 ScreenSpot Pro 的具体分数。 Holo3.1 官方博客提到了 ScreenSpot-Pro 作为基准之一，但没有给出具体数字（图表形式）。
Apple Silicon 的参考数据未公开具体数字。 官方说「we include reference numbers for Apple Silicon」，但博客中没有贴出具体推理速度。
4G 显存能跑的是 0.8B / 4B 模型，不是 35B-A3B。 「消费级显卡就能跑」这个说法需要区分模型尺寸——35B-A3B 的 Q4 GGUF 仍然需要较大显存。
Holo3.1 的开源协议未在博客中明确说明。 Holo1.5 系列是开源权重，但 Holo3/3.1 的具体协议需要查看 HuggingFace 模型卡确认。
本地部署的 Agent 框架优化仍需等待。 官方说「These improvements and more will land in an upcoming desktop agent harness」，意味着当前版本还没有完整的桌面 Agent 框架。
与 Qwen3.7-Plus 的 ScreenSpot Pro 79.0 对比缺乏直接数据。 Qwen3.7-Plus 在 ScreenSpot Pro 上拿到 79.0，Holo3.1 的具体分数未公开，无法直接比较。

写在最后

Holo 3.1 最值得记住的不是某个跑分数字，而是它把「本地部署」从附属特性做成了核心卖点。

在 2026 年 6 月——Claude Fable 5 被美国政府强制关停的同一个月——一个法国公司发布了一组可以完全在本地运行的 Computer Use Agent 模型，量化后几乎不损失性能，推理速度反而更快。

这不是巧合。H Company 在官方博客里写得很清楚：

Users want to run the same computer-use capabilities across desktop and mobile environments, with seamless integration with different agent frameworks. They want deployment flexibility, from cloud inference to fully local execution on end-user devices.

「用户想要部署灵活性，从云端推理到完全本地执行。」

当闭源云模型可以被政府一纸禁令关停，当 API 定价可以随时调整，当数据留存政策可以不经用户同意就改变——本地部署就不再是一个技术选择，而是一个风险对冲。

Holo 3.1 不是最强的 Computer Use 模型（Qwen3.7-Plus 和 GPT-5.4 在部分基准上更高），但它可能是 2026 年中段最让人安心的。

官方博客原文：hcompany.ai/holo3.1 HuggingFace 模型页：huggingface.co/H-Company OpenClaw 框架：github.com/OpenClaw

Previous MiniMax M3：MSA 稀疏注意力让 1M 上下文真正可用 Next Kimi K2.7 Code：让 AI 学会闭嘴干活的开源编程模型