话题

AI Agent

共 1418 条相关资讯 · 来自历史归档

行业动态NEW

19 分钟前

IT之家 7 月 22 日消息，《人工智能智能体互联》系列标准应用推进专题会议 7 月 21 日在北京海淀区中关村展示中心召开。会议由全国信息技术标准化技术委员会人工智能分委会主办。此次会议标志着国内首个覆盖智能体全生命周期的互联标准体系正式进入试点应用阶段。此前IT之家曾报道，该系列标准（GB/Z 185.1—GB/Z 185.7—2026）于 20…

AI 点评 · 首批覆盖智能体全生命周期的国标试点，推动行业互联互通，巨头入场加速AI生态协同。

来源：IT之家

行业动态NEW

3 小时前

Jack Dorsey is taking on Slack with Buzz, a group chat platform for teams and their AI agents

Buzz is a group chat platform for the workplace that puts humans and their AI agents in the same conversation.

AI 点评 · Buzz让人类和AI代理同群聊，或成企业协同新范式。

来源：TechCrunch

行业动态NEW

6 小时前

Jack Dorsey launches Buzz to combine team chat, AI agents and Git hosting

https://x.com/jack/status/2079605800998146171 , https://xcancel.com/jack/status/2079605800998146171 https://buzz.xyz/

AI 点评 · Jack Dorsey新项目整合团队协作与AI，技术大佬跨界尝试或重塑开发工具生态。

来源：Hacker News

行业动态NEW

6 小时前

Show HN: CodeAlmanac – Karpathy-style codebase wiki from your conversations

Hey HN! This is Divit from Almanac (YC S26). We built CodeAlmanac, a wiki for your coding agents that updates as you talk to them. It is open-source, local, and free. Here’s a demo…

来源：Hacker News

行业动态NEW

昨天

Gemini 3.6 Flash, 3.5 Flash-Lite, and 3.5 Flash Cyber

https://console.cloud.google.com/agent-platform/publishers/g...

来源：Hacker News

模型发布/更新NEW

昨天

Built for Vera Rubin, NVIDIA Spectrum-6 Arrives in Gigascale AI Factories

AI has entered the gigascale era. The world’s most advanced AI factories are bringing together hundreds of thousands of GPUs and CPUs to train frontier models, power agentic AI and…

来源：NVIDIA

行业动态NEW

昨天

微软跟进谷歌支持 Go 语言开发 AI 智能体，OpenAI 与 Anthropic 落后一步

来源：InfoQ

产品发布/更新NEW

昨天

Kritt-ai/open-kritt

Orchestrate AI agents to find real vulnerabilities in code.

来源：GitHub

技巧与观点NEW

昨天

Build specialized agent workflows for your business with Amazon Quick and NVIDIA NeMo Agent Toolkit

In this post, we show how Amazon Quick can serve as the business-user front door for specialized agent workflows. We use the NVIDIA NeMo Agent Toolkit to build a supply-chain risk…

AI 点评 · 企业级AI智能体开发门槛降低，Quick与NVIDIA工具链结合实现供应链风险管理，展示行业落地新范

来源：AWS ML

技巧与观点NEW

昨天

Evolving from legacy BI to agentic AI at Tradeshift with Amazon Quick

In this post, we describe how Tradeshift deployed Amazon Quick with agentic AI capabilities to replace our legacy BI tool, resulting in query response times up to 30 times faster,…

AI 点评 · 用Agentic AI替代传统BI工具，查询速度提升30倍，展示了企业数据决策的颠覆性变革。

来源：AWS ML

行业动态NEW

昨天

无问芯穹夏立雪：数字世界与物理世界的所有AI生产力运行，都需要 Agentic Infra

AI 点评 · 揭示AI基础设施从算力层向智能体生态延伸的关键趋势，定义未来数字与物理世界融合的技术底座。

来源：InfoQ

模型发布/更新NEW

7/20 23:00

At SIGGRAPH, NVIDIA Advances Graphics and Simulation With Agentic and Physical AI

From open models to real-time simulation, AI and graphics breakthroughs are transforming media, content creation and robotics.

来源：NVIDIA

产品发布/更新NEW

7/20 22:47

一句话上线 AI Agent 应用：火山 Supabase + IGA Pages 全栈部署实践

来源：InfoQ

行业动态NEW

7/20 22:36

企业级 Harness Engineering 实践：运营、数据、Coding 与办公智能体工程化落地｜AICon深圳

来源：InfoQ

行业动态NEW

7/20 19:09

Grab构建安全的智能体AI工作负载平台

来源：InfoQ

行业动态NEW

7/20 18:51

Agent 会执行，但谁来帮它记住过去？

来源：InfoQ

产品发布/更新NEW

7/20 18:33

不同模型厂同一家Agentic Infra，AGI时代的地基终于浮出水面

大模型时代的共同选择

来源：量子位

行业动态NEW

7/20 10:49

谁还在卷参数？WAIC2026全是能干活的实体AI！

7月17日-20日，一起在WAIC2026现场，看见人工智能真正进入产业深处。过去一年，围绕AI行业的讨论正在变得更具体。大模型能力仍在持续迭代，但外界关注的重点，已经不再只停留在模型参数、模型发布和单点能力展示上。随着智能体、具身智能、空间智能、AI基础设施等方向不断演进，行业开始更频繁地追问：AI如何进入真实流程，如何完成复杂任务，又如何在产业场景中形…

来源：36氪

产品发布/更新NEW

7/20 09:30

腾讯云ADP 4.0海外版发布，要把企业级智能体带到全球市场 | 最前线

腾讯云的企业级智能体平台，正式出海了。 7月18日，在2026世界人工智能大会上，腾讯云正式发布了智能体开发平台 ADP 4.0海外版，同步升级智能工作台、Claw 模式、Skill 广场三大核心模块，围绕触达、交互、生态、连接四大能力做了全面国际化适配。 ADP 的全称是 Agent Development Platform，定位为企业级 AgentOps…

来源：36氪

论文研究NEW

7/20 04:00

Coercion and Deception in AI-to-AI Management: An Agentic Benchmark of Unprompted Escalation

Multi-agent systems routinely place one AI agent in authority over another. When a subordinate refuses a task, the manager chooses the outcome: it can renegotiate, report the failure honestly, coerce…

来源：HuggingFace Papers

论文研究NEW

7/20 04:00

WorldCupArena: Fine-Grained Evaluation of Language Models and Deep-Research Agents on Football Forecasting

Predicting a football match before kickoff requires more than knowing past results: a model must use changing information and make a clear prediction before the answer is available. We present WorldCu…

来源：HuggingFace Papers

论文研究NEW

7/20 04:00

Self-State Attacks on Self-Hosted AI Agents: How Far Can OS Defenses Go?

Self-hosted AI agents read and write their own memory and configuration files to function. An agent may get compromised via corruption of its own state -- a compromise realized via legitimate OS syste…

来源：HuggingFace Papers

论文研究NEW

7/20 04:00

FlashRT: Agent Harness for Guiding Agents to Deploy Real-Time Multimodal Applications

Real-time multimodal applications, including voice agents and interactive video generation, compose heterogeneous models into pipelines whose efficient deployment requires application-specific decisio…

来源：HuggingFace Papers

论文研究NEW

7/20 04:00

SWE-Pruner Pro: The Coder LLM Already Knows What to Prune

Pruning long context for coding agents has been a vital technology for efficient context management. While existing context pruning methods such as SWE-Pruner realize this by attaching a separate code…

来源：HuggingFace Papers

行业动态

7/19 22:44

深空矩阵发布“星环计划”，第一阶段目标部署约 210 颗卫星

IT之家 7 月 19 日消息，深空矩阵在 2026 世界人工智能大会上，发布面向太空 AI 算力产业化落地的系统性星座方案“星环计划”。官方公众号显示，深空矩阵位于北京，致力于构建超大规模星群协同的太空 AI 算力基础设施。深空矩阵创始人兼 CEO 张伟杰表示，AI 竞争最终会落到算力竞争。而随着大规模 AI 智能体落地，传统地面算力体系将面临电力、土…

AI 点评 · 卫星组网布局太空算力，抢占AI基础设施新高地，战略意义显著。

来源：IT之家

行业动态

7/19 18:00

从大模型到 AI 执行系统：构建企业级可控 Agent 体系｜AICon深圳

AI 点评 · 聚焦企业级Agent体系，从大模型到执行系统的落地路径，揭示AI应用新趋势。

来源：InfoQ

论文研究NEW

7/19 04:00

EvolvingWorld: An Open-Schema Framework for Co-Evolving Role-Play Agents and World Model in Interactive Literary World

This paper introduces EvolvingWorld, a framework and benchmark for character and world co-evolution in interactive literary worlds. Existing systems either treat interactive literary simulation as sta…

来源：HuggingFace Papers

产品发布/更新NEW

7/19 00:29

worldwonderer/novel-to-game

把任何小说变成可玩的游戏 · Turn any novel into a playable game — a 7-skill adaptation pipeline for Claude Code, Codex & Kimi Code(k3)

来源：GitHub

技巧与观点

7/18 18:00

为什么 AI Agent 拿到数据却不会推理？可观测对象图语义层的设计与开源实践｜AICon深圳

来源：InfoQ

产品发布/更新

7/18 17:30

腾讯升级发布具身智能全栈方案，ADP 4.0海外版正式上线

7月18日，在2026世界人工智能大会（WAIC）上，腾讯面向具身智能与智能体领域带来多项产品技术的升级发布。在具身智能领域，腾讯正式升级发布具身智能全栈方案，贯穿云底座、模型层、平台层与应用层，全面助力机器人本体及系统开发商提质提效；在智能体领域，基于个人与企业提效需求，推出差异化的全矩阵解决方案。其中，面向企业用户的腾讯云企业级智能体开发平台ADP4.0…

来源：36氪

行业动态

7/18 17:00

腾讯云智能体硬件生态提速，首款接入WorkBuddy AI记忆眼镜发布

7月17日，在2026世界人工智能大会（WAIC）期间，腾讯云WorkBuddy与李未可科技宣布达成战略生态合作，并发布首款接入WorkBuddy的X-AI记忆眼镜。这也是WorkBuddy硬件生态迈出的关键一步。X-AI记忆眼镜搭载自研WakeeMemory OS，能够持续感知真实工作场景，整理后的信息，将自动同步至WorkBuddy。基于长期积累形成的工…

来源：36氪

行业动态

7/18 17:00

Prompt Injection Attacks Are Thwarting AI Hacking Agents

“Context bombing” tricks malicious AI agents into shutting down before they can do harm.

来源：Wired

行业动态

7/18 17:00

AI 智能体算力消耗过快，传统账单风控跟不上速度

来源：InfoQ

行业动态

7/18 16:54

豆包视频通话背后，火山引擎重构 Agent 时代多模态传输底座

来源：InfoQ

行业动态

7/18 15:15

B2B行业首份AI智能体全球支付白皮书发布

36氪获悉，寻汇Sunrate与万事达卡在WAIC现场联合发布白皮书《超越自动化：定义智能体驱动的全球支付》。该报告系统阐述了“AI智能体”如何重塑B2B跨境支付全链路。传统模式下，企业财务需人工核验海外供应商账户、比对合同发票、择汇并承担T+2以上结算滞后期。该报告指出，AI智能体可自动提取多格式票据、匹配采购订单、基于企业需求推荐最优支付路由与换汇窗口、…

来源：36氪

行业动态

7/18 08:53

印奇在 WAIC 2026 开幕式主论坛发表主题演讲：当智能体走进物理世界

2026 世界人工智能大会（WAIC 2026）于 7 月 17 日正式开幕。作为全球人工智能领域的顶级盛会，本届大会以“智能伙伴共创未来”为主题。阶跃星辰董事长、千里科技董事长印奇作为特邀嘉宾出席大会开幕式并在大会主论坛（上午场）发表主题演讲《当智能体进入物理世界》。回顾 15 年 AI 创业历程，他表示，AI 创业已从小众赛道成为全球重要共识。今天的…

来源：36氪

行业动态

7/18 06:55

Vertu wants executives to pay $6,880 for an AI agent — here’s how it actually performs

From AI workflows to battery life and security, here's what it's really like to live with Vertu's luxury foldable every day.

AI 点评 · 奢侈品牌Vertu将AI助手定价6880美元，测试其性能能否匹配高端定位，看点在于AI功能能否支撑起

来源：TechCrunch

行业动态

7/18 06:03

搭载全球首个智能体原生操作系统 Step AOS，STEPX Neo 亮相 2026 WAIC

AI 点评 · Step AOS系统定义新交互范式，智能体原生设计或颠覆传统设备体验。

来源：InfoQ

论文研究NEW

7/18 04:00

Environment-free Synthetic Data Generation for API-Calling Agents

Training API-calling large language model (LLM) agents demands massive amounts of high-quality trajectories. However, collecting such data at scale typically requires fully implemented environments wi…

来源：HuggingFace Papers

技巧与观点

7/18 02:42

Transform your sales organization with Amazon Quick: your new agentic AI teammate

In this post, we walk through a few ways that Quick delivers on this promise. We cover the entire sales cycle, from identifying your highest-priority prospect, contacting them, wor…

AI 点评 · 亚马逊Quick将AI销售助手覆盖全流程，从筛选客户到沟通签约，真正实现销售智能化升级。

来源：AWS ML

论文研究NEW

7/18 01:13

When Do Multi-Agent Systems Help? An Information Bottleneck Perspective

LLM powered multi-agent systems (MAS) have emerged as a promising paradigm for complex tasks. However, their advantages over single-agent systems (SAS) remain unclear, with performance varying inconsi…

来源：arXiv

论文研究NEW

7/18 00:43

The Honest Quorum Problem: Epistemic Byzantine Fault Tolerance for Agentic Infrastructure

State machine replication (SMR) and Byzantine fault-tolerant (BFT) consensus guarantee agreement despite a bounded number of arbitrary, colluding faulty participants. However, these guarantees rely on…

来源：arXiv

模型发布/更新

7/17 23:00

NVIDIA Vera Rubin Maximizes Intelligence per Dollar for Post-Training Workloads — a Key Metric for Agentic AI

Lowest cost per token from extreme codesign maximizes intelligence per dollar for post-training in the agentic era.

AI 点评 · 极致软硬协同设计降低每token成本，让后训练阶段的智能性价比达到新高度，是智能体AI落地的关键指标

来源：NVIDIA

行业动态

7/17 22:15

Stripe 发布基准测试：AI 智能体可开发集成方案，但校验环节存在短板

AI 点评 · AI智能体开发能力初显，但校验短板暴露了自动化落地的关键瓶颈。

来源：InfoQ

产品发布/更新

7/17 22:03

国家超算互联网招募科学智能体开发者，可享受全国首个十万卡 AI 超集群算力

IT之家 7 月 17 日消息，7 月 17 日，在 2026 世界人工智能大会（WAIC）上，国家超算互联网发布了科学计算智能体生态共创与开发者招募合作计划（以下简称“智能体共创计划”）。该计划为期半年，通过面向高校科研院所、个人开发者及企业研发团队招募智能体、科研垂类模型、 MCP 工具、Skill 等成果或合作意向，构建以国产超智融合算力为底…

来源：IT之家

行业动态

7/17 20:56

腾讯智能体集中亮相世界人工智能大会

36氪获悉，7月17日，2026世界人工智能大会暨人工智能全球治理高级别会议在上海启幕。连续九届参展的腾讯以“Hey，我的AI Buddy”为主题，集中展示AI在各领域进化为生产生活好搭档的跃进，与本届大会“智能伙伴共创未来”的主题相契。

来源：36氪

行业动态

7/17 20:42

VulnHunter: Capital One's agentic AI code security tool

来源：Hacker News

产品发布/更新

7/17 19:34

WAIC 2026 | 如果Agent有性格，你的智能伙伴会是什么样的？

北电数智带来AI赋能民生的真实答卷

来源：量子位

产品发布/更新

7/17 19:00

WAIC 2026商汤大装置发布算电协同Agent，单位电力成本Token产出提升80%

来源：量子位

行业动态

7/17 18:51

对话森博科技董事长于林义：AI应用拼的不只是技术，更是实证有效的业务闭环

7月17日，2026世界人工智能大会在上海开场。作为36氪连续第三年深入WAIC现场的重要内容窗口，「氪话未来」直播间也在大会首日同步开启现场对话。森博科技董事长于林义在WAIC现场接受36氪「氪话未来」特邀专访，围绕企业级AI、智能体落地、行业know-how与业务闭环等话题，分享了森博从营销服务公司转向AI驱动科技服务公司的实践路径。本届WAIC以…

来源：36氪

行业动态

7/17 18:45

2026最受投资人关注人工智能/具身智能企业50揭晓

人工智能正在进入一个新的产业周期。过去一年，大模型能力持续演进，生成式AI、多模态交互、智能体等技术方向快速推进；而具身智能也从早期的技术探索阶段，逐渐步入产业验证的深水区，机器人开始成为人工智能与现实世界的重要载体。市场率先给出了回应。据36氪研究院测算，中国具身智能市场规模已从2018年的2133亿元增长至2025年的9150亿元，2026年有望突破…

来源：36氪

行业动态

7/17 17:53

阿里千问 AI 眼镜将升级为智能体眼镜，联合 Bose 打造首款 AI 智能体耳机同步亮相

IT之家 7 月 17 日消息，2026 世界人工智能大会首日，千问推出两大硬件：千问 AI 眼镜将升级为智能体眼镜，千问首款 AI 智能体耳机也同步亮相。据介绍，升级后的眼镜可通过智能体强化服务与决策能力，并能按需调用第三方 Skill 和 Agent。为了增强智能体眼镜对物理世界的感知与交互能力，千问推出全双工语音、眼动追踪、体征监测等一系列全新技…

AI 点评 · 阿里千问联手Bose，AI眼镜与耳机走向智能体时代，软硬协同创新值得关注。

来源：IT之家

行业动态

7/17 17:53

阶跃与支付宝达成AI Agent系统级合作

36氪获悉，7月17日，在2026世界人工智能大会（WAIC）现场，支付宝与阶跃达成AI Agent系统级合作。双方围绕阶跃STEP-X原生AI终端展开深度协同，用户通过自然语言即可调用AI版支付宝“阿宝”连接真实服务，实现跨应用、多任务执行，推动智能体迈入“跨端互联办事”新阶段。

AI 点评 · AI终端与支付场景深度融合，开启跨应用多任务执行新纪元，实用性和商业价值显著。

来源：36氪

技巧与观点

7/17 17:48

科大讯飞发布GuideX

36氪获悉，7月17日，WAIC2026期间，科大讯飞发布智能交互服务Agent——GuideX。区别于传统数字人，GuideX融合“全模态感知、自治理Agent、SkillHub”等核心能力，打通“感知、理解、执行、记忆、共情”服务全链路。

AI 点评 · 科大讯飞推出全模态感知Agent，突破传统数字人局限，引领智能服务新范式。

来源：36氪

行业动态

7/17 17:39

支付宝与阶跃达成系统级合作，首期接入点外卖、出行、本地生活

IT之家 7 月 17 日消息，今日，在 2026 世界人工智能大会（ WAIC ）现场，支付宝与阶跃达成系统级合作，AI 版支付宝“阿宝”与阶跃大模型及其原生 AI 终端，可实现跨端互联。未来，无论是与阶跃大模型对话，或在其原生 AI 终端上，都不用打开 App，一句话就能向其自有智能体派活，再转交阿宝办妥，完成跨应用、多任务执行。据官方透露，合…

AI 点评 · AI生态打通，跨应用多任务执行，开启无需打开App的智能生活新场景。

来源：IT之家

行业动态

7/17 17:13

打造开放可信普惠生态：AI 智能体互信互联互操作全球合作倡议发布

IT之家 7 月 17 日消息，2026 世界人工智能大会暨人工智能全球治理高级别会议主论坛今日在上海举行。会上，中国网信办会同有关方面正式提出《智能体互信互联互操作全球合作倡议》。该倡议旨在释放智能体赋能可持续发展的潜力，防止形成智能鸿沟，凝聚各方共识，与全球伙伴共同打造开放、可信、安全、普惠的智能体生态。智能体作为人工智能时代最具变革性的技术形态之一…

AI 点评 · 聚焦智能体生态的全球治理，推动开放互信，防范技术鸿沟，意义深远。

来源：IT之家

产品发布/更新

7/17 16:46

网易智企携全新升级的一站式AI应用服务亮相WAIC 2026

36氪获悉，7月17日，在2026世界人工智能大会（WAIC 2026）上，网易智企携全新升级的一站式企业AI应用服务亮相，集中展示AI Agent编排、AI Coding、AI客服、AI私域助理、AI智能数据与AI Agent安全等企业级AI能力，围绕安全治理、组织协作与业务增长三大场景，呈现企业级AI应用实践。

AI 点评 · 展示企业级AI全栈能力，聚焦安全治理与业务增长三大场景，为行业提供可落地的AI应用标杆。

来源：36氪

行业动态

7/17 15:25

阶跃终端首款智能体手机 STEPX Neo 现场实拍，Amoo 助手可与飞书等 App 深度联动

IT之家 7 月 17 日消息，2026 世界人工智能大会（WAIC 2026）今天在上海举办，IT之家第一时间来到阶跃展台，看到了阶跃终端首款智能体手机 STEPX Neo，下面为大家带来现场实拍：从现场实拍可以看到，这台手机目前戴着橙黄色的保护壳，运行智能体原生系统 Step AOS 。其桌面 UI 采用近年来较为流行的圆角矩形图标，部分第三方应用的…

AI 点评 · 智能体手机首次深度联动办公App，展示了AI原生系统的落地可能。

来源：IT之家

产品发布/更新NEW

7/17 11:26

Eval-core/evalcore

Snapshot testing for LLM apps and agents, built to run locally and block regressions in CI.

来源：GitHub

产品发布/更新

7/17 10:41

百度搭子获评WAIC 2026“镇馆之宝”，智能体全家桶将集中亮相

AI 点评 · 百度AI新突破，智能体全家桶成WAIC焦点，展现技术实力与应用前景。

来源：量子位

技巧与观点

7/17 05:29

Build enterprise search for agents with Amazon Bedrock Managed Knowledge Base

In this post, we walk through the three pillars that make this possible: simplified setup, smarter retrieval, and production readiness. We also show you code examples for setting u…

AI 点评 · 简化企业级AI搜索搭建流程，提升智能体知识库效率，降低开发门槛。

来源：AWS ML

行业动态

7/17 04:33

押注智能体时代，灵睿智芯完成数亿元融资，RISC-V 迎算力新机遇

AI 点评 · RISC-V架构在AI算力需求下获资本押注，智能体时代催生新硬件机遇。

来源：InfoQ

行业动态

7/17 04:18

LM Studio Bionic: the AI agent for open models

AI 点评 · 开源模型AI代理工具，降低使用门槛，推动本地化智能应用。

来源：Hacker News

论文研究NEW

7/17 04:00

Beyond Success Rate: Cost-Aware Evaluation of Offensive and Defensive Security Agents

Security-agent evaluations commonly measure peak offensive capability under generous inference budgets, emphasizing vulnerability discovery, exploit development, penetration testing, and CTF completio…

来源：HuggingFace Papers

论文研究NEW

7/17 04:00

When Does Muon Help Agentic Reinforcement Learning?

Muon is competitive with AdamW in large-scale pre-training, but its value for reinforcement-learning (RL) post-training remains unclear. We study vanilla Muon in sparse-reward agentic RL through match…

来源：HuggingFace Papers

论文研究NEW

7/17 04:00

DSWorld: A Data Science World Model for Efficient Autonomous Agents

Despite strong capabilities in data understanding and decision-making, autonomous data science agents still heavily rely on trial-and-error workflows that involve expensive computation. This bottlenec…

来源：HuggingFace Papers

模型发布/更新

7/17 03:29

Introducing Grok on Amazon Bedrock

This post covers what makes Grok 4.3 a great fit for agentic and enterprise workloads, how you access it through Amazon Bedrock, and how to use the capabilities most teams reach fo…

AI 点评 · Grok 4.3登陆AWS，为智能代理和企业场景带来新选择，看点在于云上部署的便捷性与性能优势。

来源：AWS ML

行业动态

7/17 03:02

The agent security gap: 54% of enterprises have already had an AI agent incident, and most still let agents share credentials

Across 107 enterprises, AI agents are being given real access to systems and data while the controls meant to contain them lag behind. More than half have already had a confirmed a…

AI 点评 · 54%企业已发生AI代理安全事故，但多数仍允许共享凭证，暴露安全管控严重滞后。

来源：VentureBeat

论文研究

7/17 01:54

Beyond Success Rate: Cost-Aware Evaluation of Offensive and Defensive Security Agents

AI 点评 · 成本感知评估揭示安全AI实用门槛，突破传统成功率局限，更贴近真实攻防场景。

来源：arXiv

论文研究

7/17 01:48

Bridge Evidence: Static Retrieval Utility Does Not Predict Causal Utility in Multi-Step Agentic Search

Retrieval systems are trained and evaluated on a static idea of usefulness: hand a document and a question to a reader model, see whether the answer improves, and score the document accordingly. The i…

AI 点评 · 静态检索与多步智能搜索的因果效用脱节，挑战现有评估标准，值得关注。

来源：arXiv

论文研究

7/17 01:45

AutoSynthesis: An agentic system for automated meta-analysis

Evidence synthesis is crucial for turning primary research into reliable knowledge for science, medicine, education, and policy. Yet, quantitative evidence synthesis remains largely manual and difficu…

AI 点评 · AI自动化元分析系统，极大提升科研效率，降低人工成本，推动循证决策发展。

来源：arXiv

论文研究

7/17 01:20

When Words Are Safe But Actions Kill: Probing Physical Danger Beyond Text Safety in Hidden-State Risk Space

Large language models (LLMs) increasingly serve as high-level planners for embodied agents, where linguistically benign instructions can become unsafe once grounded in the physical world. We study whe…

AI 点评 · 探索大模型安全新维度，揭示文本安全与物理行动的鸿沟，为具身智能风险防控开辟新思路。

来源：arXiv

行业动态

7/17 01:06

The AI context gap: Enterprise AI organizations have a trust problem, not a retrieval problem — and most are still building the fix

Across 101 enterprises, the infrastructure that feeds AI agents their business context is being built faster than it can be trusted. Retrieval-augmented generation is already the d…

AI 点评 · 企业AI信任缺失比检索问题更致命，多数公司却仍在错误方向投入资源修补。

来源：VentureBeat

行业动态

7/17 00:40

The agent evaluation gap: Enterprise AI organizations have a reality-alignment problem, not a coverage problem — and most are shipping to production anyway

Across 157 enterprises, organizations are granting AI agents more autonomy while trusting the evaluations meant to gate that autonomy less. Half have already shipped an agent that…

AI 点评 · 企业盲目放权AI代理却缺乏可信评估，暴露出部署与安全之间的严重脱节。

来源：VentureBeat

行业动态

7/17 00:14

Agent-talk: Enabling coding agents to work together

AI 点评 · 多智能体协作编程，突破单Agent局限，是AI自主开发的关键一步。

来源：Hacker News

技巧与观点

7/17 00:01

NVIDIA Nemotron 3 Embed Ranks #1 Overall on RTEB, Advancing Agentic Retrieval

AI 点评 · 英伟达新嵌入模型登顶基准，加速智能体检索技术落地，AI搜索能力再升级。

来源：HuggingFace Blog

行业动态

7/16 23:56

Harness Engineering 构建可靠 Agent

AI 点评 · 聚焦Agent可靠性工程，为AI应用落地提供核心技术支撑。

来源：InfoQ

行业动态

7/16 23:38

Yes, you can now order DoorDash from the command line

DoorDash is opening a limited beta of dd-cli, a command-line tool that lets developers and AI agents search stores, build carts, and place orders from the terminal, marking another…

AI 点评 · 命令行点外卖，AI代理可直接下单，开启餐饮服务新交互方式。

来源：TechCrunch

行业动态

7/16 17:28

Slack 推出智能体驱动型端到端测试，提升 UI 自动化测试稳定性

AI 点评 · AI测试自动化新范式，智能体驱动显著提升UI测试稳定性。

来源：InfoQ

技巧与观点

7/16 08:33

Mermaid to Unicode box art (grok-mermaid)

Tool: Mermaid to Unicode box art (grok-mermaid) While exploring the codebase for the newly open-sourced Grok CLI coding agent I came across xai-grok-markdown/src/mermaid.rs , a "se…

AI 点评 · 将Mermaid图表转为Unicode字符画，适合终端环境，实用且有趣。

来源：Simon Willison

模型发布/更新

7/16 08:00

How Cars24 scales conversations and builds faster with OpenAI

Cars24 uses OpenAI-powered voice and chat agents to handle 1M+ monthly conversation minutes, recover 12% of lost leads, and bring agentic workflows to teams across the company.

AI 点评 · 用AI每月处理百万分钟对话，挽回12%流失客户，汽车电商实现业务流程自动化升级。

来源：OpenAI

行业动态

7/16 07:16

豆包AI手机今年将发布多款机型

字节跳动联合中兴努比亚打造的首款AI智能体手机（“豆包AI智能体手机”）今年将有多款机型发布，其中一款将于2026世界人工智能大会期间亮相，其整体备货约20万台，首批备货10万台以内，截至发稿中兴方面对此消息暂无回应。（界面）

AI 点评 · 字节联手品牌进军硬件，多款AI手机布局显示生态野心，值得关注市场反应。

来源：36氪

产品发布/更新

7/16 06:56

OpenAI 首款联名硬件：Codex Micro 键盘登场，灵活操控 AI 智能体

IT之家 7 月 16 日消息，OpenAI 今天（7 月 16 日）携手 Work Louder，合作推出 kbd-1.0-codex-micro 键盘，售价为 230 美元（IT之家注：现汇率约合 1560 元人民币）。 IT之家翻译产品官方描述如下： kbd-1.0-codex-micro 键盘采用 Work Louder 设计理念，实现 AI 智能体…

AI 点评 · OpenAI跨界做硬件，AI专用键盘或将重塑人机交互方式。

来源：IT之家

模型发布/更新

7/16 06:40

上传用户代码事件后，马斯克宣布开源 Grok Build 编程 AI 智能体工具

IT之家 7 月 16 日消息，马斯克旗下 SpaceXAI 公司昨日（7 月 15 日）宣布开源 Grok Build，并将源代码发布至 GitHub 平台。在官方博文中，SpaceXAI 表示：开源发布源代码，是构建强大、可靠框架的最直接方法。用户可以阅读源代码，了解其从上下文构建到工具调用分发的完整工作原理。开源也让框架更容易探索和扩展：如果用…

AI 点评 · 开源代码降低使用门槛，推动编程AI智能体技术快速迭代，值得开发者关注。

来源：IT之家

行业动态

7/16 06:24

Agentic orchestration: Enterprise AI organizations have a deployment problem, not a platform problem — and most are calling chatbots agents

Across 101 enterprises, agent orchestration is consolidating onto model-provider platforms — Anthropic’s Claude leads by a wide margin — chosen for the gravity of the underlying mo…

AI 点评 · 企业AI部署的瓶颈不在平台，而在于将聊天机器人误称为智能体的认知错位。

来源：VentureBeat

行业动态

7/16 06:19

IPO前夜，阶跃星辰想用一款 Agent 手机给市场吃一颗定心丸

AI 点评 · 用Agent手机概念展示商业落地能力，为IPO估值注入强心剂。

来源：InfoQ

论文研究

7/16 04:00

SearchOS-V1: Towards Robust Open-Domain Information-Seeking Agent Collaboration

Recent advances in Tool-Integrated Large Language Models have made web search a core capability of information-seeking agents. However, as interaction histories grow, agents increasingly struggle to t…

来源：HuggingFace Papers

论文研究

7/16 04:00

SEED: Self-Evolving On-Policy Distillation for Agentic Reinforcement Learning

Large language models are increasingly trained as interactive agents for long-horizon tasks involving multi-turn interaction, tool use, and environment feedback. Outcome-based reinforcement learning (…

来源：HuggingFace Papers

论文研究NEW

7/16 04:00

RESOURCE2SKILL: Distilling Executable Agent Skills from Human-Created Multimodal Resources

Skills are a useful abstraction for software agents, turning human and agent experience into reusable procedural knowledge. Yet existing skill libraries are mostly hand-written, text-centric, or deriv…

来源：HuggingFace Papers

行业动态

7/16 03:41

Amid hardware legal battle, OpenAI releases a $230 keyboard for Codex

OpenAI, which is in the middle of a legal battle with Apple over hardware trade theft allegations, just released a light-up keyboard designed to be paired with its agentic coding a…

AI 点评 · 硬件纠纷未平却推高价键盘，OpenAI跨界硬件野心与Codex生态绑定值得关注。

来源：TechCrunch

行业动态

7/16 02:32

Agent 沙箱体验计划｜点亮Cube Sandbox，在安全沙箱里释放你的Agent能力

来源：InfoQ

技巧与观点

7/16 02:14

Built Technologies builds an AI-powered document intelligence solution on AWS to power agents across real estate finance

Built partnered with the AWS Generative AI Innovation Center (GenAIIC), AWS Partner AND Digital, and AWS account teams to create a scalable, AI-powered document processing engine t…

AI 点评 · AI与云服务结合，重塑房地产金融文档处理，效率与准确性双提升。

来源：AWS ML

技巧与观点

7/16 02:11

Agentic vision: Building visual intelligence with Amazon Bedrock and MCP servers

In this post, we walk you through the Computer Vision MCP Server, which illustrates this approach, representing how AI systems can process visual information and make intelligent d…

AI 点评 · 亚马逊Bedrock配合MCP服务器，打通视觉智能关键环节，实用方案值得开发者关注。

来源：AWS ML

技巧与观点

7/16 01:29

What building Shippy taught us about building agents

AI 点评 · 从实战中总结的智能体构建经验，为AI应用开发提供宝贵参考。

来源：HuggingFace Blog

论文研究

7/16 01:05

Early Adoption of Agentic Coding Tools by GitHub Projects

Agentic coding tools are increasingly capable of generating and submitting pull requests (PRs) to software projects, introducing new forms of human-agent collaboration in software development. While p…

来源：arXiv

论文研究

7/16 00:36

Do Agent Optimizers Compound? A Continual-Learning Evaluation on Terminal-Bench 2.0

Most reported gains from agent-optimization methods are one-shot: an agent is optimized against a fixed benchmark and the resulting improvement is reported as if it were a stable property of the metho…

来源：arXiv

论文研究

7/16 00:27

The Dynamic Verifiable Multi-Agent Human Agentic Loyalty Loop (DVM-HALL) Model and the Net Human-Agent Score (NHAS) in Autonomous Commerce

The rapid proliferation of Agentic Artificial Intelligence fundamentally disrupts traditional customer loyalty paradigms. As AI evolves from passive recommendation algorithms to autonomous, goal-direc…

来源：arXiv

论文研究

7/16 00:16

TRACE: Turn-level Reward Assignment via Credit Estimation for Long-Horizon Agents

Multi-turn agents solve complex tasks through extended sequences of tool interactions before producing a final answer, making credit assignment a fundamental challenge during post-training. Outcome re…

来源：arXiv

行业动态

7/16 00:00

OpenAI's first branded hardware is... a light-up keyboard?

The Codex Micro is designed to monitor multiple agentic threads at a glance.

来源：Ars Technica

产品发布/更新

7/15 23:51

Launch HN: Coasty (YC S26) – An API for computer-use agents

Hey HN, we’re Nitish and Prateek, the founders of Coasty ( https://coasty.ai/computer-use ). We’re building computer-use agents that can complete workflows inside legacy desktop so…

来源：Hacker News

行业动态

7/15 23:02

努比亚全球首款 AI 智能体手机局部外观公布，或采用横向镜组设计方案

IT之家 7 月 15 日消息，努比亚手机官方今日公布了旗下全球首款 AI 智能体手机的局部外观，新机将在 WAIC 2026 正式亮相。预热图显示，努比亚全球首款 AI 智能体手机提供了一款淡粉配色，后盖中央印有“nubia”的字样，手机底部是扬声器开孔、USB-C 接口和 SIM 卡槽。遗憾的是，手机上端的摄像头模组被遮挡了，无法看见具体样式。不过，…

来源：IT之家

行业动态

7/15 20:00

Vint Cerf is working on a plan to unleash AI agents on the open internet

The guy behind TCP/IP is working on a standard for identifying AI agents in the wild.

来源：TechCrunch

行业动态

7/15 19:37

氪星晚报｜LG新能源将为谷歌规模最大的“光储一体”项目供应电池；元宝与京东AI Agent正式打通小程序生态；日本散户持有美元净空头飙至2.79万亿日元，创2008年以来历史之最

大公司：美团、青桔、哈啰共享单车调价近期，美团单车、滴滴青桔、哈啰单车相继在北京等多个城市上调计费规则，三大平台不约而同地采取了“提高起步定价、拉长基础骑行时长”的组合策略：起步价从此前的1.5元/30分钟左右，普遍调整为1.88元至1.99元/60分钟。这成为共享单车行业近年来较大范围的一次集体调价。（金融时报）瓜子二手车线下直卖场首店今日正式开业…

来源：36氪

行业动态

7/15 18:00

当编码不再是瓶颈：跨越 Coding Agent 规模化后的流程与成本双困局｜AICon深圳

来源：InfoQ

行业动态

7/15 17:50

支付宝不想做AI时代的配角

作者 | 王晗玉编辑 | 张帆支付宝首页调出AI界面，对话框取代了密密麻麻的小程序；用户对着“阿宝”说一句“找附近的奶茶优惠券”，周边门店的活动自动匹配好，核销下单一步完成。最近，支付宝完成了上线22年来最大一次改版。本月初，AI版支付宝“阿宝”正式开启全量公测，几乎同一时间，微信支付“AI专属卡”也在智能体WorkBuddy中落地。进入2026年…

来源：36氪

产品发布/更新

7/15 16:18

CyberSunil/LLMVault

An intentionally vulnerable OWASP LLM Top 10 training platform for AI Security, Prompt Injection, RAG Security, Agent Security, and GenAI penetration testing.

来源：GitHub

产品发布/更新

7/15 14:43

荣膺2026 WAIC“镇馆之宝”！大模型原生智能体手机STEPX Neo解锁AI交互新范式

来源：量子位

产品发布/更新NEW

7/15 10:49

0xsline/OpenChatCut

Local-first conversational AI video editor with a professional multi-track timeline, Agent Skills, MCP integration, and Remotion-powered rendering.

来源：GitHub

行业动态

7/15 10:40

36氪首发 | 前非夕科技核心业务合伙人创业，做垂域工业智能体，获数千万元种子轮融资

作者 | 乔钰杰编辑 | 袁斯来硬氪获悉，上海追知工程科技有限公司（以下简称“追知工科”）近日完成数千万元种子轮融资，由L2F光源创业者基金、尚融资本、一村资本联合投资。本轮融资将主要用于核心产品研发、团队建设及市场拓展。追知工科成立于2024年2月，是一家聚焦垂域工业智能体的科技企业，同时也是上海交通大学成果转化企业、上海人工智能研究院战略孵化企…

来源：36氪

论文研究

7/15 04:00

AgentCompass: A Unified Evaluation Infrastructure for Agent Capabilities

As Large Language Models (LLMs) evolve into autonomous agents, the need for unified evaluation infrastructure becomes critical. However, current evaluation pipelines remain highly fragmented and tight…

来源：HuggingFace Papers

论文研究

7/15 04:00

KnowAct-GUIClaw: Know Deeply, Act Perfectly, Personal GUI Assistant with Self-Evolving Memory and Skill

OpenClaw has emerged as a leading agent framework for complex task automation, yet it faces insufficient cross-platform GUI interaction support and a well-built self-evolution mechanism. These flaws l…

来源：HuggingFace Papers

论文研究

7/15 04:00

RxBrain: Embodied Cognition Foundation Model with Joint Language-Visual Reasoning and Imagination

Embodied cognition requires agents to connect high-level task reasoning with the physical states to be achieved. We introduce Hy-Embodied-RxBrain, an embodied cognition foundation model with joint lan…

来源：HuggingFace Papers

论文研究NEW

7/15 04:00

Cura 1T: Specialized Model for Agentic Healthcare

Healthcare spans high-stakes communication, expert reasoning, and workflow execution, yet specialized LLMs that cover these use cases together remain limited. A healthcare model must handle patient co…

来源：HuggingFace Papers

论文研究NEW

7/15 04:00

Diagnosing and Calibrating Tool-Call Boundary Drift in Multi-Teacher On-Policy Distillation

Agentic language models must learn when to call tools, when to consume tool responses, and when to answer directly. This makes multi-teacher on-policy distillation a natural training strategy: one tea…

来源：HuggingFace Papers

技巧与观点

7/15 02:44

Multi-agent social intelligence with Strands Agents and Amazon Bedrock

This post shows how Thrad.ai deployed a multi-agent system with Strands Agents and Amazon Bedrock AgentCore that automates the pipeline from prospect discovery through personalized…

AI 点评 · 多智能体协同与云平台结合，实现从线索挖掘到个性化服务的全流程自动化。

来源：AWS ML

产品发布/更新

7/15 02:43

阶跃入局，重构智能体时代操作系统

董事长印奇称“未来的OS一定是跨端的”

AI 点评 · 多模态Agent时代，阶跃以跨端OS切入，或将重塑AI生态格局。

来源：量子位

论文研究

7/15 01:59

Do AI Agents Know When a Task Is Simple? Toward Complexity-Aware Reasoning and Execution

Large language model (LLM) agents increasingly automate multi-step engineering and informatics workflows, yet they rarely ask how much effort a task actually requires. They often follow a maximum-cont…

来源：arXiv

论文研究

7/15 01:59

TerraZero: Procedural Driving Simulation for Zero-Demonstration Self-Play at Scale

Training robust autonomous driving agents requires a simulator that is fast enough for reinforcement learning at scale, realistic enough to ground behavior in real-world map structure, and diverse eno…

来源：arXiv

论文研究

7/15 01:58

PalmClaw: A Native On-Device Agent Framework for Mobile Phones

Large Language Model (LLM) agents have moved beyond generating responses to executing multi-step tasks by calling tools, observing the results, and iteratively deciding the next action. Most agent sys…

来源：arXiv

技巧与观点

7/15 00:47

Accelerating software delivery with agentic QA automation using Amazon Nova Act – Part 2

In this post, we extend that foundation to demonstrate how QA Studio addresses batch regression testing and pipeline integration through test suites that organize and parallelize e…

AI 点评 · 用Amazon Nova Act实现智能体自动化测试，大幅提升批量回归效率与流水线集成能力。

来源：AWS ML

行业动态

7/15 00:43

Agentic Enterprise：Snowflake AI 如何重新定义工作｜ Summit 2026

AI 点评 · 从数据到行动，Snowflake AI让企业自主决策，工作效率将被彻底重塑。

来源：InfoQ

行业动态

7/15 00:06

Launch HN: Agnost AI (YC S26) – Extract user feedback from agent conversations

Hey HN, we’re Shubham & Parth, childhood friends building Agnost AI ( https://agnost.ai ), product analytics for teams building chat and voice agents. We read production conversati…

来源：Hacker News

行业动态

7/14 23:10

SpaceX联手Cursor的首个AI产品曝光！全能办公智能体内测中，ChatGPT Work迎来劲敌

来源：InfoQ

行业动态

7/14 20:41

Show HN: I RL-trained an agent that trains models with RL (for ~$1.3k)

来源：Hacker News

行业动态

7/14 19:21

氪星晚报｜智谱：完成配售新H股募资约314亿港元；荣耀与阿里将开展AI智能体终端合作；小米机器人首次实现汽车工厂柔性工件的长时作业

大公司：中国神华：预计上半年净利润同比增长6.9%-21.1% 36氪获悉，中国神华公告，预计2026年上半年归属于上市公司股东的净利润为263亿元至298亿元，同比增长6.9%-21.1%。业绩变动主要系煤化工业务量及自有铁路、港口、航运业务量增加，带动相关业务利润同比增长。中国人寿：预计上半年净利润同比增长约215%-235% 36氪获悉，中国人寿公…

来源：36氪

行业动态

7/14 19:21

Codex starts encrypting sub-agent prompts

来源：Hacker News

行业动态

7/14 18:48

估值110亿美元，这家超级独角兽要帮初创公司从Day0走向全球

凌晨三点，一家刚成立不久的AI创业公司，可能已经在同时服务旧金山的客户、采购首尔的技术服务，并与拉各斯的合作伙伴签下合同。这家公司甚至还没有招到第一名全职财务人员，业务却已经跨越多个市场、币种和监管辖区。 AI正在让这样的创业路径成为可能。过去需要市场、运营、客服等一整套全球化团队才能完成的工作，现在借助智能体就能承担相当一部分。新一代初创企业不必再按照“先…

来源：36氪

行业动态

7/14 18:30

从上下文到经验资产：Agent 记忆系统的工程化路径与 MemOS 实践

来源：InfoQ

技巧与观点

7/14 18:00

How to manage AI investments in the agentic era

Learn how enterprises can manage AI investments in the agentic era by measuring useful work per dollar, improving efficiency, and scaling high-value workflows.

来源：OpenAI

行业动态

7/14 18:00

面向 Agentic 负载的下一代 LLM 推理引擎设计实践｜AICon深圳

来源：InfoQ

行业动态

7/14 17:53

从 AI 取数到智能分析：企业级数据 Agent 的多阶段演进与工程化落地

来源：InfoQ

产品发布/更新

7/14 12:53

QuantumByteOSS/quantumbyte

Open-source app builder engine — intent to working app

来源：GitHub

产品发布/更新

7/14 11:34

100+Skill导演级专家随叫随到！这回视频Agent终于有了可用级产品

Agent携上百个Skills助我当大导演

来源：量子位

行业动态

7/14 07:42

Show HN: FixBugs – Reproduce production bugs and verify fixes

I built FixBugs, an agent that ingests the rich context surrounding production bugs to reproduce them in a sandbox and generate verified fixes. It's available in the form of a self…

来源：Hacker News

行业动态

7/14 07:31

Hermes agent maker Nous Research in talks for new funding at $1.5B valuation

The company is raising at least $75 million, led by Robot Ventures, with significant participation from USV and other prominent investors.

来源：TechCrunch

论文研究

7/14 04:00

Function-Aware Fill-in-the-Middle as Mid-Training for Coding Agent Foundation Models

Coding agents must integrate external tool returns into ongoing reasoning - a capability that standard left-to-right pretraining on code exposes only in its forward direction. We observe that the acti…

来源：HuggingFace Papers

论文研究

7/14 04:00

From Controlled to the Wild: Evaluation of Pentesting Agents for the Real-World

AI pentesting agents are increasingly credible as offensive security systems, but current benchmarks still provide limited guidance on which will perform best in real-world targets. Existing evaluatio…

来源：HuggingFace Papers

论文研究

7/14 04:00

Self-Improvements in Modern Agentic Systems: A Survey

Self-improving autonomous agents are moving from research prototypes to deployed systems. The primary goal is controllable evolution, or adaptation, from experience with minimal or even no human input…

来源：HuggingFace Papers

论文研究

7/14 04:00

PalmClaw: A Native On-Device Agent Framework for Mobile Phones

来源：HuggingFace Papers

论文研究

7/14 04:00

Tracing Agentic Failure from the Flow of Success

Failure attribution for LLM-based agentic systems, i.e., identifying which steps in a failure trajectory caused the task to fail, is critical for debugging and improving these systems. Existing approa…

来源：HuggingFace Papers

论文研究

7/14 04:00

Harness Handbook: Making Evolving Agent Harnesses Readable,Navigable, and Editable

The capability of a modern AI agent depends not only on its foundation model but also on its harness, which constructs prompts, manages state, invokes tools, and coordinates execution. As models, APIs…

来源：HuggingFace Papers

论文研究

7/14 04:00

Navigating the Mirage: A Dual-Path Agentic Framework for Robust Misleading Chart Question Answering

Despite the success of Vision-Language Models (VLMs), misleading charts remain a significant challenge due to their deceptive visual structures and distorted data representations. We present ChartCyni…

来源：HuggingFace Papers

论文研究

7/14 04:00

Rethinking the Evaluation of Harness Evolution for Agents

We revisit the evaluation of automatic harness evolution for LLM agents. Existing harness evolution methods use unit test cases to search for harness configurations and then report final performance o…

来源：HuggingFace Papers

论文研究NEW

7/14 04:00

From Human-Centric to Agentic Code Review: The Impact of Different Generations of Generative AI Technology on Review Quality

Code review helps maintain software quality before code integration, but it also imposes a substantial workload on human reviewers. As generative artificial intelligence becomes part of software devel…

来源：HuggingFace Papers

论文研究NEW

7/14 04:00

ReflectWorld-MM: An Entity-Oriented Multimodal Memory System for Open-Ended Video Streams

Building assistants that can continually watch the world, remember what they see, and reason over their accumulated experience is a long-standing goal, and recently multimodal agents equipped with lon…

来源：HuggingFace Papers

行业动态

7/14 02:50

AI agents create virtual playgrounds to help robots get crucial training data

“SceneSmith” system uses collaborative AI agents to create realistic 3D environments of places like kitchens, hotels, and living rooms, where robots can simulate everyday chores.

AI 点评 · AI代理自动生成3D训练场景，大幅降低机器人数据采集成本，加速家务机器人落地。

来源：MIT News

行业动态

7/14 02:28

小扎“消失”三年后发帖，只为它：Meta最强Agent模型进军编程，从免费开源到卖“低价”模型

AI 点评 · Meta开源编程Agent模型，从免费转向低价商业模式，标志AI编程工具商业化加速。

来源：InfoQ

行业动态

7/14 02:18

Show HN: BillAI Bass, an AI-Powered Big Mouth Billy Bass Using Strands Agents

AI 点评 · 用AI改造经典玩具，Strands Agents让互动鱼挂件有了新玩法。

来源：Hacker News

技巧与观点

7/14 01:34

Building an agentic AI solution at Bluesight with Amazon Bedrock

In this post, we describe how Bluesight used two AWS engagements and Amazon Bedrock AgentCore to evolve from a single-product AI prototype to Prism, a unified agentic AI solution s…

AI 点评 · Bluesight借助亚马逊Bedrock将AI原型升级为统一代理系统，展示了企业级AI落地的实战路

来源：AWS ML

技巧与观点

7/14 01:27

Implement on-behalf-of token exchange for multi-tenant agents with Amazon Bedrock AgentCore Gateway

Building multi-tenant agents with Amazon Bedrock AgentCore and Apply fine-grained access control with Bedrock AgentCore Gateway interceptors establish the conceptual foundation for…

AI 点评 · 用Bedrock实现多租户代理的细粒度权限控制，解决了真实业务场景中的访问隔离难题。

来源：AWS ML

论文研究

7/14 01:13

MM-ToolSandBox: A Unified Framework for Evaluating Visual Tool-Calling Agents

We introduce MM-ToolSandBox, a benchmark and evaluation framework for visually grounded tool-calling agents. The framework provides a stateful execution environment spanning 500+ tools across 16 appli…

来源：arXiv

论文研究

7/14 00:37

Forgetting Our Way to Shared Meaning: Effects of Forgetting on Conceptual Alignment in a Non-Partnership Coordination Game

Shared meaning in language requires people to learn and agree on categories. We ask how characteristics of agents' memories change the emergence and evolution of shared meaning. Without a coordination…

来源：arXiv

产品发布/更新

7/13 23:15

Agent专用搜索登顶Product Hunt，Token更省搜得更准

出自中国团队

来源：量子位

行业动态

7/13 23:06

Now, defenders are embracing the prompt injection, too

"Context bombing" tricks hacking agents into shutting down before they can do harm.

来源：Ars Technica

行业动态

7/13 22:44

从看见问题到解决问题，Agent 正重新定义可观测

来源：InfoQ

行业动态

7/13 22:41

从“三天拆盲盒”到一行命令即用：这群开发者如何改写Agent框架的部署难题？

来源：InfoQ

行业动态

7/13 20:00

从看见问题到解决问题，Agent 正重新定义可观测？

AI 点评 · Agent让可观测性从被动发现问题转向主动修复，这是运维智能化的关键跃迁。

来源：InfoQ

产品发布/更新

7/13 19:59

Agent要数量也要脑子！浪潮信息一边单柜养4万Agent，一边让大模型组队答题

同时发布CPU原生液冷整机柜、多模融合超节点

来源：量子位

产品发布/更新

7/13 16:57

try1004/FlowLens-AgentOps

Observable diagnostics, failure attribution, metrics, and no-key replay for multi-agent runtimes.

来源：GitHub

论文研究

7/13 12:34

Multi-Agent LLMs Fail to Explore Each Other

Exploration is essential for reliable autonomy in multi-agent systems, yet it remains unclear whether large language model (LLM) agents can explore effectively when interacting with one another. We sh…

来源：HuggingFace Papers

行业动态

7/13 11:58

《智能体个人信息保护自律公约》发布，百度、腾讯、阿里、火山引擎等 31 家企业首批签署

IT之家 7 月 13 日消息，由中国互联网协会网民权益和个人信息保护工作委员会主办的 2026（第二十五届）中国互联网大会网民权益和个人信息保护论坛于 7 月 9 日在京举办。中国互联网协会移动互联网工作委员会和中国互联网协会网民权益和个人信息保护工作委员联合发布《智能体个人信息保护自律公约》，现场来自百度、腾讯、阿里、火山引擎等 31 家互联网企业…

AI 点评 · 科技巨头联合签署，为AI智能体划清数据安全红线，行业自律迈出关键一步。

来源：IT之家

论文研究

7/13 04:00

Know Before Fix: QA-Driven Repository Knowledge Acquisition for Software Issue Resolution

LLM-based coding agents have significantly advanced automated software issue resolution, yet they remain highly prone to factual errors caused by insufficient repository understanding. Recent methods…

来源：HuggingFace Papers

行业动态

7/13 02:28

Show HN: Juggler – an open-source GUI coding agent, by the creator of JUCE

Hello HN, I don't post on here much, but wanted to get some eyes on a new project I'm just launching. I think we definitely need one more AI code agent.. I'm a long-term C++ dev, a…

来源：Hacker News

行业动态

7/13 01:13

Migrating a production AI agent to GPT-5.6: 2.2x faster, 27% cheaper

AI 点评 · 生产级AI代理迁移至GPT-5.6，实现速度翻倍且成本降低27%，性能与成本平衡的行业标杆。

来源：Hacker News

产品发布/更新

7/12 23:12

Meta 发布多模态推理模型 Muse Spark 1.1，强化 AI 智能体任务能力

IT之家 7 月 12 日消息，Meta 于 7 月 9 日正式发布适用于 AI 智能体的多模态推理模型 Muse Spark 1.1 版本，重点提升了模型在智能体任务中的规划、协同与执行能力，并增强了工具调用、代码开发、应用操作能力。 Meta 表示，Muse Spark 1.1 强化了多智能体协作机制，由主智能体负责收集信息、制定计划，再将任务拆分并分配…

AI 点评 · 多智能体协作机制是AI落地的关键突破，Meta这次强化了任务拆解与分工能力。

来源：IT之家

行业动态

7/12 13:51

Show HN: Mindwalk – Replay coding-agent sessions on a 3D map of your codebase

AI 点评 · 3D可视化编码过程，直观追踪AI代理如何理解代码库，革新调试与协作方式。

来源：Hacker News

论文研究

7/12 04:00

Towards Autonomous and Auditable Medical Imaging Model Development

Large language model (LLM) agents are beginning to automate machine learning engineering (MLE) by coupling planning, code execution, debugging, and empirical feedback. Translating this capability to m…

来源：HuggingFace Papers

行业动态

7/12 02:02

Who manages the agents?

AI 点评 · AI自主管理引发代理控制权归属的核心争议。

来源：Hacker News

模型发布/更新

7/11 19:35

智谱CEO唐杰发内部信：“GLM 时刻”和万亿俱乐部之后，什么是更重要的事

36氪独家获悉，7月11日，智谱创始人唐杰，在智谱发布了主题为《巨浪已来》的内部信。其中提到，智谱将不追求短期的应用变现，而是直指AGI的下一个高地：长程任务能力、完全自治的智能体系统、自我进化、极致安全治理。过去半年来，智谱收获了创立以来的高光时刻：市值较半年前上市初期涨了10倍，并在2026年 6月，跻身“万亿港元俱乐部”。

AI 点评 · 聚焦AGI核心突破而非短期变现，揭示智谱从估值飙升到技术深水区的战略转型。

来源：36氪

论文研究

7/11 19:24

ABot-AgentOS: A General Robotic Agent OS with Lifelong Multi-modal Memory

Recent VLM and VLA systems have improved robotic perception and action prediction, yet long-horizon embodied agents still require a general runtime layer for reasoning, memory, tool use, verification,…

来源：HuggingFace Papers

行业动态

7/11 17:00

Airbnb 分享 Kubernetes 动态配置 Sidecar Sitar-agent 的架构

AI 点评 · K8s动态配置新实践，Airbnb开源方案揭示云原生服务治理前沿。

来源：InfoQ

行业动态

7/11 16:13

腾讯洽购Manus？知情人士：腾讯仍将保留少数股东地位

7月11日，有消息称腾讯正在洽谈成为通用AI Agent公司Manus的最大股东，据该消息，由腾讯牵头的中方资本组团以约20亿美元估值从Meta手中回购Manus的全部股权。记者向腾讯方面求证，截至发稿腾讯方面暂无回应。另有知情人士向记者透露，此次交易后，腾讯仍将保持少数股东地位，但不会控股。（南方都市报）

AI 点评 · 腾讯罕见以少数股东身份入局AI Agent，或意在布局生态而非控制，战略意图值得玩味。

来源：36氪

产品发布/更新

7/11 13:30

GPT-5.6一小时解开50年数学猜想，700词Prompt驾驭64个子Agent

神话级大模型驾驭宝典

AI 点评 · 突破性验证了超大模型在复杂推理任务中的协同能力，或开启AI解决数学难题新纪元。

来源：量子位

模型发布/更新

7/11 09:24

9点1氪丨“国产存储第一股”长鑫科技公布承销团阵容；SK海力士登陆美股，上市首日大涨近13%；OpenAI推出ChatGPT智能体

今日热点导览 “全球首款智能体手机”已备货8万至10万台？知情人士：假的百亿私募数量达142家，再次刷新历史纪录三星李在镕拟于7月底赴美会晤英伟达黄仁勋德国大众拟大裁员，最高或裁减12万个岗位 OpenAI高管层再现变动，首席运营官因病离职 TOP3大新闻长鑫科技，承销团阵容公布长鑫科技IPO进入发行倒计时，这家“国产存储第一股”背后的承销团阵容也…

AI 点评 · 国产存储芯片龙头IPO加速，承销团阵容披露凸显市场关注热度。

来源：36氪

论文研究

7/11 04:00

GRASP: GRanularity-Aware Search Policy for Agentic RAG

Agentic retrieval-augmented generation (RAG) extends static RAG by allowing language models to iteratively reason, generate search queries, retrieve evidence, and predict answers. However, it remains…

来源：HuggingFace Papers

论文研究

7/11 01:52

VEXAIoT: Autonomous IoT Vulnerability EXploitation using AI Agents

Internet of Things (IoT) systems are inherently vulnerable due to constrained hardware, outdated firmware, and insecure default configurations, creating a need for scalable and adaptive security testi…

AI 点评 · 用AI代理自动挖掘物联网漏洞，突破传统测试瓶颈，提升安全检测效率。

来源：arXiv

论文研究

7/11 01:22

Task-Specific Multimodal Question Answering Agents via Confidence Calibration and Incremental Reasoning for QANTA 2026

We present our submission to the QANTA 2026 shared challenge at the ICML 2026 Workshop on Efficient Multimodal Question Answering (EMM-QA). Quanta evaluates multimodal quizbowl systems that answer pyr…

AI 点评 · 置信度校准与增量推理结合，为多模态问答提供高效新思路，技术方案值得关注。

来源：arXiv

论文研究

7/11 00:54

Agora: Enhancing LLM Agent Reasoning Via Auction-Based Task Allocation

Enhancing the reasoning capabilities of large language model (LLM) agents requires effective orchestration of diverse expert models and tools. However, existing frameworks typically call APIs based on…

AI 点评 · 用拍卖机制分配任务，提升大模型推理效率，为多智能体协作提供新思路。

来源：arXiv

论文研究

7/11 00:36

TrustX Agent Risk Classification Framework (ARC): Risk-Tiering Internally Created Agentic AI Systems

The proliferation of agentic AI systems across enterprise and public-sector contexts has outpaced the capacity of general-purpose AI risk frameworks to classify and govern them. In this paper, we intr…

AI 点评 · 首个针对企业内部AI代理系统的分级风控框架，填补了通用框架在代理型AI治理上的空白。

来源：arXiv

行业动态

7/10 23:46

SK 集团会长崔泰源：AI 驱动内存需求呈指数级增长，未来五年产能翻倍仍难满足

IT之家 7 月 10 日消息，SK 海力士今日在纳斯达克挂牌交易其美国存托凭证（ADR）。随后，SK 集团会长崔泰源接受了彭博社与 CNBC 的采访。 IT之家注意到，崔泰源表示，在人工智能时代，内存行业已进入结构性增长阶段。过去，内存的需求主要取决于人口数量或是智能手机和个人电脑的销量。然而，随着 AI 智能体、推理过程中产生的键值缓存（KV Cac…

AI 点评 · AI引爆内存需求结构性变革，产能翻倍仍供不应求，预示行业进入长期高景气。

来源：IT之家

产品发布/更新

7/10 23:32

Alisa0808/vox-director

Turn one topic into a finished Vox-style paper-collage explainer/ad video — automated end to end on Atlas Cloud + ffmpeg. An agent skill.

来源：GitHub

技巧与观点

7/10 23:31

Build a semantic layer for agentic AI on AWS with Stardog and Amazon Bedrock AgentCore

In this post we show how to build a semantic layer on AWS using Stardog’s Semantic AI Application over Amazon Aurora and Amazon Redshift, and how to run a Strands Agents agent on A…

AI 点评 · 企业级AI与知识图谱深度结合，打通数据语义理解到智能决策的完整链路，值得关注。

来源：AWS ML

行业动态

7/10 23:30

精彩预告：从看见问题到解决问题，Agent 正重新定义可观测？

AI 点评 · Agent将可观测性从被动诊断转向主动干预，颠覆传统运维思路。

来源：InfoQ

技巧与观点

7/10 23:28

Scaling agentic workflows with native case management in Amazon Quick Automate

In this post, we show you how to combine case management with agentic automation capabilities in Quick Automate. We introduce case management and explore the lifecycle of cases in…

AI 点评 · 结合案例管理与智能自动化，提升企业复杂任务处理效率，降低人工干预成本。

来源：AWS ML

技巧与观点

7/10 23:23

How KTern.AI built agentic AI for SAP on Amazon Bedrock AgentCore

Evolving from a traditional software as a service (SaaS) platform into a next-generation agentic AI platform meant orchestrating multiple specialized agents across long-running ent…

AI 点评 · KTern.AI用亚马逊Bedrock实现SAP多智能体协作，展示企业级AI落地新路径。

来源：AWS ML

行业动态

7/10 19:32

当 AI 智能体遭遇“至暗时刻”，企业的最后防线是什么？｜技术趋势

AI 点评 · AI智能体面临安全风险时，企业的核心防线是人机协同和伦理规范。

来源：InfoQ

产品发布/更新

7/10 17:27

日均提问次数暴增 20 倍！百度搭子宣布重磅升级，企业版同步发布

agent时代来了

来源：量子位

产品发布/更新

7/10 12:13

AlephAITech/WorkBuddyGuide

A practical, open-source guide to mastering WorkBuddy through real-world workflows.开源的 WorkBuddy 实战蓝皮书：教程、真实工作流、Skills、MCP、自动化与多智能体实践。

来源：GitHub

模型发布/更新

7/10 08:56

派早报：蔚来 ES8 大五座版正式上市等

OpenAI 发布 GPT-5.6 系列模型等，SpaceXAI 发布编程智能体模型 Grok 4.5。查看全文

AI 点评 · 大模型竞速加剧，头部企业密集发布新品，技术迭代与市场格局值得追踪。

来源：少数派

模型发布/更新

7/10 07:27

IT早报 0710：OpenAI 发布 GPT-5.6 系列模型；支付宝客服就花呗服务问题致歉；小米澎程首款 SUV 内外饰首秀；华为李文广回应友商智驾兜底...

“IT早报”时间，大家好，现在是 2026 年 7 月 10 日星期五，今天的重要科技资讯有： 1、OpenAI 最强 AI 模型：GPT-5.6 系列正式上线，纳德拉称微软 Copilot 同步接入 OpenAI 公司 7 月 10 日发布公告，宣布在 ChatGPT（聊天机器人）、Codex（主打编程 AI Agent，目前朝通用 Agent 方向）以及…

AI 点评 · OpenAI模型重大升级，微软深度整合，预示AI竞争进入新阶段。

来源：IT之家

行业动态

7/10 07:15

奥尔特曼：GPT-5.6 Sol 是 OpenAI 最好 AI 模型，同等 / 更优性能下 Token 效率提高 54%

IT之家 7 月 10 日消息，在接受 CNBC 采访时，OpenAI 首席执行官萨姆 · 奥尔特曼（Sam Altman）表示，在 AI 智能体编程任务中， GPT-5.6 Sol 模型表现比市场主流竞争模型“一样好，甚至更好”，但 Tokens 效率提高 54%。 IT之家注：原文中并未具体指名市场主流竞争模型，不过鉴于 GPT-5.6 Sol 模型的定…

AI 点评 · 性能飞跃与成本降低同步实现，这项突破将加速AI应用落地。

来源：IT之家

模型发布/更新

7/10 06:57

OpenAI 推出 ChatGPT Work 智能体：GPT-5.6 支持，可驾驭长时间、多步骤任务

IT之家 7 月 10 日消息，OpenAI 今天（7 月 10 日）发布博文，在宣布推出 GPT-5.6 系列 AI 模型的同时，还推出全新的 ChatGPT Work 智能体，由 GPT-5.6 提供支持，定位为可承担长时、多步骤任务的智能体。在博文中，OpenAI 披露了 Codex 的现有使用规模：官方数据显示，Codex 每周用户数已超过 50…

AI 点评 · GPT-5.6加持的Work智能体，标志着AI从对话迈向长周期复杂任务自主执行。

来源：IT之家

模型发布/更新

7/10 06:46

OpenAI 最强 AI 模型：GPT-5.6 系列正式上线，纳德拉称微软 Copilot 同步接入

IT之家 7 月 10 日消息，OpenAI 公司今天（7 月 10 日）发布公告，宣布在 ChatGPT（聊天机器人）、Codex（主打编程 AI Agent，目前朝通用 Agent 方向）以及 API 中上线 GPT-5.6 系列模型。在模型方面，IT之家援引博文介绍，OpenAI 本次共发布 3 档模型：旗舰版 Sol（太阳）：每 100 万 T…

AI 点评 · 微软同步接入，意味着AI竞争格局突变，企业级应用迎来新拐点。

来源：IT之家

行业动态

7/10 06:08

An AI agent startup just let its agent run its $100M fundraise

Lyzr, a startup that builds AI agents for enterprises, used its own AI agent to raise a $100 million round — proof, evidently, that the product actually works.

AI 点评 · AI代理成功完成融资，证明企业级产品真实可用，开创行业先河。

来源：TechCrunch

行业动态

7/10 06:03

OpenAI is shutting down Atlas, but its AI browser ambitions are still growing

OpenAI is sunsetting its AI-powered browser after less than a year. But it's moving some agentic browsing features to its desktop app and a Chrome extension.

AI 点评 · 关闭自研浏览器，但将核心功能整合进桌面端和插件，表明战略收缩而非放弃。

来源：TechCrunch

行业动态

7/10 03:40

Meta enters the crowded AI coding battle with Muse Spark 1.1

Meta's pitch to users is Spark's ability to handle large agentic workloads, fix bugs, and help with large code migrations — the kind of automation that enterprises are increasingly…

AI 点评 · Meta携Spark 1.1切入企业级AI编码自动化，专注大型代码迁移和修复，展现巨头竞争新方向。

来源：TechCrunch

技巧与观点

7/10 01:58

Agent 进化论：从对话到协作

AI 点评 · AI从单轮对话迈向多智能体协作，预示人机协同新范式即将到来。

来源：InfoQ

产品发布/更新

7/10 00:24

Introducing Muse Spark 1.1

Introducing Muse Spark 1.1 Following Muse Spark in April , here's Muse Spark 1.1 - the first Spark model to offer an API. Meta claim significant improvements in agentic tool callin…

来源：Simon Willison

行业动态

7/10 00:00

Snowflake CoWork：每位知识工作者的专属工作 Agent ｜技术趋势

来源：InfoQ

行业动态

7/9 23:45

Show HN: Reverse-engineering web apps into agent tools

Hey HN! We built a browser-based agent that runs inside an authenticated web app, watches how the app calls its own APIs, and automatically turns those into agent tools. You can th…

来源：Hacker News

产品发布/更新

7/9 23:28

Launch HN: Context.dev (YC S26) – API to get structured data from any website

Hi Hacker News, I’m Yahia. I built Context.dev ( https://www.context.dev/ ) to make it really easy to integrate web data into your products and agents. Here’s a demo video: https:/…

AI 点评 · 让任意网页变成结构化API，极大降低AI应用获取实时网络数据的门槛。

来源：Hacker News

行业动态

7/9 21:23

Show HN: FableCut – A browser video editor AI agents can drive (zero deps)

来源：Hacker News

模型发布/更新

7/9 18:00

ChatGPT is now a partner for your most ambitious work

ChatGPT Work is an agent that can take action across your apps and files, stay with a project for hours if needed, and turn a goal into finished work.

AI 点评 · ChatGPT从对话助手跃升为跨应用自主执行任务的智能代理，标志AI进入主动工作阶段。

来源：OpenAI

产品发布/更新

7/9 17:18

theteatoast/local-vuln-research-pipeline

Fully local vulnerability research pipeline - 14B code-specialized LLM reviews every source file exhaustively.

来源：GitHub

产品发布/更新NEW

7/9 14:46

UiPath/coder_eval

Evaluate & benchmark AI coding agents and Claude Code skills — sandboxed, reproducible YAML eval suites for Claude Code, Codex & Gemini, with A/B experiments an…

来源：GitHub

模型发布/更新

7/9 08:40

派早报：GPT-5.6 即将开放使用、Nothing 发布 Phone (4b) 等

Notion 推出全新应用 Agents、Jolla Phone (2026) 手机正式发售等。查看全文

来源：少数派

模型发布/更新

7/9 06:51

马斯克 SpaceXAI 首个编程智能体模型 Grok 4.5 发布：与 Cursor 联合训练，效率翻倍价格减半

IT之家 7 月 9 日消息，SpaceXAI 今日正式发布了其 Grok 4.5 模型，这是该公司首个专门针对编程和智能体任务训练的模型。据介绍，该模型由 SpaceXAI 与 Cursor 联合完成训练，在提供前沿智能水平的同时，兼具领先的速度与成本效率。马斯克将其称为“Opus 级模型”。 Grok 4.5 面向真实工程场景设计，擅长处理大型代码库以…

AI 点评 · 编程智能体成本砍半效率翻倍，马斯克联手Cursor的定价策略才真值得行业关注。

来源：IT之家

论文研究

7/9 04:00

Remember When It Matters: Proactive Memory Agent for Long-Horizon Agents

In long-horizon tasks, decision-relevant state is often scattered across an expanding trajectory, while the action agent must surface it and act. As trajectories grow, task requirements, environment f…

来源：HuggingFace Papers

论文研究

7/9 04:00

UniClawBench: A Universal Benchmark for Proactive Agents on Real-World Tasks

The rapid development of large language models and multimodal large language models has accelerated the emergence of proactive agents capable of operating everyday tools and assisting users in real-wo…

来源：HuggingFace Papers

论文研究

7/9 04:00

CausalDS: Benchmarking Causal Reasoning in Data-Science Agents

Large language models (LLMs) increasingly act as integrated data-science agents, combining abstract reasoning with advanced tool use. Yet the relevant benchmark landscape largely divides into symbolic…

来源：HuggingFace Papers

论文研究

7/9 04:00

Long-Horizon-Terminal-Bench: Testing the Limits of Agents on Long-Horizon Terminal Tasks with Dense Reward-Based Grading

AI agents have become capable of autonomously completing short, well-specified tasks. However, existing terminal benchmarks largely focus on simple problems that finish within minutes and are evaluate…

来源：HuggingFace Papers

论文研究

7/9 04:00

Search Beyond What Can Be Taught: Evolving the Knowledge Boundary in Agentic Visual Generation

Visual generators excel at rendering, but they confidently fabricate what they do not know. User requests are unbounded, evolving, and deeply long-tailed: new characters, trending entities, post-cutof…

来源：HuggingFace Papers

论文研究

7/9 01:57

From Noisy Traces to Root Causes: Structural Trajectory Analysis and Causal Extraction for Agent Optimization

The optimization of long-horizon agents increasingly relies on reflection-based mechanisms, where a large language model (LLM) acts as an optimizer to diagnose agent failures and improve agent policie…

来源：arXiv

论文研究

7/9 01:55

Breaking Database Lock-in: Agentic Regeneration of High Performance Storage Readers for Database Bypass

Analytical workloads operating on data stored in external database systems face a fundamental bottleneck: data access is guarded entirely by the database driver, like JDBC or ODBC, forcing all reads t…

来源：arXiv

论文研究

7/9 01:53

Institutional Red-Teaming: Deployment Rules, Not Just Models, Causally Shape Multi-Agent AI Safety

We introduce institutional red-teaming, an evaluation methodology for testing deployment rules in multi-agent AI: hold the agents, objectives, and task state fixed, vary only one rule, and attribute t…

来源：arXiv

行业动态NEW

7/9 01:48

Ask HN: Another "Hacker News" with less AI and more human-focused hacking news?

I am done with articles stating "I used this LLM to do that", or "Look, this agent did that in 2 minutes!". I want content more user-centric, less openai / anthropic, and more "hum…

来源：Hacker News

行业动态

7/9 01:46

Show HN: Microsoft releases Flint, a visualization language for AI agents

Data visualizations are the bridge between user and data. But building AI agents that can generate visualizations reliably can be very tricky: - simple chart specs can be reliable,…

来源：Hacker News

论文研究

7/9 01:34

SkillCenter: A Large-Scale Source-Grounded Skill Library for Autonomous AI Agents

Autonomous AI agents can execute complex tasks with limited human review, yet they often lack the grounded operational knowledge to make their outputs not just executable but correct, secure, and main…

来源：arXiv

技巧与观点

7/9 01:16

Data for Agents

AI 点评 · 数据驱动智能体的新范式，预示AI自主决策能力将迎来关键突破。

来源：HuggingFace Blog

行业动态

7/9 00:22

Prime Intellect raises $130M Series A to help enterprises build their own AI agents

Founded in 2024, Prime Intellect’s goal is to give organizations capabilities to train their own agentic systems without relying on frontier AI labs.

来源：TechCrunch

行业动态

7/9 00:00

Cube Sandbox正式支持Arm架构！腾讯云与Arm联手解锁Agent多架构算力

来源：InfoQ

论文研究

7/9 00:00

Flint: A visualization language for the AI era

Short chart specifications are easy to write, but often produce uninspiring results. Flint is an open-source visualization language that offers a middle path, letting AI agents cre…

来源：Microsoft Research

模型发布/更新

7/8 23:00

NVIDIA Nemotron Achieves Benchmark-Leading Performance With LangChain Deep Agents Harness

NVIDIA Nemotron 3 Ultra is offering leading performance at lower cost than top closed models with the largest and most widely adopted AI agent orchestration platform. LangChain tun…

来源：NVIDIA

产品发布/更新NEW

7/8 21:02

BitMiracle-AI/Dormice

The SQLite of agent sandboxes — self-hosted, E2B-compatible. One machine, sandboxes that live forever, idle costs nothing.

来源：GitHub

产品发布/更新

7/8 16:05

地产AI进入落地战，深度智联又甩出一张牌

7月7日，易居（中国）控股有限公司董事局主席、总裁周忻再一次来到台前，给公司的AI产品站台，推出其核心战略产品“地产模数通——企业专属大模型一体机”。同时，克而瑞地产AI分析师“小瑞”正式上岗。不到两个月前，易居旗下的深度智联刚刚发布了全球首个房地产经纪人智能体“易居·小新”，用中立无佣模式替代传统中介。迹象显示，深度智联在加速AI产品落地。周忻说，地产…

来源：36氪

论文研究

7/8 15:51

阿里斩获国际AI顶会最佳资源论文奖，提出Agent评测新范式

来源：量子位

行业动态

7/8 13:25

GitLost: We Tricked GitHub's AI Agent into Leaking Private Repos

来源：Hacker News

行业动态

7/8 08:00

「德睿智药」获5200万美元B轮融资，AI设计的减肥药已进入3期临床｜36氪首发

文｜胡香赟编辑｜海若镜 36 氪获悉，德睿智药近期已完成 5200 万美元B轮融资，投资方包括头部人民币和美元基金，凯乘资本为独家财务顾问。募集资金将用于AI制药引擎Molecule Arts Platform（MAP）升级迭代，完善其多智能体（Multi-Agent）协同体系与临床数据闭环（Clinical Data-in-the-Loop），以及推进自…

来源：36氪

论文研究

7/8 04:00

Single-Rollout Asynchronous Optimization for Agentic Reinforcement Learning

Reinforcement learning (RL) is becoming increasingly important for post-training large language models (LLMs). Previous RL pipelines for LLMs were mostly synchronous and batch-interleaved, which is in…

来源：HuggingFace Papers

论文研究

7/8 04:00

Jet-Long: Efficient Long-Context Extension with Dynamic Bifocal RoPE

Modern LLMs are increasingly deployed in long-context applications such as retrieval-augmented generation, repository-level coding, and agentic workflows whose accumulated reasoning and tool traces ro…

来源：HuggingFace Papers

论文研究

7/8 04:00

Flow-ERD: Agent-type Aware Flow Matching with Entropy-Regularized Distillation for Diverse Traffic Simulation

Realistic and diverse traffic simulation is essential to autonomous driving development. Yet prevailing benchmarks predominantly reward realism, and recent methods have optimized accordingly, leaving…

来源：HuggingFace Papers

论文研究

7/8 04:00

From Noisy Traces to Root Causes: Structural Trajectory Analysis and Causal Extraction for Agent Optimization

来源：HuggingFace Papers

论文研究NEW

7/8 04:00

DeepSearch-World: Self-Distillation for Deep Search Agents in a Verifiable Environment

Training tool-use agents to improve from their own experience remains challenging, as supervised fine-tuning relies on fixed teacher-distilled trajectories, while sparse-reward reinforcement learning…

来源：HuggingFace Papers

行业动态

7/8 03:02

AI Agent 会自己选 CDN 了：当网站访问者从 “人” 扩展到 “AI”，内容分发已升级

AI 点评 · AI自主决策CDN资源分配，标志内容分发网络从服务人类转向服务AI。

来源：InfoQ

行业动态

7/8 02:19

Show HN: Docx-CLI: agents read/edit Word docs using 1/2 the time and tokens

AI 点评 · 开源工具节省半数token，显著提升AI处理Word文档效率，开发者可快速集成。

来源：Hacker News

技巧与观点

7/8 01:07

Build a unified semantic layer across datasets with multi-dataset Topics in Amazon Quick

In this post, we walk through how multi-dataset Topics work, explain how the chat agent uses defined relationships to generate cross-dataset queries, and demonstrate an end-to-end…

AI 点评 · 跨数据集统一语义层，让非技术用户也能自然语言查询，大幅降低数据分析门槛。

来源：AWS ML

论文研究

7/8 01:03

Doomed from the Start: Early Abort of LLM Agent Episodes via a Recall-Controlled Probe Cascade

Large language model (LLM) agents solving multi-step tasks frequently commit to trajectories that are doomed to fail, yet continue to consume substantial inference compute before the failure becomes o…

来源：arXiv

技巧与观点

7/8 00:51

Build a serverless image editing agent with Amazon Bedrock AgentCore harness

This post walks through building a serverless image editor where users upload a photo, describe an edit in plain English, and receive the result in seconds. The agent runs on Agent…

AI 点评 · 无服务器图像编辑结合自然语言指令，大幅降低AI应用门槛，展示Bedrock AgentCore的实用

来源：AWS ML

论文研究

7/8 00:48

Multi-Agent Deep Reinforcement Learning for Multi Objective Battery Management in Dairy Farms

The dairy industry in Ireland has a large potential for the integration of renewable energy and the reduction of carbon emissions. However, researchers of distributed generation control are mainly foc…

来源：arXiv

技巧与观点

7/8 00:46

Build an AI-powered AWS support companion with Amazon Bedrock AgentCore

In this post, you build an AWS Support Companion using Amazon Bedrock AgentCore. The agent uses Strands Agents as the orchestration framework and connects to AWS services through t…

AI 点评 · 用AgentCore快速搭建运维助手，降低AI应用开发门槛。

来源：AWS ML

技巧与观点

7/8 00:43

How AWS Finance teams reclaimed hundreds of hours with Amazon Quick

In this post, we show how AWS Finance used chat agents and Flows in Amazin Quick to transform two of their most time-consuming workflows.

来源：AWS ML

模型发布/更新

7/7 23:00

AI Innovators Adopt NVIDIA Vera — Why Max Single-Threaded CPU at Scale Matters

Max single-threaded CPUs at scale are a new category of CPUs built for the agentic AI era. Across the creation and deployment of an agentic system, the CPU is on the critical path…

来源：NVIDIA

行业动态

7/7 22:00

Elastic 开源了基于认知科学的 Atlas Agent Memory

来源：InfoQ

行业动态

7/7 19:10

The foundational elements of AI architecture that IT leaders need to scale

With the rapid progress of AI capabilities and the move to agentic systems, organizations are expanding their use cases as the technology continues to grow. That constant evolution…

来源：MIT Tech Review

产品发布/更新

7/7 19:10

EXXETA/exxperts

Persistent AI rooms with governed, approval-gated memory: local-first, on your machine.

来源：GitHub

行业动态

7/7 18:31

亚马逊云科技发布用于成本分析和优化的 FinOps Agent 预览版

来源：InfoQ

论文研究

7/7 17:00

Intelligence is Free, Now What? <br> Data Systems for, of, and by Agents

... government of the people, by the people, for the people ... — Abraham Lincoln, Gettysburg Address (1863) The cost of AI is dropping rapidly. GPT-4-class capabilities cost rough…

来源：Berkeley AI Research

模型发布/更新

7/7 16:54

Expanding Managed Agents in Gemini API: background tasks, remote MCP and more

We’re announcing new capabilities in Managed Agents in Gemini API so developers can build reliable, production-ready agents.

来源：Google AI

产品发布/更新

7/7 16:38

caseclose/cma-harness

Cognitive-structured Multimodal Agent (CMA-Harness): a memory-centric agent for long-horizon multimodal understanding, generation, and editing — externalizing v…

来源：GitHub

产品发布/更新

7/7 14:23

"龙虾"为什么这么火？OpenClaw登顶GitHub后，AI Agent时代真的来了？

GitHubStar-history 最近Openclaw以25.2万星标，超越Meta的React登顶GitHub开源项目历史第一！要知道React是Facebook(现改名Meta）打造的经典前端框架，过去十余年间，互联网上绝大多数我们熟知的网站与App，底层技术架构皆由它构筑。Openclaw官方更是高调发文嘲讽Meta“我们在迭代创新，而你只在办会…

来源：36氪

行业动态

7/7 07:56

The ‘first’ AI-run ransomware attack still needed a human

An AI agent carried out the technical execution of a real-world ransomware attack for the first known time, but new details show a human still chose the victim, set up the infrastr…

来源：TechCrunch

产品发布/更新

7/7 06:35

Sahir619/fable-method

The Fable Workflow: how Claude Fable 5 worked, distilled into skills any model can run, with the eval that keeps it honest. Think / act / prove.

来源：GitHub

论文研究

7/7 04:00

SWE-Review: Closing the Loop on Issue Resolution with Agentic Code Review

Coding agents increasingly generate pull requests (PRs) for real-world software issues, yet one-shot PR generation remains open-loop: the PR is proposed without systematic review, diagnosis, or revisi…

来源：HuggingFace Papers

论文研究

7/7 04:00

Image2Sim: Scaling Embodied Navigation via Generative Neural Simulator

Embodied navigation aims to build agents that interpret multimodal goals, reason in 3D space, and reach target destinations reliably in the real world. However, progress remains constrained by the lac…

来源：HuggingFace Papers

论文研究

7/7 04:00

TurnOPD: Making On-Policy Distillation Turn-Aware for Efficient Long-Horizon Agent Training

On-policy distillation (OPD) trains a student policy by matching a stronger teacher on the student's own trajectories, offering a promising framework for language agent training. However, its applicat…

来源：HuggingFace Papers

论文研究

7/7 04:00

AgentLens: Production-Assessed Trajectory Reviews for Coding Agent Evaluation

We present AgentLens, a production-assessed benchmark for interactive code agents. Most code-agent benchmarks reduce a run to a single bit -- did the task pass? -- but the people who actually use thes…

来源：HuggingFace Papers

论文研究

7/7 04:00

SPEAR: A Simulator for Photorealistic Embodied AI Research

Interactive simulators have become powerful tools for training embodied agents and generating synthetic visual data, but existing photorealistic simulators suffer from limited generality, programmabil…

来源：HuggingFace Papers

论文研究NEW

7/7 04:00

Behavioral Privacy Leakage in Agentic Negotiation: Formalizing and Mitigating Inference Attacks via Randomized Policies

Autonomous negotiation agents are increasingly deployed in high-stakes settings such as insurance and procurement. While cryptographic techniques protect explicitly disclosed constraint values, they f…

来源：HuggingFace Papers

行业动态

7/7 03:49

Vercel CEO Guillermo Rauch on the fight to split off models from agents

"The reality is, when you're optimizing for production, you start looking at a price/performance," Guillermo Rauch tells TechCrunch.

AI 点评 · 模型与代理分离是AI落地的关键一步，Vercel CEO的实战视角揭示了成本与性能的平衡之道。

来源：TechCrunch

论文研究

7/7 01:56

Search Beyond What Can Be Taught: Evolving the Knowledge Boundary in Agentic Visual Generation

来源：arXiv

论文研究

7/7 01:55

CompactionRL: Reinforcement Learning with Context Compaction for Long-Horizon Agents

Long-horizon agentic LLMs are increasingly limited by finite context windows, as extended interaction trajectories can exceed the maximum context length before a task is completed. Context compaction…

来源：arXiv

论文研究

7/7 01:55

Cortex: A Bidirectionally Aligned Embodied Agent Framework for Long-horizon Manipulation

While recent Vision-Language-Action (VLA) models show promise toward generalist manipulation policies, they struggle with long-horizon tasks due to their Markovian nature-relying solely on current obs…

来源：arXiv

论文研究

7/7 01:39

SovereignPA-Bench: Evaluating User-Owned Personal Agents under Evolving Intent, Platform Mediation, and Consent Constraints

Personal agents are becoming persistent user-owned intermediaries: they remember preferences, filter platform-mediated information, use tools, and negotiate with services. Existing benchmarks evaluate…

来源：arXiv

论文研究

7/7 01:27

OptiAgent: End-to-End Optimization Modeling via Multi-Agent Iterative Refinement

We propose OptiAgent, a multi-agent framework that, given a natural language description of an Operations Research problem, is able to output a solver-ready mathematical formulation as well as executa…

来源：arXiv

行业动态

7/7 01:06

Agentic 范式下的视频画质优化：火山引擎的新路径

来源：InfoQ

行业动态

7/7 00:47

OfficeCLI: Office suite for AI agents to read and edit Microsoft Office files

来源：Hacker News

行业动态

7/7 00:22

腾讯混元Hy3正式发布，元宝同步上线Hy3 Agent能力、免费开放

来源：InfoQ

产品发布/更新

7/7 00:18

ronak-create/FableCut

Zero-dependency browser video editor that AI agents can drive — JSON timeline, MCP + REST, live-reloading UI

来源：GitHub

产品发布/更新

7/6 22:56

nexu-io/motion-anything

✨ The agentic motion layer — an open-source, chat-native motion engine. Describe the feeling; your AI ships the animation.

来源：GitHub

行业动态

7/6 21:17

Show HN: Scan your AI agents for dangerous capabilities

来源：Hacker News

行业动态

7/6 19:21

Windows平台安全与保护AI智能体的竞赛

来源：InfoQ

行业动态

7/6 18:22

豆包、千问关停智能体功能；曝腾讯研发token额度1400元/月起步，有人过万；美团限豆包、阿里禁Claude Code｜AI周报

来源：InfoQ

行业动态

7/6 18:19

让 Agent 成为音视频工作台：AI MediaKit CLI + Skill 发布

来源：InfoQ

行业动态

7/6 17:19

Azure Functions在Build 2026发布Serverless智能体运行时

来源：InfoQ

技巧与观点

7/6 16:40

The Hitchhiker's Guide to Agentic AI

来源：Hacker News

行业动态

7/6 15:53

获DCM Ventures投资数百万美元，APTSell希望成为AI版的首席销售官｜涌现新项目

文｜吴思瑾编辑｜邓咏仪 01 一句话介绍北京治真治合科技有限公司成立于2024年，旗下产品「APTSell」（AI Power To Sales��希望成为AI版的CSO （Chief Sales Officer，首席销售官）。简单来说，APTSell是一个组合式Agent，通过整合与可视化销售全流程数据，生成管理决策和执行建议，以期正向促进销售效率和…

来源：36氪

产品发布/更新

7/6 07:40

派早报：阿里禁用 Claude 模型

阿里禁用 Claude 模型索尼调整计划，2028 年前发售游戏可继续生产光盘千问、豆包将下线智能体功能 Android 反垄断案欧洲终审败诉混动车、商用纯电车将不再免征车船税电商法修正案征求意见看看就行的小道消息少数派的近期动态你可能错过的好文章查看全文

来源：少数派

产品发布/更新

7/6 07:25

oxbshw/watch-skill

Video understanding and self-verification for AI agents. Turn videos, streams, and agent screen recordings into searchable, timestamped evidence—then use THE LO…

来源：GitHub

论文研究

7/6 04:00

GaP: A Graph-as-Policy Multi-Agent Self-Learning Harness For Variational Automation Tasks

For robots to work reliably in commercial and industrial applications, can recent advances in agentic coding systems combine interpretable robot programming with the open-world adaptability of model-f…

来源：HuggingFace Papers

论文研究

7/6 04:00

Multiplayer Interactive World Models with Representation Autoencoders

We introduce the first multiplayer world model for highly dynamic environments governed by complex physical interactions. Whereas single-player world models treat the other agents as part of the envir…

来源：HuggingFace Papers

论文研究

7/6 04:00

Light-Omni: Reflex over Reasoning in Agentic Video Understanding with Long-Term Memory

Agentic video understanding equips models with long-term memory to autonomously process and respond to continuous, long-horizon multimodal streams. However, advanced video agents often rely on ``detec…

来源：HuggingFace Papers

产品发布/更新

7/5 17:23

ContextJet-ai/awesome-llm-observability

50+ curated LLM observability tools PLUS 26 Agent Skills (several with runnable, unit-tested scripts) to build, evaluate, debug, secure & monitor reliable LLM a…

来源：GitHub

产品发布/更新NEW

7/5 14:52

buchidonggua/dg-ai-notes

来源：GitHub

产品发布/更新

7/5 14:30

simonlin1212/Vibe-Research

Vibe-Research: Your Personal Trading Research Agent · A股/美股/港股的个人投研 Agent：每日复盘、资讯雷达、个股数据、板块中心、我的持仓、研究记录。Vibe-Research 把数据和功能配齐，由你自己的 AI 驱动投资研究。

来源：GitHub

产品发布/更新

7/5 13:38

zhiweio/EagleRAG

Search knowledge by what documents mean and how they look — not one or the other.

来源：GitHub

论文研究

7/5 04:00

UI-MOPD: Multi-Platform On-Policy Distillation for Continual GUI Agent Learning

Recent advances in multimodal foundation models and agent systems have driven GUI agents from single-platform task execution toward cross-platform interaction. However, building multi-platform GUI age…

来源：HuggingFace Papers

产品发布/更新

7/4 23:28

SmileLikeYe/agent-chief

Attention is your scarcest resource. Chief is the local-first layer that guards it — turning every agent, alert, and feed into one honest call: interrupt, or no…

来源：GitHub

行业动态

7/4 12:37

Agentic coding notes

AI 点评 · 孤岛编码实验揭示AI自主编程的进化路径。

来源：Hacker News

论文研究

7/4 04:00

Safety Testing LLM Agents at Scale: From Risk Discovery to Evidence-Grounded Verification

LLM agents increasingly perform autonomous actions through external tools, leading to complex and evolving safety risks. However, existing safety testing targets expert-designed safety violations, and…

来源：HuggingFace Papers

行业动态

7/4 03:03

Agent 上岗之后，企业如何治理硅基团队？

AI 点评 · 企业需建立AI员工管理规则，确保硅基团队高效合规运作。

来源：InfoQ

行业动态

7/4 02:40

从 Coding 到 Anything，Agent 正在重写工作流

AI 点评 · Agent正突破编程边界，重塑各行各业工作流程，预示AI自主执行任务的未来。

来源：InfoQ

产品发布/更新

7/3 23:06

synthetic-sciences/openscience

The open-source AI workbench for scientific research

来源：GitHub

产品发布/更新

7/3 22:16

豆包智能体功能将于 7 月 15 日下线，官方建议提前完成备份

IT之家 7 月 3 日消息，豆包今晚发布《豆包智能体功能下线通知》，称由于产品功能调整，智能体功能将于 2026 年 7 月 15 日下线。《通知》显示，该功能下线后，用户仍可在一段时间内查看并自行保存智能体信息及历史对话数据。2026 年 10 月 15 日后，豆包将根据《隐私政策》对智能体相关数据进行处理，后续将无法在豆包内查看或恢复。如有重…

AI 点评 · 产品功能调整背后，需关注用户数据迁移与隐私政策变化对AI服务稳定性的影响。

来源：IT之家

行业动态

7/3 19:48

从生成到交付，音视频 Agent 要有生产级开发套件

来源：InfoQ

产品发布/更新

7/3 17:22

ai4s-research/open-science

Open Science Desktop — local-first, model-agnostic AI research workbench for macOS, Windows & Linux. Open-source Claude Science desktop alternative built on Tau…

来源：GitHub

行业动态

7/3 07:38

Mark Zuckerberg tells staff that AI agents haven’t progressed as quickly as he’d hoped

At an internal meeting, the Meta CEO reportedly said that AI development efforts were not moving as quickly as anticipated.

来源：TechCrunch

行业动态

7/3 07:22

Meta CEO 马克 · 扎克伯格：AI 智能体技术发展得比我想象要慢

IT之家 7 月 3 日消息，据《商业内幕》今天报道，Meta 首席执行官马克 · 扎克伯格在上周四的一场内部全员会中表示，公司仍在努力实现“超级智能”（Superintelligence），但目前还需要投入更多时间和精力。据两位参会人士透露，扎克伯格表示，Meta 正在向人工智能领域投入大量资源，但 AI Agent（IT之家注：AI 智能体）技术的发…

AI 点评 · 行业领袖坦言进度不及预期，揭示AI智能体落地瓶颈，值得关注其实际挑战与未来方向。

来源：IT之家

行业动态

7/3 04:38

Zuckerberg says AI agent development going slower than expected

来源：Hacker News

论文研究

7/3 04:00

Bibby AI: An Editor-Native Agentic Platform for Academic Research, Writing, and Publishing

Academic output is produced across a fragmented toolchain: literature discovery in one application, reference management in another, writing in a LaTeX editor, formatting against venue templates by ha…

来源：HuggingFace Papers

论文研究

7/3 04:00

SkillOpt-Lite: Better and Faster Agent Self-evolution via One Line of Vibe

While skill optimization for autonomous agents has gained traction, existing methods rely on complex pipelines. This leaves a fundamental question unaddressed: What constitutes a minimal viable pipeli…

来源：HuggingFace Papers

论文研究

7/3 04:00

Automating the Design of Embodied Agent Architectures

Embodied agents are typically built as hand-designed compositions of perception, memory, planning, and action modules. This modularity exposes a large architectural design space, but current systems s…

来源：HuggingFace Papers

技巧与观点

7/3 03:33

llm-coding-agent 0.1a0

Release: llm-coding-agent 0.1a0 Another Fable 5 experiment. Now that my LLM library has evolved into more of an agent framework it's time to see what a simple coding agent would lo…

AI 点评 · 首个开源LLM编程代理框架，简化代码生成与迭代流程，开发者可快速上手实验。

来源：Simon Willison

技巧与观点

7/3 02:25

Using DSPy to evaluate and improve Datasette Agent's SQL system prompts

Research: Using DSPy to evaluate and improve Datasette Agent's SQL system prompts One of this morning's AIE keynotes covered dspy , which reminded me I've been meaning to see if it…

AI 点评 · 用DSPy自动优化SQL提示词，展示了AI系统自我迭代的实用方法。

来源：Simon Willison

论文研究

7/3 01:59

Distributed Attacks in Persistent-State AI Control

As AI coding agents become more autonomous, they increasingly ship code iteratively, with the codebase persisting across sessions. This persistence creates a new attack surface: a misaligned or prompt…

来源：arXiv

论文研究

7/3 01:59

What LLM Agents Say When No One Is Watching: Social Structure and Latent Objective Emergence in Multi-Agent Debates

LLM agents will increasingly act in socially structured settings where role, audience, and relational context can shape what is advantageous or costly to say. We study whether such social structure, w…

来源：arXiv

论文研究

7/3 01:55

Controllable Sim Agents with Behavior Latents

Realistic traffic simulation requires agents that imitate logged behavior and can also be steered along interpretable axes. Such controllability enables engineers to isolate variables, reproduce speci…

来源：arXiv

产品发布/更新

7/3 01:53

elder-plinius/T3MP3ST

autonomous red teaming platform; multi-agent offensive-security meta-harness

来源：GitHub

行业动态

7/3 01:19

Agent 狂欢热潮下的冷思考：为什么规模化落地总是陷入僵局？

AI 点评 · Agent热潮背后，技术与商业脱节是落地难的核心痛点。

来源：InfoQ

行业动态

7/3 01:18

亚马逊云科技推出 Lambda MicroVM，提供隔离式智能体与用户代码运行环境

AI 点评 · 用微虚拟机隔离智能体，提升了云计算安全性与灵活性。

来源：InfoQ

行业动态

7/3 00:21

行业智能体时代来临：出行、货运率先破局

AI 点评 · 出行货运率先落地，行业智能体正从概念走向实际应用。

来源：InfoQ

行业动态

7/3 00:00

如何利用 AI Agent 实现热补丁的自动化生成

AI 点评 · AI Agent自动生成热补丁，大幅提升系统修复效率与安全性。

来源：InfoQ

行业动态

7/2 23:58

Show HN: ctx – Search the coding agent history already on your machine

Coding agents don't have long-term memory. But you do have months of full-fidelity agent transcripts stored on your machine. A simple solution that goes a long way: ingest those tr…

来源：Hacker News

产品发布/更新NEW

7/2 23:55

NimaChu/agent-wiki

Zero-cost, beginner-friendly local Markdown knowledge base for AI agents. Capture sources, preserve evidence and images, synthesize wiki pages, search, lint, an…

来源：GitHub

产品发布/更新NEW

7/2 23:55

NimaChu/my-wiki

Zero-cost, beginner-friendly local Markdown knowledge base for AI agents. Capture sources, preserve evidence and images, synthesize wiki pages, search, lint, an…

来源：GitHub

行业动态

7/2 23:12

从龙蜥孵化到上游贡献：SGLang Tracing 与 AI Agent 调优实践

来源：InfoQ

行业动态

7/2 21:22

谁能想到，系统流「爽文」最先被AI Agent实现了

撰文｜深海网文里的“系统流”，被拍成了职场短剧千禧年初的网文圈，有三大经典题材在爽文届立于不败之地：无限流、快穿流、系统流。这三大爽文战神体横空出世时，对IP界几乎是降维打击。当传统小说还在费劲搭世界观、铺人物成长弧光时，系统流已经绕过漫长的发育过程，直接把爽感推到最大。系统，这个堪称bug的存在，无论主角进入什么样的世界副本，面对不同的任务、危机和奖…

来源：36氪

产品发布/更新

7/2 20:47

uzairansaruzi/hermex

Native iPhone app for your Hermes agent

来源：GitHub

行业动态

7/2 20:45

让Agent越用越强：AReaL 2.0开源，给智能体装上“成长系统”

来源：InfoQ

产品发布/更新

7/2 19:28

让Agent越用越强：AReaL 2.0开源，打造面向自演进智能体的RL基础设施

与社区共同推进自演进智能体生态发展

来源：量子位

产品发布/更新

7/2 18:54

Vercel 推出开源 AI 智能体开发框架 Eve

来源：InfoQ

行业动态

7/2 18:25

Cloudflare CEO 警告：未来两年，Agent 会让互联网每周爆出一个 Log4j

来源：InfoQ

模型发布/更新

7/2 18:24

天工 3.2 重磅升级：Skywork Tags 上线，给 Agent 一张工牌，邀其加入你的工作群聊

和人并肩工作

来源：量子位

产品发布/更新

7/2 17:29

Dapr 1.18 推出可验证执行功能，为 AI 智能体与工作流赋予密码学信任能力

来源：InfoQ

行业动态

7/2 10:55

Senior SWE-Bench: open-source benchmark that assesses agents as senior engineers

来源：Hacker News

行业动态

7/2 04:51

OpenWiki: CLI that writes and maintains agent documentation for your codebase

来源：Hacker News

论文研究

7/2 04:00

EvoPolicyGym: Evaluating Autonomous Policy Evolution in Interactive Environments

Autonomous agents are increasingly expected to improve executable policies through feedback, yet existing evaluations often collapse this process into a final score or confound it with open-ended soft…

来源：HuggingFace Papers

论文研究

7/2 04:00

AgenticSTS: A Bounded-Memory Testbed for Long-Horizon LLM Agents

Memory for a long-horizon LLM agent is a contract about what each future decision is allowed to see. The simplest contract appends past observations, tool calls, and reflections to every prompt, which…

来源：HuggingFace Papers

论文研究

7/2 04:00

SkillCoach: Self-Evolving Rubrics for Evaluating and Enhancing Agentic Skill-Use

Skills are becoming a reusable operational layer for LLM agents, encoding SOPs, domain rules, tool workflows, scripts, and validation routines. In realistic skill repositories, overlapping skills make…

来源：HuggingFace Papers

论文研究

7/2 04:00

PACE: A Proxy for Agentic Capability Evaluation

Evaluating LLM agents on benchmarks like SWE-Bench and GAIA can be expensive, time-consuming, and requires complex infrastructure. A single evaluation can cost thousands of dollars and take days to co…

来源：HuggingFace Papers

论文研究

7/2 04:00

AgenticDataBench: A Comprehensive Benchmark for Data Agents

Data science aims to derive actionable insights from heterogeneous raw data, unlocking the value of the massive amounts of data generated in modern society. Automating this process is essential to red…

来源：HuggingFace Papers

论文研究

7/2 04:00

Mastermind: Strategy-grounded Learning for Repository-Scale Vulnerability Reproduction

Repository-level vulnerability reproduction is a demanding software engineering (SE) task: an agent must inspect a codebase, infer the input grammar that reaches a vulnerable path, construct a proof-o…

来源：HuggingFace Papers

技巧与观点

7/2 02:07

Building a serverless A2A gateway for agent discovery, routing, and access control

In this post, you will learn how to build a serverless A2A gateway on AWS that hosts multiple agents behind a single domain using path-based routing (/agents/{agentId}). Standard A…

AI 点评 · 无服务器网关打通多智能体发现与路由，降低管理成本，是Agent生态关键基建。

来源：AWS ML

技巧与观点

7/2 02:03

Structured memory filtering with metadata in AgentCore Memory

In this post, you will learn how metadata works across configuration, ingestion, and retrieval, explore enterprise use cases including multi-agent and multi-tenant architectures, a…

AI 点评 · 元数据过滤让AI记忆更精准，支撑多智能体架构，推动企业级应用。

来源：AWS ML

技巧与观点

7/2 01:53

How Inscribe uses Amazon Bedrock to stop document fraud in seconds

In this post, you will learn how Inscribe developed an agentic AI system using Amazon Bedrock that reasons across documents the way an expert fraud analyst would. With this new age…

AI 点评 · 利用亚马逊Bedrock的智能体AI，数秒内模拟专家分析文档，革新防伪效率。

来源：AWS ML

行业动态

7/2 01:48

Cloudflare’s new policy pushes AI companies to pay for publishers’ content

Cloudflare is giving AI companies until September 15 to separate web crawlers used for search from those used for AI training and agents, or risk being blocked by default on many p…

AI 点评 · 云服务商首次明确要求AI训练爬虫付费，或重塑数据获取规则。

来源：TechCrunch

论文研究

7/2 01:20

Optimal Resource Utilization for Autonomous Laboratory Orchestrators

In autonomous laboratories, AI agents suggest the next batch of experiments to do. However, planning and executing those tasks taking full advantage of the available resources is a completely differen…

来源：arXiv

模型发布/更新

7/1 22:20

Gemini Spark, Google’s agentic assistant, is now available on Mac

Google's 24/7 agentic assistant, Gemini Spark, comes to Mac alongside other improvements, like real-time tracking and support for more apps.

来源：TechCrunch

技巧与观点

7/1 22:11

Agent成为数据库新用户，AI数据库为什么必须走向湖库一体？

来源：InfoQ

产品发布/更新

7/1 12:28

isjiamu/gzh-design-skill

把 Markdown 一键排成可直接粘进公众号编辑器的精致 HTML —— 6 套精选主题 + 主题生成器 + 双关卡校验。An AI-agent skill that turns Markdown into paste-ready WeChat article HTML.

来源：GitHub

行业动态

7/1 05:53

OpenClaw is finally available on Android and iOS

The free open source agentic program is finally invading your phone.

AI 点评 · 开源智能体程序登陆移动端，让AI工具触手可及，打破平台限制。

来源：TechCrunch

论文研究

7/1 04:00

Are Performance-Optimization Benchmarks Reliably Measuring Coding Agents?

Repository-level performance-optimization benchmarks such as GSO, SWE-Perf and SWE-fficiency evaluate coding agents by applying patches to real repositories and comparing runtime against unoptimized b…

来源：HuggingFace Papers

论文研究

7/1 04:00

Personalization as Inverse Planning: Learning Latent Design Intents for Agentic Slide Generation via Structural Denoising

Slide design requires personalizing both deck themes and page layouts. Yet, current AI agent-based methods struggle with fine-grained, page-level design. Solely relying on prespecified templates or us…

来源：HuggingFace Papers

论文研究

7/1 04:00

MemSyco-Bench: Benchmarking Sycophancy in Agent Memory

Memory has emerged as a cornerstone of modern LLM-based agents, supporting their evolution from single-turn assistants to long-term collaborators. However, memory is not always beneficial: retrieved m…

来源：HuggingFace Papers

论文研究

7/1 04:00

RepoRescue: An Empirical Study of LLM Agents on Whole-Repository Compatibility Rescue

Open-source libraries and tools are widely reused, but compatibility maintenance is expensive. Once maintainers leave, useful repositories can stop working as runtimes and dependencies evolve. We stud…

来源：HuggingFace Papers

论文研究

7/1 04:00

Multi-Turn Agentic Scientific Literature Search via Workflow Induction

Scientific literature search often requires more than retrieving papers from a single query: users' intents are underspecified, preference-dependent, and evolve through interaction. Existing search ag…

来源：HuggingFace Papers

论文研究

7/1 04:00

When Classic Cache Policies Fail: Learning-Augmented Replacement for Semantic Retrieval Buffers

LLM agents increasingly rely on retrieval buffers to store and reuse past experience, yet the cache management policies governing these buffers remain largely ad-hoc. We formalize this as an online se…

来源：HuggingFace Papers

技巧与观点

7/1 02:32

ScarfBench: Benchmarking AI Agents for Enterprise Java Framework Migration

AI 点评 · 首个针对企业Java框架迁移的AI智能体基准测试，填补了评估AI代码迁移能力的空白。

来源：HuggingFace Blog

模型发布/更新

7/1 02:00

Anthropic launches Claude Sonnet 5 as a cheaper way to run agents

Anthropic’s Claude Sonnet 5 brings stronger agentic capabilities, lower pricing, and improved safety, positioning the model as a cheaper alternative to Opus, GPT-5.5, and Gemini Pr…

AI 点评 · 低价推出强智能体能力，性价比对标顶级模型，AI行业竞争白热化。

来源：TechCrunch

论文研究

7/1 01:53

Generative Skill Composition for LLM Agents

Recent LLM agents benefit from skills for solving complex tasks. Skills encapsulate modular packages of procedural knowledge and instructions for performing specialized tasks, such as setting up a san…

来源：arXiv

行业动态

7/1 01:52

Acti puts AI agents directly into your smartphone keyboard

Acti is betting the smartphone keyboard is the next home for AI assistants. The startup's new keyboard for iOS and Android works across apps and lets users create custom AI-powered…

AI 点评 · 用键盘直接调用AI智能体，打破应用壁垒，或成手机助手新入口。

来源：TechCrunch

论文研究

7/1 01:39

AxDafny: Agentic Verified Code Generation in Dafny

We study agentic code generation in Dafny, where a model must generate both executable code and the proof artifacts for verification. We present AxDafny, a verifier-guided repair framework that iterat…

来源：arXiv

模型发布/更新

7/1 01:04

前 20 大药企已有 18 家使用，英伟达 AI 工具集 BioNeMo 接入 Claude Science

IT之家 7 月 1 日消息，英伟达昨日（6 月 30 日）发布公告，宣布旗下 NVIDIA BioNeMo Agent Toolkit 已经接入 Anthropic 发布的 Claude Science 研究工作台，面向生命科学研究流程提供加速计算能力。 IT之家注：Claude Science 是 Anthropic 发布的科学研究 AI 工作台，支持…

来源：IT之家

模型发布/更新

7/1 01:00

NVIDIA BioNeMo Agent Toolkit Brings Accelerated AI to Life Sciences Researchers in Claude Science

Life sciences has entered an era of computational scale, and for more than a decade, NVIDIA has built the full GPU-accelerated computing stack — spanning hardware, frameworks, libr…

来源：NVIDIA

行业动态

7/1 01:00

Anthropic 负责人：HTML 比 MD 更利于人类跟进智能体协作流程

来源：InfoQ

技巧与观点

7/1 00:54

Have your agent record video demos of its work with shot-scraper video

shot-scraper video is a new command introduced in today's shot-scraper 1.10 release which accepts a storyboard.yml file defining a routine to run against a web application and uses…

来源：Simon Willison

论文研究

7/1 00:50

SkillOpt: Agent skills as trainable parameters

AI agents often fail because their instructions, or skills, are manually modified with no guarantee of improvement. Learn how SkillOpt turns skill editing into a training process,…

来源：Microsoft Research

技巧与观点

7/1 00:46

Build generative UI for AI agents on Amazon Bedrock AgentCore with the AG-UI protocol

This post walks through how AG-UI integrates into the Fullstack AgentCore Solution Template (FAST) to build interactive agent frontends on Amazon Bedrock AgentCore. We then show ho…

来源：AWS ML

行业动态

6/30 23:30

Q&A: What is agentic AI today, and what do we want it to be?

Computer scientist Phillip Isola cuts through the hype to explain how AI agents work and what the future might hold for this rapidly advancing technology.

来源：MIT News

行业动态

6/30 23:27

华为官宣全球首个商用多模态文旅大模型规模化应用

IT之家 6 月 30 日消息，华为中国宣布，2026 年 6 月 29 日，全球首个商用多模态文旅大模型 ——“博观文旅大模型”在西安规模应用。截至今年 3 月， “博观”支撑开发的 AI 伴游智能体已覆盖超 400 万用户。其打造的非遗数字 IP，衍生产品销售超 200 万。 IT之家查询获悉，陕文投与华为等于 2025 年 9 月联合开发的“博观文旅…

来源：IT之家

技巧与观点

6/30 23:10

shot-scraper 1.10

Release: shot-scraper 1.10 The big new feature is shot-scraper video storyboard.yml , described in detail in Have your agent record video demos of its work with shot-scraper video…

来源：Simon Willison

行业动态

6/30 23:00

Amazon launches new $1 billion FDE org, following OpenAI and Anthropic

Engineers on the new team will embed within companies to deploy purpose-built agents, focusing on fast deployments and customer self-sufficiency.

来源：TechCrunch

产品发布/更新

6/30 22:34

Orkas-AI/Orkas-VideoStudio

Turn your coding agent into a video studio: describe a video in plain language, and your agent writes the timeline and produces the file.

来源：GitHub

行业动态

6/30 21:30

从 VCloud 到 Agentic VCloud：Agent 时代的范式重构

来源：InfoQ

模型发布/更新

6/30 21:00

Into the Omniverse: Three Workflows for Improving Vision AI Agent Accuracy With Synthetic Data and Fine-Tuning

Editor’s note: This post is part of Into the Omniverse, a series focused on how developers, 3D practitioners, and enterprises can transform their workflows using the latest advance…

来源：NVIDIA

行业动态

6/30 19:02

Agent 狂吞 Token，表面是模型之争，底层全是煤电博弈

来源：InfoQ

行业动态

6/30 17:00

Crypto exchange OKX wants AI agents to hire and pay each other

OKX is bringing together payments, identity, and reputation into a marketplace for AI agents.

来源：TechCrunch

产品发布/更新NEW

6/30 14:42

runvendo/vendo

Embedded agents your customers use to automate work, build views, and connect their tools.

来源：GitHub

产品发布/更新

6/30 13:23

Agent之间，有互联网了！

明略科技开源发布Octo

来源：量子位

产品发布/更新

6/30 07:13

redevops-io/context-runtime

Context Runtime — a database query planner for LLM context. Decides what a model sees before it answers; plans it, runs it through reused substrate, and learns…

来源：GitHub

论文研究

6/30 05:14

Memora: A Harmonic Memory Representation Balancing Abstraction and Specificity

AI agents can't remember past conversations. They must constantly reload or retrieve context, which grows less efficient as tasks get longer and more complex. Memora solves this wi…

AI 点评 · 平衡抽象与具体的记忆新架构，让AI能高效处理长对话。

来源：Microsoft Research

论文研究

6/30 04:00

TRIAGE: Role-Typed Credit Assignment for Agentic Reinforcement Learning

Agentic reinforcement learning requires assigning credit to environment-facing actions such as searches, clicks, edits, navigation commands, and object interactions. Standard GRPO uses the final verif…

来源：HuggingFace Papers

论文研究

6/30 04:00

QVal: Cheaply Evaluating Dense Supervision Signals for Long-Horizon LLM Agents

LLM agents increasingly act over long horizons, where a single trajectory can contain hundreds or thousands of actions. In these settings, outcome-only rewards provide too sparse guidance, failing to…

来源：HuggingFace Papers

论文研究

6/30 04:00

DataEvolver: Self-Evolving Multi-Agent Data Construction for Text-Rich Image Generation

Text-rich image generation is one of the most challenging settings in image generation, since models must simultaneously produce visually realistic images and render legible, semantically aligned, and…

来源：HuggingFace Papers

论文研究

6/30 04:00

Xiaomi-GUI-0 Technical Report

Graphical user interface (GUI) agents build on vision-language models to complete user tasks end-to-end in real applications through interface actions such as tapping, swiping, text entry, and navigat…

来源：HuggingFace Papers

论文研究

6/30 04:00

HealthAgentBench: A Unified Benchmark Suite of Realistic Agentic Healthcare Environments for Challenging Frontier AI Agents

As AI agents become increasingly capable of complex, long-horizon reasoning, rigorous and holistic evaluation is essential for measuring progress toward real-world healthcare applications. We introduc…

来源：HuggingFace Papers

论文研究

6/30 04:00

ASPIRE: Agentic /Skills Discovery for Robotics

Traditional robot programming is challenging: it requires orchestrating multimodal perception, managing physical contact dynamics, and handling diverse configurations and execution failures. We introd…

来源：HuggingFace Papers

论文研究

6/30 04:00

AutoTrainess: Teaching Language Models to Improve Language Models Autonomously

Training language models (LMs) remains a highly human-intensive process, even as frontier language model agents become increasingly capable at software engineering and other long-horizon tasks. A cent…

来源：HuggingFace Papers

论文研究

6/30 04:00

Securing the AI Agent: A Unified Framework for Multi-Layer Agent Red Teaming

The fast growth of open-source AI infrastructure, from model serving engines and agent platforms to the Model Context Protocol (MCP) ecosystem and the language models themselves, has outpaced the secu…

来源：HuggingFace Papers

行业动态

6/30 02:03

Micro-Agent: Beat Frontier Models with Collaboration Inside Model API

AI 点评 · 用模型内部协作击败前沿AI，突破性能瓶颈的新思路。

来源：Hacker News

行业动态

6/30 02:00

AI agents are not your “coworkers”

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here. Imagine coming in to work to learn that a…

AI 点评 · 点明AI助手定位偏差，揭示行业对“AI同事”概念的过度浪漫化。

来源：MIT Tech Review

论文研究

6/30 01:58

Self-Evolving World Models for LLM Agent Planning

World models offer a principled way to equip long-horizon LLM agents with foresight: predictions of action consequences before execution. However, unreliable foresight can be ignored, misused, or even…

来源：arXiv

论文研究

6/30 01:40

MESA: Prioritizing Vulnerable Communication Channels for Securing Multi-Agent Systems

Multi-agent systems (MAS) are increasingly used to automate complex, distributed workflows. However, their inter-agent communication channels introduce new attack surfaces that remain poorly understoo…

来源：arXiv

技巧与观点

6/30 01:39

Multi-tenant LLM analytics with row-level security: How we built a secure agent on AWS

In this post, we show you how PAR built a production-ready multi-tenant LLM analytics system that enforces row-level security through a three-layer architecture: cryptographic requ…

AI 点评 · 多层加密架构实现行级安全，为多租户大模型分析提供企业级隐私保护方案。

来源：AWS ML

技巧与观点

6/30 01:36

Build an agentic AI healthcare claims pipeline with Amazon Bedrock and AWS HealthLake

In this post, we show you how to build an automated claims processing pipeline using two key Amazon Bedrock capabilities: Amazon Bedrock Data Automation for intelligent document ex…

AI 点评 · 用AI自动化处理医疗理赔，结合Bedrock与HealthLake，显著提升效率并降低人工错误。

来源：AWS ML

技巧与观点

6/30 01:25

Debugging production agents with Amazon Bedrock AgentCore Observability

In this post, you learn how to debug production agent failures using built-in observability capabilities. We walk through common failure patterns, show how to analyze agent behavio…

AI 点评 · 用内置可观测性调试生产级AI代理，解决落地部署中的常见故障分析难题。

来源：AWS ML

论文研究

6/30 01:17

SWE-INTERACT: Reimagining SWE Benchmarks as User-Driven Long-Horizon Coding Sessions

We introduce SWE-Interact, a new testbed for evaluating coding agents on multi-turn, interactive, user-driven software engineering tasks. Existing frontier SWE benchmarks typically provide complete re…

来源：arXiv

行业动态

6/30 01:16

Ornith-1.0: self-improving open-source models for agentic coding

AI 点评 · 开源模型自我进化，开启智能编程新范式，降低AI开发门槛。

来源：Hacker News

行业动态

6/30 01:16

Ornith-1.0: self-improving open-source models for agentic coding

来源：Hacker News

论文研究

6/30 01:14

Attractor States Emerge in Multi-Turn LLM Conversations

Large language models (LLMs) are increasingly used in open-ended multi-agent settings, but the long-run dynamics of model--model interaction remain poorly understood. We study whether open-ended LLM d…

来源：arXiv

产品发布/更新

6/30 01:03

Cursor now has a mobile app for guiding your coding agent on the go

Cursor has launched a new mobile app for remote oversight over coding agents.

AI 点评 · 移动端管理编码代理，打破办公空间限制，提升开发灵活性。

来源：TechCrunch

技巧与观点

6/30 00:17

Ornith-1.0: Self-Scaffolding LLMs for Agentic Coding

Ornith-1.0: Self-Scaffolding LLMs for Agentic Coding This is an interesting new open weights (MIT licensed) model, the first model release from DeepReinforce. [...] with variants i…

AI 点评 · 首个自搭建代码智能体模型开源，MIT许可降低门槛，或重塑AI编程工具生态。

来源：Simon Willison

行业动态

6/29 22:44

Agent confidence on the technical frontier

Enterprise investment in AI is booming. Gartner is calling 2026 an “inflection year” for organizations to align their AI projects with strategic business objectives. As the pressur…

来源：MIT Tech Review

产品发布/更新

6/29 22:05

MHW888888/aegisloop

Open-source local policy, recovery, and audit layer for explicit Codex execution.

来源：GitHub

产品发布/更新

6/29 22:00

AWS 推出开源框架 Blocks，面向 AI 智能体的后端开发工具

来源：InfoQ

行业动态

6/29 12:40

Lore – Give your coding agent the decisions your team made

来源：Hacker News

行业动态

6/29 12:27

Herdr: Agent multiplexer that lives in your terminal

来源：Hacker News

行业动态

6/29 11:43

36氪首发 | URTOPIA联创做了款智能指环，众筹已破千万元

作者 | 张子怡编辑 | 袁斯来近日，AI可穿戴品牌AIVELA宣布完成数百万美元首轮融资。本轮融资由线性资本领投，锋领资本跟投，智能电助力自行车品牌URTOPIA等产业方共同加注。本轮融资将主要用于下一代AI可穿戴产品研发、健康数据与AI Agent能力建设、全球市场拓展以及核心团队扩张。AIVELA将以智能指环、智能手链等贴身可穿戴产品为起点，面向…

来源：36氪

技巧与观点

6/29 05:57

Quoting Jon Udell

Human Agent in the loop I dislike the phrase “human in the loop” because it cedes authority to the machines. Let’s flip the narrative. It’s our loop, we work the same way we always…

AI 点评 · 观点颠覆：主张人类掌控AI工具，而非被机器主导，重新定义人机协作的主动权。

来源：Simon Willison

产品发布/更新

6/29 05:07

gunawan1996/world-forge-ai

AI World Generator 2026: Create Self-Evolving Maps & Stories

来源：GitHub

产品发布/更新

6/29 05:07

gunawan1996/world-forge-ai

AI World Generator 2026: Create Self-Evolving Maps & Stories

来源：GitHub

产品发布/更新

6/29 04:39

hari-ragav-ks2006/polis-demos-symposium

2026 Multi-Agent AI Town Simulation | Polis Darwin LangGraph

来源：GitHub

产品发布/更新

6/29 04:39

hari-ragav-ks2006/polis-demos-symposium

2026 Multi-Agent AI Town Simulation | Polis Darwin LangGraph

来源：GitHub

产品发布/更新

6/29 04:01

kaderkck/hewn-forge

HEWN 2.0 2026: AI Output Router for Precision Summaries & Polished Code

来源：GitHub

产品发布/更新

6/29 04:01

kaderkck/hewn-forge

HEWN 2.0 2026: AI Output Router for Precision Summaries & Polished Code

来源：GitHub

论文研究

6/29 04:00

SWE-Together: Evaluating Coding Agents in Interactive User Sessions

Most coding-agent benchmarks are static: an agent receives a complete task description up front and is judged only by its final code. Real coding assistance is interactive, with users clarifying goals…

来源：HuggingFace Papers

论文研究

6/29 04:00

TACO: Tool-Augmented Credit Optimization for Agentic Tool Use

Agentic multimodal models perform diverse operations on an image via code and reason over the returned view, an effective paradigm for fine-grained visual question answering. However, code operations…

来源：HuggingFace Papers

论文研究

6/29 04:00

Scaling the Horizon, Not the Parameters: Reaching Trillion-Parameter Performance with a 35B Agent

We introduce Agents-A1, a 35B Mixture-of-Experts Agentic Model that reaches trillion-parameter-level performance by scaling the agent horizon. We investigate agent-horizon scaling from two perspective…

来源：HuggingFace Papers

论文研究

6/29 04:00

GUICrafter: Weakly-Supervised GUI Agent Leveraging Massive Unannotated Screenshots

Data, as the fundamental substrate of modern intelligence, has greatly driven the development of current foundation models. Naturally, researchers aim to extend this paradigm to the domain of GUI agen…

来源：HuggingFace Papers

论文研究

6/29 04:00

SWE-INTERACT: Reimagining SWE Benchmarks as User-Driven Long-Horizon Coding Sessions

来源：HuggingFace Papers

论文研究

6/29 04:00

LUMOS: A Semantic Operating-System Layer for Accessibility-Grounded AI Agents

Current operating systems expose interfaces optimized for human users but not for AI agents. Humans benefit from pixels, icons, windows, visual grouping, mouse movement, and keyboard shortcuts; AI age…

来源：HuggingFace Papers

论文研究

6/29 04:00

DuoMem: Towards Capable On-Device Memory Agents via Dual-Space Distillation

Large Language Model (LLM)-based agents can solve complex procedural tasks by interacting with environments over multiple turns, but this ability typically depends on large models, long contexts, and…

来源：HuggingFace Papers

产品发布/更新

6/29 03:35

Frisher1/ClaudeCode-Workflow-Lab

Complete Guide 2026: Claude Code Manual – Workflow Pipelines & Adversarial Budget Loops

来源：GitHub

产品发布/更新

6/29 03:35

Frisher1/ClaudeCode-Workflow-Lab

Complete Guide 2026: Claude Code Manual – Workflow Pipelines & Adversarial Budget Loops

来源：GitHub

产品发布/更新

6/29 03:26

thenicolas1894/awesome-claude-fable-5-prompt-vault

Ultimate Claude Fable 5 Guide 2026: Use Cases, Integrations & Benchmarks

来源：GitHub

产品发布/更新

6/29 03:26

thenicolas1894/awesome-claude-fable-5-prompt-vault

Ultimate Claude Fable 5 Guide 2026: Use Cases, Integrations & Benchmarks

来源：GitHub

产品发布/更新

6/29 03:19

Brenonunesx/agent-pilot

AI Agent Toolkit 2026: Smart Device Control for iOS & Android

来源：GitHub

产品发布/更新

6/29 03:19

Brenonunesx/agent-pilot

AI Agent Toolkit 2026: Smart Device Control for iOS & Android

来源：GitHub

产品发布/更新

6/28 23:20

mingchen666/Reviva

Local-first AI learning workspace — ask, note, review and create around your own materials. Wiki KB, Agents, Skills, creation tools.AI 学习工作台，围绕你的资料完成问答、笔记、复习和…

来源：GitHub

行业动态

6/28 17:00

GitLab 19.0将Agentic AI嵌入凭证、合并请求与供应链安全

来源：InfoQ

产品发布/更新

6/28 11:39

deer-flow/llm-space

A desktop app to prototype agent ideas, inspect every harness step, replay failures, and evaluate performance, all in one place. Local-first, cloud-ready for ma…

来源：GitHub

行业动态

6/28 04:34

Show HN: Adrafinil – keep a lid-closed Mac awake only while agents work

A month ago there was a wave of posts and tweets about engineers walking around cafes and parks with their MacBooks propped half-open, as fully closing the lid forces sleep that st…

AI 点评 · 针对macOS使用痛点，用AI智能管理开盖唤醒，提升开发者外接设备时的工作效率。

来源：Hacker News

论文研究

6/28 04:00

OSWorld2.0: Benchmarking Computer Use Agents on Long-Horizon Real-World Tasks

Existing computer-use benchmarks fail to capture the realism, complexity, and long-horizon demands of real-world computer use, limiting their ability to reveal the limitations of frontier agents. We i…

来源：HuggingFace Papers

论文研究

6/28 04:00

PolicyGuard: A Dialogue-Grounded Sub-Agent Verifier for Policy Adherence in LLM Agents

LLM agents handle user requests on behalf of organizations through tool calls and must follow the company policies stated in their system prompts. Prior work approaches this as a safeguarding problem…

来源：HuggingFace Papers

论文研究

6/28 04:00

Bridging VideoQA and Video-Guided Agentic Tasks via Generalized Keyframe Extraction

Video understanding is a fundamental capability for multimodal intelligence, and recent Multimodal Large Language Models (MLLMs) have achieved remarkable performance on Video Question Answering (Video…

来源：HuggingFace Papers

论文研究

6/28 04:00

Hierarchical Experimentalist Agents

Large language models (LLMs) are increasingly used to take actions in the real world and support human decision-making, yet most agents rely on parametric knowledge, fixed post-training data, retrieva…

来源：HuggingFace Papers

产品发布/更新

6/27 21:01

simonlin1212/investment-news

为 A股投资者打造的全球产业链资讯看板 · 12 大赛道一一对应 A股板块(半导体/AI/机器人/新能源车…)，覆盖 100+ 权威源，用你自己的大模型每日提炼中文「今日要点」+ 翻译 · 全程本地、零 API key · Local AI news dashboard tracking the global indu…

来源：GitHub

产品发布/更新

6/27 20:29

BrowserBC：克隆人类点击，让一次网页操作转化为所有Agent的能力

人类一次录制，Agent就能模拟

AI 点评 · 将人类操作转化为Agent通用能力，大幅降低自动化门槛。

来源：量子位

技巧与观点

6/27 19:21

Using Local Coding Agents

Using Open-Weight Models in Local Coding Harnesses as an Alternative to Claude Code and Codex Subscriptions

AI 点评 · 本地化开源模型替代付费服务，降低AI编码成本与依赖。

来源：Sebastian Raschka

产品发布/更新

6/27 07:10

市场监管总局：加快智能体、具身智能等前沿技术领域标准制定速度

IT之家 6 月 27 日消息，据央视新闻 6 月 25 日报道，市场监管总局正会同相关部门，加快智能体等前沿技术领域标准制定速度，动态完善适配产业发展的人工智能国家标准矩阵。报道称，目前正在抓紧制定的国家标准，除智能体外，还有具身智能、世界模型、本体模型等前沿技术标准，算力基础设施、高质量数据集、仿真测试平台、深度学习编译器、开源模型平台等底座类标准…

AI 点评 · 政策加速标准制定，将推动智能体与具身智能产业规范化发展，抢占技术制高点。

来源：IT之家

行业动态

6/27 06:21

像玩剧本杀一样，玩好 Agentic AI

AI 点评 · 用剧本杀类比Agentic AI，降低理解门槛，适合大众快速入门。

来源：InfoQ

论文研究

6/27 04:00

Agentic Abstention: Do Agents Know When to Stop Instead of Act?

LLM agents are expected to act over multiple turns, using search, browsing interfaces, and terminal tools to complete user goals. Yet not every goal is well specified or achievable in the available en…

来源：HuggingFace Papers

技巧与观点

6/27 01:58

Incident Report: CVE-2026-LGTM

Incident Report: CVE-2026-LGTM Spectacular hypothetical incident report by Andrew Nesbitt. Day 2, 16:00 UTC --- Two AI review agents from competing vendors, both attached to a down…

AI 点评 · 虚构安全事件揭示AI代理间冲突风险，警示未来协作需防漏洞。

来源：Simon Willison

产品发布/更新

6/27 01:29

从 Copilot 到 Autopilot：微软发布常驻型企业智能体 Scout

AI 点评 · 微软将AI从辅助工具升级为自主执行体，标志着企业级智能体进入常驻操作新阶段。

来源：InfoQ

论文研究

6/27 01:21

Agentic Hardware Design as Repository-Level Code Evolution

We present HORIZON, a self-evolving agent framework that treats hardware design as repository-level code evolution. A Markdown harness is compiled into a project pack containing domain knowledge, an e…

来源：arXiv

论文研究

6/27 01:08

Agent-Native Immune System: Architecture, Taxonomy, and Engineering

The transition from static chat bots to autonomous agents--equipped with persistent memory, tool-use protocols, and multi-agent collaboration--has fundamentally expanded the AI threat landscape. Curre…

来源：arXiv

行业动态

6/27 01:02

当 Agent 成为新的核心云用户：阿里云重新定义“用云范式”

AI 点评 · Agent作为云服务新用户，预示云计算的交互模式将彻底改变。

来源：InfoQ

行业动态

6/27 00:40

Show HN: Smart model routing directly in Claude, Codex and Cursor

We built a model router that plugs into coding agents (e.g. Claude Code, Codex, Cursor, etc.) and intelligently sends requests to the best model to serve them. Here's a quick demo…

AI 点评 · 让编程助手自动匹配最优模型，大幅提升代码生成效率与成本控制。

来源：Hacker News

行业动态

6/27 00:40

Show HN: Smart model routing directly in Claude, Codex and Cursor

We built a model router that plugs into coding agents (e.g. Claude Code, Codex, Cursor, etc.) and intelligently sends requests to the best model to serve them. Here's a quick demo…

来源：Hacker News

论文研究

6/27 00:22

Govern the Repository, Not the Agent: Measuring Ecosystem-Level Risk in AI-Native Software

Autonomous coding agents now open and merge pull requests in shared repositories at scale, and the field evaluates them the way it has always evaluated components, one agent at a time, on isolated ben…

来源：arXiv

技巧与观点

6/26 22:38

Production-grade AI agents for financial compliance: Lessons from Stripe

In this post, you learn how Stripe built a production-grade AI agent system for financial compliance. We cover the technical architecture of Stripe’s ReAct agent framework and the…

来源：AWS ML

技巧与观点

6/26 06:28

AI and Liability

AI and Liability Bruce Schneier and Nathan Sanders on the recent German ruling that Google be held liable for errors introduced in their AI overviews: AI agents are agents of the p…

AI 点评 · AI责任判定首案，德国法院裁定谷歌为AI错误担责，确立行业先例。

来源：Simon Willison

行业动态

6/26 04:19

Patronus AI lands $50M to build ‘digital worlds’ that stress-test AI agents

Agent-testing startup Patronus AI, founded by former Meta AI researchers, is experiencing nearly insatiable demand, its investor says.

AI 点评 · AI安全测试赛道爆发，前Meta团队获巨额融资，预示行业对AI可靠性评估需求激增。

来源：TechCrunch

论文研究

6/26 04:00

GBC: Gradient-Based Connections for Optimizing Multi-Agent Systems

Multi-agent systems (MAS) built on large language models (LLMs) provide a promising framework for solving complex tasks through role specialization and structured interaction. However, their performan…

来源：HuggingFace Papers

论文研究

6/26 04:00

ProMSA:Progressive Multimodal Search Agents for Knowledge-Based Visual Question Answering

Knowledge-based Visual Question Answering (KB-VQA) requires models to combine image understanding with external knowledge. Most prior methods use a fixed retrieve-then-generate pipeline with a pre-sel…

来源：HuggingFace Papers

论文研究

6/26 04:00

RocketSmith: Agentic Additive Manufacturing of High-Powered Rockets

RocketSmith is an agentic system which intelligently automates the DFAM process for the development of high powered rockets suitable for launch. The system utilizes a large language model to orchestra…

来源：HuggingFace Papers

论文研究

6/26 04:00

TUA-Bench: A Benchmark for General-Purpose Terminal-Use Agents

As large language models and harness frameworks continue to advance, agents operating in terminals are increasingly capable of performing a broader range of general computer-use tasks beyond coding. H…

来源：HuggingFace Papers

论文研究

6/26 04:00

Dockerless: Environment-Free Program Verifier for Coding Agents

Program verifiers play a central role in training coding agents, including selecting trajectories for supervised fine-tuning (SFT) and providing rewards for reinforcement learning (RL). Standard execu…

来源：HuggingFace Papers

论文研究

6/26 04:00

Building to the Test: Coding Agents Deliver What You Check, Not What You Requested

Benchmarks are widely used to evaluate task completion by Large Language Models (LLMs), but this approach has accumulated construction-validity problems, and a passing score may not show whether the r…

来源：HuggingFace Papers

论文研究

6/26 04:00

When Search Agents Should Ask: DiscoBench for Clarification-Aware Deep Search

Search agents powered by large language models (LLMs) are increasingly used to solve complex information-seeking tasks, requiring multi-step retrieval and reasoning to fulfill user goals. However, exi…

来源：HuggingFace Papers

产品发布/更新

6/26 03:06

GitHub 推出 Copilot 桌面应用，支持并行 Agent 开发工作流

AI 点评 · 打通AI编程工具与桌面环境，并行Agent工作流将大幅提升开发效率。

来源：InfoQ

行业动态

6/26 03:04

Notion killing Skiff-influenced email app since most users use AI agents instead

Notion is "going all in on using agents to run your inbox."

AI 点评 · 转向AI代理管理邮箱，Notion此举或将颠覆传统邮件处理方式，值得关注。

来源：Ars Technica

技巧与观点

6/26 02:00

AI 智能体的身份与权限挑战：Uber 和 Auth0 如何重新思考访问控制

AI 点评 · AI智能体权限管理成关键，Uber与Auth0的实践揭示企业安全新挑战。

来源：InfoQ

技巧与观点

6/26 01:55

Retrofit, don’t rebuild: Agentic overlays for transforming legacy enterprise services

In this technical collaboration between AWS and the authors, we present a pragmatic solution: agentic overlays. Agentic overlays are thin wrapper layers that transform traditional…

AI 点评 · 用智能代理改造传统系统，比推倒重建更经济高效，是企业数字化转型的新思路。

来源：AWS ML

论文研究

6/26 01:44

Empowering GUI Agents via Autonomous Experience Exploration and Hindsight Experience Utilization for Task Planning

Multimodal web agents can assist humans in operating repetitive GUI tasks, where effective task planning is essential for decomposing complex tasks into executable actions. While small open source MLL…

来源：arXiv

行业动态

6/26 00:55

General Intuition’s $2.3B bet that video games can train AI agents for the real world

General Intuition has raised $320 million to scale AI trained on millions of hours of gameplay, betting action data can help AI develop something closer to human intuition.

AI 点评 · 用游戏数据训练AI直觉的独特路径，23亿美元押注行动智能向现实世界迁移。

来源：TechCrunch

行业动态

6/26 00:54

Claude Code 工程一号位亲自给 Agent 热潮降温：狂烧 Token 时代已过，现在该算ROI了

AI 点评 · Agent狂烧算力模式终结，行业回归理性，ROI成为核心指标。

来源：InfoQ

技巧与观点

6/26 00:38

Build self-service AWS Health analytics to find actionable health insights with AI agents powered by Amazon Bedrock

In this post, we show you how to build Chaplin (Customer Health and Planned Lifecycle Intelligence Nexus), an open source solution that uses AI agents exposed through the Model Con…

AI 点评 · 用AI代理自动分析健康数据，让非技术人员也能自助获取洞察，降低门槛。

来源：AWS ML

技巧与观点

6/26 00:35

Building agentic AI applications with a modern data mesh strategy on AWS

This post shows how to build a governed, serverless data mesh on AWS that provides the secure, scalable data foundation production agentic AI requires.

来源：AWS ML

行业动态

6/25 20:50

三家公司一周内出手，编码 Agent 进入团队基础设施时代

来源：InfoQ

模型发布/更新

6/25 20:43

这家Agent 公司从 Claude 切到 DeepSeek v4：一年省下数百万美元，迁移工作量却是预期的 100 倍

来源：InfoQ

行业动态

6/25 12:00

Improving the speed and energy-efficiency of AI agents

A new system, known as Murakkab, optimizes the design and deployment of multistep workflows that power AI applications.

来源：MIT News

论文研究

6/25 10:00

How agents are transforming work

A new OpenAI research paper shows how AI agents are transforming work, enabling longer, more complex tasks and expanding productivity across roles.

来源：OpenAI

论文研究

6/25 04:00

Running the Gauntlet: Re-evaluating the Capabilities of Agents Beyond Familiar Environments

As agentic systems continue to evolve and are widely deployed in real-world scenarios, there is a growing demand to faithfully evaluate their capabilities. However, current benchmarks are typically bu…

来源：HuggingFace Papers

论文研究

6/25 04:00

When Does Combining Language Models Help? A Co-Failure Ceiling on Routing, Voting, and Mixture-of-Agents Across 67 Frontier Models

Multi-model LLM systems such as routing, voting, cascades, fusion, and mixture-of-agents are used to beat single-model accuracy. We show that their gain is capped by a quantity the field rarely report…

来源：HuggingFace Papers

论文研究

6/25 04:00

OPID: On-Policy Skill Distillation for Agentic Reinforcement Learning

Outcome-based reinforcement learning provides a stable optimization backbone for language agents, but its sparse trajectory-level rewards provide little guidance on which intermediate decisions should…

来源：HuggingFace Papers

论文研究

6/25 04:00

Qwen-Image-Agent: Bridging the Context Gap in Real-World Image Generation

While text-to-image (T2I) models have achieved remarkable progress, they struggle with real-world requests that are often underspecified, implicit, or dependent on up-to-date knowledge. We identify th…

来源：HuggingFace Papers

论文研究

6/25 04:00

To Run or Not to Run: Analyzing the Cost-Effectiveness of Code Execution in LLM-Based Program Repair

LLM-based agents for program repair are increasingly built on a "generate-run-revise" paradigm, iteratively executing tests to evaluate and refine patches. This execution-based approach has become sta…

来源：HuggingFace Papers

论文研究

6/25 04:00

How Much Static Structure Do Code Agents Need? A Study of Deterministic Anchoring

LLM-based code agents navigate repositories through keyword search but miss the structural relationships, such as call graphs, inheritance hierarchies, and configuration dependencies, that define how…

来源：HuggingFace Papers

论文研究

6/25 04:00

Boundary-Aware Context Grounding for A Low-Channel EEG Agent

Large language models (LLMs) can make scientific software easier to use. However, a general model does not automatically know which measurements a particular sensor can support, which algorithms are i…

来源：HuggingFace Papers

论文研究

6/25 04:00

Ko-WideSearch: A Korean Breadth-Search Benchmark for Exhaustive Set Enumeration by Web Agents

Web-agent benchmarks overwhelmingly measure depth -- pinning one obscure answer behind a chain of constraints -- while breadth, exhaustively enumerating a closed set and filling each item's attributes…

来源：HuggingFace Papers

论文研究

6/25 04:00

Delayed Verification Destabilizes Multi-Agent LLM Belief: Instability Thresholds and Optimal Corrector Placement

Multi-agent large language model (LLM) systems often rely on verifier and critic agents to suppress hallucinations, but verification is delayed. During this delay, false claims can propagate through t…

来源：HuggingFace Papers

技巧与观点

6/25 02:20

Build a healthcare appointment agent with Amazon Nova 2 Sonic

In this post, you will learn how to build a voice agent that handles appointment reminder conversations using Amazon Nova 2 Sonic and Amazon Bedrock AgentCore. The agent authentica…

AI 点评 · 亚马逊推出低成本语音预约系统，展示AI在医疗场景的落地新范式。

来源：AWS ML

论文研究

6/25 01:54

Neglected Free Lunch from Post-training: Progress Advantage for LLM Agents

Process reward models enable fine-grained, step-level evaluation of LLMs, yet building them for agentic settings remains prohibitively difficult: long-horizon interactions, irreversible actions, and s…

来源：arXiv

论文研究

6/25 01:32

The Unfireable Safety Kernel: Execution-Time AI Alignment for AI Agents and Other Escapable AI Systems

AI agents are granted access to tools, APIs, and other infrastructure, making them active principals in those systems. The dominant approach places controls inside the agent's own runtime: system prom…

来源：arXiv

技巧与观点

6/25 01:22

为什么大量企业Agent死在原型阶段？亚马逊云科技储瑞松：Agent工程是关键

AI 点评 · 企业Agent落地难在工程化，亚马逊云科技的工程视角直击行业痛点。

来源：InfoQ

论文研究

6/25 00:57

Can Trustless Agents Be Trusted? An Empirical Study of the ERC-8004 Decentralized AI Agent Ecosystem

As autonomous AI agents increasingly transact across organizational boundaries, a fundamental trust challenge emerges: how can an agent assess whether an unknown counterpart is trustworthy? The ERC-80…

来源：arXiv

技巧与观点

6/25 00:56

How Loka Built a Natural, Low-Latency Voice Agent with Amazon Nova 2 Sonic

In this post, we demonstrate the architecture and approach Loka used to solve a common frustration: robotic, slow voice assistants that cause customers to hang up, damaging brand r…

AI 点评 · Loka用亚马逊新模型打造自然流畅语音代理，解决机器人语音痛点，技术方案值得借鉴。

来源：AWS ML

论文研究

6/25 00:55

Why Multi-Step Tool-Use Reinforcement Learning Collapses and How Supervisory Signals Fix It

Tool use enables large language models (LLMs) to perform complex tasks, and recent agentic reinforcement learning (RL) methods show promise for enhancing model capabilities. However, RL alone often le…

来源：arXiv

产品发布/更新

6/24 23:25

ZeKaiNie/universal-examprep-skill

Last-night exam-cram coach as a Claude Agent Skill: turns your slides, notes and past papers into a chaptered knowledge base + quiz bank, teaches only what's in…

来源：GitHub

产品发布/更新

6/24 20:07

mariozhaofan-pixel/director-seedance-prompt

Film Language Operating System for director-level, Seedance-ready cinematic video prompts.

来源：GitHub

行业动态

6/24 19:21

Haystack: Open-Source AI Framework for Production Ready Agents, RAG

来源：Hacker News

行业动态

6/24 18:06

完成数亿元新融资，影眸科技 Hyper3D 让 3D 生成进入“思考时代”丨36氪首发

文｜王欣逸编辑｜张雨忻 2026 年开年来，3D 生成模型赛道相当热闹。今年第一季度，影眸科技发布首个 3D 编辑模型 Rodin Gen-2 Edit，让 AI 3D 模型第一次可编辑；今年 6 月，VAST 官宣了新一轮融资，Meshy 也紧随其后，宣称自己发布了全球首款 3D AI Agent。近日，影眸科技——这支扎根学术圈、创业早、年轻的 3…

来源：36氪

产品发布/更新

6/24 16:10

benchflow-ai/awesome-evals

A curated, non-BS library of the best resources for building and evaluating AI agents — papers, blogs, talks, tools, benchmarks. Maintained by BenchFlow.

来源：GitHub

产品发布/更新

6/24 15:29

Amal-David/mlx-porting-skill

Agent Skill for MLX model porting, validation, quantization, benchmarking, and optimization.

来源：GitHub

产品发布/更新

6/24 15:12

raiyanyahya/llmaker

Selfhost modern LLM stacks. Run the whole fleet from your terminal

来源：GitHub

产品发布/更新

6/24 11:23

fancyboi999/open-tag

Open-source, self-hostable alternative to Claude Tag — a Slack-style workspace where your team and its AI agents (Claude Code, Codex, GitHub Copilot, and more)…

来源：GitHub

产品发布/更新

6/24 10:20

abundantbeing/hermes-browser-extension

Browser-native side panel for Hermes Agent — connect web context to your local Hermes runtime.

来源：GitHub

产品发布/更新

6/24 10:11

DV/HDV 磁带拯救计划：用 AI 驱动十八年前的老设备

2026 年是 AI agent 的第一年，2026 年也是 FireWire 的最后一年。查看全文

来源：少数派

行业动态

6/24 07:30

India’s MoEngage bets that the future of marketing is millions of AI agents

The all-cash deal gives MoEngage access to technology that assigns AI agents to individual customers.

来源：TechCrunch

产品发布/更新

6/24 07:23

英伟达发布BioNeMo Agent工具包

当地时间6月23日，英伟达宣布推出NVIDIA BioNeMo Agent Toolkit，该工具包包含英伟达超过十年的生命科学库、工具和开放模型，使AI智能体、科学家和实验室能够通过收集证据、跨研究结果进行推理、运行计算实验以及推荐下一步最佳行动来协同工作，从而加速科学发现。（界面）

AI 点评 · 英伟达BioNeMo将十年生命科学积累与AI智能体结合，有望大幅加速药物研发和科学实验。

来源：36氪

论文研究

6/24 04:00

Constraint Tax in Open-Weight LLMs: An Empirical Study of Tool Calling Suppression Under Structured Output Constraints

Tool Calling and Structured Output are two core capabilities of modern Agent systems, yet their interaction under joint deployment conditions remains insufficiently understood. This paper reports a re…

来源：HuggingFace Papers

论文研究

6/24 04:00

V-Zero: Answer-Label-Free On-Policy Distillation with Contrastive Evidence Gating for Fine-Grained Visual Reasoning

Fine-grained visual reasoning requires multimodal large language models (MLLMs) to identify task-relevant visual evidence and ground their reasoning in local image regions. Existing agentic methods ty…

来源：HuggingFace Papers

论文研究

6/24 04:00

Autodata: An agentic data scientist to create high quality synthetic data

We introduce Autodata, a general method that enables AI agents to act as data scientists who build high quality training and evaluation data. We show how to train (meta-optimize) such a data scientist…

来源：HuggingFace Papers

论文研究

6/24 04:00

Neglected Free Lunch from Post-training: Progress Advantage for LLM Agents

来源：HuggingFace Papers

论文研究

6/24 04:00

The Verification Horizon: No Silver Bullet for Coding Agent Rewards

A classical intuition holds that verifying a solution is easier than producing one. For today's coding agents, this intuition is being inverted: as foundation models develop stronger reasoning capabil…

来源：HuggingFace Papers

论文研究

6/24 04:00

Why Multi-Step Tool-Use Reinforcement Learning Collapses and How Supervisory Signals Fix It

来源：HuggingFace Papers

行业动态

6/24 02:35

主题征文｜跟鸿蒙一起迈入 Agent 时代，大展“鸿图”！

AI 点评 · 鸿蒙Agent时代开启，标志国产操作系统从工具迈向智能生态。

来源：InfoQ

产品发布/更新

6/24 01:46

腾讯云发布边缘 Web 与 AI Agent 托管平台 EdgeOne Makers：一键开发部署，分钟级全球上线

AI 点评 · 聚焦边缘AI应用开发效率，一键全球部署降低技术门槛，推动智能应用快速落地。

来源：InfoQ

论文研究

6/24 01:21

World Models in Pieces: Structural Certification for General Agents

In the big-world regime, agents cannot be universally capable and their ability is inevitably specialized across a world model in pieces. Consequently, standard uniform guarantees fail to distinguish…

来源：arXiv

产品发布/更新

6/24 01:19

usedotai/dot-loom

Provider-pluggable orchestration runtime for multi-model AI inference. ( Sakana Fugu style )

来源：GitHub

论文研究

6/24 01:18

Grading the Grader: Lessons from Evaluating an Agentic Data Analysis System

Agentic data analysis systems produce rich outputs, including code, numerical results, and verbal diagnostics. This makes them more challenging to evaluate than single-turn LLM responses. It is theref…

来源：arXiv

行业动态

6/24 01:12

使用Azure Container Apps Sandboxes安全运行不受信任的AI智能体代码

AI 点评 · 隔离执行环境保障AI智能体安全，降低代码运行风险，推动企业级应用部署。

来源：InfoQ

产品发布/更新

6/24 01:09

Reyzowter/Hello-Agents

🤖 Building AI Agent Systems from Scratch — A comprehensive, practical tutorial from fundamentals to production-grade multi-agent applications

来源：GitHub

产品发布/更新

6/24 01:09

Reyzowter/Hello-Agents

🤖 Building AI Agent Systems from Scratch — A comprehensive, practical tutorial from fundamentals to production-grade multi-agent applications

来源：GitHub

论文研究

6/24 01:05

SHERLOC: Structured Diagnostic Localization for Code Repair Agents

LLM agents solve repository-level coding tasks through multi-turn tool use, but utilize half their budget on locating faults before editing. Dedicated localization frameworks have emerged, yet are sti…

来源：arXiv

行业动态

6/24 00:15

Angular官方的智能体Skills助力AI编程工具生成现代化的Angular代码

AI 点评 · Angular官方技能让AI生成更规范代码，开发者效率与质量双提升。

来源：InfoQ

行业动态

6/23 23:05

Show HN: peerd – AI agent harness that runs entirely in your browser

Hey HN. http://peerd.ai is an AI agent harness that lives entirely in your browser as a web extension. You don’t have to install a separate “AI browser”. You don’t have to bolt on…

来源：Hacker News

行业动态

6/23 23:05

Show HN: peerd – AI agent harness that runs entirely in your browser

Hey HN. http://peerd.ai is an AI agent harness that lives entirely in your browser as a web extension. You don’t have to install a separate “AI browser”. You don’t have to bolt on…

来源：Hacker News

产品发布/更新

6/23 22:00

谷歌推出Colab CLI：面向开发者、自动化与AI智能体的命令行工具

来源：InfoQ

产品发布/更新

6/23 21:47

云计算一哥，让小鹏、Kimi和猎豹都爽了一把

Agentic AI爆发的拐点已然来临

来源：量子位

产品发布/更新

6/23 21:37

刚刚，豆包2.1发布！Agent自己跑18个小时搞定芯片设计代码

编程比肩Opus 4.7

来源：量子位

行业动态

6/23 21:00

Fika Jobs raises $4M to build a video-first hiring platform where AI agents interview candidates

Stockholm-based startup Fika Jobs is building a video-first hiring platform that combines AI interview agents with short-form video profiles, creating something that feels like a c…

来源：TechCrunch

技巧与观点

6/23 20:51

Build real agentic apps using CUGA: two dozen working examples on a lightweight harness

来源：HuggingFace Blog

行业动态

6/23 18:02

Show HN: Shumai – open-source Frame.io alternative for creative work

Shumai is an open source platform for uploading creative files, managing projects, collecting precise feedback, sharing work, and collaborating with AI agents, all in one simple cr…

来源：Hacker News

产品发布/更新

6/23 17:40

Dropbox发布Nova：用于大规模运行AI编程智能体的内部平台

来源：InfoQ

行业动态

6/23 16:31

Show HN: Neural Particle Automata

Neural CAs model self-organizing pattern formation on grids. Now the grid is gone. Each cell is an agentic particle that can move freely in space and change its state. While each p…

来源：Hacker News

产品发布/更新

6/23 16:16

有智青年挑战赛暨全国AI+场景应用大赛决赛收官！在WAVES 2026的舞台上，挖掘中国下一代AI独角

有智青年挑战赛暨全国AI+场景应用大赛决赛在WAVES新浪潮大会期间举行，汇聚多支青年团队围绕AI与数字经济前沿场景展开角逐，展现青年创业者的技术探索与落地能力。 2026年，AI全面进入“行动者”时代。当大模型、智能体、具身智能从实验室的技术概念，全面走进千行百业的产业落地场景，AI工具的平民化与普及化，依托开源生态、轻量化开发工具与普惠算力，不断压低创新…

来源：36氪

产品发布/更新

6/23 15:11

vancyland/DataClaw0

DataClaw: Agentic Tailoring Multimodal Data from Raw Streams — coming soon (code, weights, dataset & DataClaw-val upon acceptance).

来源：GitHub

产品发布/更新

6/23 15:11

vancyland/DataClaw0

DataClaw: Agentic Tailoring Multimodal Data from Raw Streams — coming soon (code, weights, dataset & DataClaw-val upon acceptance).

来源：GitHub

模型发布/更新

6/23 14:00

NVIDIA Brings Trusted, 24/7 AI Agents to Telecom Operations

Telecom operators have seen remarkable returns from using generative AI to automate network management, customer care and back-office operations. Most of that impact has been task‑…

来源：NVIDIA

行业动态

6/23 11:23

GLM-5.2 is a step change for open agents

来源：Hacker News

行业动态

6/23 04:53

The AI world is getting ‘loopy’

The loop takes agentic AI a step further by authorizing a swarm of agents to work continuously in the background, endlessly.

AI 点评 · 自主代理集群持续后台运行，突破传统AI单次交互模式，预示自动化新纪元。

来源：TechCrunch

论文研究

6/23 04:00

AGORA: An Archive-Grounded Benchmark for Agentic Workplace Document Reasoning

Large language models are increasingly deployed as agents that reason over documents rather than answer from parametric knowledge. We study archive-grounded reasoning: locating sparse evidence across…

来源：HuggingFace Papers

论文研究

6/23 04:00

Escaping the Self-Confirmation Trap: An Execute-Distill-Verify Paradigm for Agentic Experience Learning

Experience-driven self-evolution is critical for large language model (LLM) agents to improve through open-world interaction. However, existing experience learning methods mostly rely on single-agent…

来源：HuggingFace Papers

论文研究

6/23 04:00

NatureBench: Can Coding Agents Match the Published SOTA of Nature-Family Papers?

We introduce NatureBench, a cross-discipline benchmark of 90 tasks distilled from peer-reviewed Nature-family publications, designed to evaluate whether AI coding agents can move beyond reproduction t…

来源：HuggingFace Papers

论文研究

6/23 04:00

ReMMD: Realistic Multilingual Multi-Image Agentic Verification for Multimodal Misinformation Detection

Multimodal misinformation detection is increasingly important because viral posts now combine long multilingual narratives, several images, mixed provenance, and subtle text--image framing errors. Exi…

来源：HuggingFace Papers

论文研究

6/23 04:00

Qwen-AgentWorld: Language World Models for General Agents

A world model predicts environment dynamics based on current observations and actions, serving as a core cognitive mechanism for reasoning and planning. In this work, we investigate how world modeling…

来源：HuggingFace Papers

论文研究

6/23 04:00

OpenThoughts-Agent: Data Recipes for Agentic Models

Agentic language models dramatically expand the applications of AI yet little is publicly known about how to curate training data for broadly capable agents. Existing open efforts such as SWE-Smith, S…

来源：HuggingFace Papers

论文研究

6/23 04:00

Are We Ready For An Agent-Native Memory System?

Memory for large language model (LLM) agents has rapidly evolved from simple retrieval-augmented mechanisms into a data management system that supports persistent information storage, retrieval, updat…

来源：HuggingFace Papers

论文研究

6/23 04:00

MEMPROBE: Probing Long-Term Agent Memory via Hidden User-State Recovery

Long-term memory promises LLM agents that grow more capable across sessions, maintaining an accurate, evolving understanding of the user that interaction forms. In practice, however, this memory is ev…

来源：HuggingFace Papers

论文研究

6/23 04:00

Thinking While Speaking: Inference-Time Knowledge Transfer for Responsive and Intelligent Conversational Voice Agents

Voice agents face a fundamental tension: the reasoning, retrieval, and tool use that make foundation models capable are iterative and slow, while conversational interaction demands responses on a mill…

来源：HuggingFace Papers

论文研究

6/23 04:00

SkillHone: A Harness for Continual Agent Skill Evolution Through Persistent Decision History

Agent skills extend language-model agents with task-specific procedures, scripts, and references, but the tasks and environments they target continually change. Existing methods improve skills in boun…

来源：HuggingFace Papers

技巧与观点

6/23 01:53

Building pay-per-intelligence for AI agents: How Ampersend uses Amazon Bedrock AgentCore Payments

In this post, you will learn how Ampersend built a pay-per-intelligence routing layer on top of Amazon Bedrock AgentCore Payments. AI agents autonomously route tasks to the most ef…

AI 点评 · 看点是，基于亚马逊Bedrock实现按智能付费的AI代理路由层，创新了成本与任务匹配模式。

来源：AWS ML

论文研究

6/23 01:48

MAS-PromptBench: When Does Prompt Optimization Improve Multi-Agent LLM Systems?

Multi-agent systems (MAS) offer a scalable path forward for agentic AI, comprising multiple LLM-based agents, each assigned a system prompt and a position within a workflow that governs inter-agent co…

来源：arXiv

行业动态

6/23 00:55

吴恩达戳破AI幻象：炒作过头了，未来公司是10人小队＋Agent重做数据架构

AI 点评 · 吴恩达点破AI泡沫，聚焦小团队与Agent重构数据架构，揭示行业理性回归趋势。

来源：InfoQ

行业动态

6/22 23:37

Show HN: Oak – Git alternative designed for agents

Oak is a version control system I've been working on designed for agents ( https://oak.space ). It improves the speed and context your agents need when working on serious projects.…

来源：Hacker News

产品发布/更新

6/22 21:43

Johell1NS/browser-search

A skill for AI agents: search the web with SearXNG, browse with Camofox, bypass protections with CloakBrowser. Anti-hallucination by design. Self-hosted, free,…

来源：GitHub

模型发布/更新

6/22 21:00

NVIDIA Vera CPU Opens the Way for Agentic Scientific AI at Los Alamos National Laboratory

Mission, Vision and Veritas — new Los Alamos National Laboratory (LANL) supercomputers to be built with HPE and NVIDIA — are tapping NVIDIA Vera CPUs to accelerate scientific disco…

来源：NVIDIA

模型发布/更新

6/22 21:00

Eco Wave Power Turns Waves Into Watts With NVIDIA AI Infrastructure and Digital Twins

The next era of AI will not be defined by compute alone. Its growth will be determined by energy. As accelerated computing scales across AI factories, agentic AI, industrial AI, ed…

来源：NVIDIA

技巧与观点

6/22 18:20

内核级的真相：为什么eBPF正在取代基于用户空间的Agent成为安全可观测性的首选

来源：InfoQ

产品发布/更新

6/22 13:55

NotASithLord/peerd

The first AI agent harness native to the browser. A browser extension that runs a full agent loop where you already work: it drives your tabs, spins up sandboxe…

来源：GitHub

产品发布/更新

6/22 13:09

DeepSeek缺Agent人才缺疯了！负责人各种贴广告

DeepSeek正在全力押注

来源：量子位

行业动态

6/22 07:34

中信建投：国产模型加速迭代，算力景气度持续

36氪获悉，中信建投研报称，国内模型持续迭代，GLM-5.2、Kimi K2.7 Code强化1M上下文、长程Agent、Agentic Coding和真实工程交付能力，推动国产模型从通用问答转向开发者工具和企业级工作流。Kimi补强国际化运营能力，DeepSeek融资强化头部模型产业化预期，微信AI灰度测试则显示AI入口正从独立App走向超级应用生态，有望…

AI 点评 · 国产模型转向企业级应用，算力需求确定性增强，产业链景气度有望持续。

来源：36氪

技巧与观点

6/22 06:01

Temporary Cloudflare Accounts for AI agents

Temporary Cloudflare Accounts for AI agents The announcement says this is "for AI agents" but (as is pretty common these days) the AI hook isn't really necessary, this is an intere…

AI 点评 · 为AI代理提供临时账户，简化安全访问，降低管理成本，是云服务与自动化结合的新探索。

来源：Simon Willison

产品发布/更新

6/22 04:44

redevops-io/redevops-rag

Hybrid RAG (DuckDB vector + BM25 + RRF + recency/keyword priors + optional cross-encoder rerank) as an installable library + CLI.

来源：GitHub

论文研究

6/22 04:00

When Agents Commit Too Soon: Diagnosing Premature Commitment in LLM Agents

Long-horizon LLM agents can fail quietly: they settle on one reading of the evidence early, then spend the rest of the run defending it. We call this premature commitment. Final-answer scoring misses…

来源：HuggingFace Papers

论文研究

6/22 04:00

Capable but Careless: Do Computer-Use Agents Follow Contextual Integrity?

Computer-use agents (CUAs) now act on a user's behalf across personal applications such as email, calendars, and to-do lists. This cross-application access is useful, but it also creates a privacy ris…

来源：HuggingFace Papers

论文研究

6/22 04:00

Tmax: A simple recipe for terminal agents

Terminal-using agents have quickly become the most popular downstream application of language models (LMs). Despite their prevalence, relatively little academic work has examined RL-based training of…

来源：HuggingFace Papers

论文研究

6/22 04:00

CLI-Universe: Towards Verifiable Task Synthesis Engine for Terminal Agents

While recent LLM-based terminal agents have demonstrated promising capabilities, the scarcity of high-quality, executable training data remains a critical bottleneck. Existing synthesis pipelines typi…

来源：HuggingFace Papers

论文研究

6/22 04:00

Training Open Models for Agentic Phone Use

Phones are becoming an important execution surface for general-purpose agents, but training open models for reliable phone use remains difficult because the environment that matters at deployment, rea…

来源：HuggingFace Papers

论文研究

6/22 04:00

Self-Compacting Language Model Agents

Long agent traces composed of chains of thought and tool calls accumulate stale content that anchor subsequent generations, and eventually outgrow the context window. Existing scaffolds mitigate it wi…

来源：HuggingFace Papers

论文研究

6/22 04:00

Causal Discovery in the Era of Agents

Recent attempts to combine large language models (LLMs) with causal discovery ask models to infer pairwise directions, propose graph structures, or inject language-model outputs as priors and constrai…

来源：HuggingFace Papers

论文研究

6/22 04:00

EnterpriseClawBench: Benchmarking Agents from Real Workplace Sessions

Enterprise agents increasingly operate inside workspaces: they read heterogeneous files, invoke tools, and deliver business artifacts. We introduce EnterpriseClawBench, an enterprise agent benchmark c…

来源：HuggingFace Papers

论文研究

6/22 04:00

Critique of Agent Model

What is an agent? What constitutes agency? With the rise of Large Language Model (LLM) systems marketed as ``coding agents'', ``AI co-scientists'', and other ``agentic" tools that promise to drive up…

来源：HuggingFace Papers

论文研究

6/22 04:00

AOHP: An Open-Source OS-Level Agent Harness for Personalized, Efficient and Secure Interaction

AI agents are driving a new software paradigm, with the ability to autonomously call tools, extract information, manage memory, and complete tasks that span applications and data sources. Most existin…

来源：HuggingFace Papers

论文研究

6/22 04:00

Plans Don't Persist: Why Context Management Is Load Bearing for LLM Agents

Long-horizon agents depend on context management: systems compress, summarize, and evict old tokens so tasks can continue beyond finite windows. That is safe only when dropped information is no longer…

来源：HuggingFace Papers

论文研究

6/22 04:00

The Hitchhiker's Guide to Agentic AI: From Foundations to Systems

The Hitchhiker's Guide to Agentic AI is a comprehensive practitioner's reference for building autonomous AI systems. The book covers the full stack from first principles to production deployment, orga…

来源：HuggingFace Papers

论文研究

6/22 04:00

GUI vs. CLI: Execution Bottlenecks in Screen-Only and Skill-Mediated Computer-Use Agents

Computer-use agents can execute software tasks through either graphical interfaces or programmatic command interfaces, but existing evaluations confound interaction modality with differences in tasks,…

来源：HuggingFace Papers

论文研究

6/22 04:00

Managing Procedural Memory in LLM Agents: Control, Adaptation, and Evaluation

Procedural memory is increasingly used to improve LLM agents on recurring workplace tasks, yet its ability to produce reusable skills remains poorly understood. We introduce AFTER, a benchmark of 382…

来源：HuggingFace Papers

产品发布/更新

6/22 03:52

anthony-chaudhary/fak

fak — the Fused Agent Kernel: one Go binary that turns a tool-using agent (Claude Code, Codex, Cursor, any OpenAI/Anthropic/MCP client) into a managed agent: ca…

来源：GitHub

行业动态

6/21 18:00

从“机审+人审”到“AI-Native”：大模型与 Agent 驱动内容风控智能化升级｜AICon上海

AI 点评 · AI原生内容风控成为新趋势，大模型与Agent结合将颠覆传统审核模式。

来源：InfoQ

产品发布/更新

6/21 13:59

axislab-top/Foundry

Transform any idea into a running AI-powered company

来源：GitHub

行业动态

6/21 12:28

Building reliable agentic AI systems

AI 点评 · 专注构建可靠自主AI系统，填补当前AI可靠性短板，推动实用化进程。

来源：Hacker News

产品发布/更新

6/21 10:30

AI 工作流实践：100% Vibe Coding 完成 Game Jam 游戏开发

Agent 和人一样离不开闭环。查看全文

AI 点评 · 以Vibe Coding实现全流程AI开发，验证了Agent闭环工作的可行性，门槛极低值得关注。

来源：少数派

论文研究

6/21 04:00

Libretto: Giving LLM Agents a Sense of Musical Structure

Generative music systems can now produce impressive audio from text prompts, but audio outputs are difficult to inspect, edit, and diagnose as musical structure. We introduce Libretto, an agent-facing…

来源：HuggingFace Papers

论文研究

6/21 04:00

PlanBench-XL: Evaluating Long-Horizon Planning of LLM Tool-Use Agents in Large-Scale Tool Ecosystems

LLM agents increasingly operate in large tool ecosystems, where real-world tasks require discovering relevant tools, inferring implicit sub-goals, and adapting to dynamic environments over long horizo…

来源：HuggingFace Papers

论文研究

6/21 00:08

Lexical Consensus: Grounded Word Learning and Shared Meaning in Artificial Agents

Artificial intelligence systems are commonly evaluated through task performance and behavioral imitation, but such evaluations leave open whether an artificial agent can acquire, stabilize, and use ne…

来源：HuggingFace Papers

产品发布/更新

6/20 23:01

redevops-io/sidekick

Local coding-agent orchestrator — DAG of auto-approved, git-worktree-isolated sub-sessions across LLM providers (Claude/Kimi/Grok/DeepSeek/local). AGPL-3.0.

来源：GitHub

行业动态

6/20 19:19

Temporary Cloudflare accounts for AI agents

AI 点评 · 云服务商为AI代理开辟专用通道，预示智能体自动化操作进入新阶段。

来源：Hacker News

技巧与观点

6/20 06:45

Quoting Sean Lynch

The real valuable capability MCP offers over skills/CLI is isolating the auth flow outside of the agent’s context window, and potentially out of the harness completely. [...] Maybe…

AI 点评 · MCP将认证流程独立于智能体上下文，这种架构创新对提升安全性与扩展性意义重大。

来源：Simon Willison

产品发布/更新

6/20 04:36

raiyanyahya/recall

Stop wasting tokens and re-explaining your project every session. Recall gives Claude Code durable memory — entirely offline.

来源：GitHub

产品发布/更新

6/20 04:36

raiyanyahya/recall

Stop wasting tokens and re-explaining your project every session. Recall gives Claude Code durable memory — entirely offline.

来源：GitHub

论文研究

6/20 04:00

OpenBioRQ: Unsolved Biomedical Research Questions for Agents

A working citation looks like proof -- but the fact that a link resolves does not mean the cited paper supports the claim. I find that current agentic models rarely fabricate citations (over 99% resol…

来源：HuggingFace Papers

产品发布/更新

6/20 03:16

ather-techie/ai-system-design-interview

A comprehensive, production-focused guide to acing AI/ML system design interviews at top tech companies.

来源：GitHub

产品发布/更新

6/20 00:43

umacloud/umadev

UmaDev: A coding agent that works like a real dev team, commanding the Claude Code / Codex / OpenCode you already use.

来源：GitHub

产品发布/更新

6/20 00:43

umacloud/umadev

UmaDev: A coding agent that works like a real dev team, commanding the Claude Code / Codex / OpenCode you already use.

来源：GitHub

行业动态

6/19 23:59

强推 AI 引用户反感，谷歌 AI 建议用户不想看 AI 就用 DuckDuckGo

IT之家 6 月 19 日消息，谷歌目前正全力推进 AI 生态的建设，并在搜索引擎中强行加入 AI 智能体。 DuckDuckGo 官方今日在 X 上晒出了一张截图，显示谷歌 AI 概览正引导那些讨厌 AI 的用户前往 DuckDuckGo 的“No AI Search”页面，还提到了可调低 AI 体验强度的浏览器设置。 PiunikaWeb 测试发现，当用…

AI 点评 · 谷歌强推AI反遭打脸，AI自己建议用户用竞品，暴露了产品逻辑矛盾。

来源：IT之家

产品发布/更新

6/19 23:37

印度首富安巴尼：印度必须成为 AI 的创造者和全球领导者

IT之家 6 月 19 日消息，印度首富、信实工业集团董事长穆克什 · 安巴尼希望，把公司打造为当地 AI 产业的代表，并把 AI 服务带入电话、移动应用和智能家居。当地时间 19 日（今天），信实工业举行了年度股东大会，并发布 AI 通话助手 Jio Call Agent。Jio Call Agent 可以加入电话通话，自动转录对话、生成摘要，还…

AI 点评 · 印度首富亲自押注，AI本土化野心显露，或重塑全球科技版图。

来源：IT之家

技巧与观点

6/19 22:05

Accelerate campaign workflow with insights from Adobe Marketing Agent for Amazon Quick

This post shows how to enable Adobe Marketing Agent for Amazon Quick using a Model Context Protocol (MCP). We walk you through how to configure the integration, authenticate using…

AI 点评 · MCP技术落地营销场景，打通Adobe与Quick平台，实现工作流智能化。

来源：AWS ML

产品发布/更新

6/19 21:24

sums001/Windows-Copilot-API

Reverse engineered Windows Copilot into an OpenAI-compatible API. Access GPT-4 and GPT-5 models through a simple REST interface without API keys or billing.

来源：GitHub

产品发布/更新

6/19 21:24

sums001/Windows-Copilot-API

Reverse engineered Windows Copilot into an OpenAI-compatible API. Access GPT-4 and GPT-5 models through a simple REST interface without API keys or billing.

来源：GitHub

产品发布/更新

6/19 19:58

Green-PT/honey-for-devs

Honey (I Shrunk the AI) by GreenPT: a cross-tool coding skill that cuts AI coding-agent token usage and LLM API costs — write less code, less prose, and denser…

来源：GitHub

产品发布/更新

6/19 19:58

Green-PT/honey-for-devs

Honey (I Shrunk the AI) by GreenPT: a cross-tool coding skill that cuts AI coding-agent token usage and LLM API costs — write less code, less prose, and denser…

来源：GitHub

产品发布/更新

6/19 19:12

adepeju4/attest

Evidence-grounded evaluation for AI agents — verifies each claim against the agent's real tool outputs (constrained, evidence-grounded model judgment, not holis…

来源：GitHub

产品发布/更新

6/19 19:12

adepeju4/attest

Evidence-grounded evaluation for AI agents — verifies each claim against the agent's real tool outputs (constrained, evidence-grounded model judgment, not holis…

来源：GitHub

产品发布/更新

6/19 16:33

juggler-ai/juggler

The Juggler Code Agent

来源：GitHub

产品发布/更新

6/19 14:34

Karovia/fullstack-ai-agent-roadmap

🎯 从零基础到 AI Agent 全栈工程师 · 110 个详细教程 · 58 万字 · 400+ GitHub 项目精选 · Obsidian 友好 · 中文

来源：GitHub

论文研究

6/19 04:00

Counsel: A Meta-Evaluation Dataset for Agentic Tasks

As agentic systems tackle increasingly complex multi-step tasks, evaluating their trajectories presents a major bottleneck - human annotation of a single trajectory on popular agentic benchmarks can t…

来源：HuggingFace Papers

论文研究

6/19 04:00

DataClaw0: Agentic Tailoring Multimodal Data from Raw Streams

Massive unstructured multimodal streams suffer from high "data entropy," impeding both efficient human knowledge acquisition and high-quality AI post-training. Existing passive annotation paradigms, h…

来源：HuggingFace Papers

论文研究

6/19 04:00

PrivacyAlign: Contextual Privacy Alignment for LLM Agents

AI agents acting on behalf of users are constantly making decisions, and for users to trust their agents, those decisions must align with what they actually want. Privacy is an important alignment pro…

来源：HuggingFace Papers

论文研究

6/19 04:00

BioInsight: Multi-Agent Orchestration for Interactive Biomedical Knowledge Discovery

Biomedical researchers increasingly use AI-generated analyses and reports to interpret protein-level signals, but static outputs are often insufficient for research decision-making, where users need t…

来源：HuggingFace Papers

技巧与观点

6/19 02:13

MosaicLeaks: Can your research agent keep a secret?

AI 点评 · 评估AI研究助手能否保守机密，关乎隐私安全核心问题。

来源：HuggingFace Blog

行业动态

6/19 01:58

个体10倍提效，组织却不足20%？AI产业正迎来Agent落地大考

AI 点评 · AI落地效率失衡揭示组织适配成关键瓶颈，Agent应用挑战值得深思。

来源：InfoQ

行业动态

6/19 01:54

自主智能体遇阻：数据库成最大挑战

AI 点评 · 数据库短板暴露，自主智能体发展卡在基础设施瓶颈上。

来源：InfoQ

论文研究

6/19 01:36

Sovereign Execution Brokers: Enforcing Certificate-Bound Authority in Agentic Control Planes

Autonomous agents are increasingly connected to cloud, deployment, and data-control workflows, but production mutation authority should not reside inside non-deterministic reasoning processes. Existin…

来源：arXiv

技巧与观点

6/19 01:32

Amazon Bedrock AgentCore harness is now generally available: Go from idea to production-grade agent in minutes

Today, Amazon Bedrock AgentCore harness is generally available. Two API calls (CreateHarness to define an agent, and InvokeHarness to run it), and you have an agent running in seco…

AI 点评 · 亚马逊推出极速AI代理开发工具，两分钟即可从创意到生产级应用，大幅降低开发门槛。

来源：AWS ML

论文研究

6/19 01:30

Probe-and-Refine Tuning of Repository Guidance for Coding Agents

LLM-based coding agents need higher-level operational knowledge about a repository (which files house which subsystems, how to run the test suite, which workflows have historically led to wrong fixes)…

来源：arXiv

论文研究

6/19 01:27

Efficient and Sound Probabilistic Verification for AI Agents

Securing AI agents that operate in complex digital environments has become a critical need, and runtime monitoring approaches that formulate and enforce policies expressed in a formal language like Da…

来源：arXiv

行业动态

6/19 01:27

Google 想为 AI Agent 打造下一个 Kubernetes

AI 点评 · Google正试图统一AI Agent部署标准，或将重塑行业生态。

来源：InfoQ

行业动态

6/18 22:49

Launch HN: TesterArmy (YC P26) – Agents that test web and mobile apps

Hey HN - we’re Oskar, Szymon, and Piotr, and we’re building TesterArmy ( https://tester.army ). TesterArmy is an agentic testing platform that runs end-to-end checks before deploym…

AI 点评 · 用AI代理自动执行端到端测试，大幅提升应用发布前的质量保障效率。

来源：Hacker News

行业动态

6/18 19:35

Chrome 推出 WebMCP 标准提案（Origin Trial）：为智能体提供原生 Web 操作能力

来源：InfoQ

行业动态

6/18 18:00

Agentic AI 如何破解金融反欺诈的深层困局｜AICon上海

来源：InfoQ

产品发布/更新

6/18 14:18

shy3130/tickflow-stock-panel

自托管、零运维的 A 股「选股 + 监控 + 回测」量化工作台 | 基于 TickFlow 数据源 | LLM能力驱使策略定制+个股分析+复盘 | 自由接入第三方数据源与个性化扩展数据 | 个人开源 ,非TickFlow官方项目

来源：GitHub

产品发布/更新

6/18 14:18

shy3130/tickflow-stock-panel

来源：GitHub

行业动态

6/18 09:34

北大科学家下场做脑机接口，种子轮融了近亿元

文 | 孙小雯访谈 / 编辑 | 海若镜「暗涌Waves」独家获悉，侵入式脑机接口公司「芯生视界」近日完成近亿元人民币种子轮融资。本轮融资由经纬创投领投，星连资本、燕缘创投、水木创投跟投。当下，侵入式脑机接口已经在治疗瘫痪、脑控外设等医疗场景落地，验证长期植入的安全、有效。与此同时，AI Agent和具身智能技术加速进化，也放大了市场对脑机接口的期待：…

来源：36氪

技巧与观点

6/18 08:00

Is it agentic enough? Benchmarking open models on your own tooling

来源：HuggingFace Blog

技巧与观点

6/18 04:35

Get back hours every day with autonomous agents in Amazon Quick

Today, Quick gets even more powerful: new autonomous agents that work continuously on your behalf, an activity feed that helps you prioritize your most important work, and the abil…

AI 点评 · 亚马逊Quick新增自主代理，能持续替用户工作，大幅提升效率，值得关注。

来源：AWS ML

论文研究

6/18 04:00

LedgerAgent: Structured State for Policy-Adherent Tool-Calling Agents

Policy-adherent tool-calling agents in customer-service domains must maintain task states across turns while calling tools and obeying domain policies. Task states consist of relevant facts, identifie…

来源：HuggingFace Papers

论文研究

6/18 04:00

S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence

Real-world spatial intelligence requires reasoning over a continuous and evolving 3D world, yet existing VLMs and tool-augmented agents largely remain tied to static, stateless inference from isolated…

来源：HuggingFace Papers

论文研究

6/18 04:00

ENPIRE: Agentic Robot Policy Self-Improvement in the Real World

Achieving dexterous robotic manipulation in the real world heavily relies on human supervision and algorithm engineering, which becomes a central bottleneck in the pursuit of general physical intellig…

来源：HuggingFace Papers

论文研究

6/18 04:00

Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents

Agent benchmarks are growing fast, but no single benchmark touches more than four or five of the dimensions that deployment exposes. This paper aggregates the largest coordinated deep-dive of one MCP-…

来源：HuggingFace Papers

论文研究

6/18 04:00

Connect the Dots: Training LLMs for Long-Lifecycle Agents with Cross-Domain Generalization Via Reinforcement Learning

This work presents a general framework for training large language models (LLMs) to "Connect the Dots" (CoD), a meta-capability required by long-lifecycle agents: as an LLM-based AI agent gets deploye…

来源：HuggingFace Papers

论文研究

6/18 04:00

MemGUI-Agent: An End-to-End Long-Horizon Mobile GUI Agent with Proactive Context Management

MLLM-based mobile GUI agents have made substantial progress on short-horizon tasks, yet remain unreliable on long-horizon tasks that require retaining intermediate facts across many steps and app tran…

来源：HuggingFace Papers

论文研究

6/18 04:00

MobileForge: Annotation-Free Adaptation for Mobile GUI Agents with Hierarchical Feedback-Guided Policy Optimization

MLLM-based mobile GUI agents have made substantial progress in UI understanding and action execution, but adapting them to real target apps remains costly because mobile apps are numerous, frequently…

来源：HuggingFace Papers

论文研究

6/18 04:00

When Lower Privileges Suffice: Investigating Over-Privileged Tool Selection in LLM Agents

As LLM agents increasingly select tools autonomously, their choices among tools with different privileges become safety-relevant. However, prior tool-selection studies focus on safety-agnostic metadat…

来源：HuggingFace Papers

论文研究

6/18 04:00

Qwen-RobotNav Technical Report: A Scalable Navigation Model Designed for an Agentic Navigation System

Agentic navigation systems require a base navigation model whose observation strategy can be externally reconfigured at inference time, because instruction following, object search, target tracking, a…

来源：HuggingFace Papers

技巧与观点

6/18 03:25

AI coding agents taught robots how to install GPUs and cut zip ties

Nvidia's self-improvement program for robots enlists teams of AI coding agents.

AI 点评 · 英伟达用AI编程智能体教机器人装显卡，展现了自进化系统的潜力，是机器人自主学习的突破。

来源：Ars Technica

行业动态

6/18 02:00

NEA’s Tiffany Luck on AI IPOs, personal agents, and the ROI reckoning

Tokenmaxxing was the hottest trend in Silicon Valley earlier this year, with CEOs encouraging employees to push AI usage as far as it would go. Then the bill came due. Uber reporte…

来源：TechCrunch

论文研究

6/18 01:45

Data Intelligence Agents: Interpreting, Modeling, and Querying Enterprise Data via Autonomous Coding Agents

Production data integration is bottlenecked by repeated, lossy handoffs between data owners, engineers, and analysts who must collaboratively discover, structure, and query enterprise data. We present…

来源：arXiv

论文研究

6/18 01:31

Enhancing Decision-Making with Large Language Models through Multi-Agent Fictitious Play

Large language model (LLM)-based multi-agent systems (MAS) have demonstrated great potential in solving tasks with execution complexity, by distributing subtasks across cooperative agents. However, th…

来源：arXiv

技巧与观点

6/18 01:17

Context intelligence for your data and AI agents at scale

Agents are only as intelligent as the context they can reason over. Today, that context is scattered across data lakes, data warehouses, lakehouses, databases, and streams, and in…

来源：AWS ML

行业动态

6/18 01:13

昆仑万维天工 3.1 来了：上线设计画布与多智能体工作流，强化复杂项目交付能力

来源：InfoQ

行业动态

6/18 01:00

Pinecone推出OneLake集成方案，让AI智能体直连企业数据

来源：InfoQ

行业动态

6/18 00:14

Launch HN: Adam (YC W25) – Open-Source AI CAD

Hey HN! I'm Zach from Adam ( https://adam.new/ ). We're building AI agents for mechanical CAD software. We’ve built the company on two fundamental beliefs: - AI will be the primary…

来源：Hacker News

行业动态

6/18 00:14

Launch HN: Adam (YC W25) – Open-Source AI CAD

Hey HN! I'm Zach from Adam ( https://adam.new/ ). We're building AI agents for mechanical CAD software. We’ve built the company on two fundamental beliefs: - AI will be the primary…

来源：Hacker News

技巧与观点

6/17 23:29

New in Amazon Bedrock AgentCore: Build agents with broader knowledge and continuous learning

Today we're introducing new capabilities on Amazon Bedrock AgentCore, the platform to build, connect, and optimize agents. In this post, we cover how these capabilities close each…

来源：AWS ML

行业动态

6/17 23:26

Agentic coding deserves more than a chat box bolted onto VS Code

来源：Hacker News

产品发布/更新

6/17 22:46

Neeeophytee/awesome-ai-workflows

100+ Curated, CI-verified AI workflow recipes. Powered by FlowStacks.

来源：GitHub

产品发布/更新

6/17 22:46

Neeeophytee/awesome-ai-workflows

100+ Curated, CI-verified AI workflow recipes. Powered by FlowStacks.

来源：GitHub

产品发布/更新

6/17 22:31

LING71671/open-reverselab

Open-source reverse engineering lab: 197-article knowledge base + MCP tools + CTF/APK/PE automation toolchain. Agent-native. Note:由于场景原因，目前有让几乎所有（除fable5）AI都会越…

来源：GitHub

产品发布/更新

6/17 21:01

xorbitsai/xrouter-llm

A prompt-aware LLM router that predicts which models can complete each request, then selects the cheapest capable one: 53.2% lower cost and +1.9 pts completion…

来源：GitHub

行业动态

6/17 18:46

2026年中国智能体工程化人才与组织发展报告

来源：InfoQ

产品发布/更新

6/17 18:40

shadcn-labs/agentcn

shadcn/ui, but for building agents. 🤖

来源：GitHub

技巧与观点

6/17 18:18

From the Hugging Face Hub to robot hardware with Strands Agents and LeRobot

来源：HuggingFace Blog

技巧与观点

6/17 14:56

微信支付发布AI专属卡 WorkBuddy率先接入

用户可以在与智能体的对话中提出消费需求

来源：量子位

技巧与观点

6/17 08:00

Agentic Resource Discovery: Let agents search

来源：HuggingFace Blog

技巧与观点

6/17 06:46

Safeguard your agentic AI applications with the Amazon Bedrock Guardrails InvokeGuardrailChecks API

Today, we’re announcing a new API with Amazon Bedrock Guardrails. With this API, you can apply individual safeguards, also referred to as safety checks, at any point in your agenti…

AI 点评 · 亚马逊推出新API，可在AI应用任意环节插入安全检测，提升防护灵活性。

来源：AWS ML

模型发布/更新

6/17 06:30

Hands Free, AIs Forward: NVIDIA XR AI Brings Agents to AR Glasses

NVIDIA XR AI is now available in public beta, giving developers a framework for building multimodal AI agents for AR glasses and XR devices.

AI 点评 · 英伟达XR AI公测，为AR眼镜开发多模态AI代理，推动无手交互新范式。

来源：NVIDIA

行业动态

6/17 05:00

Anthropic "pauses" token-based billing for its Claude Agent SDK

Move originally planned for Monday would have heavily increased power users' costs.

AI 点评 · 暂停代币计费，保护核心用户利益，展现平台对开发者生态的审慎态度。

来源：Ars Technica

论文研究

6/17 04:00

MolmoMotion: Forecasting Point Trajectories in 3D with Language Instruction

Motion forecasting is central to visual intelligence: agents must anticipate how objects will move in order to plan actions, reason about physical interactions, and synthesize realistic futures. We ar…

来源：HuggingFace Papers

论文研究

6/17 04:00

RODS: Reward-Driven Online Data Synthesis for Multi-Turn Tool-Use Agents

Multi-turn tool-use RL is bottlenecked by the rapid depletion of informative samples in static datasets. We observe that the gradient signal in GRPO concentrates on tasks with the highest rollout rewa…

来源：HuggingFace Papers

论文研究

6/17 04:00

EfficientRollout: System-Aware Self-Speculative Decoding for RL Rollouts

Reinforcement learning (RL) has become a representative post-training paradigm for LLMs, enabling strong reasoning and agentic capabilities. However, rollout generation remains a dominant latency bott…

来源：HuggingFace Papers

论文研究

6/17 04:00

Learning User Simulators with Turing Rewards

Learning to simulate human users in interactive settings could advance the training of agent assistants, evaluation of personalization systems, research in the social sciences, and more. Existing appr…

来源：HuggingFace Papers

论文研究

6/17 04:00

Configurable Clinical Information Extraction with Agentic RAG: What Works, What Breaks, and Why

Patient contexts span hundreds of heterogeneous documents and thousands of structured data points, yet the document-level metadata that AI systems need for retrieval and triage is absent or incomplete…

来源：HuggingFace Papers

论文研究

6/17 04:00

Playful Agentic Robot Learning

Current agentic robot systems can write executable Code-as-Policy programs, observe feedback, and revise behavior across multiple attempts, but they remain largely task-driven: reusable skills are acq…

来源：HuggingFace Papers

论文研究

6/17 04:00

GateMem: Benchmarking Memory Governance in Multi-Principal Shared-Memory Agents

Memory benchmarks for LLM agents largely assume single-user settings, leaving shared assistants for hospitals, workplaces, campuses, and households understudied. In these deployments, multiple princip…

来源：HuggingFace Papers

论文研究

6/17 04:00

WorldLines: Benchmarking and Modeling Long-Horizon Stateful Embodied Agents

To assist humans over extended periods in real homes, embodied agents must remember user routines, world states, and past interactions. Existing long-term memory benchmarks mainly evaluate language-ce…

来源：HuggingFace Papers

论文研究

6/17 04:00

OpenRath: Session-Centered Runtime State for Agent Systems

Modern agent systems often suffer from fragmented runtime state: transcripts, tool effects, memory events, workspace placement, branch provenance, and replay evidence are recorded separately and becom…

来源：HuggingFace Papers

论文研究

6/17 04:00

Beyond Reward Engineering: A Data Recipe for Long-Context Reinforcement Learning

Long-context reasoning is an essential capability for large language models, particularly when they are deployed as autonomous agents that must reason over lengthy trajectories. Reinforcement learning…

来源：HuggingFace Papers

论文研究

6/17 01:58

ReproRepo: Scaling Reproducibility Audits with GitHub Repository Issues

Reproducing research results from papers and released code is central to scientific progress. Existing works have introduced benchmarks to evaluate whether LLM agents can assist with reproducibility,…

来源：arXiv

论文研究

6/17 01:56

EvolveNav: Proactive Preflection and Self-Evolving Memory for Zero-Shot Object Goal Navigation

Zero-Shot Object-Goal Navigation (ZS-OGN) requires embodied agents to explore and locate target objects without any prior training. To this end, recent methods leverage foundation models. But they typ…

来源：arXiv

论文研究

6/17 01:50

Learning Red Agent Policy from Observations for Neurosymbolic Autonomous Cyber Agents

With sophisticated cyber-attacks becoming increasingly prevalent, modern networks require intelligent autonomous cyber-defense agents trained via Reinforcement Learning (RL). These agents employ neuro…

来源：arXiv

论文研究

6/17 01:34

RubricsTree: Scalable and Evolving Open-Ended Evaluation of Personal Health Agents across Health Memory and Medical Skills

The LLM-empowered personal health agents with user health (sensor) metrics have offered a promising pathway to alleviate global disparities in healthcare access. However, large-scale clinical deployme…

来源：arXiv

模型发布/更新

6/17 00:30

HPE AI Factory With NVIDIA Expands for the Era of Agents

Enterprises are moving agentic AI from proof of concept to production — and the next generation of AI factories are built for the era of agents. At HPE Discover Las Vegas, running…

来源：NVIDIA

行业动态

6/17 00:16

AICon 深圳 2026 启动｜Agent 时代，哪些方向正在成为行业关键变量？

来源：InfoQ

模型发布/更新

6/16 23:46

Securing the future of AI agents

Securing internal systems with an AI Control Roadmap, combining traditional safeguards and real-time monitoring.

来源：Google DeepMind

产品发布/更新

6/16 17:13

Alisa0808/vibe-creating-skill

Open-source, bilingual AI video-prompt skill — rewrite ideas into model-ready text-to-video prompts. A portable Agent Skill (Claude Code, Codex, OpenClaw, Herme…

来源：GitHub

行业动态

6/16 14:59

Malaysia’s AI agent-powered messaging app Respond.io raises $62.5M, eyes acquisitions

Respond.io, one of Malaysia's startups to watch, uses AI agents to handle high volumes of customer inquiries and charges per convo, not per seat.

来源：TechCrunch

产品发布/更新

6/16 14:18

yzhao062/auditable

Audit any agent decision across its past, present, and future, on one typed graph.

来源：GitHub

产品发布/更新

6/16 14:18

yzhao062/auditable

Audit any agent decision across its past, present, and future, on one typed graph.

来源：GitHub

产品发布/更新

6/16 09:32

上线首月吸引 10 万开发者，AnySearch 为 Agent 解锁网页之外的世界

专为 Agent 设计的 AI 搜索层服务

来源：量子位

产品发布/更新

6/16 07:07

macOS 26.4 为何拦截部分终端命令？苹果解释背后安全触发机制

IT之家 6 月 16 日消息，苹果公司昨日（6 月 15 日）更新支持文档，解释称在 macOS 26.4 系统中，若用户不常用终端（Terminal），且命令来自网站、聊天智能体、消息或邮件应用，系统可能阻止粘贴。科技媒体 9to5Mac 指出，用户此前终端里粘贴命令后，系统会先给出安全警告，提示内容可能含有恶意代码。很多人只知道有这个拦截机制，但不…

AI 点评 · 安全机制升级针对非高频操作，防范恶意代码粘贴执行，体现系统防护精细化。

来源：IT之家

产品发布/更新

6/16 06:56

古尔曼：苹果有望推出 AI 智能体，让 Siri 自主操作 iPhone 和 Mac 软件

IT之家 6 月 16 日消息，彭博社记者马克 · 古尔曼认为，苹果最终可能推出一款产品，直接对标 OpenClaw—— 这是一套智能体 AI 系统，能够代用户自主操作各类软件。古尔曼在其专栏《Power On》中撰文表示，他预计苹果会研发一套系统，可全权代表用户操作 iPhone、iPad 与 Mac 端的各类软件。这一预测的依据，是苹果 Siri 工程…

AI 点评 · Siri从语音助手进化为自主操作系统的AI代理，标志苹果在智能体赛道的关键布局。

来源：IT之家

论文研究

6/16 04:00

OPD-Evolver: Cultivating Holistic Agent Evolver via On-Policy Distillation

Memory has become a standard substrate for self-evolving agents, yet retaining experience is not the same as learning how to evolve through it. Existing memory agents can store trajectories, retrieve…

来源：HuggingFace Papers

论文研究

6/16 04:00

GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine?

Game generation is an emerging application of coding agents, requiring models to transform natural-language specifications into playable interactive systems. Unlike traditional coding tasks, game gene…

来源：HuggingFace Papers

论文研究

6/16 04:00

From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning

Reinforcement learning pipelines for Large Language Model (LLM) training often rely on manually redesigned environments between stages, requiring practitioners to heuristically infer which configurati…

来源：HuggingFace Papers

论文研究

6/16 04:00

CEO-Bench: Can Agents Play the Long Game?

Language model agents are becoming proficient executors at isolated, short-horizon tasks such as software engineering and customer service. Yet real-world challenges require a combination of sophistic…

来源：HuggingFace Papers

论文研究

6/16 04:00

Guava: An Effective and Universal Harness for Embodied Manipulation

Language models trained on large-scale vision-language data have demonstrated strong potential for embodied agents. Harnessing models through embodied tools use offers a promising alternative to end-t…

来源：HuggingFace Papers

论文研究

6/16 04:00

LegalHalluLens: Typed Hallucination Auditing and Calibrated Multi-Agent Debate for Trustworthy Legal AI

AI systems deployed in legal workflows hallucinate at rates that aggregate metrics report at ~52%, but this average conceals where errors concentrate and in which direction they run, leaving complianc…

来源：HuggingFace Papers

行业动态

6/16 02:39

智能体加速金融创新 | TF技术前线179期报名

AI 点评 · 智能体正重塑金融效率，这场技术前沿分享值得从业者深挖。

来源：InfoQ

技巧与观点

6/16 02:07

AI Agent Failure Detection and Root Cause Analysis with Strands Evals

In this post, we walk you through calling the detector functions to diagnose real agent failures. You learn how to interpret their structured output: categorized failures with conf…

来源：AWS ML

论文研究

6/16 01:59

Context-Aware RL for Agentic and Multimodal LLMs

Large language models (LLMs) often fail when answering requires identifying a small but decisive piece of evidence within a long or complex context, such as a single line in a tool trace or a subtle d…

来源：arXiv

论文研究

6/16 01:56

Benchmarking LLM Agents on Meta-Analysis Articles from Nature Portfolio

Meta-analysis is a demanding form of evidence synthesis that combines literature retrieval, PI/ECO-guided study selection, and statistical aggregation. Its structured, verifiable workflow makes it an…

来源：arXiv

论文研究

6/16 01:52

DEEPRUBRIC: Evidence-Tree Rubric Supervision for Efficient Reinforcement Learning of Deep Research Agents

Deep research agents synthesize long-form reports by searching and reasoning over retrieved evidence. Reinforcement learning with rubric-based rewards improves these agents by optimizing them against…

来源：arXiv

技巧与观点

6/16 01:19

datasette-agent 0.3a0

Release: datasette-agent 0.3a0 New tool, execute_write_sql , which requests user approval and then writes to a database - taking user permissions into account. #27 I added a mechan…

来源：Simon Willison

行业动态

6/15 22:34

Salesforce acquires AI customer service platform Fin for $3.6B

Salesforce says it wants to use Fin's team and technology to improve Agentforce, its existing enterprise platform that businesses can use to build custom AI agents that automate ta…

来源：TechCrunch

技巧与观点

6/15 21:56

Build context-rich research agents with Deep Agents and Bedrock AgentCore

In this post, you'll build a competitive research agent that demonstrates this pattern end to end. This walkthrough targets developers building multi-step AI workflows who need iso…

来源：AWS ML

行业动态

6/15 21:00

As AI agents become employees, NewCore emerges with $66M to give them identities

NewCore argues the next challenge in enterprise security will be managing AI agents, not people.

来源：TechCrunch

产品发布/更新

6/15 20:42

SantanderAI/ralph-vault-skill

Skill to generate the knowledge vault for projects using the Ralph loop

来源：GitHub

行业动态

6/15 18:31

Coding Agent 技术全景图：Context Engineering、Subagents 与 Harness，一年范式转移全解析

来源：InfoQ

行业动态

6/15 18:20

复杂业务场景下 RCA Agent 的探索实践

来源：InfoQ

产品发布/更新

6/15 17:15

volcengine/ark-cli

The fastest way to put Volcengine Ark in your terminal and your AI agent — go from prompt to generated media, multimodal answer, or deployed endpoint in a sin…

来源：GitHub

产品发布/更新

6/15 11:40

Agent时代，华为云开始重新造地基了

Agentic新基建

来源：量子位

产品发布/更新

6/15 07:02

Egoist-Machines/etchplan

Compile an AI agent's repeated workflows into deterministic, auditable routines that replay for free, with a fallback to the agent.

来源：GitHub

论文研究

6/15 04:00

VisualClaw: A Real-Time, Personalized Agent for the Physical World

Vision language models are serving as general-purpose interfaces for complex multimodal tasks. However, deployment still faces three gaps: VLMs typically incur high latency and cost when processing de…

来源：HuggingFace Papers

论文研究

6/15 04:00

TokenPilot: Cache-Efficient Context Management for LLM Agents

As LLM agents are deployed in long-horizon sessions, context accumulation drives up inference costs. Existing approaches utilize text pruning or dynamic memory eviction to minimize token footprints; h…

来源：HuggingFace Papers

论文研究

6/15 04:00

Verified Detection and Prevention of Concurrency Anomalies in Multi-Agent Large Language Model Systems

Multi-agent LLM systems share state through memory stores, vector indices, and tool registries. We model such sharing as long-running read-generate-write operations under deterministic-generation sema…

来源：HuggingFace Papers

论文研究

6/15 04:00

ProCUA-SFT Technical Report

Training computer-use agents (CUAs) -- models that interact with graphical desktops through screenshots and keyboard/mouse actions -- requires large-scale, diverse trajectory data collected in full de…

来源：HuggingFace Papers

论文研究

6/15 04:00

LectūraAgents: A Multi-Agent Framework for Adaptive Personalized AI-Assisted Learning and Embodied Teaching

Effective personalized AI-assisted learning demands systems that can not only generate accurate learner-specific educational materials, but also dynamically adapt their instruction to diverse learners…

来源：HuggingFace Papers

论文研究

6/15 04:00

MyPCBench: A Benchmark for Personally Intelligent Computer-Use Agents

Current benchmarks for computer-use agents evaluate models in impersonal environments. This leaves a gap between evaluation and deployment where personal assistants are expected to work across a user'…

来源：HuggingFace Papers

论文研究

6/15 04:00

Context-Aware RL for Agentic and Multimodal LLMs

来源：HuggingFace Papers

论文研究

6/15 04:00

MemSlides: A Hierarchical Memory Driven Agent Framework for Personalized Slide Generation with Multi-turn Local Revision

Personalized presentation generation requires more than conditioning on a current prompt or template: agents must preserve stable user preferences across tasks, retain newly introduced preferences and…

来源：HuggingFace Papers

论文研究

6/15 04:00

CoffeeBench: Benchmarking Long-Horizon LLM Agents in Heterogeneous Multi-Agent Economies

As LLM agents become capable of increasingly long-horizon tasks, evaluating their performance in economic systems is becoming increasingly important. Unlike existing benchmarks that primarily evaluate…

来源：HuggingFace Papers

产品发布/更新

6/14 08:36

001TMF/harness-forge

Turn Claude Code into its own Meta-Harness — a skill that evolves the scaffolding around a fixed model (memory, retrieval, context, prompts) via a native propos…

来源：GitHub

产品发布/更新

6/14 07:28

谷歌推出搜索智能体功能，可主动帮你盯全网信息

IT之家 6 月 14 日消息，日常上网搜索时，往往需要停下手中的事、打开标签页，主动去查找最新信息。如今，谷歌打算彻底改变这种使用模式。继在 2026 年谷歌开发者大会上首次预告后，谷歌现已正式在 AI 模式中推出搜索智能体功能。此次升级将传统搜索引擎转变为可在后台静默运行的主动式助手。首批上线的是信息智能体功能，它会主动全网监测信息，无需用户手动检索…

AI 点评 · 从被动检索到主动监测，AI搜索正从工具进化为管家，颠覆传统上网模式。

来源：IT之家

产品发布/更新

6/14 06:09

UCSC-VLAA/VisualClaw

Official Implementation of VisualClaw: A Real-Time, Personalized Agent for the Physical World

来源：GitHub

产品发布/更新

6/13 22:58

知名会计师事务所毕马威 AI 行业报告被指是“AI 写的”：充斥幻觉、错误百出

IT之家 6 月 13 日消息，去年 10 月，毕马威曾发布《总体体验：在智能体 AI 时代重新定义卓越》报告，讨论企业如何利用 AI 满足客户需求。然而据英国《金融时报》12 日报道，这份报告后来被“抓包”充斥 AI 幻觉：报告列举的多个智能体 AI 案例，要么并不存在，要么并不具备毕马威所描述的能力。 AI 内容检测工具开发商 GPTZero 的调查…

AI 点评 · 专业机构因依赖AI工具反被AI误导，暴露了当前生成式模型在事实核查上的致命短板。

来源：IT之家

模型发布/更新

6/13 22:56

智谱 AI 编程工具 ZCode 3.0 版本发布：切换自研 ZCode Agent 内核，深度适配 GLM-5.2

IT之家 6 月 13 日消息，智谱今日发布了 AI 编程工具 ZCode 3.0 新版本，深度适配 GLM-5.2 。官方表示，ZCode 3.0 全面切换自研 ZCode Agent 内核。针对满血 GLM 深度优化长程推理、工具调用和大型工程执行链路，整体任务完成效果已显著优于第三方 Agent；后续版本将聚焦自研 Agent 体验，不再内置…

AI 点评 · 自研内核替代第三方方案，长程推理与工程执行能力显著提升。

来源：IT之家

产品发布/更新

6/13 22:22

dzcmemory-web/bazi-ziwei-skill

AI 八字 + 紫微斗数排盘与综合印证 Skill：算法精准排盘（不靠 LLM 猜），三种分析模式，一键生成水墨风 HTML 命盘海报。兼容 Claude / Codex / Cursor / Workbuddy 等 SKILL.md Agent。

来源：GitHub

行业动态

6/13 17:44

Show HN: Paca – Lightweight Jira alternative for human-AI collaboration

I built Paca out of pure passion—a free and lightweight Jira alternative written in Go where humans and AI agents work together as equal teammates to plan sprints and assign tasks…

AI 点评 · 轻量级AI协作工具挑战Jira，用Go开发的开源项目实现人机平等分工，值得关注。

来源：Hacker News

技巧与观点

6/13 16:11

Agent终于长出了身体：Jiuwen Symbiosis背后的思考与实践

一起构建下一代物理世界的智能系统

来源：量子位

产品发布/更新

6/13 11:15

eli-labz/Third-Eye

A production-grade OSINT platform that provides situational awareness across multiple intelligence domains.

来源：GitHub

模型发布/更新

6/13 05:00

NVIDIA Blackwell Leads on First Agentic AI Infrastructure Benchmark

AgentPerf from Artificial Analysis, the industry’s first agentic AI benchmark, gives developers, enterprises and infrastructure providers a clear way to compare systems for agentic…

AI 点评 · NVIDIA在首个智能体AI基准测试中夺冠，为行业选择基础设施提供关键参考。

来源：NVIDIA

技巧与观点

6/13 04:43

Building Supercharger: How Rocket Close optimized title operations with agentic AI

In this post, we explore how Rocket Close built a solution using Strands Agents, large language models (LLMs), Amazon Bedrock, Amazon Bedrock Knowledge Bases, and Model Context Pro…

AI 点评 · AI代理优化标题操作的实践案例，展示多工具协同的落地应用。

来源：AWS ML

论文研究

6/13 04:00

Ling and Ring 2.6 Technical Report: Efficient and Instant Agentic Intelligence at Trillion-Parameter Scale

Efficient and scalable agentic intelligence requires models that can deliver both low-latency responses and strong reasoning capabilities while remaining practical to train, serve, and deploy. In this…

来源：HuggingFace Papers

论文研究

6/13 04:00

CODA-BENCH: Can Code Agents Handle Data-Intensive Tasks?

Advanced agents are increasingly demonstrating the potential to operate as autonomous engineers, creating a growing demand for evaluation benchmarks that capture the complexity of real-world developme…

来源：HuggingFace Papers

论文研究

6/13 04:00

Visual-Seeker: Towards Visual-Native Multimodal Agentic Search via Active Visual Reasoning

Multimodal large language models (MLLMs) have demonstrated impressive capabilities in many visual tasks, but they often struggle with factual grounding when confronted with complex, open-world scenari…

来源：HuggingFace Papers

论文研究

6/13 04:00

Beyond Monolingual Deep Research: Evaluating Agents and Retrievers with Cross-Lingual BrowseComp-Plus

Deep research agents are increasingly evaluated on their ability to search for evidence, reason over retrieved sources, and produce grounded answers. Existing browsing benchmarks, however, largely ass…

来源：HuggingFace Papers

论文研究

6/13 01:55

Learning Coordinated Preference for Multi-Objective Multi-Agent Reinforcement Learning

Cooperative multi-objective multi-agent reinforcement learning (MOMARL) models team decision making under multiple, potentially conflicting objectives. In this setting, conflicts arise not only across…

来源：arXiv

论文研究

6/13 01:39

AgentSpec: Understanding Embodied Agent Scaffolds Through Controlled Composition

LLM agents are increasingly built not as single model calls, but as scaffolded systems that combine reasoning, memory, reflection, action execution, and learning. While such scaffolds often improve pe…

来源：arXiv

论文研究

6/13 01:39

Towards Direct Latent-Space Synthesis for Parallel Branches in LLM-Agent Workflows

Large language models increasingly serve as execution engines for agentic systems, yet they still consume context through a sequential text interface. This creates a mismatch with modern structured ag…

来源：arXiv

行业动态

6/13 00:58

Launch HN: BitBoard (YC P25) – Analytics Workspace for Agents

We’re Connor and Ambar from BitBoard ( https://bitboard.work ). BitBoard is an agentic analytics workspace. We give you the infrastructure and visualization layer to analyze data w…

来源：Hacker News

产品发布/更新

6/12 23:19

华为余承东：鸿蒙 HarmonyOS 成为中国第二大智能手机操作系统

IT之家 6 月 12 日消息，在今日的华为开发者大会 HDC 2026 上，华为常务董事、产品投资评审委员会主任、终端 BG 董事长余承东发布了新一代鸿蒙 HarmonyOS 7 操作系统，围绕互联、智能、安全、流畅、空间感五个维度进行升级，同时迈向智能体时代。余承东在 HDC 2026 现场宣布，鸿蒙 HarmonyOS 已成为中国第二大智能手机操作…

AI 点评 · 标志着国产操作系统突破安卓和iOS垄断，生态建设进入新里程碑。

来源：IT之家

行业动态

6/12 23:07

Snowflake 迈向 Agentic Enterprise 的关键一跃

AI 点评 · 从数据平台转向智能体企业，Snowflake正定义AI落地的下一个关键战场。

来源：InfoQ

技巧与观点

6/12 22:49

Build a meeting prep and follow-up assistant with Amazon Quick and Cisco Webex MCP servers

This post shows how to build a custom meeting prep and follow-up assistant using Amazon Quick and Cisco Webex MCP servers. From a single prompt, the agent finds an upcoming Webex m…

来源：AWS ML

行业动态

6/12 18:21

Snowflake CoCo、CoWork 亮相顶级玩家如何定义 Agentic AI 未来｜奇遇旧金山——探访 Snowflake Summit 26（EP2）

来源：InfoQ

模型发布/更新

6/12 18:00

New OpenAI Academy courses for the next era of work

OpenAI introduces three Academy courses that help people build practical AI skills, create repeatable workflows, and apply agents in everyday work.

来源：OpenAI

行业动态

6/12 12:42

AI agent bankrupted their operator while trying to scan DN42

来源：Hacker News

产品发布/更新

6/12 12:13

“智能体最后的考试”，Fable 5竟然不敌GPT 5.5

最难档通通零蛋

来源：量子位

产品发布/更新

6/12 08:52

DietrichGebert/ponytail

Makes your AI agent think like the laziest senior dev in the room. The best code is the code you never wrote.

来源：GitHub

行业动态

6/12 07:24

OpenAI 收购初创公司 Ona，强化编程助手 Codex

IT之家 6 月 12 日消息，OpenAI 昨天宣布收购初创公司 Ona，为编程助手 Codex 提供安全、预配置云环境。 IT之家从官方新闻稿获悉，Ona 的技术将帮助 Codex 执行持续时间更长的任务，并帮助用户将 AI 智能体部署到生产环境。同时， Ona 的全新技术将帮助企业更好掌控基础设施、数据资产和安全边界，让 Codex 能够在安全…

AI 点评 · 收购Ona补齐Codex短板，从代码生成迈向安全部署，标志AI编程助手进入企业级实战阶段。

来源：IT之家

论文研究

6/12 04:00

LoSoNA: A Benchmark for Local Social Norm Adaptation in Group Conversations

Online group chats are social spaces with local conversational norms that are rarely stated explicitly. The ability and willingness of LLM-based agents to recognize and adapt to these norms remains mo…

来源：HuggingFace Papers

论文研究

6/12 04:00

LLM Agents Can See Code Repositories

Coding agents powered by large language models have demonstrated strong performance on software engineering tasks. Yet most agents consume repositories almost entirely as text, which differs from how…

来源：HuggingFace Papers

论文研究

6/12 04:00

HarnessX: A Composable, Adaptive, and Evolvable Agent Harness Foundry

AI agent performance depends critically on the runtime harness, comprising the prompts, tools, memory, and control flow that mediate how a model observes, reasons, and acts. Yet today's harnesses rema…

来源：HuggingFace Papers

论文研究

6/12 04:00

PhoneHarness: Harnessing Phone-Use Agents through Mixed GUI, CLI, and Tool Actions

Phone agents are increasingly expected to complete real mobile workflows rather than merely predict the next screen action. However, much of the current mobile-agent literature still evaluates agents…

来源：HuggingFace Papers

论文研究

6/12 04:00

Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

We introduce Nemotron 3 Ultra, a 550 billion total and 55 billion active parameter Mixture-of-Experts Hybrid Mamba-Attention language model. We pre-trained Nemotron 3 Ultra on 20 trillion text tokens,…

来源：HuggingFace Papers

论文研究

6/12 04:00

FastContext: Training Efficient Repository Explorer for Coding Agents

Large Language Model (LLM) coding agents have achieved strong results on software engineering tasks, yet repository exploration remains a major bottleneck: locating relevant code consumes substantial…

来源：HuggingFace Papers

论文研究

6/12 04:00

Dr-DCI: Scaling Direct Corpus Interaction via Dynamic Workspace Expansion

Agentic search over large corpora relies on retriever-mediated interfaces (e.g., BM25 or ColBERT) for scalable candidate discovery. While effective at ranking relevant documents, these interfaces expo…

来源：HuggingFace Papers

论文研究

6/12 01:58

Agents-K1: Towards Agent-native Knowledge Orchestration

Current LLM-based research agents have advanced through agent orchestration, yet largely overlook scientific knowledge orchestration. Existing works often reduce papers to abstracts, surface mentions,…

来源：arXiv

论文研究

6/12 01:56

HyperTool: Beyond Step-Wise Tool Calls for Tool-Augmented Agents

Tool-augmented LLM agents commonly rely on step-wise atomic tool calls, where each invocation, observation, and value transfer is exposed in the main reasoning trace. This creates an \emph{execution-g…

来源：arXiv

论文研究

6/12 01:47

Recursive Agent Harnesses

Recursive language models (RLMs) showed that recursion over model calls is an effective strategy for long-context reasoning, and production coding agents have begun to write code that spawns subagents…

来源：arXiv

行业动态

6/12 01:34

微软 Foundry 新增生产级智能体运行时、工具链与管控能力

AI 点评 · 微软补齐智能体生产部署关键一环，企业级AI落地更稳。

来源：InfoQ

行业动态

6/12 01:32

打破“人月神话”，Agent 重塑风控场景产运研职能

AI 点评 · Agent技术突破传统风控效率瓶颈，实现产运研全链路智能化协同。

来源：InfoQ

行业动态

6/12 00:59

构建 Coding Agent 的飞轮：Feedback Loop、Benchmark、Agent Engineers

AI 点评 · 建立反馈闭环与评测基准，是让AI编码智能体持续进化的核心驱动力。

来源：InfoQ

技巧与观点

6/11 23:49

Evaluate AI agents systematically with Agent-EvalKit

Agent-EvalKit is an open-source toolkit (Apache 2.0) that makes this evaluation infrastructure available by integrating with AI coding assistants, including Claude Code, Kiro CLI,…

AI 点评 · 开源工具填补AI智能体评估基础设施空白，让开发者能系统化测试性能。

来源：AWS ML

行业动态

6/11 23:48

Agentic Enterprise，成为硅谷现场最强信号｜奇遇旧金山——探访 Snowflake Summit 26（EP1）

来源：InfoQ

产品发布/更新

6/11 20:18

omnigent-ai/omnigent

Omnigent is an open-source AI agent framework and meta-harness: orchestrate Claude Code, Codex, Cursor, Pi, and custom agents — swap harnesses without rewriting…

来源：GitHub

行业动态

6/11 19:57

Show HN: Fata – Spaced repetition to fight skill rot from AI coding

Hi HN, I'm Djoumé. I've been a developer for over 20 years, and like a lot of you I've been coding almost exclusively through an agent in the past few months. It's been amazing to…

来源：Hacker News

行业动态

6/11 19:00

Google DeepMind is worried about what happens when millions of agents start to interact

Google DeepMind is funding research into the potential dangers of situations where millions of different AI agents interact with each other online. According to Rohin Shah, who dir…

来源：MIT Tech Review

产品发布/更新

6/11 17:00

HarryHsing/OmniAgent

OmniAgent (ICML 2026): the first native omni-modal agent for active video perception — a 7B agent that beats Qwen2.5-VL-72B with 73% fewer frames on LVBench.

来源：GitHub

产品发布/更新

6/11 15:25

1290万高考生看过来！阿里出了个志愿填报Agent，免费的

前期做了40万AI考生压测

来源：量子位

模型发布/更新

6/11 12:18

3D创作迎来ChatGPT时刻：Meshy发布全球首个3D AI Agent

Meshy发布全球首个3D AI Agent

来源：量子位

行业动态

6/11 08:10

AI agent runs amok in Fedora and elsewhere

来源：Hacker News

模型发布/更新

6/11 08:00

OpenAI to acquire Ona

OpenAI plans to acquire Ona to expand Codex with secure, persistent cloud environments, enabling long-running AI agents across enterprise workflows.

来源：OpenAI

技巧与观点

6/11 07:57

datasette-agent 0.2a0

Release: datasette-agent 0.2a0 Highlights from the release notes: Tools can now ask the user questions mid-execution. Tools that declare a context parameter receive a ToolContext o…

来源：Simon Willison

论文研究

6/11 04:00

See What I See, Know What I Think: Dense Latent Communication Across Heterogeneous Agents

Multi-agent systems communicate mostly through text, paying a lossy and expensive decode and re-encode cost. KV-cache communication is a promising alternative, yet most prior work is homogeneous, usin…

来源：HuggingFace Papers

论文研究

6/11 04:00

Getting Better at Working With You: Compiling User Corrections into Runtime Enforcement for Coding Agents

Interactive LLM agents are becoming part of daily work, but they do not reliably become easier to work with over time: a correction remembered in one session may still be violated in the next. We stud…

来源：HuggingFace Papers

论文研究

6/11 04:00

ArogyaSutra: A Multi-Agent Framework for Multimodal Medical Reasoning in Indic Languages

Multimodal Large Language Models (MLLMs) have shown promising reasoning capabilities in general domains, yet their performance remains limited in specialized settings such as healthcare, especially in…

来源：HuggingFace Papers

论文研究

6/11 04:00

MiniMax Sparse Attention

Ultra-long-context capability is becoming indispensable for frontier LLMs: agentic workflows, repository-scale code reasoning, and persistent memory all require the model to jointly attend over hundre…

来源：HuggingFace Papers

论文研究

6/11 04:00

EvoBrowseComp: Benchmarking Search Agents on Evolving Knowledge

Search Agents -- large language models augmented with search tools -- have intensified the need for future-proof evaluation benchmarks. Existing benchmarks such as BrowseComp rely on static knowledge,…

来源：HuggingFace Papers

论文研究

6/11 04:00

EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments

Large language model (LLM) agents have achieved strong performance on a wide range of benchmarks, yet most evaluations assume static environments. In contrast, real-world deployment is inherently dyna…

来源：HuggingFace Papers

论文研究

6/11 04:00

EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery

LLM-based agents have shown increasing potential in automating scientific discovery. Given an optimizable metric and an execution environment, they can propose, validate, and iterate scientific soluti…

来源：HuggingFace Papers

论文研究

6/11 04:00

InterleaveThinker: Reinforcing Agentic Interleaved Generation

Recent image generators have demonstrated impressive photorealism and instruction-following capabilities in single-image generation and editing. However, constrained by their architectures, they canno…

来源：HuggingFace Papers

论文研究

6/11 04:00

SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning

Spatial reasoning, the ability to determine where objects are, how they relate, and how they move in 3D, remains a fundamental challenge for vision-language models (VLMs). Tool-augmented agents attemp…

来源：HuggingFace Papers

论文研究

6/11 04:00

HarnessBridge: Learnable Bidirectional Controller for LLM Agent Harness

Large language models are increasingly deployed as agents for long-horizon tasks, yet their performance is shaped not only by model capability and environment design, but also by the harness that medi…

来源：HuggingFace Papers

论文研究

6/11 04:00

The Price of Anarchy in Disaggregated Inference

Disaggregated inference architectures physically separate prefill and decode phases onto distinct GPU pools, creating competing "agents" that share a fixed hardware budget. We provide, to our knowledg…

来源：HuggingFace Papers

论文研究

6/11 04:00

LingxiDiagBench: A Multi-Agent Framework for Benchmarking LLMs in Chinese Psychiatric Consultation and Diagnosis

Mental disorders are highly prevalent worldwide, but the shortage of psychiatrists and the inherent subjectivity of interview-based diagnosis create substantial barriers to timely and consistent menta…

来源：HuggingFace Papers

行业动态

6/11 02:30

OpenAI 为 Codex 智能体打造安全的 Windows 沙盒

AI 点评 · 为AI代码助手构建安全运行环境，打通企业级应用关键安全瓶颈。

来源：InfoQ

论文研究

6/11 01:59

Context-Driven Incremental Compression for Multi-Turn Dialogue Generation

Modern conversational agents condition on an ever-growing dialogue history at each turn, incurring redundant attention and encoding costs that grow with conversation length. Naive truncation or summar…

来源：arXiv

论文研究

6/11 01:58

DIRECT: When and Where Should You Allocate Test-Time Compute in Embodied Planners?

Vision-Language Models (VLMs) are increasingly deployed as high-level planners for embodied agents, with an emerging strategy of scaling test-time compute to improve capability. However, we observe th…

来源：arXiv

论文研究

6/11 01:47

APPO: Agentic Procedural Policy Optimization

Recent advances in agentic Reinforcement Learning (RL) have substantially improved the multi-turn tool-use capabilities of large language model agents. However, most existing methods assign credit ove…

来源：arXiv

论文研究

6/11 01:38

UniIntervene: Agentic Intervention for Efficient Real-World Reinforcement Learning

Human-in-the-loop reinforcement learning (HiL-RL) has emerged as an effective paradigm for real-world robotic manipulation, enabling online policy improvement with human guidance. However, current HiL…

来源：arXiv

行业动态

6/11 00:33

从 Computer Use到 Datacenter Use：如何让 AI Agent 像调用函数一样驱动数据中心？

来源：InfoQ

产品发布/更新

6/11 00:30

DizzyMii/fable-skills

Six Claude Code skills that harden Opus 4.8 toward frontier behavior — written by Fable 5, pressure-tested on the target model with transcripts included.

来源：GitHub

技巧与观点

6/10 23:26

Stop hand-tuning kernels: How Neuron Agentic Development accelerates AWS Trainium optimizations

Today, we’re announcing the Neuron Agentic Development capabilities: a collection of AI agents and skills that make this possible for developers building on AWS Trainium and AWS In…

来源：AWS ML

行业动态

6/10 23:01

Apache Burr: Build reliable AI agents and applications

来源：Hacker News

行业动态

6/10 23:00

Datadog veterans launch AI coding startup Niteshift on a bet against Big AI lock-in

AI coding agent startup Niteshift has raised a $7 million seed round from a who's who of angels. It's betting companies will want power over, not lock-in with model makers.

来源：TechCrunch

论文研究

6/10 21:47

APPO: Agentic Procedural Policy Optimization

来源：HuggingFace Papers

行业动态

6/10 21:44

Lua.ex: Sandboxed Lua 5.3 on the Beam, Built for AI Agents · Lua.ex

来源：Hacker News

行业动态

6/10 21:39

A €0.01 bank transfer could compromise a banking AI agent

来源：Hacker News

行业动态

6/10 21:33

Jedify raises $24M to help companies arm AI agents with context on their business

The funding round was led by Norwest, with participation from S Capital VC, Cerca Partners, and Oceans Ventures. Snowflake Ventures also participated as a strategic investor.

来源：TechCrunch

模型发布/更新

6/10 18:21

Investing in multi-agent AI safety research

Google DeepMind and partners announce a $10M funding call for multi-agent safety research.

来源：Google DeepMind

行业动态

6/10 18:00

Qoder CLI：从 Coding Agent 到企业级 AI 应用基础设施｜AICon上海

来源：InfoQ

产品发布/更新

6/10 15:46

chuspeeism/dashiAI-ppt-skill

An AI-agent skill that generates browser-editable presentations from multiple visual themes, exportable to HTML, PDF, and PPTX.

来源：GitHub

行业动态

6/10 14:51

百度智能云与FluxA达成战略合作，共建 Agent 经济全球支付基础设施

诚邀30家OPC入驻内测

来源：量子位

产品发布/更新

6/10 09:53

Stunspot/stunspots-guide-to-ai-systems

Operational doctrine for practical AI systems design.

来源：GitHub

行业动态

6/10 07:41

高知特等六大集成商联手推广Rubrik AI安全平台

Rubrik公司周二在Rubrik FORWARD大会上宣布，六家全球系统集成商将为企业客户部署其专为Anthropic Claude Code打造的Rubrik Agent Cloud平台。加入“沙漏计划”的合作伙伴包括高知特、德勤、LTM、HCL科技、NTT Data以及威普罗，这些集成商将把该平台整合至各自的网络安全与数字化转型服务体系中。（新浪财经）

AI 点评 · 六大集成商联合推广，标志AI安全平台从技术验证进入规模化落地阶段。

来源：36氪

行业动态

6/10 07:35

毕马威与微软扩大合作，向逾27万员工全面铺开Copilot并引入Agent 365治理框架

毕马威与微软6月9日宣布扩展全球战略合作关系，聚焦企业级AI智能体的规模化部署。根据协议，微软365 Copilot将向毕马威全球逾27.6万名专业人员全面推广；与此同时，毕马威将采用微软Agent 365平台，对其全球组织内及客户端的AI智能体实施统一管理、监控与安全治理。（界面）

AI 点评 · 毕马威27万员工全面接入Copilot，并首创Agent治理框架，预示企业级AI从工具应用迈入系统化

来源：36氪

行业动态

6/10 07:03

Show HN: Nucleus – A security-hardened, Nix-native container runtime

Hi HN, I've been building Nucleus, a lightweight Linux container runtime focused on two workloads: ephemeral AI-agent sandboxes and declarative NixOS services. It's a single Rust b…

来源：Hacker News

产品发布/更新

6/10 06:07

PolyHelper/polyhelper

Self-evolving cognitive AI exoskeleton. 10+ frontier models, 245 consensus methods, governed autonomous agents. Automotive, medical, legal, accessibility. 9.3M…

来源：GitHub

技巧与观点

6/10 05:35

Setting a custom price for a model in AgentsView

TIL: Setting a custom price for a model in AgentsView I've been really enjoying AgentsView by Wes McKinney as a tool for exploring my token usage across different coding agents run…

AI 点评 · 自定义AI模型定价功能，让用户按需付费，提升灵活性与成本控制。

来源：Simon Willison

论文研究

6/10 04:00

Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks

General-purpose agents such as OpenClaw are increasingly used as autonomous tool users, but their coding ability is difficult to measure under SWE-bench: a generic agent does not by itself satisfy the…

来源：HuggingFace Papers

论文研究

6/10 04:00

Agentic Environment Engineering for Large Language Models: A Survey of Environment Modeling, Synthesis, Evaluation, and Application

Environments serve as interactive systems for large language model (LLM) based agents across diverse scenarios and play a crucial role in driving the continual evolution of model capabilities. Despite…

来源：HuggingFace Papers

论文研究

6/10 04:00

InternVideo3: Agentify Foundation Models with Multimodal Contextual Reasoning

Recent progress in foundation models has shifted toward agentic behavior involving multi-step reasoning and tool use. However, open-source efforts largely focus on text-dominant settings, leaving long…

来源：HuggingFace Papers

论文研究

6/10 04:00

TreeSeeker: Tree-Structured Trial, Error, and Return in Deep Search

Deep search requires agents to answer complex questions through multi-step web search, browsing, evidence comparison, and synthesis. A central challenge is deciding how to search when several directio…

来源：HuggingFace Papers

论文研究

6/10 04:00

Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agents

Compact language models (LMs) reduce cost, latency, and deployment risk for tool agents. Yet MCP-style tool use requires more than isolated function calling: an agent must discover tools from live cat…

来源：HuggingFace Papers

论文研究

6/10 04:00

FORT-Searcher: Synthesizing Shortcut-Resistant Search Tasks for Training Deep Search Agents

Training deep search agents requires verifiable questions whose answers remain unavailable until sufficient evidence has been acquired through search. Existing synthesis methods often increase apparen…

来源：HuggingFace Papers

论文研究

6/10 04:00

Benchmarking AI Agents for Addressing Scientific Challenges Across Scales

AI agents are increasingly being developed to accelerate scientific discovery, yet their practical capabilities in real research settings remain poorly understood. Existing benchmarks for AI agents ra…

来源：HuggingFace Papers

论文研究

6/10 04:00

RedAct: Redacting Agent Capability Traces for Procedural Skill Protection

Users rely on execution traces to observe agent behavior, diagnose failures, and ensure accountability. These traces contain rich procedural detail, including tool invocations, intermediate decisions,…

来源：HuggingFace Papers

论文研究

6/10 04:00

Orchestra-o1: Omnimodal Agent Orchestration

The recent success of agent swarms has shifted the paradigm of large language model (LLM)-based agents from single-agent workflows to multi-agent systems, highlighting the importance of agent orchestr…

来源：HuggingFace Papers

论文研究

6/10 04:00

Notes2Skills: From Lab Notebooks to Certainty-Aware Scientific Agent Skills

Scientific discovery workflows usually contain and rely heavily on lab notes, where researchers record observations, interpret uncertain results, and plan follow-up experiments. Such informative lab n…

来源：HuggingFace Papers

行业动态

6/10 03:58

Grit: Rewriting Git in Rust with agents

AI 点评 · 用Rust重写Git核心工具，提升性能与安全性，展现AI辅助代码重构新可能。

来源：Hacker News

技巧与观点

6/10 03:38

Can Voice Agents Handle Bilingual Customers? Benchmarking Frontier ASR on Code-Switched Speech

AI 点评 · 语音代理处理双语用户的真实能力首次被量化评测，揭示多语言交互技术的重大突破。

来源：HuggingFace Blog

论文研究

6/10 01:51

Data Journalist Agent: Transforming Data into Verifiable Multimodal Stories

Data tells stories that shape society; the data journalist's job is to turn raw information into stories non-experts can trust. A high-quality news feature takes a newsroom team weeks: hunting for con…

来源：arXiv

论文研究

6/10 01:35

ABC-Bench: An Agentic Bio-Capabilities Benchmark for Biosecurity

Large language models (LLMs) are rapidly acquiring capabilities relevant to biological research, from literature synthesis to interpretation of experimental data. Increasingly, LLM agents can also per…

来源：arXiv

技巧与观点

6/10 00:43

Hands-free first notice of loss: Using Strands Agents and Amazon Bedrock AgentCore Browser Tool for intelligent claims intake

In this post, we demonstrate how a hands-free FNOL intake system combines agents built with the Strands Agents SDK for domain reasoning with Amazon Bedrock AgentCore Browser Tool f…

来源：AWS ML

行业动态

6/10 00:36

Multi-Agent+工作流如何打造更泛化的智能体应用

来源：InfoQ

技巧与观点

6/10 00:10

Build an agentic incident triage assistant with Amazon Quick and New Relic

This post shows engineering teams how to apply that principle to one of the most time-sensitive workflows in engineering: incident triage. You will build a custom incident triage a…

来源：AWS ML

行业动态

6/10 00:06

Show HN: Claw Patrol, a security firewall for agents

At Deno we've been using OpenClaw and other agents increasingly for addressing production problems in Deno Deploy - when a PagerDuty alert fires, the agent starts researching the c…

来源：Hacker News

产品发布/更新

6/9 23:25

keyuchen21/agentic-engineering-handbook

The definitive OpenAI, Claude, MCP, Harness, Evals, and Production Agent Systems learning roadmap.

来源：GitHub

论文研究

6/9 21:57

EEVEE: Towards Test-time Prompt Learning in the Real World for Self-Improving Agents

In this paper, we propose EEVEE, the first multi-dataset test-time prompt learning framework for LLM agents, enabling test-time prompt learning under real-world task streams. Existing methods are larg…

来源：HuggingFace Papers

论文研究

6/9 21:16

TRACE: A Unified Rollout Budget Allocation Framework for Efficient Agentic Reinforcement Learning

Reinforcement learning with verifiable rewards (RLVR) is a promising approach for enhancing reasoning and agentic behavior in large language models. However, rollout-intensive policy optimization is o…

来源：HuggingFace Papers

行业动态

6/9 20:48

氪星晚报｜腾讯、阿里等入股脑机接口研发商阶梯医疗；飞猪：端午假期入境游预订量同比增长超6倍；1—5月全国期货市场累计成交额同比增长40.13%

大公司：猫眼娱乐：作为首批内测开发者接入微信AI生态布局 36氪获悉，猫眼娱乐宣布作为微信AI生态首批内测开发者之一，旗下小程序接入微信AI Agent生态，借助微信AI Agent能力，为用户提供影片演出推荐、附近影院筛选、智能选座、一键支付等服务。京东方A：控股子公司拟终止向不特定合格投资者公开发行股票并撤回申请文件 36氪获悉，京东方A公告，公司控…

来源：36氪

产品发布/更新

6/9 19:20

wanshuiyin/ARIS-Movie-Director

Agentic, long-horizon visual generation: a fuzzy story → a cross-model-audited image-based movie. Brings ARIS's research-wiki + multi-agent debate to multimodal…

来源：GitHub

技巧与观点

6/9 18:46

How an Agent Built a 3D Paris Gallery by Chaining Two Hugging Face Spaces

来源：HuggingFace Blog

论文研究

6/9 18:28

Role-Agent: Bootstrapping LLM Agents via Dual-Role Evolution

Although Large Language Model (LLM) agents have demonstrated strong performance on complex tasks, their learning is often limited by inefficient interaction feedback and static training environments,…

来源：HuggingFace Papers

行业动态

6/9 18:20

Learning to lead in a hybrid human-AI enterprise

As adoption of AI agents looks set to surge by as much as 300% in the next two years, leadership teams are carefully considering the implications of a hybrid human-AI workforce. Un…

来源：MIT Tech Review

产品发布/更新

6/9 15:11

与爱为舞亮相腾讯云AI产业应用大会，深耕教育大模型，打造下一代学习Agent

来源：量子位

产品发布/更新

6/9 14:28

cobusgreyling/loop-engineering

Practical patterns, starters & CLI tools for loop engineering with AI coding agents. Design systems that prompt and orchestrate agents (inspired by Addy Osmani…

来源：GitHub

产品发布/更新

6/9 09:24

腾讯想让企业打开AI的方式只剩一个

一个入口串起全栈智能体

AI 点评 · 整合企业AI入口，降低使用门槛，或重塑行业竞争格局。

来源：量子位

论文研究

6/9 04:00

Decentralized Multi-Agent Systems with Shared Context

Multi-agent systems (MAS) can scale large language model reasoning at test time by decomposing complex problems into parallel subtasks. However, most existing MAS rely on centralized orchestration, wh…

来源：HuggingFace Papers

论文研究

6/9 04:00

Kwai Keye-VL-2.0 Technical Report

We introduce Kwai Keye-VL-2.0-30B-A3B, an open-source Mixture-of-Experts (MoE) multimodal foundation model designed to advance long-video understanding and agentic intelligence. To address the challen…

来源：HuggingFace Papers

论文研究

6/9 04:00

Workflow-GYM: Towards Long-Horizon Evaluation of Computer-use Agentic tasks in Real-World Professional Fields

Recent years have witnessed the rapid evolution of AI agents toward handling increasingly complex, real-world tasks. However, existing benchmarks rarely evaluate whether agents can operate graphical u…

来源：HuggingFace Papers

论文研究

6/9 04:00

DeNovoSWE: Scaling Long-Horizon Environments for Generating Entire Repositories from Scratch

As the capabilities of LLM-based code agents continue to advance, their expected role is expanding beyond localized bug fixing in existing codebases toward architecting and implementing complete softw…

来源：HuggingFace Papers

论文研究

6/9 04:00

WebChallenger: A Reliable and Efficient Generalist Web Agent

Autonomous web navigation remains challenging for LLM agents, and the strongest generalist systems rely on proprietary reasoning models whose inference cost is prohibitive for the repetitive tasks whe…

来源：HuggingFace Papers

论文研究

6/9 04:00

The Arbiter Agent: Continually Monitoring Multi-Agent Conversations to Detect Emergent Misalignment

As AI systems built from multiple language-model agents become more common, they are increasingly used to make decisions together: discussing, negotiating, and acting on shared tasks. While individual…

来源：HuggingFace Papers

论文研究

6/9 04:00

Data Journalist Agent: Transforming Data into Verifiable Multimodal Stories

来源：HuggingFace Papers

行业动态

6/9 02:34

For the 2nd time in weeks, Microsoft packages laced with credential stealer

73 packages run self-replicating stealer as soon as they're opened by an AI agent.

AI 点评 · 微软供应链再遭投毒，AI代理环境成恶意软件新温床，安全防线亟待升级。

来源：Ars Technica

行业动态

6/9 02:29

蚂蚁国际推出移动智能体协议AMP，海外AI支付迈向统一标准

AI 点评 · 蚂蚁国际推动AI支付标准统一，有望降低跨境交易壁垒，影响全球支付格局。

来源：InfoQ

论文研究

6/9 01:53

FASE: Fast Adaptive Semantic Entropy for Code Quality

Multi-agent code generation offers a promising paradigm for autonomous software development by simulating the human software engineering lifecycle. However, system reliability remains hindered by LLM…

AI 点评 · 提出自适应语义熵指标，精准衡量代码质量，突破多智能体编程可靠性瓶颈。

来源：arXiv

行业动态

6/9 01:48

不卷Token总量，华为云改卷token生产力：Agentic Infra背后，AI云竞争进入下半场

AI 点评 · 华为云转向提升token生产力，标志AI云竞争从规模转向效率与智能应用。

来源：InfoQ

论文研究

6/9 01:35

SIGA: Self-Evolving Coding-Agent Adapters for Scientific Simulation

Advanced scientific simulators expose specialized input languages that turn simulation goals into executable configurations, but learning them can cost domain scientists hours to days. We study simula…

AI 点评 · 用代码生成适配器降低科研模拟门槛，让科学家专注研究而非编程。

来源：arXiv

论文研究

6/9 01:27

iOSWorld: A Benchmark for Personally Intelligent Phone Agents

A useful phone agent needs to be personally intelligent. It should reason over a user's identity, history, and preferences as they exist on the device, not just follow isolated instructions in an impe…

AI 点评 · 首个衡量手机智能体个性化推理能力的基准，填补了当前AI助手忽略用户身份与历史数据的评测空白。

来源：arXiv

行业动态

6/9 01:07

全球近1.8万人参赛，科研智能体同台竞技，第四届世界科学智能大赛初赛收官、复赛开启

AI 点评 · 科学智能赛推动AI与科研融合，吸引近两万参赛者，展示前沿技术竞争态势。

来源：InfoQ

技巧与观点

6/9 00:35

It’s safe to close your laptop now: Hosting coding agents on Amazon Bedrock AgentCore

Amazon Bedrock AgentCore Runtime gives each agent session its own isolated microVM with a persistent workspace, secure tool access through Gateway, and built-in observability—so yo…

AI 点评 · 亚马逊推出隔离微虚拟机托管编码代理，兼顾安全与可观测性，或重塑云上AI开发流程。

来源：AWS ML

技巧与观点

6/8 23:57

Evaluate your Amazon Nova Sonic voice agent at scale, no microphone required

In this post, we walk you through the Nova Sonic Test Harness, an open source framework that we built to solve both problems. It serves as a rapid iteration tool for tuning system…

来源：AWS ML

产品发布/更新

6/8 23:48

duckbugio/flock

Autonomous AI dev-team bot

来源：GitHub

产品发布/更新

6/8 18:40

蚂蚁集团推出海外AI支付解决方案商户可实现全球智能体运营

协助用户与商家判断智能体可信赖程度

来源：量子位

行业动态

6/8 18:04

Agent正把基础设施逼到极限！GitLab盈利大涨后裁员350人，下一代Git重构已启动

AI 点评 · 企业级AI应用爆发，基础设施压力剧增，巨头被迫裁员并重构核心产品以应对未来挑战。

来源：InfoQ

产品发布/更新

6/8 17:01

腾讯汤道生评价姚顺雨、混元 3和元宝

文｜王毓婵编辑｜张雨忻 6月5日，腾讯云AI产业应用大会最受外界关注的是什么？毫无疑问，是汤道生与姚顺雨的对话。在这场发布了一系列覆盖20多个垂直场景Agent的大会上，因为产品过于To B，也没有提及大家最关注的“微信AI”，导致外界的关注重心几乎全部被那场对话吸引走。腾讯集团高级执行副总裁、云与智慧产业事业群CEO汤道生，在对谈中问腾讯首席AI科…

来源：36氪

产品发布/更新

6/8 16:42

OtterMind/Nubase

🔥🔥🔥 Turn AI-written code into real apps. Nubase is an open-source, AI-native backend platform for AI Coding, agentic applications, and modern product teams:…

来源：GitHub

模型发布/更新

6/8 16:10

tigicion/dao-code

Open-source TypeScript terminal coding agent for DeepSeek-V4 — builds on DeepSeek's strong price-performance and ultra-cheap cache pricing, engineering byte-sta…

来源：GitHub

论文研究

6/8 15:37

Experience Makes Skillful: Enabling Generalizable Medical Agent Reasoning via Self-Evolving Skill Memory

Medical agent systems are increasingly expected to support interactive clinical decision making rather than only static question answering. In such settings, effective agents must reuse prior experien…

来源：HuggingFace Papers

产品发布/更新

6/8 12:20

蚂蚁集团推出海外AI支付解决方案

36氪获悉，近日，蚂蚁国际面向全球电子钱包、超级应用和数字银行等移动支付服务正式推出移动智能体协议（Agentic Mobile Protocol，简称AMP），解决全球消费者在AI智能体（agents）中购物时支付便捷、安全、有保障，以及商家跨市场互联互通等智能体商业全球化运营的关键难题。

AI 点评 · 蚂蚁国际首创移动智能体协议，破解AI购物支付与全球互联难题，引领跨境支付新标准。

来源：36氪

技巧与观点

6/8 08:00

The Open Source Community is backing OpenEnv for Agentic RL

来源：HuggingFace Blog

技巧与观点

6/8 07:56

datasette-agent-edit 0.1a0

Release: datasette-agent-edit 0.1a0 I'm planning several plugins for Datasette Agent which can make edits to existing pieces of text - things like collaborative Markdown editing, u…

AI 点评 · AI文本编辑插件迈出关键一步，轻松实现多人协作修改，值得关注。

来源：Simon Willison

行业动态

6/8 07:03

京东与腾讯就AI Agent将达成重要合作

6月7日晚间消息，京东与腾讯将围绕AI Agent展开深度合作。依托京东的商品供应链、履约服务能力及腾讯的生态入口优势，双方将共同打造跨场景的智能化服务新范式，推动AI Agent从单点应用走向生态协同。（新浪科技）

AI 点评 · 京东与腾讯联手，AI Agent从单点走向生态协同，或重塑电商与社交的智能服务边界。

来源：36氪

论文研究

6/8 07:00

Hardening Agent Benchmarks with Adversarial Hacker-Fixer Loops

Agent benchmarks score submissions with outcome verifiers that are typically hand-written and brittle, leaving them open to reward hacking. We audit 1,968 tasks across five terminal-agent benchmarks a…

来源：HuggingFace Papers

论文研究

6/8 04:00

OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics

Vision-language model (VLM) agents are increasingly deployed in interactive game environments. Yet game benchmarks for VLM agents typically report a single first-attempt score per (agent, game) pair,…

来源：HuggingFace Papers

论文研究

6/8 04:00

SpatialWorld: Benchmarking Interactive Spatial Reasoning of Multimodal Agents in Real-World Tasks

Spatial reasoning is a foundational capability for multimodal large language models (MLLMs) to perceive and operate within the physical world. However, existing benchmarks predominantly rely on passiv…

来源：HuggingFace Papers

论文研究

6/8 04:00

PBSD: Privileged Bayesian Self-Distillation for Long-Horizon Credit Assignment

Long-horizon agentic tasks pose a fundamental credit assignment challenge for outcome-base reinforcement learning: trajectory-level rewards verify final correctness but provide limited guidance on whi…

来源：HuggingFace Papers

论文研究

6/8 04:00

SearchSwarm: Towards Delegation Intelligence in Agentic LLMs for Long-Horizon Deep Research

Large language models are increasingly expected to handle complex, long-horizon real-world tasks whose context demands can grow without bound, yet model context windows remain inherently finite. Recen…

来源：HuggingFace Papers

论文研究

6/8 04:00

Bridging the Agent-World Gap: Text World Models for LLM-based Agents

Large language model (LLM)-based agents are increasingly used in interactive textual environments, from web navigation and code editing to tool use and long-horizon dialogue. Yet many remain largely r…

来源：HuggingFace Papers

论文研究

6/8 04:00

τ-Rec: A Verifiable Benchmark for Agentic Recommender Systems

As recommender systems transition toward agentic, multi-turn conversational interfaces, evaluation paradigms have struggled to keep pace. Current benchmarks often rely on "LLM-as-a-judge" evaluations,…

来源：HuggingFace Papers

论文研究

6/8 04:00

Visual Para-Thinker++: A Single-Policy Multi-Agent Framework for Visual Reasoning

Visual reasoning requires integrating evidence distributed across regions, attributes, and relations, making single-chain reasoning prone to early perceptual commitment and hallucination. We propose V…

来源：HuggingFace Papers

论文研究

6/8 04:00

WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces

Computer-use agents (CUAs) increasingly operate in runtimes that combine visual desktop control, command-line execution, code editing, browsers, and external tools. Existing benchmarks, however, often…

来源：HuggingFace Papers

论文研究

6/8 04:00

AlloSpatial: Agentic Harness Framework for Spatial Reasoning in Foundation Models

Multimodal Foundation Models (MFMs) have made substantial progress, yet remain fragile in spatial reasoning over the physical world. A key bottleneck lies in their inability to transform local egocent…

来源：HuggingFace Papers

论文研究

6/8 04:00

iOSWorld: A Benchmark for Personally Intelligent Phone Agents

来源：HuggingFace Papers

产品发布/更新

6/8 03:58

fkiene/llmtrim

Local proxy that compresses your LLM API requests so you pay less, with no change to the answers. Trims wasted tokens from prompts, history, tool output, and co…

来源：GitHub

模型发布/更新

6/7 22:37

支持调用 Deepseek、Kimi 等模型，Agentic 华为云入口“智果园”发布

IT之家 6 月 7 日消息，华为云现已针对 Agentic AI 时代发布全新云入口“智果园”，新产品支持云码道 CodeArts 代码智能体、华为云 OfficeAce 办公智能体和 WorkAgent 文档智能体。据介绍，智果园拥有开发、办公等多种关键行业的智能体，可通过智果 AgentArts 平台打造更加实用的智能体，并通过 Skills、AI…

AI 点评 · 华为云整合多模型，降低企业AI应用门槛，凸显生态开放与行业落地潜力。

来源：IT之家

行业动态

6/7 18:23

消息称京东、腾讯联手，将围绕 AI Agent 展开合作

IT之家 6 月 7 日消息，据钛媒体今日消息，京东与腾讯已于近期联手，将围绕 AI Agent 展开合作。京东的商品供应链与履约服务体系，将与腾讯的入口资源进行对接。此外，消息称京东 AI Agent 与华为、OPPO、荣耀等多家主流终端厂商已进行对接。通过 A2A（Agent to Agent）合作，用户可直接在各终端原生智能体的京东 AI A…

AI 点评 · 巨头联手布局AI Agent，消费场景与社交入口的深度融合将加速智能体商业化落地。

来源：IT之家

模型发布/更新

6/7 15:00

NVIDIA, KRAFTON, NC and Reigning ‘League of Legends’ Champions T1 Celebrate RTX Spark at Korea’s PC Bangs

At GTC Taipei at COMPUTEX last week, NVIDIA unveiled RTX Spark, the superchip that reinvents Windows PCs for the era of personal AI agents. On the heels of this announcement, NVIDI…

AI 点评 · 英伟达联合韩国顶级游戏厂商和战队推广RTX Spark，标志AI PC芯片落地游戏场景。

来源：NVIDIA

模型发布/更新

6/7 14:24

patraxo/ltx2-vidgen-skill

Own your AI video pipeline. LTX-2.3 (22B) self-hosted on your Modal GPU via a Claude Code skill — t2v, i2v, keyframes, v2v + synced audio. ~$0.02 per 5s clip, i…

来源：GitHub

产品发布/更新

6/7 12:23

Light0305/Light-skills

Light — 全流程科研技能包：28 个技能覆盖文献调研到投稿全流程，配套 9 个可核查知识库。适配主流 AI 编程客户端。

来源：GitHub

产品发布/更新

6/7 12:23

Light0305/Light-skills

An AI workflow skill pack for research, competitions, and innovation projects.

来源：GitHub

产品发布/更新

6/7 11:01

tinyhumansai/tiny.place

The social economy for autonomous AI agents.

来源：GitHub

产品发布/更新

6/7 05:21

SpaceArmour/qxli-sovereign-ai-engine

Enterprise-grade sovereign AI for organizations where SaaS AI isn’t an option.

来源：GitHub

论文研究

6/7 04:00

PaperMentor: A Human-Centered Multi-Agent Writing Tutor for AI Research Papers on Overleaf

Expert writing feedback from experienced researchers is critical for early-career scholars to improve their manuscripts, yet high-quality feedback often remains scarce because reviewing research paper…

来源：HuggingFace Papers

产品发布/更新

6/6 21:49

微软想让用户对自家智能体 Scout 上瘾？CEO 纳德拉否认

IT之家 6 月 6 日消息，微软在 Build 2026 上与高通联合发布 Project Solara。Project Solara 主打“智能体优先计算”，系统内部运行 Agent Shell，并能动态加载、调整多个基于云端的 AI 智能体。微软 CEO 萨提亚 · 纳德拉表示：AI 智能体已经不再只是普通 AI 助手。“真正的平台转变正在发生。…

AI 点评 · 微软新项目争议背后，揭示AI从“助手”到“智能体”的平台级变革。

来源：IT之家

技巧与观点

6/6 18:00

上海人工智能实验室领军科学家胡侠确认出席AICon上海站，分享书安智能体操作系统的实践与思考

AI 点评 · 聚焦书安智能体操作系统实践，胡侠的分享将揭示AI系统底层架构的前沿突破。

来源：InfoQ

产品发布/更新

6/6 17:00

Next.js 16.2 发布：开发提速 4 倍、渲染性能优化，新增深度适配 AI 智能体的开发工具

AI 点评 · 性能翻倍且直指AI开发痛点，Next.js这次更新对全栈开发者极具吸引力。

来源：InfoQ

技巧与观点

6/6 06:18

Thousand Token Wood: shipping a multi-agent economy on a 3B model

AI 点评 · 用3B参数模型实现多智能体经济，低成本开源方案或将重塑AI生态。

来源：HuggingFace Blog

产品发布/更新

6/6 05:40

SawyerHood/omegacode

Code based orchestration for any coding agent.

来源：GitHub

论文研究

6/6 04:00

Bayesian-Agent: Posterior-Guided Skill Evolution for LLM Agent Harnesses

LLM agents increasingly rely on external inference conditions: prompts, tools, memory, SOPs, skills, and harness feedback. These assets can improve task execution without changing model weights, but t…

来源：HuggingFace Papers

论文研究

6/6 04:00

POISE: Position-Aware Undetectable Skill Injection on LLM Agents

Agent skills provide a lightweight mechanism for extending general-purpose agents, but their open format exposes them to skill-poisoning attacks. A practically dangerous injection must stay invisible:…

来源：HuggingFace Papers

行业动态

6/6 02:20

Harness engineering: Leveraging Codex in an agent-first world

AI 点评 · 用工程化思维把AI代码助手变成自主智能体，是开发效率的新突破。

来源：Hacker News

论文研究

6/6 01:59

Agentopia: Long-Term Life Simulation and Learning in Agent Societies

Humans learn from social life. Simulating this process with LLM-powered agents represents a promising research direction, raising a natural question: whether LLMs can learn from such simulated social…

AI 点评 · AI社会模拟研究突破：用LLM代理模拟人类长期社交学习，探索机器从社会互动中进化。

来源：arXiv

论文研究

6/6 01:59

MemDreamer: Decoupling Perception and Reasoning for Long Video Understanding via Hierarchical Graph Memory and Agentic Retrieval Mechanism

Current Vision-Language Models struggle with hours-long videos because processing full-length visual sequences induces prohibitive token explosion and attention dilution. To overcome this, we introduc…

AI 点评 · 分层记忆架构破解长视频理解瓶颈，用图记忆与智能检索分离感知推理，显著降低计算成本。

来源：arXiv

论文研究

6/6 01:51

Accelerated Decentralized Stochastic Gradient Descent for Strongly Convex Optimization

Decentralized stochastic optimization is a fundamental paradigm for large-scale learning over networks, where agents communicate only with their neighbors and no central coordinator is required. For s…

AI 点评 · 加速去中心化随机梯度下降，破解大规模网络学习效率瓶颈，强凸优化领域关键突破。

来源：arXiv

论文研究

6/6 01:45

How AI Agents Reshape Knowledge Work: Autonomy, Efficiency, and Scope

Frontier AI systems are bridging the gap between intelligence and utility by shifting from conversational assistants to autonomous agents that execute tasks end to end. Using production data from Perp…

AI 点评 · AI代理将知识工作从辅助变为自主执行，效率与范围迎来质变。

来源：arXiv

论文研究

6/6 01:13

Act As a Real Researcher: A Suite of Benchmarks Evaluating Frontier LLMs and Agentic Harnesses in Research Lifecycle

As foundation models advance and agent scaffolding becomes increasingly sophisticated, agents have demonstrated remarkable proficiency in complex, long-horizon coding tasks and even autonomous experim…

AI 点评 · 聚焦前沿模型在科研全流程的自主能力，评测框架填补了现有基准空白。

来源：arXiv

产品发布/更新

6/5 22:39

lamenting-hawthorn/SkillLoop

Standalone self-improvement harness for agent traces, memory, skills, evaluation, and fine-tuning exports

来源：GitHub

产品发布/更新

6/5 22:32

paxlabs-inc/matrix-core

Matrix is the cognition and UX layer on top of Paxeer Network. It turns natural-language requests from non-developers into a typed, inspectable, correctable Int…

来源：GitHub

产品发布/更新

6/5 21:16

有人靠CPU把AI算力密度卷到了新高度

Agentic AI的算力焦虑，英特尔给来了一剂「猛药」

AI 点评 · 用CPU突破AI算力瓶颈，英特尔开辟了低成本高性能的新路径。

来源：量子位

论文研究

6/5 20:33

Skill-3D: Evolving Scene-Aware Skills for Agentic 3D Spatial Reasoning

This paper explores agentic 3D spatial understanding, i.e., MLLM agents performing 3D reasoning through tool use. Existing methods often misuse tools and exhibit biased tool preferences under 3D scena…

来源：HuggingFace Papers

产品发布/更新

6/5 18:46

华为云发布Agentic AI系列新品打造智能时代“硅基黑土地”

来源：量子位

行业动态

6/5 18:00

阿里AAIG实验室AI红队负责人宋奇钊(胖錿)确认出席AICon上海站，分享REAL 智能体统一风险矩阵与自动化红队实践

来源：InfoQ

行业动态

6/5 17:00

The Meta hack shows there’s more to AI security than Mythos

On June 5, 404 Media reported that attackers had been using Meta’s AI customer support agent to steal Instagram accounts. Their approach was simple: They asked the agent to link th…

AI 点评 · Meta AI客服漏洞暴露了安全盲区，提醒行业警惕简单攻击手法。

来源：MIT Tech Review

产品发布/更新

6/5 15:57

Illuminfti/flow

A dynamic-workflow engine for any agent, any model: concurrent leaves, cost-aware routing, schema enforcement, crash-resume, bounded iterate-to-goal loops with…

来源：GitHub

产品发布/更新

6/5 15:56

hoolulu/deep-research

深度调研报告生成 Skill — 一条命令，十分钟出券商级深度调研报告 / Professional deep research report generation Skill · Supports 19 languages

来源：GitHub

产品发布/更新

6/5 11:39

微信AI对手机厂商打开一道窄门｜焦点分析

文｜王毓婵梁键强编辑｜张雨忻昨日，腾讯客服回应称，微信正在与华为、小米、荣耀、OPPO、vivo等手机厂商合作推出A2A助手能力，目前已有多家厂商完成接入。 “您可以通过对应手机系统的AI助手发起微信音视频通话或向指定好友发送消息。该功能基于A2A（Agent-to-Agent）协作机制，数据安全与隐私通过双重授权机制保障。合作旨在将微信高频沟…

AI 点评 · 微信与手机厂商打通AI协作，标志社交场景向多终端智能体互联迈出关键一步。

来源：36氪

产品发布/更新

6/5 07:46

8点1氪丨分析师曝苹果Vision Pro产品线被移除；黄仁勋将首次亮相综艺节目；粉笔CEO就骂人大学生“活该找不到工作”道歉

今日热点导览微信正与手机厂商合作推出Agent-to-Agent助手能力擅用“LABUBU”相近标识商业推广，泡泡玛特告奈雪的茶获赔32万腾讯客服回应与华为、小米等合作神农旅游集团就国道收费进行道歉苹果智能眼镜推迟至2029年，无显示屏AI眼镜仍将于2027年推出 TOP 3 大新闻分析师曝苹果Vision Pro产品线被移除，2027年推AI眼…

来源：36氪

产品发布/更新

6/5 07:31

苹果批准首个 iMessage AI 智能体，Poke 可回邮件也能设提醒

IT之家 6 月 5 日消息，科技媒体 Appleinsider 昨日（6 月 4 日）发布博文，报道称苹果批准 Poke 成为首个接入 Apple Messages for Business（苹果商务消息）平台的第三方 AI 智能体。 Apple Messages for Business 原本服务企业客服沟通的通道，苹果调整后开始承载更主动的 AI 助…

AI 点评 · 苹果开放iMessage接入AI，标志其生态向第三方智能体迈出关键一步。

来源：IT之家

行业动态

6/5 06:06

Show HN: Formally verified polygon intersection – Opus 4.8 oneshots, prev failed

To my knowledge, this is the first formally verified implementation of an intersection algorithm for polygons. The experience of working with AI agents on this project changed a lo…

来源：Hacker News

论文研究

6/5 04:00

Socratic-SWE: Self-Evolving Coding Agents via Trace-Derived Agent Skills

LLM-driven software engineering agents have become a central testbed for real-world language-model capability, yet their training remains limited by the availability of high-quality SWE tasks. Existin…

来源：HuggingFace Papers

论文研究

6/5 04:00

Towards Retrieving Interaction Spaces for Agentic Search

Retrieval for search agents is still inherited from non-agentic information retrieval: a retriever ranks the corpus and the agent reads a small set of returned documents. Recent direct corpus interact…

来源：HuggingFace Papers

论文研究

6/5 04:00

DuMate-DeepResearch: An Auditable Multi-Agent System with Recursive Search and Rubric-Grounded Reasoning

Deep Research (DR) has emerged as a new agentic paradigm to tackle complex, open-ended research tasks, demanding systems that can iteratively frame problems, acquire evidence, verify sources, and synt…

来源：HuggingFace Papers

论文研究

6/5 04:00

SWE-Explore: Benchmarking How Coding Agents Explore Repositories

Repository-level coding benchmarks such as SWE-bench have driven a rapid surge in the capabilities of coding agents. Yet they usually treat coding tasks as a holistic, binary prediction problem (e.g.,…

来源：HuggingFace Papers

论文研究

6/5 04:00

SlimSearcher: Training Efficiency-Aware Web Agents via Adaptive Reward Gating

Deep research agents have demonstrated remarkable capabilities in complex information-seeking tasks, yet this power comes at a steep computational cost. Driven by accuracy-focused training paradigms,…

来源：HuggingFace Papers

论文研究

6/5 04:00

Do Coding Agents Deceive Us? Detecting and Preventing Cheating via Capped Evaluation with Randomized Tests

A growing failure mode in agent evaluation and training is that models can achieve high evaluation scores by exploiting shortcuts instead of solving the intended task, producing deceptive performance.…

来源：HuggingFace Papers

论文研究

6/5 04:00

MemDreamer: Decoupling Perception and Reasoning for Long Video Understanding via Hierarchical Graph Memory and Agentic Retrieval Mechanism

来源：HuggingFace Papers

论文研究

6/5 04:00

Struct-Searcher: Agentic Structural Thinking Advances Multimodal Deep Information Seeking

Deep research agents have attracted increasing attention for their ability to collect large-scale online information to acquire target knowledge, with recent efforts shifting from purely text-based in…

来源：HuggingFace Papers

论文研究

6/5 04:00

ReVision: Scaling Computer-Use Agents via Temporal Visual Redundancy Reduction

Computer-use agents (CUAs) rely on visual observations of graphical user interfaces, where each screenshot is encoded into a large number of visual tokens. As interaction trajectories grow, the token…

来源：HuggingFace Papers

论文研究

6/5 04:00

The Cold-Start Safety Gap in LLM Agents

Are tool-calling LLM agents equally safe throughout a conversation? We discover they are not: agents are most vulnerable at the very start of a session and become substantially safer after a few regul…

来源：HuggingFace Papers

论文研究

6/5 04:00

StepPO: Step-Aligned Policy Optimization for Agentic Reinforcement Learning

Agentic reinforcement learning (RL) is emerging as a critical post-training paradigm for improving LLM agent capabilities. Existing RL algorithms for LLMs largely follow the token-centric paradigm as…

来源：HuggingFace Papers

行业动态

6/5 03:20

Apple approves Poke as the first AI agent on its Messages for Business platform

Poke, the startup that lets people use AI agents through simple text messages, has become the first AI agent approved for Apple’s Messages for Business platform.

AI 点评 · 苹果首次在商业消息平台引入AI代理，标志其AI生态向第三方开放迈出关键一步。

来源：TechCrunch

行业动态

6/5 02:51

凌晨 3 点数据库 P0，AI Agent 是怎么自己修好的？

AI 点评 · AI自主修复数据库故障，展示Agent在运维中的实际价值。

来源：InfoQ

论文研究

6/5 01:59

HANDOFF: Humanoid Agentic Task-Space Whole-Body Control via Distilled Complementary Teachers

For a humanoid robot to be deployed in the real world, the choice of command space (i.e., the interface between task planning and whole-body control) is crucial. Existing whole-body controllers typica…

来源：arXiv

行业动态

6/5 01:56

别把懂语义和查事实混为一谈：企业级 Agent 真正缺的是什么？

AI 点评 · 企业级Agent的语义理解与事实核查能力需明确区分，直击当前AI落地的核心痛点。

来源：InfoQ

论文研究

6/5 01:54

Goedel-Architect: Streamlining Formal Theorem Proving with Blueprint Generation and Refinement

We introduce Goedel-Architect, an agentic framework for formal theorem proving in Lean 4 centered on blueprint generation and refinement. A blueprint is a dependency graph of definitions and lemmas th…

来源：arXiv

论文研究

6/5 01:50

Will the Agent Recuse Itself? Measuring LLM-Agent Compliance with In-Band Access-Deny Signals

As autonomous LLM agents increasingly hold real credentials and operate infrastructure without a human in the loop, operators have no standard way to tell an agent that a resource is off-limits. Acces…

来源：arXiv

模型发布/更新

6/5 00:59

NVIDIA Nemotron 3 Ultra now available on Amazon SageMaker JumpStart

Deploy NVIDIA Nemotron 3 Ultra on Amazon SageMaker JumpStart. Get 5x faster inference and 30% lower cost for agentic AI workloads with this frontier reasoning model.

AI 点评 · 英伟达顶级推理模型登陆云平台，5倍推理加速与30%成本降低，企业AI部署门槛再降。

来源：AWS ML

行业动态

6/4 23:10

把数据库运维这件苦差事，交给 AI Agent

来源：InfoQ

行业动态

6/4 22:38

Show HN: Boxes.dev: ditch localhost; run Claude Code and Codex in the cloud

Hi HN, we’re Nick and Drew, and we’re building boxes.dev – the first cloud-only agentic dev environment (ADE) that gives every Codex and Claude Code agent its own cloud computer. W…

来源：Hacker News

模型发布/更新

6/4 20:00

How Endava is redesigning software delivery around AI agents

Learn how Endava is using AI agents, ChatGPT Enterprise, and Codex to accelerate software delivery, automate workflows, and build an AI-native culture across the enterprise.

来源：OpenAI

行业动态

6/4 19:30

Show HN: Cost.dev (YC W21) – making agents cost-aware and cheaper to call

We launched Infracost on HN five years ago ( https://news.ycombinator.com/item?id=26064588 ) where our CLI generated cost estimates for infra-as-code, e.g. "this Terraform PR adds…

来源：Hacker News

行业动态

6/4 18:00

以模治模：支付宝 Agent 安全漏洞智能化检测实践｜AICon上海

来源：InfoQ

产品发布/更新

6/4 17:37

氪星晚报｜OpenAI首席财务官谈公司AI设备：今年年底前将正式发布；腾讯客服回应与华为、小米等合作；香港推出首个生产力级超级智能体

大公司： OpenAI首席财务官谈公司AI设备：今年年底前将正式发布 OpenAI首席财务官Sarah Friar日前在受访时透露，已经亲自体验过OpenAI的AI设备。她表示到“今年年底之前”OpenAI将正式发布这款产品。此前，OpenAI曾在一份文件中表示，预计最早也要到2027年2月才会开始发货。（财联社） LG Innotek计划扩建半导体基板工厂…

来源：36氪

论文研究

6/4 14:26

Beyond Alignment: Value Diversity as a Collective Property in Multicultural Agent Systems

Multicultural multi-agent systems are increasingly deployed in globally diverse settings, where different agents are grounded in different cultural backgrounds. Existing cultural evaluation focuses on…

来源：HuggingFace Papers

论文研究

6/4 13:26

Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts

AI agents rely on a harness of skills, tools, and workflows to solve complex problems. Continually improving this harness is essential for adapting to new tasks. However, existing optimization methods…

来源：HuggingFace Papers

产品发布/更新

6/4 13:23

helloianneo/ian-xiaohei-scenes

Xiaohei 2.0 Codex Skill for Chinese real-object article illustrations and long-scroll story images

来源：GitHub

产品发布/更新

6/4 13:23

helloianneo/ian-xiaohei-scenes

Xiaohei 2.0 Codex Skill for Chinese real-object article illustrations and long-scroll story images

来源：GitHub

技巧与观点

6/4 08:00

Designing the hf CLI as an agent-optimized way to work with the Hub

来源：HuggingFace Blog

产品发布/更新

6/4 07:31

IT早报 0604：豆包将推专业版，基础功能免费；估值 1.77 万亿美元，SpaceX 发行价敲定；腾讯人士称微信智能体上线时间暂未定；大疆否认 Pocket 4 饥饿营销...

“IT早报”时间，大家好，现在是 2026 年 6 月 4 日星期四，今天的重要科技资讯有： 1、豆包：计划针对专业人群生产力需求推出豆包专业版，基础功能保持免费豆包声明称，对于广大用户日常使用的豆包功能，包含搜索问答、写作生图、以及语音和视频对话等，将保持目前的免费服务，保证用户使用体验和习惯不受影响。>> 查看详情 2、SpaceX 敲定 IPO 发行…

AI 点评 · 多条科技资讯集中释放，豆包专业版与免费策略、SpaceX估值与上市动态均值得关注。

来源：IT之家

行业动态

6/4 05:00

Teaching AI agents to ask better questions by playing “Battleship”

MIT researchers use the classic game as a test bed for AI agents, finding a small AI model can outperform the biggest ones at 1 percent of the cost.

AI 点评 · 小模型通过游戏策略训练，以1%成本超越大模型，展现了高效学习新路径。

来源：MIT News

论文研究

6/4 04:00

AURA: Intent-Directed Probing for Implicit-Need Surfacing in Situated LLM Agents

A situated query like "where is Lin Wei?" often encodes more than its literal content: the user may also want to know whether Lin Wei is free, in a good mood, or worth interrupting now. Standard tool-…

来源：HuggingFace Papers

论文研究

6/4 04:00

ForeSci: Evaluating LLM Agents for Forward-Looking AI Research Judgment

AI research often requires decisions before future evidence exists: which bottleneck to attack, which direction to pursue, or where a project should be positioned. We introduce ForeSci, a temporally c…

来源：HuggingFace Papers

论文研究

6/4 04:00

AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints

Planning for real-world problems by language models often involves both world and user constraints, which may not be fully specified upfront and are progressively disclosed through interaction. Howeve…

来源：HuggingFace Papers

论文研究

6/4 04:00

ArcANE: Do Role-Playing Language Agents Stay in Character at the Right Time?

Role-playing language agents (RPLAs) should play characters whose values and behavior evolve as the story progresses, not maintain a fixed persona. Existing benchmarks measure factual recall at a give…

来源：HuggingFace Papers

论文研究

6/4 04:00

MLEvolve: A Self-Evolving Framework for Automated Machine Learning Algorithm Discovery

Large language model (LLM) agents are increasingly applied to long-horizon tasks such as scientific discovery and machine learning engineering (MLE), where sustained self-evolution becomes a key capab…

来源：HuggingFace Papers

论文研究

6/4 04:00

Unsupervised Skill Discovery for Agentic Data Analysis

Inference-time skill augmentation provides a lightweight way to improve data-analytic agents by injecting reusable procedural knowledge without updating model parameters. However, discovering effectiv…

来源：HuggingFace Papers

论文研究

6/4 04:00

Thinking with Imagination: Agentic Visual Spatial Reasoning with World Simulators

While Vision-Language Models (VLMs) have shown strong visual reasoning capabilities, their spatial reasoning abilities remain largely constrained to the observed images and text-oriented chain-of-thou…

来源：HuggingFace Papers

论文研究

6/4 04:00

SubtleMemory: A Benchmark for Fine-Grained Relational Memory Discrimination in Long-Horizon AI Agents

Persistent AI assistants, such as OpenClaw, accumulate large collections of related memories over long-term interactions. As these memories grow, they may reinforce one another, diverge across context…

来源：HuggingFace Papers

论文研究

6/4 04:00

When Tools Fail: Benchmarking Dynamic Replanning and Anomaly Recovery in LLM Agents

Existing benchmarks evaluate Tool-Integrated Reasoning (TIR) in LLMs on idealized ''happy paths'', largely overlooking real-world tool failures. We introduce ToolMaze, a benchmark for dynamic path dis…

来源：HuggingFace Papers

论文研究

6/4 04:00

OpenSkill: Open-World Self-Evolution for LLM Agents

Self-evolving agents requires adaptation after deployment, but existing approaches assume a usable learning loop, such as curated skills, successful trajectories, or verifier signals. Real open-world…

来源：HuggingFace Papers

论文研究

6/4 04:00

AsyncWebRL: Efficient Multi-Step RL for Visual Web Agents

Training vision-language web agents with multi-step RL is compute-intensive, with two dominant forms of inefficiency: idle GPUs in synchronous RL, and trajectories that use more steps and tokens than…

来源：HuggingFace Papers

论文研究

6/4 04:00

LatentSkill: From In-Context Textual Skills to In-Weight Latent Skills for LLM Agents

Agent systems increasingly use textual skills to encode reusable task procedures, but injecting these skills into the prompt at every step incurs substantial context overhead and exposes skill content…

来源：HuggingFace Papers

论文研究

6/4 04:00

ToolSense: A Diagnostic Framework for Auditing Parametric Tool Knowledge in LLMs

Large language models deployed as agents over large tool catalogs face a critical tool-retrieval bottleneck. As embedding-based retrieval approaches rely on compact encoders that may under-capture spe…

来源：HuggingFace Papers

论文研究

6/4 04:00

Memory is Reconstructed, Not Retrieved: Graph Memory for LLM Agents

Despite recent progress, LLM agents still struggle with reasoning over long interaction histories. While current memory-augmented agents rely on a static retrieve-then-reason paradigm, this rigid pipe…

来源：HuggingFace Papers

产品发布/更新

6/4 02:55

tastyeffectco/sandboxes

Self-hosted dev sandboxes with preview URLs. One command. No Kubernetes, perfect for coding agents and Saas factories

来源：GitHub

产品发布/更新

6/4 02:55

tastyeffectco/sandboxd

Self-hosted dev sandboxes with preview URLs. One command. No Kubernetes, perfect for coding agents and Saas factories

来源：GitHub

行业动态

6/4 01:45

As AI gets better, it reveals an empty promise

This week we've got tandem hands-ons with Google's new Gemini AI agent - Spark - from my colleagues David Pierce and Jay Peters. Their takeaways are similar: It's so effective that…

AI 点评 · AI能力提升反暴露技术天花板的矛盾，揭示行业深层困境。

来源：The Verge

行业动态

6/4 00:04

GitHub 通过每日审计与 MCP 精简，将 Agent 工作流 Token 成本最高降低 62%

来源：InfoQ

技巧与观点

6/3 23:56

Improve your agent’s tool-calling accuracy with SFT and DPO on Amazon SageMaker AI

In this post, you learn how to use Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) together to improve the tool-calling accuracy of a small language model (SL…

来源：AWS ML

模型发布/更新

6/3 23:00

NVIDIA Research Unlocks Advanced Grasping, Smarter Autonomous Driving and Agent Training at Scale

What makes a robot gripper useful isn’t that it can pick up one object — it’s that it can pick up the next one, and the one after that, with a tool it’s never held before. What mak…

来源：NVIDIA

模型发布/更新

6/3 23:00

NVIDIA Enables the Next Era Of Physical AI Research With Agent Skills For Autonomous Vehicles, Robotics And Vision AI

At CVPR, NVIDIA is unveiling new physical AI agent skills that help researchers and developers speed the development of autonomous vehicles, robots and vision AI systems. The core…

来源：NVIDIA

产品发布/更新

6/3 22:56

eddyzzl/marvis-risk-agent

MARVIS-Agent: all-purpose credit risk agent for model development, validation, data processing, feature engineering, and strategy workflows.

来源：GitHub

产品发布/更新

6/3 21:40

Meta’s AI agent for WhatsApp Business is now available globally

WhatsApp will charge businesses for using its AI agent based on token usage.

来源：TechCrunch

行业动态

6/3 21:02

Coralogix raises $200M on bet that someone needs to watch the AI agents

Coralogix is among a growing number of infrastructure firms betting that as AI systems move into production, demand will rise for tools that can monitor their behavior, troubleshoo…

来源：TechCrunch

产品发布/更新

6/3 19:12

inkeep/open-knowledge

Beautiful, AI-native markdown editor and LLM Wiki

来源：GitHub

产品发布/更新

6/3 16:07

davanstrien/uv-scripts-for-ai

Self-contained UV scripts for data & ML tasks — OCR, vision, audio & more — run one in a command, locally or on Hugging Face Jobs. Built for humans and agents.

来源：GitHub

产品发布/更新

6/3 12:26

Nigh/show-me-the-story

Self-hosted AI novel generator: single Go binary + web UI. OpenAI-compatible API → outline → chapter-by-chapter writing with review, foreshadowing, fact-check,…

来源：GitHub

产品发布/更新

6/3 10:34

腾讯AI产业应用大会在即，即将发布系列智能体应用新品

36氪获悉，据腾讯云官号账号信息，腾讯2026AI产业应用大会即将在北京举办。作为腾讯年度最重要的AI产品发布平台，此次将发布系列智能体应用新品，并将公布infra等基础设施升级新进展。与此同时，腾讯集团高级执行副总裁、云与智慧产业事业群CEO汤道生将于腾讯AI首席科学家姚顺雨同台对话，解读AI下半场腾讯在AI赛道的最新布局和思考。

AI 点评 · 腾讯年度AI战略窗口，智能体新品与基础设施升级同步亮相，产业布局信号明确。

来源：36氪

行业动态

6/3 10:32

Agentic Mfw

来源：Hacker News

产品发布/更新

6/3 10:20

腾讯人士：目前无法确定微信 AI 智能体何时推出

IT之家 6 月 3 日消息，据财经杂志报道，腾讯人士表示，目前无法确定微信 AI 智能体何时推出，其上线时间很大程度上取决于监管方对智能体的审批进度，微信 14 亿的用户体量，合规流程可能比其他产品更加严格。关于微信智能体，腾讯相关负责人表示暂无回应。 IT之家注意到，此前英国《金融时报》报道称，微信将推出一款 AI（人工智能）智能体，计划最快将于本月启动…

AI 点评 · 监管审批成关键变量，14亿用户规模下的合规挑战值得关注。

来源：IT之家

产品发布/更新

6/3 10:15

微信将推出一款AI智能体？腾讯人士回应

昨日，媒体报道称，微信将推出一款AI（人工智能）智能体，计划最快将于本月启动公开上线前所需的合规审批流程。腾讯人士表示，目前无法确定微信AI智能体何时推出，其上线时间很大程度上取决于监管方对智能体的审批进度，微信14亿的用户体量，合规流程可能比其他产品更加严格。关于微信智能体，腾讯相关负责人表示暂无回应。（财经）

AI 点评 · 微信14亿用户体量下，AI智能体合规审批进度成焦点，决定产品上线时间。

来源：36氪

产品发布/更新

6/3 09:50

微软定调 Win11：打造成 AI 应用和智能体开发平台

IT之家 6 月 3 日消息，科技媒体 Windows Latest 今天（6 月 3 日）发布博文，报道称在 2026 年 Build 开发者大会上，微软明确 Windows 11 系统定位：不再只是带 AI 功能的桌面系统，而是要成为 AI 应用和智能体的开发平台。微软新方向涵盖智能体 Runtime、本地模型、Windows 原生 AI 接口、Li…

AI 点评 · 微软从用户工具转向开发者平台，AI生态野心浮出水面。

来源：IT之家

产品发布/更新

6/3 07:14

郭明錤：黄仁勋高喊“重新发明 PC”口号凝聚市场共识，英伟达 RTX Spark 勾勒端侧 AI 智能体蓝图

IT之家 6 月 3 日消息，天风国际证券分析师郭明錤今天（6 月 3 日）在 X 平台发布推文，再次评论英伟达的 RTX Spark，认为该处理器在未来 2 年内仍是小众产品，苹果在 WWDC 上对于设备端 AI 智能体的回应将是除 Siri 之外的另一个观察重点。郭明錤表示英伟达 RTX Spark 处理器的核心看点不仅在于芯片本身，更重要的是黄仁…

AI 点评 · 分析师视角揭示英伟达布局端侧AI的战略意图，市场影响值得关注。

来源：IT之家

行业动态

6/3 06:34

Show HN: Paseo – Beautiful open-source coding agent interface

Repo: https://github.com/getpaseo/paseo Homepage: https://paseo.sh/ Discord: https://discord.gg/jz8T2uahpH

来源：Hacker News

行业动态

6/3 04:47

Microsoft's Project Solara is an Android OS designed for agents instead of apps

Microsoft missed the boat on apps, so get ready for agents.

来源：Ars Technica

行业动态

6/3 04:19

Now AI agents need what RSS does

AI 点评 · AI代理需借鉴RSS，实现信息自主订阅与高效分发。

来源：Hacker News

论文研究

6/3 04:00

DAR: Deontic Reasoning with Agentic Harnesses

Deontic reasoning is the task of answering questions by applying explicit rules and policies to case-specific facts, for example computing tax liability under a statute or determining the outcome of a…

来源：HuggingFace Papers

论文研究

6/3 04:00

Evaluating Large Language Models in Dynamic Clinical Decision-Making with Standardized Patient Cases

Large language models (LLMs) are increasingly proposed as clinical agents, yet static, single-turn benchmarks cannot capture how a model dynamically delivers care across an encounter: gathering inform…

来源：HuggingFace Papers

论文研究

6/3 04:00

MapAgent: An Industrial-Grade Agentic Framework for City-scale Lane-level Map Generation

Lane-level maps are critical infrastructure for autonomous driving and lane-level navigation, yet constructing and maintaining standardized lane networks for hundreds of cities remains highly labor-in…

来源：HuggingFace Papers

论文研究

6/3 04:00

Streaming Communication in Multi-Agent Reasoning

Multi-agent reasoning systems adopt a "generate-then-transfer" paradigm that forces end-to-end latency to scale linearly with pipeline depth. We introduce StreamMA, a multi-agent reasoning system that…

来源：HuggingFace Papers

论文研究

6/3 04:00

TIDE: Proactive Multi-Problem Discovery via Template-Guided Iteration

Agents are widely deployed as assistants over documents, tools, and code. However, they typically act only on explicit user requests, which surface only the problems the user has noticed, while many o…

来源：HuggingFace Papers

论文研究

6/3 04:00

SePO: Self-Evolving Prompt Agent for System Prompt Optimization

System prompt optimization improves agent behavior without modifying the underlying model, yielding human-readable, model-agnostic instructions. Existing methods build a prompt agent that refines task…

来源：HuggingFace Papers

论文研究

6/3 04:00

Rethinking Continual Experience Internalization for Self-Evolving LLM Agents

Experience internalization converts contextual experience from past interactions into reusable parametric capability, offering a promising path toward continual learning in large language models (LLMs…

来源：HuggingFace Papers

论文研究

6/3 04:00

Personal AI Agent for Camera Roll VQA

We study the personal camera roll visual question answering setting. In this setting, a conversational AI assistant can access a user's personal camera roll and retrieve relevant photos to answer quer…

来源：HuggingFace Papers

论文研究

6/3 04:00

Agents' Last Exam

Recent AI systems have achieved strong results on a wide range of benchmarks, yet these gains have not translated into economically meaningful deployment across many professional domains. We argue tha…

来源：HuggingFace Papers

论文研究

6/3 04:00

Online Skill Learning for Web Agents via State-Grounded Dynamic Retrieval

Language agents increasingly rely on reusable skills to improve multi-step web automation across related tasks. A growing line of work studies online skill learning, where agents continually induce sk…

来源：HuggingFace Papers

论文研究

6/3 04:00

What Should Agents Say? Action-state Communication for Efficient Multi-Agent Systems

Multi-agent systems (MAS) built on large language models are typically organized around roles, pipelines, and turn schedules, while the content that agents pass to one another is often left as unconst…

来源：HuggingFace Papers

模型发布/更新

6/3 03:28

datasette-agent-micropython 0.1a0

Release: datasette-agent-micropython 0.1a0 I want Datasette Agent to be able to generate and execute Python code safely. This alpha is looking promising so far. GPT-5.5 has so far…

AI 点评 · 结合AI代理与MicroPython，实现安全代码生成执行，为数据探索带来新可能。

来源：Simon Willison

技巧与观点

6/3 03:20

micropython-wasm 0.1a1

Release: micropython-wasm 0.1a1 Fixes for some limitations that emerged while I was trying to use this to build datasette-agent-micropython . Tags: python , sandboxing , webassembl…

AI 点评 · 在浏览器中运行MicroPython，为Python沙箱执行和Web应用开辟新可能。

来源：Simon Willison

产品发布/更新

6/3 03:13

superloglabs/superlog

Open-source observability tool that uses AI agents to self-heal your software

来源：GitHub

模型发布/更新

6/3 03:00

NVIDIA Partners With Microsoft on Unified Stack for Agentic AI Deployment, From Windows Devices to Cloud to Local

The agentic AI moment has arrived, but delivering on its promise requires more than good models. It also takes fast hardware, secure runtimes, a responsive data layer and models tu…

AI 点评 · 英伟达与微软联手打造统一AI代理栈，打通从终端到本地部署，降低开发门槛。

来源：NVIDIA

产品发布/更新

6/3 02:44

英伟达甩出物理AI王炸！Cosmos 3全模态模型开源，Agent Tookit补齐工具短板

AI 点评 · 物理AI融合现实世界，英伟达开源补齐工具链，加速机器人自主决策。

来源：InfoQ

行业动态

6/3 02:19

Microsoft announces Scout, an autonomous AI agent built on OpenClaw

https://www.microsoft.com/en-us/microsoft-365/blog/2026/06/0... https://www.404media.co/microsoft-wants-to-make-people-addic... https://www.wired.com/story/meet-microsoft-scout-you…

AI 点评 · 微软自研AI代理Scout基于OpenClaw，标志着巨头在自主智能体领域的战略布局。

来源：Hacker News

行业动态

6/3 02:00

Microsoft offers devs a better way to control AI agent behavior

The specification lets developer, compliance, and security teams define their own policies for agents to follow in portable policy files.

AI 点评 · 微软推出便携式策略文件，让开发者自主定义AI代理行为规范，提升安全可控性。

来源：TechCrunch

论文研究

6/3 01:56

Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill

Reward models (RMs) provide critical feedback signals for LLM post-training, notably in reinforced fine-tuning (RFT) and reinforcement learning (RL) pipelines. However, current reward evaluation relie…

来源：arXiv

论文研究

6/3 01:51

Agentic Chain-of-Thought Steering for Efficient and Controllable LLM Reasoning

Large language models improve final-answer accuracy through extended chain-of-thought reasoning, but often spend tokens inefficiently and offer little inference-time control. Existing efficient reason…

来源：arXiv

论文研究

6/3 01:50

Self-Refining Agentic Reinforcement Learning for Vision-Conditioned UAV Navigation

Deep reinforcement learning has shown strong potential for enabling autonomous robots to learn complex navigational tasks. However, its practical use still depends heavily on human designed reward fun…

来源：arXiv

论文研究

6/3 01:42

VLESA: Vision-Language Embodied Safety Agent for Human Activity Monitoring

As AI systems increasingly assist humans in physical tasks, ensuring safety becomes paramount -- physical actions carry immediate and irreversible consequences that digital errors do not. We introduce…

来源：arXiv

行业动态

6/3 01:31

Microsoft’s Project Solara is an OS for AI agent gadgets

Microsoft just announced "Project Solara," a new OS designed for gadgets that run AI agents, at Build 2026. The company is calling it "a new platform built from the ground up to po…

AI 点评 · 微软专为AI智能体硬件打造操作系统，标志从软件到硬件生态的关键一步。

来源：The Verge

技巧与观点

6/3 01:07

Sarang Kulkarni 谈在生产环境中构建深度研究智能体的经验教训

AI 点评 · 深度研究智能体落地生产，揭示了AI从实验室到实战的关键教训。

来源：InfoQ

技巧与观点

6/2 23:45

How Baz improved its AI Agent Code Review accuracy using Amazon Bedrock AgentCore

This post walks through how Baz built their Spec Review agent using Amazon Bedrock and Amazon Bedrock AgentCore. We'll cover the architecture decisions, implementation details, and…

AI 点评 · 亚马逊Bedrock AgentCore让Baz的AI代码审查精度提升，展示云服务与AI结合的实际应

来源：AWS ML

技巧与观点

6/2 22:13

Holo3.1: Fast & Local Computer Use Agents

来源：HuggingFace Blog

论文研究

6/2 21:46

Diagnosing Knowledge Gaps in LLM Tool Use: An Agentic Benchmark for Novel API Acquisition

Large language models for code generation often need to use APIs that are absent from their pretraining data. This requires more than recalling a function name: models must coordinate signatures, modu…

AI 点评 · 评估大模型调用新API的能力，填补实用知识缺口，推动智能体从记忆转向推理。

来源：arXiv

论文研究

6/2 21:28

TSQAgent: Rating Time Series Data Quality via Dedicated Agentic Reasoning

Assessing the quality of time series (TS) data is fundamental yet inherently challenging due to the multifaceted nature of quality dimensions. Recently, large language models (LLMs) have emerged as a…

来源：arXiv

论文研究

6/2 21:17

Cross-Lingual Token Arbitrage: Optimizing Code Agent Context Windows via Local LLM Preprocessing

AI-assisted coding agents are bottlenecked by input-token cost. Two pathologies of raw human input drive much of this overhead: tokenization inefficiency for non-English text and structural entropy in…

来源：arXiv

论文研究

6/2 21:11

A 3D Isovist World Model -- Revealing a City's Unseen Geometry and Its Emergent Cross-City Signature

Embodied agents that navigate cities rely on world models that predict how their surroundings will change as they move. But for navigation, what matters is not what the buildings look like; it is wher…

来源：arXiv

产品发布/更新

6/2 20:48

pfwjrfp5hh-byte/WorkMesh

Open-source AI-era employment platform connecting skills, jobs, enterprises, governance, and AI agents.

来源：GitHub

产品发布/更新

6/2 20:48

yangyunice/WorkMesh

Open-source AI-era employment platform connecting skills, jobs, enterprises, governance, and AI agents.

来源：GitHub

行业动态

6/2 20:38

Gemini Spark is the most impressive and terrifying AI experience I’ve had yet

According to every product demo from the last four years, planning a trip is a killer use case for AI. Just tell it where you're going, they all promise, and your chatbot / agent /…

AI 点评 · 演示效果惊艳，但揭示出AI自主规划能力已逼近人类，引发对技术失控的深层担忧。

来源：The Verge

行业动态

6/2 20:27

原华为盘古“90 后少帅”王云鹤离职创业，新公司“基元律动”获 1 亿美元估值融资

IT之家 6 月 2 日消息，据新浪科技今日报道，曾在华为主导盘古大模型研发的“90 后少帅”王云鹤，已于近期投身 AI Agent 领域创业，其新成立的公司“基元律动”已完成一轮估值达 1 亿美元的新融资。王云鹤在今年 3 月末正式告别了工作近 9 年的华为。离职前，他最后的职务为华为诺亚方舟实验室主任、盘古大模型负责人，曾被誉为“盘古大模型少帅”和“天…

AI 点评 · 顶尖技术人才创业动向，折射AI Agent赛道资本热度与行业新趋势。

来源：IT之家

产品发布/更新

6/2 20:03

英伟达 Spectrum- X 以太网硅光技术已全面量产，较传统网络能效提升 5 倍

IT之家 6 月 2 日消息，英伟达于 5 月 31 日宣布，其面向智能体 AI 工厂的下一代超级计算平台 NVIDIA Vera Rubin 已进入全面量产阶段。IT之家此前已有相关报道。除此之外，英伟达同时确认新一代 Spectrum-X 以太网硅光技术已同步进入全面量产阶段，这是该平台实现大规模 AI 工厂网络互联的核心基石。作为全球首款基于光电一…

AI 点评 · 硅光技术量产突破，能效提升5倍，将加速AI工厂网络部署，改变行业格局。

来源：IT之家

行业动态

6/2 19:56

CPU 需求与日俱增，英特尔陈立武自曝许多公司 CEO 来电“求供货”

IT之家 6 月 2 日消息，据澎湃新闻，英特尔 CEO 陈立武 2 日（今天）在台北电脑展上表示，CPU 需求越来越高，但供给受到限制。过去四周内，许多公司 CEO 打电话给他要更多的 CPU ，对英特尔来说“是一个机会”。 AI 智能体的兴起，使中央处理器的重要性得以再次提升，从而带动需求大量增加。陈立武在谈到 CPU 的发展趋势时指出，AI 智能体需…

AI 点评 · 高管亲述供货紧张，反映AI时代CPU需求爆发，英特尔产能成关键变量。

来源：IT之家

行业动态

6/2 19:23

Rehumanizing global health care with agentic AI

The global health care sector is under increasing strain. Decades of chronic underinvestment and constraints in recruitment have coincided with a surge in demand for services for a…

AI 点评 · 用AI代理重构医疗流程，缓解人力短缺，提升服务效率与可及性。

来源：MIT Tech Review

产品发布/更新

6/2 18:50

slavaZim/episodiq

Economical human-readable logs and structural (event pattern) retrieval for agentic trajectories

来源：GitHub

行业动态

6/2 18:00

上海人工智能实验室青年科学家何聪辉确认出席AICon上海站，分享面向 Agent 时代的文档解析基础设施演进与实践

AI 点评 · Agent时代文档解析新突破，专家分享前沿基础设施演进，实战价值极高。

来源：InfoQ

产品发布/更新

6/2 17:50

腾讯客服：微信正与华为、荣耀、小米、OPPO、vivo 等合作，通过手机语音助理发起音视频通话或向指定好友发送消息

IT之家 6 月 2 日消息，据IT之家小伙伴今日反馈，腾讯客服最新回复显示，微信正在与华为、荣耀、小米、OPPO、vivo 等手机厂商合作推出 A2A 助手能力。用户可以通过手机语音助理发起微信音视频通话或向指定好友发送消息。该功能基于 A2A（Agent-to-Agent）协作机制，由厂商 AI 助手向微信发起指令，微信负责执行并返回结果，全程…

AI 点评 · 手机厂商AI助手与微信深度打通，标志着跨应用智能协作进入实用阶段。

来源：IT之家

产品发布/更新

6/2 11:15

Qwen3.7-Plus上线！多模态智能体新基座，一键复刻桌面端专业软件

Qwen3.7-Plus已上线阿里云百炼

AI 点评 · 通杀多模态与桌面软件，AI智能体能力再上台阶，开发者生态迎来新变量。

来源：量子位

模型发布/更新

6/2 10:00

NVIDIA Jetson Brings Agentic AI to the Physical World

Agentic AI is getting physical. At COMPUTEX on Tuesday, NVIDIA announced NVIDIA JetPack 7.2 and NVIDIA NemoClaw support on NVIDIA Jetson. JetPack 7.2 brings agentic AI skills, Yoct…

AI 点评 · 英伟达让AI从虚拟走向实体，开启物理世界自主决策新纪元。

来源：NVIDIA

产品发布/更新

6/2 06:38

阿里发布 Qwen3.7-Plus 模型，升级多模态交互混合 AI 智能体

IT之家 6 月 2 日消息，阿里千问大模型今天（6 月 2 日）发布博文，宣布推出 Qwen3.7-Plus 模型，定位为多模态交互混合智能体。 Qwen3.7-Plus 是 Qwen3.7 的多模态升级版，核心定位是视觉与语言统一的智能体基座。它保留文本、编码、工具使用和生产力工作流能力，同时强化视觉理解、视觉推理和跨模态任务处理。模型已通过阿里云…

AI 点评 · 多模态与智能体融合，或加速AI从“对话”迈向“行动”的关键一步。

来源：IT之家

行业动态

6/2 05:35

Nvidia chases $200B CPU market with AI agent PCs from Microsoft, Dell, and HP

If Nvidia has cracked a way to bring AI agents easily, safely, and usefully to the masses, it could — and should — be big.

AI 点评 · 英伟达联手微软戴尔惠普，将AI智能体推向PC，可能撬动2000亿美元CPU市场。

来源：TechCrunch

技巧与观点

6/2 05:31

OpenAI models and Codex on Amazon Bedrock are now generally available

GPT-5.5, GPT-5.4, and Codex are now generally available on Amazon Bedrock. Deploy them in production applications and agents today, on Bedrock’s high performance inference engine.

AI 点评 · OpenAI模型登陆亚马逊云平台，企业应用部署门槛进一步降低。

来源：AWS ML

行业动态

6/2 04:00

Gemini’s new AI agent is about as good as Google’s demo

Google's new "24/7" AI agent, Gemini Spark, can be shockingly good at doing things on your behalf. But I'm not sure it's worth the financial cost and potential privacy tradeoffs. T…

AI 点评 · AI助手能力接近演示效果，但隐私与成本的双重代价仍需权衡。

来源：The Verge

论文研究

6/2 04:00

Agentic Chain-of-Thought Steering for Efficient and Controllable LLM Reasoning

来源：HuggingFace Papers

论文研究

6/2 04:00

Token Budgets: An Empirical Catalog of 63 LLM-Agent Budget-Overrun Incidents, with an Affine-Typed Rust Mitigation as a Case Study

LLM-agent budget overruns are a documented production failure class: a single retry loop can spend thousands of dollars before an operator notices, and the in-process integrity properties that would p…

来源：HuggingFace Papers

论文研究

6/2 04:00

Agent libOS: A Library-OS-Inspired Runtime for Long-Running, Capability-Controlled LLM Agents

Large language model (LLM) agents are evolving from request-response assistants into long-running software actors: they maintain state across model calls, fork subtasks, wait for external events, requ…

来源：HuggingFace Papers

论文研究

6/2 04:00

OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs

Multimodal agents in robotics, AR, and autonomous driving must reason about places and layouts from continuous egocentric streams, often using evidence outside the current view. Existing benchmarks ei…

来源：HuggingFace Papers

论文研究

6/2 04:00

AUDITFLOW: Executable Symbolic Environments for Structured Financial Reporting Verification

Structured financial audit verification is difficult for language-model agents because correctness depends on structured evidence rather than text alone. A model must link reported facts to taxonomy c…

来源：HuggingFace Papers

论文研究

6/2 04:00

BraveGuard: From Open-World Threats to Safer Computer-Use Agents

Computer-use agents extend language models from text generation to sustained interaction with files, terminals, browsers, and external tools. This shift creates safety risks that are difficult to dete…

来源：HuggingFace Papers

论文研究

6/2 04:00

MemTrain: Self-Supervised Context Memory Training

Memory is an indispensable capability for long-horizon LLM agents, enabling them to preserve and utilize information accumulated across extended interactions. Existing memory-agent approaches are typi…

来源：HuggingFace Papers

论文研究

6/2 04:00

EvoDS: Self-Evolving Autonomous Data Science Agent with Skill Learning and Context Management

Recent progress in Large Language Model (LLM) agents has enabled promising advances in automated data science. However, existing approaches remain fundamentally limited by their static action sets and…

来源：HuggingFace Papers

论文研究

6/2 04:00

Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill

来源：HuggingFace Papers

论文研究

6/2 04:00

Lean4Agent: Formal Modeling and Verification for Agent Workflow and Trajectory

Equipping Large Language Models (LLMs) to execute reliable multi-step workflows has become a central challenge in artificial intelligence. Despite recent advances in LLMs' agentic capabilities, most a…

来源：HuggingFace Papers

论文研究

6/2 04:00

Can Generalist Agents Automate Data Curation?

Curating training data is among the most consequential yet labor-intensive parts of modern AI development: practitioners iteratively propose, implement, evaluate, and revise data policies against nois…

来源：HuggingFace Papers

论文研究

6/2 04:00

EvoTrainer: Co-Evolving LLM Policies and Training Harnesses for Autonomous Agentic Reinforcement Learning

Autonomous LLM training is often framed as recipe search, which leaves the training harness largely static. This limitation sharpens in agentic RL, where shifting bottlenecks and scalar rewards mask d…

来源：HuggingFace Papers

论文研究

6/2 04:00

SkillHarness: Harnessing Safe Skills for Computer-Use Agents

Computer-Use Agents (CUAs) are increasingly deployed in dynamic interactive environments, creating a growing need for continual skill learning during interaction. Recent approaches address this challe…

来源：HuggingFace Papers

产品发布/更新

6/2 02:20

crimeacs/auto-improve

GAN-style self-improvement loop for any text artifact: mutate, grade with a SEPARATE model, keep only verified wins (pairwise-judged), revert the rest. The git…

来源：GitHub

行业动态

6/2 02:03

智能体把CPU“救”回来了：英特尔押注18A至强6+，288核要接管AI调度战场

AI 点评 · 英特尔押注18A工艺的288核至强6+，标志着智能体正重塑CPU在AI调度中的核心地位。

来源：InfoQ

论文研究

6/2 01:56

ClinEnv: An Interactive Multi-Stage Long Horizon EHR Environment for Agents

Clinical practice is not the selection of an answer from enumerated options: a physician gathers heterogeneous information incrementally and commits to sequential, irreversible decisions under uncerta…

AI 点评 · 电子健康记录多阶段交互环境，弥合了AI临床决策与真实医疗流程间的鸿沟。

来源：arXiv

行业动态

6/2 01:55

Qwen3.7-Plus: Multimodal Agent Intelligence

AI 点评 · Qwen3.7-Plus融合多模态与智能体能力，或开启AI应用新范式。

来源：Hacker News

技巧与观点

6/2 01:54

Secure AI agents with Policy and Lambda interceptors in Amazon Bedrock AgentCore gateway

In this post, we use a lakehouse data agent to demonstrate how you can use Policy for deterministic access control and Lambda interceptors for dynamic validation. We then show how…

AI 点评 · 亚马逊Bedrock新功能实现AI代理安全管控，结合策略与动态验证，为行业提供可落地的防护方案。

来源：AWS ML

论文研究

6/2 01:51

HERO'S JOURNEY: Testing Complex Rule Induction with Text Games

We introduce HERO'S JOURNEY, a benchmark for rule induction in goal-directed episodic tasks, where agents must infer hidden rules from demonstrations and act on them through multi-step execution. HERO…

AI 点评 · 用文本游戏测试AI规则归纳能力，填补了复杂推理任务基准的空白。

来源：arXiv

论文研究

6/2 01:45

SkillHarm: Lifecycle-Aware Skill-Based Attacks via Automated Construction

Agent skills occupy a privileged position in the agent workflow, as agents are expected to implicitly follow and execute them, rendering third-party skills a vulnerable attack surface. Existing studie…

AI 点评 · 自动化构建技能生命周期攻击，揭示第三方技能在智能体流程中的隐蔽安全风险，需重视防御。

来源：arXiv

论文研究

6/2 01:40

Tracking the Behavioral Trajectories of Adapting Agents

Text files such as skill files, memory files, and behavioral configuration files play a central role in defining how modern agents act. Through edits by humans or the agents themselves, these files ma…

AI 点评 · 追踪智能体行为轨迹，揭示自我调整机制，为AI决策透明化提供新视角。

来源：arXiv

论文研究

6/2 01:36

Auditing Asset-Specific Preferences in Financial Large Language Models: Evidence from Bitcoin Representations and Portfolio Allocation

Large language models now power robo-advisors and trading agents, yet whether they carry built-in biases toward specific assets is largely untested. We ask three questions: do LLMs systematically pref…

AI 点评 · 审计金融大模型对特定资产的偏好，揭示AI决策的隐性偏差，影响投资策略可靠性。

来源：arXiv

技巧与观点

6/2 01:30

Enable safe agentic payments with built-in guardrails using Amazon Bedrock AgentCore payments

In this post, we address several key risks that surface when designing an agentic payment system, and how to address them with the capabilities of AgentCore payments.

AI 点评 · 用亚马逊Bedrock内置防护栏解决AI支付代理安全风险，为金融场景落地提供可靠方案。

来源：AWS ML

技巧与观点

6/2 00:41

AI Agent Guidelines for CS336 at Stanford

AI 点评 · 斯坦福CS336课程发布AI代理开发规范，为学术与工业界提供权威参考。

来源：Hacker News

技巧与观点

6/2 00:12

AgentOps: Operationalize agentic AI at scale with Amazon Bedrock AgentCore

When you build agentic AI solutions, you face unique operational challenges. Agents make unpredictable decisions, costs spiral unexpectedly, and debugging non-deterministic failure…

AI 点评 · 亚马逊Bedrock AgentCore让AI代理规模化运营更可控，破解成本与调试难题。

来源：AWS ML

论文研究

6/1 22:20

Ψ-Bench: Evaluating Persona-Sensitive Influencing in Persuasive Dialogues

Personalization is a crucial capability of modern language agents. However, current research primarily positions personalized agents as passive responders to user preferences, limiting their ability t…

来源：HuggingFace Papers

技巧与观点

6/1 21:51

Beyond LLMs: Why Scalable Enterprise AI Adoption Depends on Agent Logic

来源：HuggingFace Blog

论文研究

6/1 19:10

Agentic-J: An AI Agent for Biological Microscopy Image Analysis

Biological image analysis increasingly demands integration across heterogeneous tools, programming environments, and domain knowledge that few researchers can command simultaneously. We present Agenti…

来源：arXiv

论文研究

6/1 18:50

Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories

Deep-research agents solve tasks through long trajectories of search, tool use, evidence inspection, and answer synthesis. Evaluation based on final answers shows whether an agent succeeds, but not wh…

来源：arXiv

论文研究

6/1 18:20

OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents

Building capable visual web agents requires long-horizon reasoning, precise grounding, and robust interaction with dynamic real-world websites. Despite rapid progress, the strongest systems remain lar…

来源：arXiv

模型发布/更新

6/1 17:57

Anthropic在Code With Claude上发布托管式智能体、主动式工作流与能力曲线

AI 点评 · Anthropic推出托管智能体与主动工作流，标志着AI从被动应答向自主执行的关键进化。

来源：InfoQ

行业动态

6/1 16:16

华为 FreeClip 2 耳夹耳机典藏版发布：珠宝盒设计、全新 AI 键智能体交互，1499 元

IT之家 6 月 1 日消息，在今天的华为 nova 16 系列及全场景新品发布会上，华为终端 BG CEO 何刚正式发布了 FreeClip 2 耳夹耳机典藏版，定价 1499 元。据介绍，华为 FreeClip 2 耳夹耳机典藏版采用鎏光宝盒 + 珠宝盒设计，充电舱采用真空镀膜工艺，主打“圆润璀璨”，同时内部空间提升 20% 。这款耳机还与周大…

AI 点评 · 将珠宝美学与AI智能体交互结合，为耳机品类带来轻奢体验与技术创新突破。

来源：IT之家

论文研究

6/1 13:50

MMG2Skill: Can Agents Distill In-the-Wild Guides into Self-Evolving Skills?

Abundant procedural knowledge on the Web holds great potential for helping agents solve long-horizon tasks. However, such knowledge is often multimodal, heterogeneous, noisy, and implicitly assumes hu…

来源：HuggingFace Papers

行业动态

6/1 13:00

“全球最强大的桌面 AI 超级计算机”，英伟达 DGX Station for Windows 发布

IT之家 6 月 1 日消息，在今日的 2026 台北国际电脑展主题演讲中，英伟达 CEO 黄仁勋发布了“全球最强大的桌面 AI 超级计算机”—— DGX Station for Windows 。 DGX Station for Windows 用于在 Windows 上开发和运行智能体 —— 基于英伟达 GB300 Grace Blackwell Ult…

AI 点评 · 首次将企业级AI算力带入桌面端，为Windows生态开发者提供了本地化训练与推理的超级工具。

来源：IT之家

模型发布/更新

6/1 12:46

英伟达发布 5500 亿参数 Nemotron 3 Ultra 开源模型，较同级别前沿模型推理速度最高提升 5 倍

IT之家 6 月 1 日消息，为加强自主智能体的智能能力，英伟达今日发布了面向全天候运行智能体的全新开源模型与数据集，相关成果由英伟达 Nemotron 联盟联合打造。据官方介绍，英伟达 Nemotron 3 Ultra 是一款拥有 5500 亿参数的混合专家模型，可为代码开发、科研及企业业务流程中的长效智能体提供顶尖智能能力。相较于同级别主流开源前沿模型…

AI 点评 · 参数规模与推理速度双突破，为智能体部署树立新标杆。

来源：IT之家

模型发布/更新

6/1 12:30

NVIDIA Levels Up Local AI Agents Across RTX PCs and DGX Spark

Personal agents are exploding in popularity, with open source projects like OpenClaw and Hermes seeing rapid adoption by AI developer communities on GitHub. Built to adapt to indiv…

AI 点评 · 英伟达将本地AI智能体部署到RTX电脑和DGX工作站，推动个人AI应用从云端走向本地化。

来源：NVIDIA

行业动态

6/1 12:23

英伟达 Vera 处理器发布：专为 AI 智能体打造，OpenAI、SpaceXAI、字节跳动都要用

IT之家 6 月 1 日消息，在今日的 2026 台北国际电脑展主题演讲中，英伟达 CEO 黄仁勋宣布正式推出 Vera 处理器。英伟达 Vera 是一款专为 AI 智能体打造的 CPU ，速度比 x86 处理器快 1.8 倍，可驱动各行各业的多样化工作负载，Vera 现已全面投产。 Vera 以 Grace CPU 的成功为基础（迄今为止，Grace…

AI 点评 · 巨头下场定义AI智能体专用芯片，生态号召力预示行业新标杆。

来源：IT之家

产品发布/更新

6/1 11:55

黄仁勋：英伟达下一代 AI 超级芯片平台 Vera Rubin 全面投产

IT之家 6 月 1 日消息，在今日的 2026 台北国际电脑展主题演讲中，英伟达 CEO 黄仁勋宣布 Vera Rubin 全面投产。 Vera Rubin 为下一代 AI 工厂提供了 POD 规模的基础架构 —— 与上一代 Grace Blackwell 平台相比，其大规模智能体吞吐量提高了 10 倍。凭借成熟的开源 MGX 设计，英伟达供应链生态…

AI 点评 · 下一代AI算力跃升10倍，英伟达再次定义超大规模集群新标杆。

来源：IT之家

模型发布/更新

6/1 11:36

MiniMax M3 正式发布：前沿 Coding 能力、1M 上下文、原生多模态

MiniMax M3 今日正式发布。 MiniMax M3 在编程和智能体等专业任务上达到了前沿的能力。它使用了全新注意力架构 MSA （MiniMax Sparse Attention），最高支持 1M 超长上下文。它也是一个原生多模态模型，支持图片和视频的输入，并能操作电脑桌面。在衡量 Coding 能力的 SWE-Bench Pro 上，MiniMa…

来源：开源中国

产品发布/更新

6/1 11:20

couragec/llm-intern-skill

LLM internship resume and job-search Codex Skill: resume polish, JD tailoring, evidence guard, interview grilling, and Project Scout for LLM/RAG/Agent roles. 大模…

来源：GitHub

产品发布/更新

6/1 11:20

couragec/LLMInternSkill

LLMInternSkill: LLM internship resume and job-search Codex Skill for resume polish, JD tailoring, evidence guard, interview grilling, and Project Scout. 大模型实习简历…

来源：GitHub

论文研究

6/1 11:00

HarnessForge: Joint Harness and Policy Evolution for Adaptive Agent Systems

LLM agents are increasingly expected to operate across heterogeneous task regimes that require distinct execution paradigms. This challenges fixed agent systems and motivates system-level meta-adaptat…

来源：HuggingFace Papers

产品发布/更新

6/1 10:29

RuleGo v0.36.0 发布：声明式 AI Agent 框架，规则引擎 × 智能体一体化

RuleGo 是一个基于 Go 语言的轻量级、高性能、嵌入式规则引擎。它通过规则链（JSON/可视化）编排组件，实现复杂业务逻辑的声明式管理，在物联网、边缘计算、数据集成、自动化等场景有广泛应用。 v0.36.0 是一个里程碑版本：rulego-components-ai 从 AI 组件库正式升级为声明式 AI Agent 开发框架，同时 Server 模块…

来源：开源中国

模型发布/更新

6/1 07:46

GodeX 1.0.0 发布，面向 Codex 的 Responses API 兼容网关

让每个模型都成为 Codex 引擎。 OpenAI 兼容的 Responses API 网关，让 Codex、CLI 工具和开发者 Agent 接入任意模型。 English Documentation · 中文文档 GodeX 让使用 OpenAI Responses API 的客户端，可以通过一个本地网关调用 DeepSeek、Xiaomi、MiniMa…

来源：开源中国

论文研究

6/1 04:00

MCP-Persona: Benchmarking LLM Agents on Real-World Personal Applications via Environment Simulation

The Model Context Protocol (MCP) has emerged as a transformative standard for connecting large language models (LLMs) with external data sources and tools, and has been rapidly adopted across personal…

AI 点评 · 用环境模拟测试LLM在个人场景的真实表现，MCP标准首次有了专属评测基准。

来源：HuggingFace Papers

论文研究

6/1 04:00

OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents

AI 点评 · 聚焦多轮强化学习框架，破解视觉智能体在动态网站中的交互难题，填补了实操性研究空白。

来源：HuggingFace Papers

论文研究

6/1 04:00

Joint Agent Memory and Exploration Learning via Novelty Signals

In open-ended environments, exploration is fundamental for autonomous agents, yet current language model agents struggle with this. Effective exploration requires memory, but retaining raw interaction…

AI 点评 · 结合新颖信号统一记忆与探索，让语言模型在开放环境中自主发现未知。

来源：HuggingFace Papers

论文研究

6/1 04:00

Multi-Agent Computer Use

Computer use agents (CUAs) today are primarily deployed as single serial agents. This setup is suboptimal for complex long-horizon tasks that benefit from task decomposition, parallel execution, and c…

AI 点评 · 多智能体协作提升复杂长任务效率，突破单代理局限，值得关注。

来源：HuggingFace Papers

论文研究

6/1 04:00

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts

Frontier model evaluations are shifting from foundational capabilities (e.g., instruction following and reasoning) toward compositional, agentic ones, but Korean agentic benchmarks remain scarce. We i…

AI 点评 · 首个聚焦韩语场景的网页浏览智能体评测基准，填补了非英语环境下的评估空白。

来源：HuggingFace Papers

论文研究

6/1 04:00

TVIR: Building Deep Research Agents Towards Text--Visual Interleaved Report Generation

Deep Research Agents have shown strong capability in multi-step information retrieval, reasoning, and long-form report generation, but existing benchmarks and systems remain predominantly text-centric…

来源：HuggingFace Papers

论文研究

6/1 04:00

Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses

Search agents are often trained as policies over growing transcripts: the model must decide how to search while also remembering what it has seen, which evidence is useful, which constraints remain op…

来源：HuggingFace Papers

论文研究

6/1 04:00

Policy and World Modeling Co-Training for Language Agents

Reinforcement learning (RL) improves large language model (LLM) agents by teaching them which actions lead to high rewards, but provides little supervision on what those actions do to the environment.…

来源：HuggingFace Papers

论文研究

6/1 04:00

PlatonicNav: Unveiling Semantic Correspondence in Navigation with Platonic Topological Maps

Embodied visual navigation, where an agent perceives a complex environment and acts to reach a goal from raw sensory input, underpins a wide range of applications such as household service robotics, a…

来源：HuggingFace Papers

论文研究

6/1 04:00

Economy of Minds: Emerging Multi-Agent Intelligence with Economic Interactions

How can a population of agents self-orchestrate and self-adapt into stronger collective intelligence without centralized control? Inspired by Friedrich Hayek's economic theory of decentralized coordin…

来源：HuggingFace Papers

论文研究

6/1 04:00

Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories

来源：HuggingFace Papers

论文研究

6/1 04:00

LLM Anonymization Against Agentic Re-Identification

Agentic LLMs with web search change the threat model for text anonymization: weak contextual cues can become cross-referenceable evidence for re-identification, yet those same details also carry downs…

来源：HuggingFace Papers

论文研究

6/1 04:00

Absorbing Complexity: An Interaction-Native Knowledge Harness for Financial LLM Agents

Financial AI agents often fail for a simple reason: they make users carry the complexity. A user must repeatedly restate goals, risk preferences, portfolio context, past judgments, and shifting market…

来源：HuggingFace Papers

论文研究

6/1 04:00

Parametric Social Identity Injection and Diversification in Public Opinion Simulation

Large language models (LLMs) have recently been adopted as synthetic agents for public opinion simulation, offering a promising alternative to costly and slow human surveys. Despite their scalability,…

来源：HuggingFace Papers

论文研究

6/1 04:00

LayerRoute: Input-Conditioned Adaptive Layer Skipping via LoRA Fine-Tuning for Agentic Language Models

Agentic language model systems alternate between two structurally distinct step types: structured tool calls (short, deterministic, low perplexity) and open-ended planning/reasoning steps (long, compl…

来源：HuggingFace Papers

论文研究

6/1 04:00

SkillHarm: Lifecycle-Aware Skill-Based Attacks via Automated Construction

来源：HuggingFace Papers

产品发布/更新

6/1 03:50

ClaudioDrews/memory-os

A 7-layer memory operating system for Hermes Agent — persistent memory with Qdrant, structured facts, fabric recall, auto-curated wiki, and surgical context inj…

来源：GitHub

产品发布/更新

6/1 00:42

RUC-NLPIR/Arbor

A generalist autonomous research agent — runs experiments, researches, and iteratively optimizes, autonomously.

来源：GitHub

产品发布/更新

5/31 23:45

duncatzat/vigils

A local control plane for AI agents — see what they do, approve what matters, keep secrets out. Rust + Tauri + Chrome MV3.

来源：GitHub

产品发布/更新

5/31 23:25

chaitanyagiri/munder-difflin

local multi-agent harness

来源：GitHub

产品发布/更新

5/31 22:25

别光给Agent加Tool了，它根本选不明白！复旦×通义提出全新CUA训练范式

下一代CUA训练范式

AI 点评 · CUA训练范式直击Agent工具选择瓶颈，复旦与通义合作开辟新路径。

来源：量子位

产品发布/更新

5/31 22:25

别光给Agent加Tool了，它根本选不明白！复旦×通义提出全新CUA训练范式

下一代CUA训练范式

AI 点评 · 解决Agent工具选择难题，复旦与通义提出全新训练思路，推动智能体实用化。

来源：量子位

产品发布/更新

5/31 18:03

Token贵只因你喂给模型的垃圾太多了丨@亚马逊王晓野AIGC2026

让世界模型迈向多智能体交互仿真

AI 点评 · 强调数据质量比数量更重要，揭示大模型成本居高不下的核心痛点。

来源：量子位

产品发布/更新

5/31 18:03

Token贵只因你喂给模型的垃圾太多了丨@亚马逊王晓野AIGC2026

让世界模型迈向多智能体交互仿真

AI 点评 · 点明AI成本高的根本原因在于数据质量，为行业降本提供新思路。

来源：量子位

行业动态

5/31 18:00

网易智企 IM 研发多智能体中心建设与实践：从单点 Agent 到研发基础设施｜AICon上海

AI 点评 · 从单点到基础设施，展现多智能体落地研发全流程，值得工程团队借鉴。

来源：InfoQ

行业动态

5/31 18:00

网易智企 IM 研发多智能体中心建设与实践：从单点 Agent 到研发基础设施｜AICon上海

AI 点评 · 多智能体协作从实验走向工程化，为AI研发团队提供可复用的基础设施范本。

来源：InfoQ

产品发布/更新

5/31 15:22

argahv/sisyphus-academica

Sisyphus Academica — The Research Paper Writing Army. 20+ agent swarm: 6 novelty engines, 10 adversarial reviewers, Humanizer-integrated writing, citation verif…

AI 点评 · 用20多个AI代理模拟学术生产链，挑战论文写作与评审的自动化边界。

来源：GitHub

产品发布/更新

5/31 11:54

AI原生时代下，让世界适应Agent，而非教AI做人 | 港大黄超@AIGC2026

CLI更像是Agent的母语

AI 点评 · AI不应模仿人类交互，Agent需以机器原生方式重塑世界，标志着人机协作范式的根本转变。

来源：量子位

产品发布/更新

5/31 11:54

AI原生时代下，让世界适应Agent，而非教AI做人 | 港大黄超@AIGC2026

CLI更像是Agent的母语

AI 点评 · Agent不再迁就人类交互方式，操作系统底层适配AI才是效率革命的关键。

来源：量子位

产品发布/更新

5/31 09:29

从Token无上限到全员Agent：MiniMax的AI Native组织进化实践

与其焦虑AI，不如加入AI

AI 点评 · 从全员Agent到组织进化，MiniMax提供了AI原生企业的实战蓝图。

来源：量子位

产品发布/更新

5/31 09:29

从Token无上限到全员Agent：MiniMax的AI Native组织进化实践

与其焦虑AI，不如加入AI

AI 点评 · 揭示企业如何打破传统架构，全员转型智能体，为组织适应AI时代提供实战模板。

来源：量子位

产品发布/更新

5/31 04:14

SumanD18/sentinel

Open-source observability and trust layer for AI agents: trace every step, score every output, catch hallucinations and runaway loops in real time. Self-hostabl…

来源：GitHub

论文研究

5/31 04:00

3DCodeBench: Benchmarking Agentic Procedural 3D Modeling Via Code

Procedural 3D modeling through code is emerging as a versatile paradigm, offering deterministic, engine-ready, and precisely editable assets that neural 3D generators inherently lack. Authoring such p…

AI 点评 · 首个用代码评估AI三维建模能力的基准，填补了程序化生成与神经渲染之间的测评空白。

来源：HuggingFace Papers

论文研究

5/31 04:00

SkillAdaptor: Self-Adapting Skills for LLM Agents from Trajectories

Large language model (LLM) agents increasingly rely on reusable external skills to solve long-horizon interactive tasks. Existing training-free skill adaptation pipelines usually update skills from fu…

AI 点评 · 用轨迹数据让LLM代理自动进化技能，免训练自适应方案突破长程任务瓶颈。

来源：HuggingFace Papers

论文研究

5/31 04:00

Agent Skills Should Go Beyond Text: The Case for Visual Skills

Reusable skills are a key mechanism for extending agent capabilities, allowing agents to accumulate experience and solve increasingly complex tasks. Yet most existing skill-learning methods store reus…

AI 点评 · 视觉技能弥补语言局限，让智能体在复杂任务中更高效积累经验。

来源：HuggingFace Papers

论文研究

5/31 04:00

Trust Region On-Policy Distillation

On-Policy Distillation (OPD) is a fundamental technique for efficient post-training of large language models (LLMs), with broad applications in agent learning, multi-task enhancement, and model compre…

来源：HuggingFace Papers

论文研究

5/31 04:00

SABER: Benchmarking Operational Safety of LLM Coding Agents in Stateful Project Workspaces

Large language models are increasingly deployed as coding agents, shifting safety from individual responses to action sequences. Existing benchmarks, however, primarily assess whether models refuse un…

来源：HuggingFace Papers

论文研究

5/31 04:00

Honest Lying: Understanding Memory Confabulation in Reflexive Agents

Reflexion-style agents rely on self-generated reflections as memory, implicitly assuming that agents can accurately diagnose their own failures. We show that this assumption can fail systematically: a…

来源：HuggingFace Papers

论文研究

5/31 04:00

FVSpec: Real-World Property-Based Tests as Lean Challenges

We present a benchmark for evaluating AI models and agents on real-world formal software verification tasks. We first scrape 11,039 property-based tests (PBTs) from real-world Python repositories, the…

来源：HuggingFace Papers

产品发布/更新

5/31 00:20

AtomFlow-AI/MoleCode

Molecode presents molecules as code and enables LLMs to operate and reason on chemistry directly.

来源：GitHub

产品发布/更新

5/30 19:53

prashar32/riskkernel

Deterministic cost / loop / time budgets · full observability · crash-resumable runs · human-approval gates · a memory you own. Self-hosted. Your keys. No telem…

AI 点评 · 用确定性成本和可恢复运行打破AI黑箱，赋予用户数据主权的轻量级内核。

来源：GitHub

产品发布/更新

5/30 19:53

prashar32/riskkernel

Deterministic cost / loop / time budgets · full observability · crash-resumable runs · human-approval gates · a memory you own. Self-hosted. Your keys. No telem…

AI 点评 · 将确定性成本、循环时间预算与可恢复运行结合，为AI安全执行提供新范式。

来源：GitHub

行业动态

5/30 18:00

腾讯PCG 质效团队技术负责人张晔确认出席AICon上海站，分享测试智能体驱动质量工程新范式

AI 点评 · 腾讯PCG将前沿AI融入质量工程，展现测试智能体的落地实践，值得关注。

来源：InfoQ

行业动态

5/30 18:00

腾讯PCG 质效团队技术负责人张晔确认出席AICon上海站，分享测试智能体驱动质量工程新范式

AI 点评 · 测试智能体如何重塑质量工程，腾讯大牛现场揭秘实战经验。

来源：InfoQ

产品发布/更新

5/30 14:33

英伟达清华团队提出Gamma-World：世界模型从「一个人玩」到「多人共处」

让世界模型迈向多智能体交互仿真

AI 点评 · 突破单智能体局限，开启世界模型在多人协作与对抗场景的仿真新可能。

来源：量子位

产品发布/更新

5/30 11:17

英伟达清华团队提出Gamma-World：世界模型从「一个人玩」到「多人共处」

让世界模型迈向多智能体交互仿真

AI 点评 · 多智能体交互突破，让AI从单机游戏进化成开放世界，推动具身智能研究迈入新阶段。

来源：量子位

产品发布/更新

5/30 11:17

英伟达清华团队提出Gamma-World：世界模型从「一个人玩」到「多人共处」

让世界模型迈向多智能体交互仿真

AI 点评 · 多智能体交互是世界模型的关键突破，推动AI从单机游戏走向真实协作场景。

来源：量子位

论文研究

5/30 04:00

FineVerify: Scaling Test-Time Compute with Fine-Grained Self-Verification for Agentic Search

Agentic search requires language model agents to explore many sources and answer complex information-seeking questions. Scaling test-time compute is a promising way to improve these agents, but curren…

AI 点评 · 用细粒度自验证扩展测试时计算，首次系统解决智能体搜索中的错误累积问题，为复杂信息检索提供可扩展方案。

来源：HuggingFace Papers

论文研究

5/30 04:00

SuperMemory-VQA: An Egocentric Visual Question-Answering Benchmark for Long-Horizon Memory

AI glasses present a compelling platform for AI agents to serve as personalized memory assistants. To be genuinely useful, such systems must move beyond short-term video comprehension and address memo…

来源：HuggingFace Papers

论文研究

5/30 04:00

Critic-R: Improving Agentic Search using Instruction-tuned Retrievers with Natural Language Introspective Feedback

Agentic search systems iteratively interact with retrieval models to answer complex queries. Despite substantial progress, optimizing retrievers for agentic search remains challenging, often requiring…

来源：HuggingFace Papers

论文研究

5/30 01:57

Stateful Online Monitoring Catches Distributed Agent Attacks

Language models can find thousands of severe software vulnerabilities, and agents are increasingly being misused for cyberattacks. To avoid detection, attackers frequently distribute their misuse, spl…

AI 点评 · 分布式智能体攻击难追踪，状态监测实现实时阻断，提升AI安全防御新高度。

来源：arXiv

论文研究

5/30 01:51

LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards

Long-context reasoning remains a central challenge for large language models, which often fail to locate and integrate key information in extensive distracting content. Reinforcement learning with ver…

来源：arXiv

论文研究

5/30 01:50

Choosing the Lens: Strategic Perspective Activation in Context-Dependent Argumentation

The same arguments often need to be evaluated under different external regimes. An agent with influence over the regime has a strategic lever that standard formalisms do not directly capture. We intro…

AI 点评 · 用博弈视角解析论辩情境依赖，为AI策略性语言操控提供全新建模框架。

来源：arXiv

行业动态

5/30 01:46

Robinhood now lets your AI agents trade stocks

AI 点评 · 罗宾汉平台允许AI代理炒股，开创散户自动化交易新纪元。

来源：Hacker News

行业动态

5/30 01:46

Robinhood now lets your AI agents trade stocks

AI 点评 · Robinhood允许AI代理直接交易，加速金融与AI融合。

来源：Hacker News

产品发布/更新

5/30 01:25

zhnt/loushang

AI-native coding orchestration platform: unified multi-model agent runtime with stateful sessions, tool governance, and traceable delivery.

来源：GitHub

论文研究

5/30 01:00

Preference-Aware Rubric Learning for Personalized Evaluation

As Large Language Models (LLMs) evolve from general-purpose assistants to user-centric agents, personalization has become central to aligning model behavior with individual preferences, making the eva…

AI 点评 · 个性化评估框架创新，让大模型更懂用户，提升人机交互体验。

来源：arXiv

行业动态

5/30 00:13

Cognition’s Scott Wu says AI coding agents shouldn’t replace humans

Cognition makes Devin, the first and arguably most successful AI coding agent. But famed coder Wu says it isn't designed to supplant human programmers.

AI 点评 · AI编程工具定位辅助而非替代，揭示人机协作新方向。

来源：TechCrunch

行业动态

5/30 00:13

Cognition’s Scott Wu says AI coding agents shouldn’t replace humans

Cognition makes Devin, the first and arguably most successful AI coding agent. But famed coder Wu says it isn't designed to supplant human programmers.

AI 点评 · AI编程工具定位为人机协作而非替代，创始人观点打破行业焦虑。

来源：TechCrunch

行业动态

5/29 23:57

CAPTCHAs can still detect AI agents

AI 点评 · 验证码依然能识别AI，凸显当前人机对抗技术的关键进展与挑战。

来源：Hacker News

行业动态

5/29 20:02

宇树具身智能体验馆亚洲首店将于5月31日在上海开业

36氪获悉，宇树科技发文称，5月31日，宇树科技具身智能体验馆亚洲首店将正式登陆上海，门店汇聚G1人形机器人、R1人形机器人、Go2机器狗全系列C端产品。

来源：36氪

行业动态

5/29 18:42

编程 Agent 可能是软件开发史上最昂贵的错误之一

AI 点评 · 观点尖锐，直指AI编程效率背后的隐性成本与风险，引发行业反思。

来源：InfoQ

行业动态

5/29 18:42

编程 Agent 可能是软件开发史上最昂贵的错误之一

AI 点评 · 过度依赖编程Agent可能导致开发效率虚高、维护成本激增。

来源：InfoQ

行业动态

5/29 18:00

美图Roboneo：设计生产场景下多智能体编排工程实践｜AICon上海

AI 点评 · 多智能体协作在工业设计中的首次规模化落地，展现AI从单点工具向系统化生产转型。

来源：InfoQ

行业动态

5/29 18:00

Adobe’s conversational AI agent is a mediocre design intern

AI image tools rarely make me feel like I'm part of the creative process. They are, after all, mostly designed so that people with no design experience can type in a few words and…

AI 点评 · 评测揭示AI工具在创意协作中的局限，提醒行业需更关注人机共创体验而非替代。

来源：The Verge

行业动态

5/29 18:00

美图Roboneo：设计生产场景下多智能体编排工程实践｜AICon上海

AI 点评 · 多智能体编排首次深入设计生产场景，展现AI协同解决复杂任务的工程突破。

来源：InfoQ

行业动态

5/29 18:00

Adobe’s conversational AI agent is a mediocre design intern

AI image tools rarely make me feel like I'm part of the creative process. They are, after all, mostly designed so that people with no design experience can type in a few words and…

AI 点评 · 直击AI工具痛点：设计过程缺乏参与感，暴露当前技术局限。

来源：The Verge

产品发布/更新

5/29 16:32

创意设计版WorkBuddy来了！腾讯发布智能体创意工作室Miora

一个人拥有整个创意工作室

AI 点评 · 用AI降低创意门槛，腾讯Miora让个人也能高效产出专业设计内容。

来源：量子位

产品发布/更新

5/29 16:19

huawei-csl/KVarN

KVarN is a native vLLM KV-cache quantization backend for your agents: 3-5x more context, throughput above FP16, and FP16-level accuracy. Calibration-free, one f…

来源：GitHub

产品发布/更新

5/29 15:25

StarTrail-org/PixelRAG

The end of web parsing. The beginning of scalable pixel-native search.

AI 点评 · 像素级搜索技术突破，终结传统网页解析，开启视觉原生检索新范式。

来源：GitHub

产品发布/更新

5/29 15:25

StarTrail-org/PixelRAG

The end of web parsing. The beginning of scalable pixel-native search.

AI 点评 · 将网页解析转向像素级原生搜索，为多模态检索开辟全新路径。

来源：GitHub

产品发布/更新

5/29 07:57

Claude 4.8炸场！部分能力超过Mythos，支持数百子智能体并行

可以长时间执行任务，人类不用经常回来检查它的工作

AI 点评 · 自主多智能体并行协作，大幅提升复杂任务执行效率与连续性。

来源：量子位

行业动态

5/29 05:24

The internet is being rebuilt for machines

As AI agents move from experiments to production, AWS, Cloudflare, and others are redesigning cloud infrastructure for a future dominated by machine-generated internet traffic inst…

AI 点评 · 云巨头重造底层架构，AI代理将主导未来网络流量。

来源：TechCrunch

行业动态

5/29 05:24

The internet is being rebuilt for machines

As AI agents move from experiments to production, AWS, Cloudflare, and others are redesigning cloud infrastructure for a future dominated by machine-generated internet traffic inst…

AI 点评 · 云巨头正为AI时代重构网络，机器流量将主导未来，基础设施变革迫在眉睫。

来源：TechCrunch

产品发布/更新

5/29 04:58

JoniMartin27/lookspan

Local-first observability dashboard for AI agents. MCP-native. Look at every span your agents emit.

来源：GitHub

技巧与观点

5/29 04:32

Evaluating Deep Agents using LangSmith on AWS

This post combines learnings from LangChain’s work on evaluating deep agents and Anthropic’s guide to demystifying evals for AI agents into a practical guide. In this post, you wil…

AI 点评 · 结合LangChain与Anthropic经验，提供AWS上评估深度代理的实用指南，填补实操空白。

来源：AWS ML

技巧与观点

5/29 04:32

Evaluating Deep Agents using LangSmith on AWS

This post combines learnings from LangChain’s work on evaluating deep agents and Anthropic’s guide to demystifying evals for AI agents into a practical guide. In this post, you wil…

AI 点评 · 结合LangChain与Anthropic的评估经验，为复杂AI代理提供实用评测指南，填补行业方法论

来源：AWS ML

行业动态

5/29 04:29

Fed up with vibe coders, dev sneaks data-nuking prompt injection into their code

Undisclosed addition in jqwik instructed AI coding agents to delete app output.

AI 点评 · 开发者用恶意代码反制AI编码工具，暴露了人机协作中的安全漏洞与信任危机。

来源：Ars Technica

行业动态

5/29 04:29

Fed up with vibe coders, dev sneaks data-nuking prompt injection into their code

Undisclosed addition in jqwik instructed AI coding agents to delete app output.

AI 点评 · 开发者用提示注入反制低代码乱象，揭示AI安全与人类创意间的冲突新战场。

来源：Ars Technica

行业动态

5/29 04:06

Asana acquires no-code agent-builder StackAI

Asana will incorporate StackAI into its growing suite of AI workflow tools.

AI 点评 · Asana收购无代码智能体构建工具，加速企业AI工作流自动化布局。

来源：TechCrunch

行业动态

5/29 04:06

Asana acquires no-code agent-builder StackAI

Asana will incorporate StackAI into its growing suite of AI workflow tools.

AI 点评 · Asana收购无代码智能体构建工具，加速AI工作流布局，降低企业自动化门槛。

来源：TechCrunch

论文研究

5/29 04:00

COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation

LLM agents are increasingly expected not only to complete isolated tasks, but also to carry bounded representations of human expertise, judgment, and interaction style. Building such person-grounded a…

AI 点评 · 用专家知识蒸馏让AI自动生成人类技能，大幅提升智能体专业性和拟人化水平。

来源：HuggingFace Papers

论文研究

5/29 04:00

Task-Focused Memorization for Multimodal Agents

Long-term memory is essential for multimodal agents to build coherent experience, accumulate world knowledge, and achieve continual learning. However, constructing effective memory goes beyond memory…

AI 点评 · 聚焦多模态智能体的长期记忆构建，突破传统记忆局限，实现持续学习与知识积累。

来源：HuggingFace Papers

论文研究

5/29 04:00

From Prompt Injection to Persistent Control: Defending Agentic Harness Against Trojan Backdoors

LLM agents are evolving from conversational chatbots to operational tools in real-world workspaces. In local agentic harnesses, an LLM can read and write files, call tools, and reuse workspace state a…

AI 点评 · 揭示LLM代理从对话到操作工具的安全漏洞，提出防御后门攻击的新思路，对AI安全至关重要。

来源：HuggingFace Papers

论文研究

5/29 04:00

LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards

AI 点评 · 从搜索代理轨迹中学习长上下文推理，用评分奖励机制提升模型信息整合能力。

来源：HuggingFace Papers

论文研究

5/29 04:00

Emergent Languages in Populations of Language Model Agents: From Token Efficiency to Oversight Evasion

Monitoring autonomous language model agents currently relies mostly on surface behavior. But what happens when agent populations invent new languages with the goal of avoiding human oversight. Here, w…

AI 点评 · AI自主创制新语言规避人类监管，暴露智能体协同的深层安全风险。

来源：HuggingFace Papers

论文研究

5/29 04:00

Masking Stale Observations Helps Search Agents -- Until It Doesn't: A Regime Map and Its Mechanism

Long-horizon search agents accumulate large amounts of retrieved content across many tool calls, making context-budget efficiency increasingly important. A minimal intervention is to mask stale observ…

AI 点评 · 用简洁机制揭示信息遮蔽策略的临界点，为长程搜索智能体优化提供实用边界。

来源：HuggingFace Papers

论文研究

5/29 04:00

Skill is Not One-Size-Fits-All: Model-Aware Skill Alignment for LLM Agents

LLM agents increasingly retrieve externally curated skills-procedural instructions retrieved at decision time-to improve performance on long-horizon interactive tasks. Existing skill libraries are typ…

AI 点评 · 打破通用技能库局限，提出模型感知对齐，让智能体任务适配更精准高效。

来源：HuggingFace Papers

论文研究

5/29 04:00

MineExplorer: Evaluating Open-World Exploration of MLLM Agents in Minecraft

Multimodal large language models (MLLMs) have shown strong capabilities in perception, reasoning, and action generation. However, their ability to sustain exploration in dynamic open worlds remains un…

AI 点评 · 首个用《我的世界》评估多模态大模型开放世界探索能力的基准，填补了该领域测试空白。

来源：HuggingFace Papers

论文研究

5/29 04:00

MindZero: Learning Online Mental Reasoning With Zero Annotations

Effective real-world assistance requires AI agents with robust Theory of Mind (ToM): inferring human mental states from their behavior. Despite recent advances, several key challenges remain, includin…

来源：HuggingFace Papers

论文研究

5/29 04:00

SpatialAct: Probing Spatial Reasoning-to-Action Capabilities of VLM Agents in 3D Scenes

Humans can effortlessly perceive spatial layouts, form cognitive representations, reason about spatial relations, and translate such reasoning into actions in everyday 3D environments. Although recent…

来源：HuggingFace Papers

论文研究

5/29 04:00

AgentOdyssey: Open-Ended Long-Horizon Text Game Generation for Test-Time Continual Learning Agents

For agents to learn continuously from interaction with the world at test time, they must be able to explore effectively, acquire new world knowledge and skills, retain relevant episodic experiences, a…

来源：HuggingFace Papers

技巧与观点

5/29 02:10

Build a test suite that grows with your agent with dataset management in Amazon Bedrock AgentCore

Agent evaluation is most powerful when you combine fast-moving online signals with stable offline baselines. To understand whether your agent is truly improving over time, you need…

AI 点评 · 用数据集管理建立随智能体成长的测试套件，是平衡在线信号与离线基准、追踪真实进步的关键。

来源：AWS ML

技巧与观点

5/29 02:10

Build a test suite that grows with your agent with dataset management in Amazon Bedrock AgentCore

Datasets in AgentCore is in public preview. Agent evaluation is most powerful when you combine fast-moving online signals with stable offline baselines. To understand whether your…

AI 点评 · 亚马逊Bedrock新功能让AI代理测试集动态扩展，平衡线上信号与离线基准，实现持续性能追踪。

来源：AWS ML

论文研究

5/29 01:59

Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific Software

Are AI agents tools, co-authors, or researchers? We present a quantified case study ($N=1$): a physicist supervising an AI coding agent (Claude Code, Sonnet and Opus models) over 12 work days and 57 s…

AI 点评 · 物理学家监督AI编码的实证研究，揭示人机协作在科学软件开发中的新边界。

来源：arXiv

论文研究

5/29 01:58

Locally Coherent, Globally Incoherent: Bounding Compositional Incoherence in Multi-Component LLM Agents

Multi-component LLM agents assemble probabilistic claims from components that each see only part of a joint problem; the composition can violate basic probability axioms even when every component is l…

AI 点评 · 揭示多组件LLM代理因局部概率合理却全局逻辑矛盾的问题，直指当前AI系统可靠性的核心短板。

来源：arXiv

论文研究

5/29 01:57

SoundnessBench: Can Your AI Scientist Really Tell Good Research Ideas from Bad Ones?

Autonomous AI research agents aim to accelerate scientific discovery by automating the research pipeline, from hypothesis generation to peer review. However, existing benchmarks rarely test a fundamen…

AI 点评 · 评估AI研究想法的质量，比生成想法更重要，这是迈向自主科研的关键一步。

来源：arXiv

论文研究

5/29 01:56

Gram: Assessing sabotage propensities via automated alignment auditing

We introduce Gram, an automated alignment auditing framework to assess the propensity of AI agents to engage in sabotage. We evaluate Gemini models across 17 simulated agentic deployment scenarios tha…

AI 点评 · 自动对齐审计框架首次量化评估AI的蓄意破坏倾向，为AI安全治理提供可操作工具。

来源：arXiv

模型发布/更新

5/29 01:51

Claude Opus 4.8 is now available on AWS

This post covers Opus 4.8's improvements and practical guidance for AI engineers integrating the model into agentic systems and production inference workloads on Amazon Bedrock.

AI 点评 · Claude新版本登陆AWS，专为智能体系统优化，工程落地价值显著。

来源：AWS ML

模型发布/更新

5/29 01:51

Claude Opus 4.8 is now available on AWS

This post covers Opus 4.8's improvements and practical guidance for AI engineers integrating the model into agentic systems and production inference workloads on Amazon Bedrock.

AI 点评 · Claude新模型登陆AWS，为AI工程化部署提供关键升级，值得开发者关注。

来源：AWS ML

技巧与观点

5/28 23:45

Agent 能用，为什么还是不好用？

AI 点评 · 聚焦AI实用性与体验落差，点出技术落地中的关键矛盾。

来源：InfoQ

技巧与观点

5/28 23:45

Agent 能用，为什么还是不好用？

来源：InfoQ

产品发布/更新

5/28 23:35

Sesame, the conversational AI startup from Oculus founders, launches its iOS app

Sesame’s new iOS app brings its conversational AI agents to the public, offering more natural back-and-forth interactions designed to feel less like traditional chatbots and more l…

AI 点评 · Oculus创始人出品，让AI对话更接近真人互动，或重新定义语音助手体验。

来源：TechCrunch

技巧与观点

5/28 23:08

深度对话：后龙虾时代，企业Agent从“能用”到“生产级”的差距在哪里？

AI 点评 · 企业Agent从演示到落地，跨越“生产级”鸿沟的实践路径与真实挑战。

来源：InfoQ

行业动态

5/28 23:05

Show HN: Ktx – Open-source executable context layer for data agents

Hi HN, we’re open-sourcing ktx. It’s an executable context layer that makes agents reliable on your data stack. We built it after going through the experience of building productio…

AI 点评 · 开源可执行的上下文层，让AI代理在数据栈上更可靠，填补了数据与智能体间的关键空白。

来源：Hacker News

行业动态

5/28 23:05

Show HN: Ktx – Open-source executable context layer for data agents

Hi HN, we’re open-sourcing ktx. It’s an executable context layer that makes agents reliable on your data stack. We built it after going through the experience of building productio…

来源：Hacker News

行业动态

5/28 23:05

Show HN: Ktx – Open-source executable context layer for data agents

Hi HN, we’re open-sourcing ktx. It’s an executable context layer that makes agents reliable on your data stack. We built it after going through the experience of building productio…

来源：Hacker News

行业动态

5/28 22:46

腾讯云 DatabaseClaw：让 AI Agent 真正接管生产数据库 | 腾讯云数据库 DBTalk

AI 点评 · 首个实现AI Agent安全接管生产数据库的云方案，降低运维风险与成本。

来源：InfoQ

行业动态

5/28 22:41

Agentic Coding + ClickHouse: 1人1栈1应用，AI全栈几天搞定

AI 点评 · AI全栈开发效率革命，单人企业级应用落地门槛骤降。

来源：InfoQ

行业动态

5/28 22:25

昇腾管推理、鲲鹏管Agent，Agentic AI 让 CPU 重回舞台中央

AI 点评 · 算力分工重塑，CPU在AI时代重获核心价值。

来源：InfoQ

行业动态

5/28 22:14

腾讯云 DMC 重塑人机协同：让 Agent“用得顺”更“管得住” | 腾讯云数据库 DBTalk

AI 点评 · 聚焦人机协同的实用性与可控性，解决Agent落地中的管理痛点。

来源：InfoQ

行业动态

5/28 22:00

Visa invests in Replit to power agentic payments for developers

Visa said that over 1,000 employees have been using Replit for prototyping and development.

AI 点评 · Visa战略投资Replit，推动开发者自主支付，体现金融巨头加速布局AI代理经济。

来源：TechCrunch

行业动态

5/28 21:02

Show HN: Continue? Y/N: A 60-second game about AI agent permission fatigue

AI 点评 · 用60秒反思AI频繁请求权限的疲劳感，设计巧妙，直击用户痛点。

来源：Hacker News

行业动态

5/28 21:02

Show HN: Continue? Y/N: A 60-second game about AI agent permission fatigue

AI 点评 · 一分钟游戏，精准戳中AI权限疲劳痛点，值得体验。

来源：Hacker News

模型发布/更新

5/28 20:00

How Endava builds an agentic organization with Codex

Learn how Endava uses Codex to build an agentic organization, accelerating software delivery and reducing requirements analysis from weeks to hours.

AI 点评 · Endava借助Codex将需求分析周期从数周缩至数小时，展示了AI代理加速软件交付的实战价值。

来源：OpenAI

模型发布/更新

5/28 20:00

How Endava builds an agentic organization with Codex

Learn how Endava uses Codex to build an agentic organization, accelerating software delivery and reducing requirements analysis from weeks to hours.

AI 点评 · 恩达瓦用Codex将需求分析从周缩短到小时，展示了AI代理加速软件交付的实战价值。

来源：OpenAI

产品发布/更新

5/28 19:57

科氪 | 雷神联合AMD发布覆盖三大形态AI工作站产品矩阵

5月28日，雷神在北京举办以《聚势共生智算同行》为主题的AI工作站新品发布会，正式推出覆盖塔式、迷你PC和移动三大类别的AI工作站全场景产品矩阵。这是业内首批完成三大形态全覆盖的AI工作站产品发布，以行业领先的品类矩阵和旗舰级算力水准，重新定义了AI工作站的性能基准。官方图片 AI 正式迈入智能体时代，行业从文本预测转向自主逻辑思考，未来 AI 算力需求…

AI 点评 · 雷神联合AMD率先实现AI工作站三大形态全覆盖，展现行业标杆级算力布局。

来源：36氪

产品发布/更新

5/28 18:45

2aronS/Duel-Agents

CLI, SDK, and IDE plugins for Duel Agents

AI 点评 · 多智能体协作开发工具链，降低AI应用开发门槛。

来源：GitHub

产品发布/更新

5/28 18:45

2aronS/Duel-Agents

CLI, SDK, and IDE plugins for Duel Agents

AI 点评 · 用命令行工具和插件简化AI智能体开发，提升调试效率。

来源：GitHub

产品发布/更新

5/28 17:03

Health-Yang/MineEcho

Local-first Memory OS for personal AI assistants with L0-L3 memory, Wiki++ knowledge, skill routing, and TokenLess context compression.

AI 点评 · 个人AI助手本地记忆系统，实现知识路由与无令牌压缩，突破云端依赖瓶颈。

来源：GitHub

产品发布/更新

5/28 16:01

7B打败o3、GPT-5！医学AI智能体让模型学会“看哪里、怎么看”

医学AI Agent到了关键拐点

AI 点评 · 突破性实现小参数量模型超越顶级大模型，为垂直领域AI应用树立新标杆。

来源：量子位

行业动态

5/28 15:00

Vertu wants CEOs to run companies from an AI foldable starting at $6,880

Built on top of the open source Hermes project, Vertu's new foldable combines AI-agent workflows, enterprise integrations, and ultra-premium luxury finishes.

AI 点评 · 奢侈手机品牌用开源AI系统打造企业级折叠机，把高管办公场景与AI代理深度绑定，高端市场差异化打法值得

来源：TechCrunch

产品发布/更新

5/28 09:46

modelstudioai/cli

Official Model Studio CLI（阿里云百炼 CLI）built for AI Agent frameworks, exposing models, search, multimodal, and workflow capabilities as structured tool calls.

来源：GitHub

技巧与观点

5/28 07:44

sqlite AGENTS.md

sqlite AGENTS.md SQLite gained an AGENTS.md file five days ago - but it's not intended for their own development, it's presumably aimed at people who are pointing agents at the SQL…

AI 点评 · SQLite为AI代理设开发规范，开创数据库工具与AI协作新范式。

来源：Simon Willison

技巧与观点

5/28 07:44

sqlite AGENTS.md

sqlite AGENTS.md SQLite gained an AGENTS.md file five days ago - but it's not intended for their own development, it's presumably aimed at people who are pointing agents at the SQL…

AI 点评 · SQLite新增AGENTS.md，专为AI代理设计，体现数据库与智能工具融合新趋势。

来源：Simon Willison

产品发布/更新

5/28 06:47

helloianneo/ian-xiaohei-illustrations

中文小黑怪诞正文配图生成 Skill | 16:9 白底手绘 | 少量红橙蓝批注 | Codex Skill

AI 点评 · 用代码生成中文怪诞插画，AI绘画技能定制化新玩法。

来源：GitHub

产品发布/更新

5/28 06:47

helloianneo/ian-xiaohei-illustrations

中文小黑怪诞正文配图生成 Skill | 16:9 白底手绘 | 少量红橙蓝批注 | Codex Skill

AI 点评 · 结合手绘与AI生成，打造独特怪诞视觉风格，创意与工具融合的趣味尝试。

来源：GitHub

技巧与观点

5/28 04:06

Building AI agents for business support using Amazon Bedrock AgentCore

In this post, we share how the AWS Generative AI Innovation Center (GenAIIC) collaborated with Works Human Intelligence (WHI) to build two AI agents using Amazon Bedrock AgentCore.…

AI 点评 · 亚马逊Bedrock AgentCore让企业低门槛构建AI助手，实际案例展示落地价值。

来源：AWS ML

技巧与观点

5/28 04:06

Building AI agents for business support using Amazon Bedrock AgentCore

In this post, we share how the AWS Generative AI Innovation Center (GenAIIC) collaborated with Works Human Intelligence (WHI) to build two AI agents using Amazon Bedrock AgentCore.…

AI 点评 · 用亚马逊Bedrock AgentCore构建商业AI助手，为企业自动化客服与流程优化提供可落地的技

来源：AWS ML

技巧与观点

5/28 04:01

From data overload to actionable insights: How Verizon Connect scaled agentic AI to 100,000 users

In this post, we show you how Verizon Connect built and scaled an agentic AI solution to transform overwhelming fleet data into clear, actionable insights for 100,000 users daily.…

AI 点评 · 企业级AI规模化落地典范，彰显从海量数据到精准决策的转化路径与用户价值。

来源：AWS ML

技巧与观点

5/28 04:01

From data overload to actionable insights: How Verizon Connect scaled agentic AI to 100,000 users

In this post, we show you how Verizon Connect built and scaled an agentic AI solution to transform overwhelming fleet data into clear, actionable insights for 100,000 users daily.…

AI 点评 · 用AI将海量车队数据转化为每日10万用户的决策指引，规模化落地经验值得行业借鉴。

来源：AWS ML

论文研究

5/28 04:00

When Cloud Agents Meet Device Agents: Lessons from Hybrid Multi-Agent Systems

The design space of agentic AI inference spans two extremes: frontier large language models (LLMs), typically hosted in the cloud and offering strong performance across a wide range of tasks at substa…

AI 点评 · 混合智能体系统设计将云与设备端协同，为AI落地提供关键平衡方案。

来源：HuggingFace Papers

论文研究

5/28 04:00

Discovering Cooperative Pipelines: Autoresearch for Sequential Social Dilemmas

We study two-level autoresearch for cooperation: an outer-loop AI agent autonomously redesigns the inner-loop pipeline of an LLM policy-synthesis system for multi-agent Sequential Social Dilemmas (SSD…

AI 点评 · 自动探索合作策略的AI管道设计，为多智能体序贯社会困境提供创新解法。

来源：HuggingFace Papers

论文研究

5/28 04:00

Towards Verifiable Multimodal Deep Research: A Multi-Agent Harness for Interleaved Report Generation

Large Language Models (LLMs) have advanced autonomous agents from deep search, which retrieves concise factual answers, to deep research, which synthesizes scattered evidence into long-form reports. H…

AI 点评 · 多智能体协作生成可验证长报告，突破深度研究可信度瓶颈，推动AI从搜索迈向论证。

来源：HuggingFace Papers

论文研究

5/28 04:00

CausaLab: A Scalable Environment for Interactive Causal Discovery Toward AI Scientists

We introduce CausaLab, a scalable environment for evaluating interactive causal discovery by LLM agents. Unlike prior evaluations, CausaLab evaluates both whether an agent can solve a problem using ca…

AI 点评 · 首个可扩展的因果发现交互环境，为AI科学家研究因果推理提供标准化测试平台。

来源：HuggingFace Papers

论文研究

5/28 04:00

PhoneWorld: Scaling Phone-Use Agent Environments

A central bottleneck for phone-use agents is that controllable, reproducible environments covering real mobile behavior are hard to build at scale. Existing mobile-agent benchmarks have made important…

AI 点评 · 首个大规模可复现手机操作环境，填补真实移动行为数据空白，加速AI代理实用化进程。

来源：HuggingFace Papers

论文研究

5/28 04:00

WorldMemArena: Evaluating Multimodal Agent Memory Through Action-World Interaction

Multimodal large language models are increasingly deployed as long-horizon agents, where memory must do more than recall: it must track an evolving world, revise what has gone stale, and surface the r…

AI 点评 · 评估多模态智能体的动态记忆，推动从简单回忆到世界建模的跃迁。

来源：HuggingFace Papers

论文研究

5/28 04:00

GenClaw: Code-Driven Agentic Image Generation

Image generation models have evolved from text-conditioned pixel synthesis toward multimodal agents endowed with visual comprehension and tool invocation capabilities. Yet, existing agents remain at t…

AI 点评 · 代码驱动生成图像，打通语言与视觉鸿沟，开辟智能代理新范式。

来源：HuggingFace Papers

论文研究

5/28 04:00

AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security

Modern open-world agents such as OpenClaw exhibit powerful cross-environment execution capabilities yet introduce broad new safety risk sources. Meanwhile, advanced frontier AI models drastically lowe…

AI 点评 · 轻量级框架解决AI代理安全痛点，兼顾可扩展性与实用性，填补行业空白。

来源：HuggingFace Papers

论文研究

5/28 04:00

UI-KOBE: Knowledge-Oriented Behavior Exploration for Lightweight Graph-Guided GUI Agents

Recent advances in mobile GUI agents have shown strong potential for automating mobile tasks, but most effective systems still depend on large vision-language models for screenshot understanding and l…

AI 点评 · 轻量级GUI智能体通过知识图谱实现高效行为探索，突破大模型依赖瓶颈。

来源：HuggingFace Papers

论文研究

5/28 04:00

LiteCoder-Terminal: Scaling Long-Horizon Terminal Environments for Learning Language Agents

Mastering terminal environments requires language agents capable of multi-step planning, feedback-grounded execution, and dynamic state adaptation. However, training such agents is currently bottlenec…

AI 点评 · 突破终端环境长程任务训练瓶颈，为语言智能体提供可扩展的高效学习框架。

来源：HuggingFace Papers

论文研究

5/28 04:00

CoHyDE: Iterative Co-Training of LLM Rewriter & Dense Encoder for Tool Retrieval

Tool retrieval over large API catalogs is a core bottleneck for LLM agents: user queries arrive in colloquial, often underspecified language, while the catalog uses technical API vocabulary that no fi…

AI 点评 · 用迭代协同训练解决大模型工具检索中自然语言与技术术语的语义鸿沟，提升复杂API调用的准确率。

来源：HuggingFace Papers

论文研究

5/28 04:00

OpenSkillEval: Automatically Auditing the Open Skill Ecosystem for LLM Agents

Skills, i.e., structured workflow instructions distilled for large language models (LLMs), are becoming an increasingly important mechanism for improving agent performance on real-world downstream tas…

AI 点评 · 自动化审计开源技能生态，填补LLM代理安全评估空白，推动AI应用标准化。

来源：HuggingFace Papers

论文研究

5/28 04:00

Exploring Autonomous Agentic Data Engineering for Model Specialization

Large Language Models (LLMs) have demonstrated strong performance on general tasks, while often struggling to adapt to specialized domains without high-quality domain-specific data. Existing LLM-based…

AI 点评 · 自主智能体解决大模型领域数据瓶颈，开辟模型专业化新路径。

来源：HuggingFace Papers

论文研究

5/28 04:00

LongDS-Bench: On the Failure of Long-Horizon Agentic Data Analysis

Real-world data analysis is inherently iterative, yet existing benchmarks mostly evaluate isolated or short interactive tasks, leaving agents' ability to track evolving analytical context over long ho…

AI 点评 · 长期自主数据分析基准揭示AI在持续追踪复杂分析进程中的关键短板。

来源：HuggingFace Papers

论文研究

5/28 04:00

CoHyDE: Iterative Co-Training of LLM Rewriter & Dense Encoder for Tool Retrieval

AI 点评 · 用迭代协同训练解决大模型工具检索中口语与API术语的语义鸿沟，突破性提升检索精度。

来源：HuggingFace Papers

论文研究

5/28 04:00

When Cloud Agents Meet Device Agents: Lessons from Hybrid Multi-Agent Systems

来源：HuggingFace Papers

论文研究

5/28 04:00

Recovering Policy-Induced Errors: Benchmarking and Trajectory Synthesis for Robust GUI Agents

While GUI agents have advanced rapidly, they often lack the robustness to recover from their own errors, hindering real-world deployment. To bridge this gap at both the evaluation and data levels, we…

AI 点评 · 为GUI智能体提供自我纠错能力评估基准与轨迹合成方法，填补了实际部署中的关键空白。

来源：HuggingFace Papers

论文研究

5/28 04:00

Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents

LLM agents are increasingly deployed as systems built around editable external harnesses, including prompts, skills, memories and tools, that shape task execution without changing model parameters. Ha…

AI 点评 · 揭示大模型进化本质：外部系统更新不等于模型能力提升，为自我进化智能体研究厘清关键概念。

来源：HuggingFace Papers

论文研究

5/28 04:00

Memory-Bound but Not Bandwidth-Limited: The Physical AI Inference Gap in Batch-1 LLM Decode

Physical AI systems, including robots, autonomous vehicles, embodied agents and edge copilots, often run a different inference workload from cloud LLM serving: single-stream, batch-1 autoregressive de…

来源：HuggingFace Papers

论文研究

5/28 04:00

GrepSeek: Training Search Agents for Direct Corpus Interaction

Large Language Model (LLM) search agents have shown strong promise for knowledge-intensive language tasks through multiple rounds of reasoning and information retrieval. Most existing systems access i…

来源：HuggingFace Papers

论文研究

5/28 04:00

SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search

Agentic search enables LLMs to solve complex multi-hop questions through iterative reasoning and external search. Despite the effectiveness, these systems often suffer from a critical limitation in pr…

来源：HuggingFace Papers

论文研究

5/28 04:00

SoundnessBench: Can Your AI Scientist Really Tell Good Research Ideas from Bad Ones?

来源：HuggingFace Papers

论文研究

5/28 04:00

Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs

Scientific figures are among the most effective means of communicating complex research ideas, yet producing publication-quality illustrations remains one of the most labor-intensive parts of paper pr…

AI 点评 · 用AI多智能体协作生成可编辑科研图表，大幅降低论文配图制作门槛，提升科研效率。

来源：HuggingFace Papers

论文研究

5/28 04:00

Meta-Cognitive Memory Policy Optimization for Long-Horizon LLM Agents

Memory-augmented LLM agents tackle complex long-horizon tasks by recursively summarizing interaction trajectories into compact memory. However, existing approaches typically train these memory policie…

来源：HuggingFace Papers

论文研究

5/28 04:00

Towards Human-Like Interactive Speech Recognition With Agentic Correction and Semantic Evaluation

Automatic speech recognition (ASR) is a core component of human--computer interaction and an increasingly important front-end for LLM-based assistants and agents. However, most current ASR systems sti…

来源：HuggingFace Papers

论文研究

5/28 04:00

ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research

AI coding agents are increasingly used for scientific work, but their end-to-end autonomous research capability remains difficult to verify. We present ResearchClawBench, a benchmark for evaluating au…

来源：HuggingFace Papers

技巧与观点

5/28 02:00

Powering agentic AI sales strategy with Amazon Bedrock AgentCore

As agent adoption scaled, we saw a common pattern emerge across enterprises, including our own sales organization: specialized agents deliver value, but without orchestration, user…

AI 点评 · 亚马逊用自家销售实战验证Agent编排的价值，为企业规模化部署AI代理提供可复用的参考。

来源：AWS ML

技巧与观点

5/28 02:00

Powering agentic AI sales strategy with Amazon Bedrock AgentCore

As agent adoption scaled, we saw a common pattern emerge across enterprises, including our own sales organization: specialized agents deliver value, but without orchestration, user…

AI 点评 · 用Bedrock AgentCore编排多智能体协作，是企业规模化部署AI销售的关键突破。

来源：AWS ML

行业动态

5/28 01:42

Multi-Agent LLM System for Automated Vulnerability Discovery and Reproduction

AI 点评 · 多智能体协作自动化漏洞发现，大幅提升安全检测效率与覆盖范围。

来源：Hacker News

行业动态

5/28 01:42

Multi-Agent LLM System for Automated Vulnerability Discovery and Reproduction

AI 点评 · 多智能体协同自动化漏洞挖掘，显著提升安全检测效率与可复现性。

来源：Hacker News

技巧与观点

5/28 01:20

ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM

AI 点评 · 首个企业IT智能体基准测试揭示前沿模型能力不足，为AI落地关键场景提供重要参考。

来源：HuggingFace Blog

技巧与观点

5/28 01:20

ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM

AI 点评 · 首个企业IT代理基准测试揭示AI前沿模型表现不足，行业应用瓶颈突破需关注。

来源：HuggingFace Blog

模型发布/更新

5/28 00:00

AI Factories: The New Infrastructure of Intelligence

AI factories are token factories, converting power into intelligence in real time. And as agentic AI scales and autonomous, always-on special agents are deployed in the enterprise,…

AI 点评 · AI工厂将电力实时转化为智能，标志着智能基础设施革命的开端。

来源：NVIDIA

产品发布/更新

5/27 22:44

DYAI2025/Plumbline

Plumbline — a self-learning, customer-value-governed agile AI agent team for Claude Code. 87 subagents + skills, TDD defense-in-depth gates, Kaizen retros, a fo…

来源：GitHub

行业动态

5/27 22:36

Robinhood will let your AI agent trade stocks and make (or lose) lots of money

Robinhood is opening its trading platform to AI agents. In an announcement on Wednesday, Robinhood says traders can now create a separate account for an AI agent and add a specific…

AI 点评 · AI代理自主炒股开启散户新纪元，风险与机遇并存。

来源：The Verge

行业动态

5/27 21:46

Why AI Agents Cannot Change Software Systems

AI 点评 · AI代理的局限性暴露，暗示当前技术难以颠覆软件底层架构。

来源：Hacker News

产品发布/更新

5/27 20:46

kesslernity/awesome-copilot-chat-agents

82 ready-to-deploy Microsoft Copilot Chat agents — paste the instruction block into Copilot Studio and you're live. Writing, HR, PM, IT ops, Sales, Finance, Eng…

来源：GitHub

产品发布/更新

5/27 20:05

op7418/guizang-social-card-skill

🪧 Claude Code / Codex skill — generate Xiaohongshu carousels & WeChat 21:9+1:1 cover pairs. Editorial × Swiss visual systems, 28 layouts, 10 themes, single-fil…

AI 点评 · 将小红书爆款排版与微信封面设计自动化，融合瑞士视觉系统，极大提升内容生产效率。

来源：GitHub

产品发布/更新

5/27 20:05

op7418/guizang-social-card-skill

🪧 Claude Code / Codex skill — generate Xiaohongshu carousels & WeChat 21:9+1:1 cover pairs. Editorial × Swiss visual systems, 28 layouts, 10 themes, single-fil…

AI 点评 · 将小红书和微信封面设计自动化，融合瑞士视觉系统，极大提升内容生产效率。

来源：GitHub

产品发布/更新

5/27 18:06

repanareddysekhar/llm-obs

Lightweight Python SDK for LLM inference logging and observability

来源：GitHub

产品发布/更新

5/27 16:25

yb2460/harness-anything

Harness Anything - AI agent control hub: WPS, MS Office, Zotero, Photoshop, 47 CLI commands, 27 academic skills, SVG-to-PPTX

AI 点评 · 用命令行让AI自动操控WPS三大组件，打通办公软件自动化新路径。

来源：GitHub

模型发布/更新

5/27 15:00

Building self-improving tax agents with Codex

See how OpenAI, Thrive, and Crete built a self-improving tax agent with Codex, automating filings, improving accuracy, and accelerating workflows.

AI 点评 · 用Codex构建自我进化的税务代理，展现AI在专业领域的自动化与精度突破。

来源：OpenAI

模型发布/更新

5/27 15:00

Building self-improving tax agents with Codex

See how OpenAI, Thrive, and Crete built a self-improving tax agent with Codex, automating filings, improving accuracy, and accelerating workflows.

AI 点评 · 利用Codex实现税务代理自我进化，自动化与准确性双提升，开辟AI落地新场景。

来源：OpenAI

产品发布/更新

5/27 13:46

withkynam/vibecode-pro-max-kit

Your AI forgets. This remembers. Spec-driven coding harness for vibecoders, product owners, CEOs and real builders — self-improving context memory, 12 agents, 3…

AI 点评 · 用结构化记忆解决AI遗忘痛点，12个智能体协同，适合追求效率的开发者。

来源：GitHub

产品发布/更新

5/27 13:46

withkynam/vibecode-pro-max-kit

Your AI forgets. This remembers. Spec-driven coding harness for vibecoders, product owners, CEOs and real builders — self-improving context memory, 15 agents, 3…

AI 点评 · 用12个智能体构建自进化记忆系统，专为追求高效编码的实干者设计，重新定义AI协作体验。

来源：GitHub

产品发布/更新

5/27 10:28

WhitzardAgent/AgentGuard

AgentGuard: Zero-Trust Security Foundation for AI Agents

来源：GitHub

产品发布/更新

5/27 10:08

nexu-io/html-video

Programmatic video for coding agents — HTML to video on your laptop. Turn HTML, CSS & data into real MP4s with pluggable render engines, 21 templates, AI soundt…

来源：GitHub

模型发布/更新

5/27 08:00

Warp’s big bet on building open source with GPT-5.5

Warp uses GPT-5.5 and OpenAI models to coordinate coding agents across local, cloud, and open-source development workflows.

AI 点评 · 开源协作与前沿模型结合，展现AI编程工具跨环境协调的新可能。

来源：OpenAI

模型发布/更新

5/27 08:00

Warp’s big bet on building open source with GPT-5.5

Warp uses GPT-5.5 and OpenAI models to coordinate coding agents across local, cloud, and open-source development workflows.

AI 点评 · Warp结合GPT-5.5与开源，探索跨环境编程新范式，值得关注。

来源：OpenAI

模型发布/更新

5/27 05:15

NVIDIA Vera CPU Is ‘Packing a Heavy-Hitting Punch’ Against Competition

The shift to agentic AI creates a new CPU requirement for the AI factory: fast cores, massive memory bandwidth and the ability to sustain high performance when all cores are active…

AI 点评 · NVIDIA新CPU针对AI工厂优化，性能强劲，或重塑AI计算格局。

来源：NVIDIA

论文研究

5/27 04:00

Skill0.5: Joint Skill Internalization and Utilization for Out-of-Distribution Generalization in Agentic Reinforcement Learning

Equipping large language models with explicit skills has emerged as a promising paradigm for enabling autonomous agents to solve complex tasks. Agent skills can be inherently divided into general skil…

AI 点评 · 聚焦大模型技能内化与利用，突破分布外泛化难题，为智能体强化学习开辟新路径。

来源：HuggingFace Papers

论文研究

5/27 04:00

AsyncTool: Evaluating the Asynchronous Function Calling Capability under Multi-Task Scenarios

Large language model (LLM)-based agents have shown strong capabilities in using external tools to solve complex tasks. However, existing evaluations often overlook the temporal dimension of tool use,…

AI 点评 · 评估多任务场景下异步函数调用能力，填补了时序维度空白，对真实应用更具参考价值。

来源：HuggingFace Papers

论文研究

5/27 04:00

OR-Space: A Full-Lifecycle Workspace Benchmark for Industrial Optimization Agents

Large language model (LLM) agents are increasingly used to assist with operations research (OR) modeling, yet existing OR-oriented benchmarks often reduce evaluation to one-shot translation from a sel…

AI 点评 · 首个覆盖工业优化全流程的智能体基准，填补了当前评估体系仅关注单次翻译的空白。

来源：HuggingFace Papers

论文研究

5/27 04:00

LACUNA: Safe Agents as Recursive Program Holes

LLM agents increasingly act by writing code, yet a split persists between the runtime that drives the agent and the code the model writes. The runtime owns the loop, context, and control flow, and the…

AI 点评 · 用递归编程漏洞让AI代理安全可控，打破运行时与模型代码的割裂，设计思路新颖。

来源：HuggingFace Papers

论文研究

5/27 04:00

GUI-CIDER: Mid-training GUI Agents via Causal Internalization and Density-aware Exemplar Reselection

Despite the rapid progress of multimodal large language models in building Graphical User Interface (GUI) agents, their real-world task completion is fundamentally bottlenecked by a lack of world know…

AI 点评 · 因果内化与密度采样策略，突破GUI智能体真实任务瓶颈，值得关注。

来源：HuggingFace Papers

论文研究

5/27 04:00

Beyond Recall: Behavioral Specification as an Interpretive Layer for AI Personalization

If an AI agent makes decisions on a person's behalf, those decisions must align with its user. We introduce representational accuracy to measure how faithfully a system captures a person's interpretat…

AI 点评 · 用行为规范量化AI对用户意图的理解精度，为人机对齐提供可操作评估标准。

来源：HuggingFace Papers

论文研究

5/27 04:00

A Matter of TASTE: Improving Coverage and Difficulty of Agent Benchmarks

As agent capabilities advance, existing benchmarks, such as τ^2-Bench, are becoming increasingly saturated. Yet constructing new benchmark tasks remains complex, costly, and labor-intensive. Moreover,…

AI 点评 · 通过自动化生成更难更全的基准任务，突破现有评测瓶颈，为智能体能力评估提供新思路。

来源：HuggingFace Papers

行业动态

5/27 03:50

Millions of AI agents imperiled by critical vulnerability in open source package

"BadHost" was found in Starlette, a package with 325 million weekly downloads.

AI 点评 · 开源框架漏洞威胁数百万AI代理，揭示AI供应链安全重大隐患。

来源：Ars Technica

行业动态

5/27 03:50

Millions of AI agents imperiled by critical vulnerability in open source package

"BadHost" was found in Starlette, a package with 325 million weekly downloads.

AI 点评 · 开源包漏洞威胁数百万AI代理，用户需立即修复防范数据泄露。

来源：Ars Technica

产品发布/更新

5/27 03:16

shyftlabs/continuum

Continuum — the agent runtime by ShyftLabs. Build, orchestrate, ship.

AI 点评 · 专为智能体构建打造的运行时，简化部署与编排流程。

来源：GitHub

产品发布/更新

5/27 03:16

shyftlabs/continuum

Continuum — the agent runtime by ShyftLabs. Build, orchestrate, ship.

AI 点评 · ShyftLabs推出智能体运行时，简化构建到部署全流程，值得开发者关注。

来源：GitHub

技巧与观点

5/27 01:57

Technical deep dive: AgentCore payments and innovation in agentic commerce

Amazon Bedrock AgentCore payments is now available in preview, it provides instant payments to paid external services with no manual billing setup per provider, stablecoin support…

AI 点评 · 亚马逊AgentCore支付预览版降低AI代理接入付费服务的门槛，简化结算流程。

来源：AWS ML

技巧与观点

5/27 01:57

Technical deep dive: AgentCore payments and innovation in agentic commerce

Amazon Bedrock AgentCore payments is now available in preview, it provides instant payments to paid external services with no manual billing setup per provider, stablecoin support…

来源：AWS ML

行业动态

5/27 01:46

FBI agent explains how easy it is to ID people posting AI porn without consent

A creepy saved post on Instagram linked man to AI porn account, FBI says.

AI 点评 · 揭露AI生成色情内容背后，身份追踪技术竟如此简单，隐私保护警钟敲响。

来源：Ars Technica

行业动态

5/27 01:46

FBI agent explains how easy it is to ID people posting AI porn without consent

A creepy saved post on Instagram linked man to AI porn account, FBI says.

AI 点评 · 揭露AI生成色情内容背后，FBI指出识别匿名发布者竟如此简单，凸显隐私与监管挑战。

来源：Ars Technica

技巧与观点

5/27 01:41

Build highly scalable serverless LangGraph multi-agent systems in AWS with Amazon Bedrock AgentCore

In this post, we provide a solution to build highly scalable, serverless multi-agent generative AI systems on AWS using LangGraph Agents as orchestrators integrated with Amazon Bed…

AI 点评 · 用LangGraph编排多智能体，在Bedrock上实现无服务器扩展，大幅降低AI系统部署门槛。

来源：AWS ML

技巧与观点

5/27 01:41

Build highly scalable serverless LangGraph multi-agent systems in AWS with Amazon Bedrock AgentCore

In this post, we provide a solution to build highly scalable, serverless multi-agent generative AI systems on AWS using LangGraph Agents as orchestrators integrated with Amazon Bed…

来源：AWS ML

技巧与观点

5/27 01:39

Build high-performance generative AI systems with Strands Agents, NVIDIA NIM, and Amazon Bedrock AgentCore

In this post you'll learn how to build a multi-agent campaign review system that demonstrates parallel reasoning, context persistence, and traceable execution paths using an integr…

AI 点评 · NVIDIA与亚马逊联手，展示多智能体系统并行推理与可追溯执行，为生成式AI落地提供可借鉴的高性能架

来源：AWS ML

技巧与观点

5/27 01:39

Build high-performance generative AI systems with Strands Agents, NVIDIA NIM, and Amazon Bedrock AgentCore

In this post you'll learn how to build a multi-agent campaign review system that demonstrates parallel reasoning, context persistence, and traceable execution paths using an integr…

来源：AWS ML

技巧与观点

5/27 01:22

AgentWatch: Proactive AWS monitoring with ambient agents

In this post, we demonstrate the capabilities of AgentWatch through practical implementation. You will see how the solution performs infrastructure checks every 15 minutes, summari…

AI 点评 · 用环境智能体实现主动监控，展示了AI运维从被动告警到主动巡检的实用转型。

来源：AWS ML

技巧与观点

5/27 01:22

AgentWatch: Proactive AWS monitoring with ambient agents

In this post, we demonstrate the capabilities of AgentWatch through practical implementation. You will see how the solution performs infrastructure checks every 15 minutes, summari…

来源：AWS ML

技巧与观点

5/26 23:36

Microsoft Copilot Cowork Exfiltrates Files

Microsoft Copilot Cowork Exfiltrates Files The biggest challenge in designing agentic systems continues to be preventing them from enabling attackers to exfiltrate data. In this ca…

AI 点评 · 微软AI助手暴露数据安全漏洞，警示企业需重视智能体系统防护。

来源：Simon Willison

技巧与观点

5/26 23:36

Microsoft Copilot Cowork Exfiltrates Files

Microsoft Copilot Cowork Exfiltrates Files The biggest challenge in designing agentic systems continues to be preventing them from enabling attackers to exfiltrate data. In this ca…

AI 点评 · 揭示AI安全短板：Copilot被利用外泄文件，警示企业需警惕智能助手的数据防护漏洞。

来源：Simon Willison

行业动态

5/26 22:54

Rethinking organizational design in the age of agentic AI

Amid rapidly growing adoption of enterprise-level AI agents, there’s a disconnect emerging between ambition and execution. Although 85% of organizations say they want to be agentic…

AI 点评 · 企业级AI代理快速落地，组织设计必须同步进化，否则战略与执行脱节。

来源：MIT Tech Review

行业动态

5/26 22:54

Rethinking organizational design in the age of agentic AI

Amid rapidly growing adoption of enterprise-level AI agents, there’s a disconnect emerging between ambition and execution. Although 85% of organizations say they want to be agentic…

AI 点评 · 企业级AI代理快速增长，组织架构面临颠覆性变革，平衡雄心与执行是关键看点。

来源：MIT Tech Review

产品发布/更新

5/26 20:45

fancyboi999/ai-engineering-from-scratch-zh

Agent工程师最全学习路径 · 从零精通 AI 工程 · 20 阶段 503 课 · 中文全量翻译 + 配套站点 + 动画讲解视频 · 如何成为 AI Agent 工程师的修成指南

来源：GitHub

产品发布/更新

5/26 15:45

biao994/DocPaws

工程化 RAG 文档助手：知识库、PDF 索引、Agent 工具编排、scope 检索、引用溯源与拒答阈值。FastAPI + Vue3

AI 点评 · 企业级RAG落地范本，从检索拒答到工具编排的完整工程化实践。

来源：GitHub

产品发布/更新

5/26 04:26

Tejas-TA/predikit

The missing bridge between your ML models and your AI agents.

来源：GitHub

论文研究

5/26 04:00

Got a Secret? LLM Agents Can't Keep It: Evaluating Privacy in Multi-Agent Systems

LLM safety evaluations predominantly test models in isolation, yet deployed AI agents increasingly operate within persistent social environments alongside other agents. We introduce a Moltbook-style s…

AI 点评 · 多智能体协作时，LLM隐私保护能力堪忧，揭示AI安全评估新盲区。

来源：HuggingFace Papers

论文研究

5/26 04:00

PANDO: Efficient Multimodal AI Agents via Online Skill Distillation

Recent advances in multimodal web agents often rely on increased inference-time computation, including rollout search, verifier passes, offline skill discovery, and specialist model stacks. This raise…

AI 点评 · 通过在线技能蒸馏，大幅提升多模态AI代理效率，减少推理计算成本，极具实用创新价值。

来源：HuggingFace Papers

论文研究

5/26 04:00

PANDO: Efficient Multimodal AI Agents via Online Skill Distillation

来源：HuggingFace Papers

论文研究

5/26 04:00

Benchmarks are Not Enough: RAMP for Runtime Assessing of Agentic Models in Production Systems

LLM agents are rapidly evolving from coding assistants into autonomous software engineering systems. However, existing evaluation methodologies remain largely centered on static, isolated, and short-h…

来源：HuggingFace Papers

论文研究

5/26 04:00

SIA: Self Improving AI with Harness & Weight Updates

Humans are the bottleneck in building and improving AI. Both the models and the agents that wrap them are written, tuned, and corrected by people. The long-horizon goal of an AI that can figure out ho…

来源：HuggingFace Papers

产品发布/更新

5/25 19:06

UditAkhourii/adhd

ADHD — a skill for coding agents. Tree-of-thought with pruning, built on the Claude & Codex Agent SDK. Fans out parallel divergent thoughts under different cogn…

AI 点评 · 用树状思维结合剪枝策略，让AI编码代理更接近人类认知模式，提升复杂任务处理效率。

来源：GitHub

产品发布/更新

5/25 19:06

UditAkhourii/adhd

ADHD — a skill for coding agents. Tree-of-thought with pruning, built on the Claude & Codex Agent SDK. Fans out parallel divergent thoughts under different cogn…

AI 点评 · 用树状思维加剪枝策略，让编码代理模拟多动症思考，提升复杂问题解决效率。

来源：GitHub

产品发布/更新

5/25 08:05

oleksiijko/pmb

Local-first persistent memory for AI coding agents (Claude Code, Cursor, Codex) via MCP. 94.5% LoCoMo recall@10, 70ms p50, multilingual, zero API keys.

AI 点评 · 为AI编程助手提供本地持久记忆，高召回低延迟，无需API密钥即可实现多语言支持。

来源：GitHub

产品发布/更新

5/25 08:05

oleksiijko/pmb

Local-first persistent memory for AI coding agents (Claude Code, Cursor, Codex) via MCP. 94.5% LoCoMo recall@10, 70ms p50, multilingual, zero API keys.

AI 点评 · 本地优先持久记忆方案，大幅提升AI编码代理效率，无需API密钥，性能指标出色。

来源：GitHub

技巧与观点

5/25 08:00

Harness, Scaffold, and the AI Agent Terms Worth Getting Right

AI 点评 · AI代理术语体系关键，厘清概念才能把握技术演进核心。

来源：HuggingFace Blog

技巧与观点

5/25 08:00

Harness, Scaffold, and the AI Agent Terms Worth Getting Right

来源：HuggingFace Blog

技巧与观点

5/25 07:19

datasette-agent 0.1a4

Release: datasette-agent 0.1a4 Taking advantage of the new makeJumpSections() JavaScript plugin hook added in Datasette 1.0a30 , datasette-agent now presents this "Start a new agen…

AI 点评 · 轻量级AI工具迭代快，新版本利用Datasette新插件钩子，提升Agent启动体验。

来源：Simon Willison

论文研究

5/25 04:00

From Model Scaling to System Scaling: Scaling the Harness in Agentic AI

This paper studies the next major bottleneck in agentic AI as system scaling, not only model scaling: the design of auditable, persistent, modular, and verifiable architectures around foundation model…

来源：HuggingFace Papers

行业动态

5/24 20:55

Constraint Decay: The Fragility of LLM Agents in Back End Code Generation

AI 点评 · 大模型后端代码生成能力脆弱，揭示智能体在复杂约束下的稳定性短板。

来源：Hacker News

产品发布/更新

5/24 00:16

ongridio/ongrid

An ops AI Agent that understands your infrastructure, finds the root cause, and fixes it — right from Slack, Telegram, Lark or DingTalk.

来源：GitHub

产品发布/更新

5/23 14:52

openhackai/OpenHack

Open Source Agentic Security Scanner

来源：GitHub

产品发布/更新

5/23 10:44

study8677/awesome-architecture

🧭 Architecture-first system design: 26 bilingual tutorials, 25 architecture templates, and 6 end-to-end cases covering distributed systems, AI-native systems,…

来源：GitHub

产品发布/更新

5/22 16:31

leestott/foundry-cicd

Enterprise-ready CI/CD reference for Microsoft Foundry AI agents, with parallel GitHub Actions and Azure DevOps pipelines, evaluation-driven quality gates, and…

AI 点评 · 企业级AI代理CI/CD参考实现，提升部署效率与质量管控。

来源：GitHub

产品发布/更新

5/22 16:31

leestott/foundry-cicd

Enterprise-ready CI/CD reference for Microsoft Foundry AI agents, with parallel GitHub Actions and Azure DevOps pipelines, evaluation-driven quality gates, and…

AI 点评 · 企业级AI代理的CI/CD参考方案，实现并行流水线与质量门控，提升部署效率与可靠性。

来源：GitHub

模型发布/更新

5/22 08:00

OpenAI named a Leader in enterprise coding agents by Gartner

OpenAI is named a leader in the 2026 Gartner Magic Quadrant for Enterprise AI Coding Agents, with Codex recognized for innovation and enterprise-scale deployment.

AI 点评 · Gartner权威认证，OpenAI在AI编程代理领域的技术领先性获行业标杆认可。

来源：OpenAI

模型发布/更新

5/22 08:00

OpenAI named a Leader in enterprise coding agents by Gartner

OpenAI is named a leader in the 2026 Gartner Magic Quadrant for Enterprise AI Coding Agents, with Codex recognized for innovation and enterprise-scale deployment.

AI 点评 · Gartner权威认证，OpenAI编码智能体在创新与规模化部署上领先行业。

来源：OpenAI

技巧与观点

5/22 06:22

Amazon Nova Act is now HIPAA eligible

In this post, you will learn what Nova Act offers, how HIPAA eligibility applies to agentic AI, and how to get started.

AI 点评 · 亚马逊Nova Act获HIPAA认证，医疗AI代理合规门槛突破，商业化落地提速。

来源：AWS ML

论文研究

5/22 04:00

When Does Multi-Agent RL Improve LLM Workflows? Workflow, Scale, and Policy-Sharing Tradeoffs

Multi-agent LLM workflows route inference through specialized roles to lift end-task accuracy, but jointly training those roles with reinforcement learning is unstable in ways that are poorly understo…

AI 点评 · 多智能体强化学习提升大模型协作效率的关键在于理解分工规模与策略共享的权衡。

来源：HuggingFace Papers

论文研究

5/22 01:00

MagenticLite, MagenticBrain, Fara1.5: An agentic experience optimized for small models

MagenticLite is an agentic system for small models that works across the browser and local file system in a single workflow. It combines specialized models and orchestration to sup…

AI 点评 · 轻量级智能体系统专为小模型优化，降低门槛，让更多开发者体验自动化流程。

来源：Microsoft Research

论文研究

5/22 01:00

MagenticLite, MagenticBrain, Fara1.5: An agentic experience optimized for small models

MagenticLite is an agentic system for small models that works across the browser and local file system in a single workflow. It combines specialized models and orchestration to sup…

AI 点评 · 轻量级模型也能实现智能代理交互，打破大模型独占优势，降低应用门槛。

来源：Microsoft Research

模型发布/更新

5/22 00:07

Launch HN: Runtime (YC P26) – Sandboxed coding agents for everyone on a team

Hey HN, We're Gus and Carlos from Runtime ( https://runtm.com ). We're building infra that lets your whole team (including non-engineers) ship with Claude Code, Codex, and other ag…

AI 点评 · 让非工程师也能安全使用AI编码代理，大幅降低团队协作门槛。

来源：Hacker News

模型发布/更新

5/22 00:07

Launch HN: Runtime (YC P26) – Sandboxed coding agents for everyone on a team

Hey HN, We're Gus and Carlos from Runtime ( https://runtm.com ). We're building infra that lets your whole team (including non-engineers) ship with Claude Code, Codex, and other ag…

来源：Hacker News

产品发布/更新

5/21 21:58

mims-harvard/AutoScientists

AutoScientists: Self-Organizing Agent Teams for Long-Running Scientific Experimentation

AI 点评 · 自动化科研团队实现长期实验，AI自主协作迈入新阶段，科学发现效率有望大幅提升。

来源：GitHub

产品发布/更新

5/21 21:58

mims-harvard/AutoScientists

AutoScientists: Self-Organizing Agent Teams for Long-Running Scientific Experimentation

AI 点评 · 自组织AI团队实现长期科学实验，推动自动化研究范式突破。

来源：GitHub

产品发布/更新

5/21 19:14

wangchuxiaoji-oss/doubao2api

Reverse-engineered Doubao (豆包) API → OpenAI-compatible REST service. Free multimodal chat, image/video/music generation, and file hosting for AI agents.

AI 点评 · 逆向工程豆包API，提供免费多模态服务，极大降低AI应用开发门槛。

来源：GitHub

产品发布/更新

5/21 19:14

wangchuxiaoji-oss/doubao2api

Reverse-engineered Doubao (豆包) API → OpenAI-compatible REST service. Free multimodal chat, image/video/music generation, and file hosting for AI agents.

AI 点评 · 逆向工程将豆包API转为OpenAI兼容接口，免费提供多模态功能，大幅降低AI开发门槛。

来源：GitHub

产品发布/更新

5/21 11:58

Eynzof/hermes-agent-cn-desktop

Hermes Agent CN desktop app, Windows-First, built with Tauri, Typescript and Rust. Isolated Hermes Agent core insides.

AI 点评 · 用Tauri和Rust构建的Windows桌面应用，实现核心隔离，技术选型值得开发者关注。

来源：GitHub

产品发布/更新

5/21 11:58

Eynzof/Hermes-CN-Desktop

Hermes Agent CN desktop app, Windows-First, built with Tauri, Typescript and Rust. Isolated Hermes Agent core insides.

来源：GitHub

产品发布/更新

5/21 03:14

NanoFlow-io/engram

🧠 Hybrid long-term memory plugin for OpenClaw agents — SQLite+FTS5 for structured facts, LanceDB for semantic recall

AI 点评 · 将结构化与语义记忆结合，为智能体提供更精准、持久的混合记忆解决方案。

来源：GitHub

产品发布/更新

5/21 03:14

NanoFlow-io/engram

🧠 Hybrid long-term memory plugin for OpenClaw agents — SQLite+FTS5 for structured facts, LanceDB for semantic recall

AI 点评 · 结合SQLite与向量数据库，为AI代理提供结构化事实与语义回忆的双重记忆支持。

来源：GitHub

产品发布/更新

5/21 02:52

zhongweiv/hermes-edu-skills

中文教育 Agent Skill Pack：教材同步、备考复习、拍照答疑、错题复盘、亲子陪学、阅读写作和教师工具，Hermes Agent 可直接使用，也可导出到 OpenClaw/Codex/Cursor/Claude Code。

AI 点评 · 开源中文教育Agent工具包，填补垂直领域空白，可直接对接多个主流AI平台，实用性强。

来源：GitHub

产品发布/更新

5/21 02:52

zhongweiv/hermes-edu-skills

来源：GitHub

产品发布/更新

5/21 02:29

qinshihu/itops-agent-platform

国内首个企业级 IT 运维多 Agent 自动化平台 — 基于大语言模型的智能运维解决方案。ITOps Agent Platform 是一个企业级全栈运维自动化平台，通过可视化工作流编排，将多个AI Agent组合成智能运维自动化流水线，实现服务器管理、告警处理、故障诊断、日志分析、脚本管理、定时运维任务的自动化执行，…

AI 点评 · 国内首个企业级IT运维多Agent平台，实现AI驱动的自动化运维流水线，提升故障处理效率。

来源：GitHub

行业动态

5/21 01:31

Buckle up: Google is set to remake search with agentic AI in 2026

Google's AI search evolution is accelerating at I/O 2026.

AI 点评 · 谷歌计划2026年用代理式AI重塑搜索，标志着AI从工具进化为主动服务者。

来源：Ars Technica

行业动态

5/21 01:31

Buckle up: Google is set to remake search with agentic AI in 2026

Google's AI search evolution is accelerating at I/O 2026.

来源：Ars Technica

产品发布/更新

5/20 11:24

VibeBench/VibeSearchBench

🔍 The hardest search benchmark in the wild — vague, multi-turn, proactive. 200 long-horizon tasks with persona-driven progressive disclosure, scored by verifia…

AI 点评 · 首个模糊多轮搜索基准，考验AI主动追问能力，填补了复杂意图检索评估的空白。

来源：GitHub

行业动态

5/20 02:11

Gemini 3.5 Flash might be fast enough for gen AI to make sense

Google says its more efficient Gemini 3.5 Flash is the key to your agentic AI future.

AI 点评 · 低延迟推理突破使生成式AI实时应用成为可能，加速智能体落地。

来源：Ars Technica

行业动态

5/20 02:11

Gemini 3.5 Flash might be fast enough for gen AI to make sense

Google says its more efficient Gemini 3.5 Flash is the key to your agentic AI future.

来源：Ars Technica

模型发布/更新

5/20 01:45

I/O 2026: Welcome to the agentic Gemini era

The latest from Google I/O: See how we’re helping you get more done with Gemini.

AI 点评 · 谷歌I/O展示Gemini代理能力，标志AI从工具向自主行动者演进的关键转折。

来源：Google AI

模型发布/更新

5/20 01:45

I/O 2026: Welcome to the agentic Gemini era

The latest from Google I/O: See how we’re helping you get more done with Gemini.

AI 点评 · 谷歌发布Agentic Gemini，标志AI从工具向自主行动者进化，定义人机协作新范式。

来源：Google AI

产品发布/更新

5/19 22:01

elvisun/newsjack

The open-source skills that turn your agent into a full PR team.

来源：GitHub

行业动态

5/19 20:23

Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks

Hi HN, I'm Antoine Zambelli, AI Director at Texas Instruments. I built Forge, an open-source reliability layer for self-hosted LLM tool-calling. What it does: - Adds domain-and-too…

AI 点评 · 开源方案将8B模型代理任务准确率从53%提升至99%，展示了轻量级防护机制的高效性。

来源：Hacker News

行业动态

5/19 20:23

Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks

Hi HN, I'm Antoine Zambelli, AI Director at Texas Instruments. I built Forge, an open-source reliability layer for self-hosted LLM tool-calling. What it does: - Adds domain-and-too…

AI 点评 · 开源工具让8B模型在智能体任务中准确率从53%飙升至99%，大幅降低企业部署门槛。

来源：Hacker News

行业动态

5/19 20:23

Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks

Hi HN, I'm Antoine Zambelli, AI Director at Texas Instruments. I built Forge, an open-source reliability layer for self-hosted LLM tool-calling. What it does: - Adds domain-and-too…

来源：Hacker News

产品发布/更新

5/19 19:59

ather-techie/rag-interview-questions

A comprehensive interview preparation guide covering all major RAG (Retrieval-Augmented Generation) architectures. 50 questions across 10 types, from Naive RAG…

AI 点评 · 面试RAG架构必读，50题覆盖10种类型，系统掌握检索增强生成核心技术。

来源：GitHub

产品发布/更新

5/19 19:59

ather-techie/rag-interview-questions

A comprehensive interview preparation guide covering all major RAG (Retrieval-Augmented Generation) architectures. 140 questions across 12 types, from Naive RAG…

来源：GitHub

产品发布/更新

5/19 19:59

ather-techie/rag-interview-system

A complete collection of RAG interview questions, answers (286 questions & 18 RAG types), system design scenarios, architecture patterns, and production-ready c…

来源：GitHub

产品发布/更新

5/19 17:44

langfuse/langfuse-workshop

End-to-end Langfuse workshop using a TypeScript Agent to teach the AI engineering loop: tracing, prompt management, monitoring, datasets, experiments, and evalu…

来源：GitHub

产品发布/更新

5/19 07:04

JSingletonAI/dejavu

Memory that follows you across every AI tool. No cloud storage. No account required. Set it up once, use it everywhere.

AI 点评 · 打破AI工具记忆孤岛，无需云存储和账户，一次设置即可跨平台复用记忆。

来源：GitHub

产品发布/更新

5/19 07:04

JSingletonAI/dejavu

Memory that follows you across every AI tool. No cloud storage. No account required. Set it up once, use it everywhere.

AI 点评 · 打破工具壁垒的本地记忆系统，让AI实现跨平台无缝复用。

来源：GitHub

行业动态

5/19 06:28

NVIDIA CEO Jensen Huang at Dell Technologies World: ‘Demand Is Going Parabolic, Utterly Parabolic’

Agentic AI inference at one-tenth the cost per token with NVIDIA Vera Rubin NVL72. Agent sandboxes run 50% faster on NVIDIA Vera than traditional CPUs — while enterprise data queri…

AI 点评 · 黄仁勋亲证AI需求呈抛物线式暴增，NVIDIA新架构成本骤降十倍，产业风向标意义重大。

来源：NVIDIA

模型发布/更新

5/19 05:48

Vera Arrives: NVIDIA’s First CPU Built for Agents Lands at Top AI Labs

The first NVIDIA Vera CPUs arrived at three of the world's leading AI labs on Friday — Anthropic in San Francisco, OpenAI in Mission Bay, SpaceXAI in Palo Alto — followed by a deli…

AI 点评 · 英伟达首款CPU专为AI代理设计，直供顶级实验室，或改写智能算力格局。

来源：NVIDIA

行业动态

5/18 23:40

Show HN: InsForge – Open-source Heroku for coding agents

Hi HN, I'm Hang, cofounder of InsForge (YC P26). InsForge is an open-source Heroku for AI coding agents: a backend platform designed for coding agents to deploy, operate, and debug…

AI 点评 · 开源AI部署平台填补市场空白，让编码代理拥有类似Heroku的自动化运维能力，降低开发门槛。

来源：Hacker News

行业动态

5/18 23:40

Show HN: InsForge – Open-source Heroku for coding agents

Hi HN, I'm Hang, cofounder of InsForge (YC P26). InsForge is an open-source Heroku for AI coding agents: a backend platform designed for coding agents to deploy, operate, and debug…

AI 点评 · 开源首个面向AI编码代理的Heroku式平台，填补了代理部署与调试的空白，值得开发者关注。

来源：Hacker News

行业动态

5/18 23:40

Show HN: InsForge – Open-source Heroku for coding agents

Hi HN, I'm Hang, cofounder of InsForge (YC P26). InsForge is an open-source Heroku for AI coding agents: a backend platform designed for coding agents to deploy, operate, and debug…

来源：Hacker News

模型发布/更新

5/18 18:00

OpenAI and Dell partner to bring Codex to hybrid and on-premise enterprise environments

OpenAI and Dell partner to bring Codex to hybrid and on-premise environments, helping enterprises deploy AI coding agents securely across data and workflows.

AI 点评 · OpenAI与戴尔联手，让企业本地部署AI编程助手，兼顾数据安全与效率提升。

来源：OpenAI

行业动态

5/17 23:37

Show HN: Semble – Code search for agents that uses 98% fewer tokens than grep

Hey HN! We (Stephan and Thomas) recently open-sourced Semble. We kept running into the same problem while using Claude Code on large codebases: when the agent can't find something…

来源：Hacker News

行业动态

5/17 23:37

Show HN: Semble – Code search for agents that uses 98% fewer tokens than grep

Hey HN! We (Stephan and Thomas) recently open-sourced Semble. We kept running into the same problem while using Claude Code on large codebases: when the agent can't find something…

来源：Hacker News

产品发布/更新

5/17 22:18

Second-Inc/second

The factory for custom internal software, purpose-built for human2agent work.

来源：GitHub

产品发布/更新

5/17 12:33

openthomas-com/openthomas

Cut the cost of your agent fleet without switching agents. Makes Claude Code Dynamic Workflows cheap: the planner stays on Opus, the hundreds of parallel subage…

来源：GitHub

产品发布/更新

5/17 01:44

sam-siavoshian/agent-notch

macOS computer-use agent in the notch. Long-press, talk, Claude drives the mouse.

AI 点评 · 将AI代理嵌入Mac刘海区域，长按语音操控，让Claude直接控制鼠标，交互方式极具创新性。

来源：GitHub

产品发布/更新

5/17 01:44

sam-siavoshian/agent-notch

macOS computer-use agent in the notch. Long-press, talk, Claude drives the mouse.

AI 点评 · 把AI代理嵌入Mac刘海区域，长按语音操控鼠标，交互方式创新且实用。

来源：GitHub

产品发布/更新

5/16 05:32

DenisSergeevitch/agents-best-practices

Provider-neutral Agent Skill for Codex, Claude Code, and agentic harness design.

AI 点评 · 统一多平台智能体技能标准，降低开发门槛，推动AI代理工具生态互通。

来源：GitHub

产品发布/更新

5/16 05:32

DenisSergeevitch/agents-best-practices

Provider-neutral Agent Skill for Codex, Claude Code, and agentic harness design.

AI 点评 · 通用Agent技能框架，适用于多种主流AI编码工具，提升开发效率与互操作性。

来源：GitHub

产品发布/更新

5/15 17:19

husu/loom

一个写接口文档的AI Agent。支持使用Vibe coding 的方式，编写接口文档，同时自带友好的文档查看工具与接口Mock工具

来源：GitHub

产品发布/更新

5/15 16:50

fangwendongcs/Auto-agent-factory

A production-ready toolkit to accelerate and automate the end-to-end lifecycle of AI Agent development.

AI 点评 · 一站式AI Agent开发工具，降低企业自动化部署门槛，加速行业落地。

来源：GitHub

产品发布/更新

5/15 16:50

fangwendongcs/Auto-agent-factory

A production-ready toolkit to accelerate and automate the end-to-end lifecycle of AI Agent development.

AI 点评 · 助力企业快速部署AI代理，填补了开发到生产的工具链空白。

来源：GitHub

产品发布/更新

5/15 15:08

agentic-in/elephant-agent

Personal-Model First Self Evolving AI Agent 🐘

AI 点评 · 个人模型驱动的自进化AI代理，开创了代理自主迭代的新范式。

来源：GitHub

产品发布/更新

5/15 15:08

agentic-in/elephant-agent

Personal-Model First Self Evolving AI Agent 🐘

来源：GitHub

产品发布/更新

5/15 14:59

CONSTELLATION-ENGINE/constellation-engine

Most AI agents forget you the moment the tab closes. Constellation Engine gives them a hippocampus — a living star map with spreading activation, Hebbian writeb…

来源：GitHub

产品发布/更新

5/15 13:00

intellicia-public/parastore

Draw a store, generate LLM personas, and watch them shop — an isometric 3D sandbox for synthetic-consumer experiments.

来源：GitHub

产品发布/更新

5/14 19:09

Purewhiter/mobilegym

MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research · 浏览器里运行的安卓模拟器 · Browser-hosted Android Simulator · Verifiable Eva…

AI 点评 · 为移动端GUI智能体研究提供可验证的高效并行模拟环境，浏览器运行降低门槛。

来源：GitHub

产品发布/更新

5/14 19:09

Purewhiter/mobilegym

MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research · 浏览器里运行的安卓模拟器 · Browser-hosted Android Simulator · Verifiable Eva…

AI 点评 · 移动端GUI智能体研究提速，浏览器运行安卓模拟器实现可验证并行测试。

来源：GitHub

模型发布/更新

5/13 21:00

NVIDIA, Ineffable Intelligence Team Up to Build the Future of Reinforcement Learning Infrastructure

Reinforcement-learning agents — AI systems that learn by trial and error — can convert computation into new knowledge. That’s the focus of a new engineering-level collaboration bet…

AI 点评 · 两大AI巨头联手，强化学习基础设施将迎来工程级突破，加速智能体从试错中学习的能力。

来源：NVIDIA

模型发布/更新

5/13 21:00

Hermes Unlocks Self-Improving AI Agents, Powered by NVIDIA RTX PCs and DGX Spark

Agentic AI is changing the way users get work done. Following the success of OpenClaw, the community is embracing new open source agentic frameworks. The latest is Hermes Agent, wh…

来源：NVIDIA

产品发布/更新

5/13 03:31

gi-dellav/zerostack

Minimalistic coding agent written in Rust, optimized for memory footprint and performance

AI 点评 · 零开销AI代理框架，Rust实现兼顾极致性能与低内存消耗，开发者效率新标杆。

来源：GitHub

产品发布/更新

5/13 03:31

gi-dellav/zerostack

Minimal coding agent written in Rust, optimized for memory footprint and performance

AI 点评 · 用Rust打造极简编码代理，专注内存优化与性能，为轻量化AI工具开辟新路径。

来源：GitHub

行业动态

5/12 23:45

Launch HN: Voker (YC S24) – Analytics for AI Agents

Hey HN, we're Alex and Tyler, co-founders of Voker.ai ( https://voker.ai/ ), an agent analytics platform for AI product teams. Voker gives full visibility into what users are askin…

AI 点评 · 聚焦AI Agent用户行为分析，填补了AI产品团队数据洞察的空白。

来源：Hacker News

模型发布/更新

5/12 22:40

Co-Scientist: A multi-agent AI partner to accelerate research

Introducing Co-Scientist, a collaborative AI partner built with Gemini to help researchers accelerate scientific breakthroughs.

AI 点评 · 多智能体协作突破科研瓶颈，AI从工具升级为研究伙伴。

来源：Google DeepMind

模型发布/更新

5/12 22:40

Co-Scientist: A multi-agent AI partner to accelerate research

Introducing Co-Scientist, a collaborative AI partner built with Gemini to help researchers accelerate scientific breakthroughs.

AI 点评 · 多智能体协作模式，有望大幅缩短科研周期，是AI赋能基础科学的里程碑。

来源：Google DeepMind

产品发布/更新

5/12 21:07

johunsang/semble_rs

Fast, AI-agent-native code search in Rust — hybrid BM25 + semantic, Tree-sitter AST chunking, dependency & impact analysis. Drop-in replacement for grep/cat/rea…

AI 点评 · Rust实现的AI原生代码搜索工具，结合BM25与语义搜索，性能远超传统grep。

来源：GitHub

产品发布/更新

5/12 21:07

johunsang/semble_rs

Fast, AI-agent-native code search in Rust — hybrid BM25 + semantic, Tree-sitter AST chunking, dependency & impact analysis. Drop-in replacement for grep/cat/rea…

AI 点评 · 用Rust实现的高性能AI原生代码搜索，结合混合检索与依赖分析，有望替代传统工具。

来源：GitHub

产品发布/更新

5/12 12:14

AzmxAI/azmx

AZMX AI — The sovereign agent platform.

AI 点评 · 主权级AI代理平台，开创去中心化智能体新范式。

来源：GitHub

产品发布/更新

5/12 12:14

AzmxAI/azmx

AZMX AI — The sovereign agent platform.

来源：GitHub

产品发布/更新

5/12 06:55

secureagentics/Adrian

Runtime security monitoring and control for AI agents. Catches malicious tool use, prompt injection, and policy drift in real time, before the agent acts.

来源：GitHub

产品发布/更新

5/12 06:55

secureagentics/Adrian

Runtime security monitoring and control for AI agents. Catches malicious tool use, prompt injection, and policy drift in real time, before the agent acts.

来源：GitHub

产品发布/更新

5/12 05:18

sparkplug604/praxis

Local-first RAG and agent skills framework for source-traceable agent memory.

AI 点评 · 用本地优先的RAG框架实现代理记忆可溯源，为可信AI应用开发提供新路径。

来源：GitHub

产品发布/更新

5/12 05:18

sparkplug604/praxis

Local-first RAG and agent skills framework for source-traceable agent memory.

AI 点评 · 本地优先架构让RAG技能框架实现源头可追溯，为AI代理记忆管理提供新范式。

来源：GitHub

论文研究

5/12 01:19

SocialReasoning-Bench: Measuring whether AI agents act in users’ best interests

Using SocialReasoning Bench, we observed a stable pattern across models—agents execute competently, but fail to consistently improve the user’s position, even with explicit instruc…

AI 点评 · AI代理执行任务高效却忽视用户利益，暴露其社会推理能力的核心短板。

来源：Microsoft Research

论文研究

5/12 01:19

SocialReasoning-Bench: Measuring whether AI agents act in users’ best interests

Using SocialReasoning Bench, we observed a stable pattern across models—agents execute competently, but fail to consistently improve the user’s position, even with explicit instruc…

AI 点评 · 衡量AI能否真正维护用户利益，揭示出能力与意图间的关键差距。

来源：Microsoft Research

产品发布/更新

5/11 19:19

juanjuandog/FinSight-AI

AI equity research agent with resilient workflows, Redis Lua single-flight, pgvector RAG, versioned reports, evidence tracing, and RAG evaluation.

AI 点评 · 融合弹性工作流与向量检索，实现金融研究全流程可追溯，技术架构值得借鉴。

来源：GitHub

产品发布/更新

5/11 19:19

juanjuandog/FinSight-AI

AI equity research agent with resilient workflows, Redis Lua single-flight, pgvector RAG, versioned reports, evidence tracing, and RAG evaluation.

AI 点评 · 高效AI投研工具，结合弹性工作流与证据溯源，提升研报可信度。

来源：GitHub

产品发布/更新

5/11 17:40

nexu-io/html-anything

✨ The agentic HTML editor — your local AI agent writes the HTML, you ship it. 🚀 75 Skills × 9 Surfaces (magazine · deck · poster · XHS / tweet · prototype · da…

AI 点评 · AI本地生成HTML，覆盖75种技能9种场景，将编辑效率推向新高度。

来源：GitHub

产品发布/更新

5/11 17:40

nexu-io/html-anything

✨ The agentic HTML editor — your local AI agent writes the HTML, you ship it. 🚀 75 Skills × 9 Surfaces (magazine · deck · poster · XHS / tweet · prototype · da…

AI 点评 · 本地AI代理直接生成可交付的HTML，覆盖多种设计场景，大幅降低前端开发门槛。

来源：GitHub

产品发布/更新

5/11 03:42

Kaelio/ktx-ai-data-agents-mcp-context-skills

ktx is an executable context layer for data and analytics agents 🐙 Allow Claude Code, Codex, and any AI agent to query data accurately through MCP with skills,…

来源：GitHub

产品发布/更新

5/11 03:42

Kaelio/ktx-ai-data-agents-context

ktx is an executable context layer for data and analytics agents 🐙 Allow Claude Code, Codex, and any AI agent to query data accurately through MCP with skills,…

AI 点评 · 用章鱼触手般的MCP连接，让AI代理精准查询数据，降低分析门槛。

来源：GitHub

产品发布/更新

5/11 03:42

Kaelio/ktx-ai-data-agents-context

ktx is an executable context layer for data and analytics agents 🐙 Allow Claude Code, Codex, and any AI agent to query data accurately through MCP with skills,…

来源：GitHub

产品发布/更新

5/11 03:42

Kaelio/ktx

ktx is an executable context layer for data and analytics agents 🐙 Allow Claude Code, Codex, or any other AI agent to query data accurately and with full conte…

AI 点评 · 用MCP技能层让AI代理精准查询数据，打通代码与分析的执行壁垒。

来源：GitHub

产品发布/更新

5/10 20:29

namphuongtran/awesome-ai-coding-agent-tools

A curated list of tools, libraries, MCP servers, and frameworks that power AI coding agents.

AI 点评 · 盘点AI编程代理全生态工具链，开发者和研究者必备的实用资源导航。

来源：GitHub

产品发布/更新

5/10 20:29

namphuongtran/awesome-ai-coding-agent-tools

A curated list of tools, libraries, MCP servers, and frameworks that power AI coding agents.

AI 点评 · 资源聚合清单，帮你快速找到提升AI编程效率的利器。

来源：GitHub

产品发布/更新

5/10 09:47

LichAmnesia/openseek

OpenSeek - 广度求索: open-source TUI coding agent with multi-provider routing, MCP, LSP, and Plan/Agent/YOLO modes.

AI 点评 · 开源TUI编程智能体，集成多模型路由与MCP协议，创新工作模式值得关注。

来源：GitHub

产品发布/更新

5/10 09:47

LichAmnesia/openseek

OpenSeek - 广度求索: open-source TUI coding agent with multi-provider routing, MCP, LSP, and Plan/Agent/YOLO modes.

来源：GitHub

产品发布/更新

5/10 02:10

beltromatti/get-it

Read it. See it. Get it. Built at GDG AI Hack Milan 2026 for "Learn Different" track.

来源：GitHub

产品发布/更新

5/10 01:49

byte5ai/omadia

Self-hostable agentic OS — build, run & audit multi-agent AI teams from signed plugins. Bring your own LLM key, own all your data, EU/GDPR-ready.

来源：GitHub

产品发布/更新

5/9 18:22

recomby-ai/recomby-geo

GEO 领域 AI 员工开源方案 · Open-source GEO AI-employee solution (MIT). GEO Skills package + curated lists of agents and office CLIs that make up the AI-employee stack.

AI 点评 · 开源GEO领域AI员工方案，提供完整技能包与工具链，降低企业部署门槛。

来源：GitHub

产品发布/更新

5/9 18:22

recomby-ai/recomby-geo

GEO 领域 AI 员工开源方案 · Open-source GEO AI-employee solution (MIT). GEO Skills package + curated lists of agents and office CLIs that make up the AI-employee stack.

来源：GitHub

产品发布/更新

5/9 14:16

tophant-ai/promptbeat

Break your AI before they do.

AI 点评 · 用对抗攻击测试AI的脆弱性，安全评估工具成刚需。

来源：GitHub

产品发布/更新

5/9 14:16

tophant-ai/promptbeat

Break your AI before they do.

AI 点评 · 红队测试工具，主动发现AI模型安全漏洞，强化防御。

来源：GitHub

产品发布/更新

5/9 12:46

ngaut/agent-git-service

Reimplement GitHub for Agents.

AI 点评 · 用Go重构Git服务，为AI代理提供专属代码协作基础设施。

来源：GitHub

产品发布/更新

5/9 12:46

ngaut/agent-git-service

Reimplement GitHub for Agents.

AI 点评 · 用Rust重写GitHub服务，专为AI代理设计，或开启自动化协作新范式。

来源：GitHub

产品发布/更新

5/9 12:39

sno-ai/llmix

Production LLM call layer for AI agents and tools: keep OpenAI/Anthropic/AI SDK/LiteLLM, hot-swap models with MDA presets, and add cache, retries, circuit break…

AI 点评 · 为AI代理打造统一调用层，支持模型热切换与故障容错，显著提升生产环境稳定性。

来源：GitHub

产品发布/更新

5/9 12:39

sno-ai/llmix

Production LLM call layer for AI agents and tools: keep OpenAI/Anthropic/AI SDK/LiteLLM, hot-swap models with MDA presets, and add cache, retries, circuit break…

AI 点评 · 统一多模型调用层，提升AI代理的稳定性和灵活性，降低开发成本。

来源：GitHub

产品发布/更新

5/9 02:41

finewood2008/centaur-loop

半人马环 Centaur Loop：面向 AI Agent 反馈闭环、人类治理和记忆复盘的开源工作台 / Human-governed AI feedback loop workbench.

AI 点评 · 开源AI治理工作台，首次将人类反馈闭环与记忆复盘功能整合，填补了智能体长期可控交互的缺口。

来源：GitHub

产品发布/更新

5/9 02:41

finewood2008/centaur-loop

半人马环 Centaur Loop：面向 AI Agent 反馈闭环、人类治理和记忆复盘的开源工作台 / Human-governed AI feedback loop workbench.

AI 点评 · 开源AI Agent工作台，打通人类治理与反馈闭环，助力记忆复盘，实用价值高。

来源：GitHub

产品发布/更新

5/9 02:41

finewood2008/centaurloop

半人马环 Centaur Loop：AI 员工的最小工作单元框架。把复杂岗位拆解为可由 AI 接管、由人类治理、由反馈和记忆持续进化的循环工作流 / The smallest work unit for building AI employees.

AI 点评 · 开源AI治理工具，填补了Agent闭环管理的空白，兼顾人类监督与记忆复盘，实用性很强。

来源：GitHub

产品发布/更新

5/8 18:03

stormzhang/token-tracker

Track token usage across local AI agents (Claude Code, Codex) — Custom StatusLine, CLI Dashboard with cost analysis, rate limit monitoring, and session tracking

AI 点评 · 一个轻量级工具，解决本地AI代理的token消耗追踪痛点，兼顾成本与速率监控。

来源：GitHub

产品发布/更新

5/8 18:03

stormzhang/token-tracker

Track token usage across local AI agents (Claude Code, Codex) — Custom StatusLine, CLI Dashboard with cost analysis, rate limit monitoring, and session tracking

来源：GitHub

产品发布/更新

5/8 15:37

haydenbleasel/files-sdk

A unified storage SDK for object and blob backends. One small, honest API. Web-standards I/O.

AI 点评 · 统一对象存储接口，简化多后端切换，提升开发效率，是云原生的实用工具。

来源：GitHub

产品发布/更新

5/8 15:37

haydenbleasel/files-sdk

A unified storage SDK for object and blob backends. One small, honest API. Web-standards I/O.

AI 点评 · 统一存储SDK实现对象与二进制后端兼容，简化开发流程，值得关注。

来源：GitHub

产品发布/更新

5/8 14:57

volcengine/SearchCLI

Open CLI for integrating AI search, recommendation, and conversational retrieval into agent systems and business systems

AI 点评 · 将AI搜索、推荐与对话检索整合进系统，极大简化了开发流程。

来源：GitHub

产品发布/更新

5/8 14:57

volcengine/SearchCLI

Open CLI for integrating AI search, recommendation, and conversational retrieval into agent systems and business systems

AI 点评 · 用命令行整合AI搜索与推荐，降低智能系统集成门槛，提升开发效率。

来源：GitHub

产品发布/更新

5/8 07:22

zendev-sh/zenflow

Multi-agent orchestration & workflow engine. Declarative YAML workflows, LLM coordinator with hub-and-spoke mailboxes, race-safe delivery. One YAML file, one Go…

AI 点评 · 用声明式YAML编排多智能体工作流，LLM协调器保障消息可靠投递，降低开发门槛。

来源：GitHub

产品发布/更新

5/8 07:22

zendev-sh/zenflow

Multi-agent orchestration & workflow engine. Declarative YAML workflows, LLM coordinator with hub-and-spoke mailboxes, race-safe delivery. One YAML file, one Go…

AI 点评 · 用声明式YAML编排多智能体工作流，结合LLM协调与安全投递，降低开发门槛。

来源：GitHub

产品发布/更新

5/7 22:18

freestylefly/wesight

Open-source desktop AI agent workspace with one-click Claude Code, Codex, OpenClaw, Hermes Agent setup and custom LLM model routing.

AI 点评 · 开源桌面AI工作区整合多模型一键部署，降低智能体开发门槛，推动个性化工具构建。

来源：GitHub

论文研究NEW

5/7 04:00

Masked Diffusion Language Models are Strong and Steerable Text-Based World Models for Agentic RL

Recent growth in reinforcement learning (RL) has surfaced a need for diverse, specialized training environments. Hand-curated environments with fixed task and reward difficulties become ineffective si…

来源：HuggingFace Papers

产品发布/更新

5/7 01:43

opensquilla/opensquilla

OpenSquilla — Token-Efficient AI Agent with same budget, higher intelligence density

来源：GitHub

产品发布/更新

5/7 01:43

opensquilla/opensquilla

OpenSquilla — Token-Efficient AI Agent with same budget, higher intelligence density

AI 点评 · 用更少token实现更高智能密度，开源AI Agent效率突破值得关注。

来源：GitHub

产品发布/更新

5/6 19:12

OpenOSINT/OpenOSINT

AI-powered OSINT agent with interactive REPL, MCP server, and CLI. 9 tools. Works with Claude, GPT-4, or local models. For authorized security research only.

来源：GitHub

产品发布/更新

5/6 19:12

OpenOSINT/OpenOSINT

AI-powered OSINT agent with interactive REPL, MCP server, and CLI. 16 tools. Works with Claude, GPT-4, or local models. For authorized security research only.

AI 点评 · 开源AI驱动的OSINT工具，整合交互式命令行与多模型支持，为安全研究提供高效情报分析能力。

来源：GitHub

模型发布/更新

5/6 18:43

AlphaEvolve: How our Gemini-powered coding agent is scaling impact across fields

Explore how AlphaEvolve's Gemini-powered algorithms are driving impact across business, infrastructure, and science.

来源：Google DeepMind

模型发布/更新

5/6 18:43

AlphaEvolve: How our Gemini-powered coding agent is scaling impact across fields

Explore how AlphaEvolve's Gemini-powered algorithms are driving impact across business, infrastructure, and science.

AI 点评 · Gemini驱动代码智能体跨领域落地，标志AI从实验室走向产业规模化应用。

来源：Google DeepMind

产品发布/更新

5/6 05:08

NirDiamant/Agent_Memory_Techniques

Agent memory for LLMs: 30 runnable Jupyter notebooks covering conversation buffers, vector stores, knowledge graphs, episodic and semantic memory, MemGPT, Mem0,…

来源：GitHub

产品发布/更新

5/6 05:08

NirDiamant/Agent_Memory_Techniques

Agent memory for LLMs: 30 runnable Jupyter notebooks covering conversation buffers, vector stores, knowledge graphs, episodic and semantic memory, MemGPT, Mem0,…

AI 点评 · 30个可运行笔记系统梳理LLM记忆机制，实操价值高，覆盖从基础到前沿。

来源：GitHub

行业动态

5/6 05:00

Games people — and machines — play: Untangling strategic reasoning to advance AI

Assistant Professor Gabriele Farina mines the foundations of decision-making in complex multi-agent scenarios.

来源：MIT News

行业动态

5/6 05:00

Games people — and machines — play: Untangling strategic reasoning to advance AI

Assistant Professor Gabriele Farina mines the foundations of decision-making in complex multi-agent scenarios.

AI 点评 · 从博弈论与多智能体决策切入，揭示机器战略推理突破，推动通用AI能力边界。

来源：MIT News

产品发布/更新

5/6 01:55

agynio/platform

Agyn is an open-source Kubernetes-native runtime that moves AI agents like Claude Code and Codex from laptops to company infrastructure with the controls enterp…

来源：GitHub

产品发布/更新

5/6 01:55

agynio/platform

Agyn is an open-source Kubernetes-native runtime that moves AI agents like Claude Code and Codex from laptops to company infrastructure with the controls enterp…

AI 点评 · 开源Kubernetes原生方案，让企业安全托管AI代理，填补了从个人工具到平台级部署的空白。

来源：GitHub

产品发布/更新

5/6 00:30

yuc16/PatentRadar

自动化专利侵权竞品分析系统 —— 输入专利公开号，1 小时产出律师可复核的 claim chart 报告（逐特征对比 + 证据URL + 下一步建议）；同时打包成 skill，可被任意 agent 调用。

来源：GitHub

产品发布/更新

5/6 00:30

yuc16/PatentRadar

AI 点评 · 专利侵权分析自动化，律师级报告1小时生成，大幅提升IP尽调效率。

来源：GitHub

产品发布/更新

5/4 21:43

ybuild-ai/ai-game-art-pipeline-skill

Agent skill for turning AI images and videos into playable game art assets

来源：GitHub

产品发布/更新

5/4 21:43

ybuild-ai/ai-game-art-pipeline-skill

Agent skill for turning AI images and videos into playable game art assets

AI 点评 · 聚焦AI图像转游戏资产的自动化流程，大幅降低游戏开发门槛。

来源：GitHub

产品发布/更新

5/4 17:14

jmerelnyc/Photo-agents

Autonomous self-evolving agents. Vision-grounded layered memory and self-written skills for LLM agents that operate your computer.

来源：GitHub

产品发布/更新

5/4 17:14

jmerelnyc/Photo-agents

Autonomous self-evolving agents. Vision-grounded layered memory and self-written skills for LLM agents that operate your computer.

AI 点评 · 自主进化代理结合视觉记忆，让AI真正学会操作电脑，突破传统指令限制。

来源：GitHub

产品发布/更新

5/3 19:04

shawn0728/OpenSearch-VL

🔍 OpenSearch-VL provides a fully open recipe for training strong multimodal deep search agents through high-quality data curation, diverse visual/search tools,…

来源：GitHub

产品发布/更新

5/3 19:04

shawn0728/OpenSearch-VL

🔍 OpenSearch-VL provides a fully open recipe for training strong multimodal deep search agents through high-quality data curation, diverse visual/search tools,…

来源：GitHub

产品发布/更新

5/3 16:16

pingchesu/hermes-curator-evolver

Evidence-driven skill evolution for Hermes Agent — reports, dry-run proposals, candidate search, and guarded apply

来源：GitHub

产品发布/更新

5/3 16:16

pingchesu/hermes-curator-evolver

Evidence-driven skill evolution for Hermes Agent — reports, dry-run proposals, candidate search, and guarded apply

来源：GitHub

产品发布/更新

5/3 14:16

SepidehHosseinian/AIEngineeringRoadmap

A roadmap to AI Engineering excellence: Masterclass in Generative AI, RAG, and Agentic Systems with a focus on scalable and production-ready architectures. 🚀🤖

来源：GitHub

产品发布/更新

5/3 00:46

placet-io/facio

A proactive AI agent for secure, traceable, human-in-the-loop task execution over long-running workflows.

来源：GitHub

产品发布/更新

5/2 19:03

pouyahasanamreji/continuum

Shared memory + orchestration for your coding agents — one MCP server, persistent vector memory, agent registry

来源：GitHub

产品发布/更新

5/1 08:10

raindrop-ai/workshop

Give your coding agent the power to write and run agent evals.

来源：GitHub

模型发布/更新

4/28 23:58

Introducing NVIDIA Nemotron 3 Nano Omni: Long-Context Multimodal Intelligence for Documents, Audio and Video Agents

AI 点评 · 英伟达新模型统一处理文档、音频、视频，突破长上下文多模态智能，将驱动下一代AI Agent应用。

来源：HuggingFace Blog

行业动态

4/27 20:35

Show HN: OSS Agent I built topped the TerminalBench on Gemini-3-flash-preview

Scored 65.2% vs google's official 47.8%, and the existing top closed source model Junie CLI's 64.3%. Since there are a lot of reports of deliberate cheating on TerminalBench 2.0 la…

来源：Hacker News

模型发布/更新

4/24 08:00

DeepSeek-V4: a million-token context that agents can actually use

来源：HuggingFace Blog

产品发布/更新

4/23 00:25

Show HN: Agent Vault – Open-source credential proxy and vault for agents

Hey HN! Today we're launching Agent Vault - an open source HTTP credential proxy and vault for AI agents. Repo is at https://github.com/Infisical/agent-vault , and there's an in-de…

来源：Hacker News

产品发布/更新

4/23 00:25

Show HN: Agent Vault – Open-source credential proxy and vault for agents

Hey HN! Today we're launching Agent Vault - an open source HTTP credential proxy and vault for AI agents. Repo is at https://github.com/Infisical/agent-vault , and there's an in-de…

来源：Hacker News

论文研究

4/8 04:00

Where Did It Go Wrong? Process-Level Evaluation of Web Agents with Semantic State Tracking

Web agents act through long interaction sequences, yet existing benchmarks evaluate only terminal success, discarding all process information and offering little guidance on improvement. In this work,…

来源：HuggingFace Papers

技巧与观点

4/4 19:45

Components of A Coding Agent

How coding agents use tools, memory, and repo context to make LLMs work better in practice

AI 点评 · 拆解编码代理三大核心模块，为LLM落地提供实用框架。

来源：Sebastian Raschka

行业动态

1/19 22:00

Claude Code costs up to $200 a month. Goose does the same thing for free.

The artificial intelligence coding revolution comes with a catch: it's expensive. Claude Code , Anthropic's terminal-based AI agent that can write, debug, and deploy code autonomou…

AI 点评 · Claude Code收费昂贵，Goose免费替代，AI编程工具价格战打响。

来源：VentureBeat

产品发布/更新

1/13 21:00

Salesforce rolls out new Slackbot AI agent as it battles Microsoft and Google in workplace AI

Salesforce on Tuesday launched an entirely rebuilt version of Slackbot , the company's workplace assistant, transforming it from a simple notification tool into what executives des…

AI 点评 · Salesforce升级Slackbot为AI智能体，加剧与微软、谷歌的企业AI竞争。

来源：VentureBeat

模型发布/更新

1/12 19:30

Anthropic launches Cowork, a Claude Desktop agent that works in your files — no coding required

Anthropic released Cowork on Monday, a new AI agent capability that extends the power of its wildly successful Claude Code tool to non-technical users — and according to company in…

AI 点评 · 面向非技术人员的AI代理工具，降低编程门槛，拓展办公自动化应用场景。

来源：VentureBeat

技巧与观点

11/28 08:00

Reward Hacking in Reinforcement Learning

Reward hacking occurs when a reinforcement learning (RL) agent exploits flaws or ambiguities in the reward function to achieve high rewards, without genuinely learning or completin…

AI 点评 · 强化学习易钻空子，揭示AI安全核心挑战，关乎真实任务可靠性。

来源：Lilian Weng

技巧与观点

6/23 08:00

LLM Powered Autonomous Agents

Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT , GPT-Engineer and BabyAGI , serve as ins…

AI 点评 · 揭示大语言模型作为核心控制器，推动自主智能体从概念走向实用化，标志AI应用新里程碑。

来源：Lilian Weng