AI 点评 · 物理AI融合现实世界,英伟达开源补齐工具链,加速机器人自主决策。
AI Agent
共 367 条相关资讯 · 来自历史归档
AI 点评 · 深度研究智能体落地生产,揭示了AI从实验室到实战的关键教训。
According to every product demo from the last four years, planning a trip is a killer use case for AI. Just tell it where you're going, they all promise, and your chatbot / agent /…
AI 点评 · 演示效果惊艳,但揭示出AI自主规划能力已逼近人类,引发对技术失控的深层担忧。

IT之家 6 月 2 日消息,据新浪科技今日报道,曾在华为主导盘古大模型研发的“90 后少帅”王云鹤,已于近期投身 AI Agent 领域创业,其新成立的公司“基元律动”已完成一轮估值达 1 亿美元的新融资。 王云鹤在今年 3 月末正式告别了工作近 9 年的华为。离职前,他最后的职务为华为诺亚方舟实验室主任、盘古大模型负责人,曾被誉为“盘古大模型少帅”和“天…
AI 点评 · 顶尖技术人才创业动向,折射AI Agent赛道资本热度与行业新趋势。

IT之家 6 月 2 日消息,英伟达于 5 月 31 日宣布,其面向智能体 AI 工厂的下一代超级计算平台 NVIDIA Vera Rubin 已进入全面量产阶段。IT之家此前已有相关报道。 除此之外,英伟达同时确认新一代 Spectrum-X 以太网硅光技术已同步进入全面量产阶段,这是该平台实现大规模 AI 工厂网络互联的核心基石。 作为全球首款基于光电一…
AI 点评 · 硅光技术量产突破,能效提升5倍,将加速AI工厂网络部署,改变行业格局。
IT之家 6 月 2 日消息,据澎湃新闻,英特尔 CEO 陈立武 2 日(今天)在台北电脑展上表示,CPU 需求越来越高,但供给受到限制。过去四周内, 许多公司 CEO 打电话给他要更多的 CPU ,对英特尔来说“是一个机会”。 AI 智能体的兴起,使中央处理器的重要性得以再次提升,从而带动需求大量增加。陈立武在谈到 CPU 的发展趋势时指出,AI 智能体需…
AI 点评 · 高管亲述供货紧张,反映AI时代CPU需求爆发,英特尔产能成关键变量。
The global health care sector is under increasing strain. Decades of chronic underinvestment and constraints in recruitment have coincided with a surge in demand for services for a…
AI 点评 · 用AI代理重构医疗流程,缓解人力短缺,提升服务效率与可及性。
AI 点评 · Agent时代文档解析新突破,专家分享前沿基础设施演进,实战价值极高。

IT之家 6 月 2 日消息,据IT之家小伙伴今日反馈,腾讯客服最新回复显示, 微信正在与华为、荣耀、小米、OPPO、vivo 等手机厂商合作推出 A2A 助手能力 。 用户可以通过手机语音助理发起微信音视频通话或向指定好友发送消息。该功能基于 A2A(Agent-to-Agent)协作机制, 由厂商 AI 助手向微信发起指令,微信负责执行并返回结果 ,全程…
AI 点评 · 手机厂商AI助手与微信深度打通,标志着跨应用智能协作进入实用阶段。
Qwen3.7-Plus已上线阿里云百炼
AI 点评 · 通杀多模态与桌面软件,AI智能体能力再上台阶,开发者生态迎来新变量。

Agentic AI is getting physical. At COMPUTEX on Tuesday, NVIDIA announced NVIDIA JetPack 7.2 and NVIDIA NemoClaw support on NVIDIA Jetson. JetPack 7.2 brings agentic AI skills, Yoct…
AI 点评 · 英伟达让AI从虚拟走向实体,开启物理世界自主决策新纪元。

IT之家 6 月 2 日消息,阿里千问大模型今天(6 月 2 日)发布博文,宣布推出 Qwen3.7-Plus 模型, 定位为多模态交互混合智能体。 Qwen3.7-Plus 是 Qwen3.7 的多模态升级版,核心定位是视觉与语言统一的智能体基座。 它保留文本、编码、工具使用和生产力工作流能力,同时强化视觉理解、视觉推理和跨模态任务处理。 模型已通过阿里云…
AI 点评 · 多模态与智能体融合,或加速AI从“对话”迈向“行动”的关键一步。
If Nvidia has cracked a way to bring AI agents easily, safely, and usefully to the masses, it could — and should — be big.
AI 点评 · 英伟达联手微软戴尔惠普,将AI智能体推向PC,可能撬动2000亿美元CPU市场。

GPT-5.5, GPT-5.4, and Codex are now generally available on Amazon Bedrock. Deploy them in production applications and agents today, on Bedrock’s high performance inference engine.
AI 点评 · OpenAI模型登陆亚马逊云平台,企业应用部署门槛进一步降低。
Google's new "24/7" AI agent, Gemini Spark, can be shockingly good at doing things on your behalf. But I'm not sure it's worth the financial cost and potential privacy tradeoffs. T…
AI 点评 · AI助手能力接近演示效果,但隐私与成本的双重代价仍需权衡。
AI 点评 · 英特尔押注18A工艺的288核至强6+,标志着智能体正重塑CPU在AI调度中的核心地位。
Clinical practice is not the selection of an answer from enumerated options: a physician gathers heterogeneous information incrementally and commits to sequential, irreversible decisions under uncerta…
AI 点评 · 电子健康记录多阶段交互环境,弥合了AI临床决策与真实医疗流程间的鸿沟。
AI 点评 · Qwen3.7-Plus融合多模态与智能体能力,或开启AI应用新范式。

In this post, we use a lakehouse data agent to demonstrate how you can use Policy for deterministic access control and Lambda interceptors for dynamic validation. We then show how…
AI 点评 · 亚马逊Bedrock新功能实现AI代理安全管控,结合策略与动态验证,为行业提供可落地的防护方案。
We introduce HERO'S JOURNEY, a benchmark for rule induction in goal-directed episodic tasks, where agents must infer hidden rules from demonstrations and act on them through multi-step execution. HERO…
AI 点评 · 用文本游戏测试AI规则归纳能力,填补了复杂推理任务基准的空白。
Agent skills occupy a privileged position in the agent workflow, as agents are expected to implicitly follow and execute them, rendering third-party skills a vulnerable attack surface. Existing studie…
AI 点评 · 自动化构建技能生命周期攻击,揭示第三方技能在智能体流程中的隐蔽安全风险,需重视防御。
Text files such as skill files, memory files, and behavioral configuration files play a central role in defining how modern agents act. Through edits by humans or the agents themselves, these files ma…
AI 点评 · 追踪智能体行为轨迹,揭示自我调整机制,为AI决策透明化提供新视角。
Large language models now power robo-advisors and trading agents, yet whether they carry built-in biases toward specific assets is largely untested. We ask three questions: do LLMs systematically pref…
AI 点评 · 审计金融大模型对特定资产的偏好,揭示AI决策的隐性偏差,影响投资策略可靠性。

In this post, we address several key risks that surface when designing an agentic payment system, and how to address them with the capabilities of AgentCore payments.
AI 点评 · 用亚马逊Bedrock内置防护栏解决AI支付代理安全风险,为金融场景落地提供可靠方案。
AI 点评 · 斯坦福CS336课程发布AI代理开发规范,为学术与工业界提供权威参考。

When you build agentic AI solutions, you face unique operational challenges. Agents make unpredictable decisions, costs spiral unexpectedly, and debugging non-deterministic failure…
AI 点评 · 亚马逊Bedrock AgentCore让AI代理规模化运营更可控,破解成本与调试难题。
AI 点评 · 企业级AI落地的核心瓶颈不在模型,而在智能体逻辑的可扩展性。
Biological image analysis increasingly demands integration across heterogeneous tools, programming environments, and domain knowledge that few researchers can command simultaneously. We present Agenti…
Deep-research agents solve tasks through long trajectories of search, tool use, evidence inspection, and answer synthesis. Evaluation based on final answers shows whether an agent succeeds, but not wh…
Building capable visual web agents requires long-horizon reasoning, precise grounding, and robust interaction with dynamic real-world websites. Despite rapid progress, the strongest systems remain lar…
AI 点评 · Anthropic推出托管智能体与主动工作流,标志着AI从被动应答向自主执行的关键进化。

IT之家 6 月 1 日消息,在今天的华为 nova 16 系列及全场景新品发布会上,华为终端 BG CEO 何刚正式发布了 FreeClip 2 耳夹耳机典藏版, 定价 1499 元 。 据介绍,华为 FreeClip 2 耳夹耳机典藏版采用鎏光宝盒 + 珠宝盒设计,充电舱采用真空镀膜工艺,主打“圆润璀璨”, 同时内部空间提升 20% 。 这款耳机还与周大…
AI 点评 · 将珠宝美学与AI智能体交互结合,为耳机品类带来轻奢体验与技术创新突破。

IT之家 6 月 1 日消息,在今日的 2026 台北国际电脑展主题演讲中,英伟达 CEO 黄仁勋发布了“全球最强大的桌面 AI 超级计算机”—— DGX Station for Windows 。 DGX Station for Windows 用于在 Windows 上开发和运行智能体 —— 基于英伟达 GB300 Grace Blackwell Ult…
AI 点评 · 首次将企业级AI算力带入桌面端,为Windows生态开发者提供了本地化训练与推理的超级工具。

IT之家 6 月 1 日消息,为加强自主智能体的智能能力,英伟达今日发布了面向全天候运行智能体的全新开源模型与数据集,相关成果由英伟达 Nemotron 联盟联合打造。 据官方介绍,英伟达 Nemotron 3 Ultra 是一款拥有 5500 亿参数的混合专家模型,可为代码开发、科研及企业业务流程中的长效智能体提供顶尖智能能力。相较于同级别主流开源前沿模型…
AI 点评 · 参数规模与推理速度双突破,为智能体部署树立新标杆。

Personal agents are exploding in popularity, with open source projects like OpenClaw and Hermes seeing rapid adoption by AI developer communities on GitHub. Built to adapt to indiv…
AI 点评 · 英伟达将本地AI智能体部署到RTX电脑和DGX工作站,推动个人AI应用从云端走向本地化。

IT之家 6 月 1 日消息,在今日的 2026 台北国际电脑展主题演讲中,英伟达 CEO 黄仁勋宣布正式推出 Vera 处理器 。 英伟达 Vera 是一款专为 AI 智能体打造的 CPU ,速度比 x86 处理器快 1.8 倍,可驱动各行各业的多样化工作负载,Vera 现已全面投产。 Vera 以 Grace CPU 的成功为基础(迄今为止,Grace…
AI 点评 · 巨头下场定义AI智能体专用芯片,生态号召力预示行业新标杆。

IT之家 6 月 1 日消息,在今日的 2026 台北国际电脑展主题演讲中,英伟达 CEO 黄仁勋宣布 Vera Rubin 全面投产。 Vera Rubin 为下一代 AI 工厂提供了 POD 规模的基础架构 —— 与上一代 Grace Blackwell 平台相比, 其大规模智能体吞吐量提高了 10 倍 。 凭借成熟的开源 MGX 设计,英伟达供应链生态…
AI 点评 · 下一代AI算力跃升10倍,英伟达再次定义超大规模集群新标杆。
MiniMax M3 今日正式发布。 MiniMax M3 在编程和智能体等专业任务上达到了前沿的能力。它使用了全新注意力架构 MSA (MiniMax Sparse Attention),最高支持 1M 超长上下文。它也是一个原生多模态模型,支持图片和视频的输入,并能操作电脑桌面。 在衡量 Coding 能力的 SWE-Bench Pro 上,MiniMa…
RuleGo 是一个基于 Go 语言的轻量级、高性能、嵌入式规则引擎。它通过规则链(JSON/可视化)编排组件,实现复杂业务逻辑的声明式管理,在物联网、边缘计算、数据集成、自动化等场景有广泛应用。 v0.36.0 是一个里程碑版本:rulego-components-ai 从 AI 组件库正式升级为声明式 AI Agent 开发框架,同时 Server 模块…
让每个模型都成为 Codex 引擎。 OpenAI 兼容的 Responses API 网关,让 Codex、CLI 工具和开发者 Agent 接入任意模型。 English Documentation · 中文文档 GodeX 让使用 OpenAI Responses API 的客户端,可以通过一个本地网关调用 DeepSeek、Xiaomi、MiniMa…
The Model Context Protocol (MCP) has emerged as a transformative standard for connecting large language models (LLMs) with external data sources and tools, and has been rapidly adopted across personal…
AI 点评 · 用环境模拟测试LLM在个人场景的真实表现,MCP标准首次有了专属评测基准。
Building capable visual web agents requires long-horizon reasoning, precise grounding, and robust interaction with dynamic real-world websites. Despite rapid progress, the strongest systems remain lar…
AI 点评 · 聚焦多轮强化学习框架,破解视觉智能体在动态网站中的交互难题,填补了实操性研究空白。
In open-ended environments, exploration is fundamental for autonomous agents, yet current language model agents struggle with this. Effective exploration requires memory, but retaining raw interaction…
AI 点评 · 结合新颖信号统一记忆与探索,让语言模型在开放环境中自主发现未知。
Computer use agents (CUAs) today are primarily deployed as single serial agents. This setup is suboptimal for complex long-horizon tasks that benefit from task decomposition, parallel execution, and c…
AI 点评 · 多智能体协作提升复杂长任务效率,突破单代理局限,值得关注。
Frontier model evaluations are shifting from foundational capabilities (e.g., instruction following and reasoning) toward compositional, agentic ones, but Korean agentic benchmarks remain scarce. We i…
AI 点评 · 首个聚焦韩语场景的网页浏览智能体评测基准,填补了非英语环境下的评估空白。
下一代CUA训练范式
AI 点评 · CUA训练范式直击Agent工具选择瓶颈,复旦与通义合作开辟新路径。
下一代CUA训练范式
AI 点评 · 解决Agent工具选择难题,复旦与通义提出全新训练思路,推动智能体实用化。
让世界模型迈向多智能体交互仿真
AI 点评 · 强调数据质量比数量更重要,揭示大模型成本居高不下的核心痛点。
让世界模型迈向多智能体交互仿真
AI 点评 · 点明AI成本高的根本原因在于数据质量,为行业降本提供新思路。
AI 点评 · 从单点到基础设施,展现多智能体落地研发全流程,值得工程团队借鉴。
AI 点评 · 多智能体协作从实验走向工程化,为AI研发团队提供可复用的基础设施范本。
Sisyphus Academica — The Research Paper Writing Army. 20+ agent swarm: 6 novelty engines, 10 adversarial reviewers, Humanizer-integrated writing, citation verif…
AI 点评 · 用20多个AI代理模拟学术生产链,挑战论文写作与评审的自动化边界。
CLI更像是Agent的母语
AI 点评 · AI不应模仿人类交互,Agent需以机器原生方式重塑世界,标志着人机协作范式的根本转变。
CLI更像是Agent的母语
AI 点评 · Agent不再迁就人类交互方式,操作系统底层适配AI才是效率革命的关键。
与其焦虑AI,不如加入AI
AI 点评 · 从全员Agent到组织进化,MiniMax提供了AI原生企业的实战蓝图。
与其焦虑AI,不如加入AI
AI 点评 · 揭示企业如何打破传统架构,全员转型智能体,为组织适应AI时代提供实战模板。
Procedural 3D modeling through code is emerging as a versatile paradigm, offering deterministic, engine-ready, and precisely editable assets that neural 3D generators inherently lack. Authoring such p…
AI 点评 · 首个用代码评估AI三维建模能力的基准,填补了程序化生成与神经渲染之间的测评空白。
Large language model (LLM) agents increasingly rely on reusable external skills to solve long-horizon interactive tasks. Existing training-free skill adaptation pipelines usually update skills from fu…
AI 点评 · 用轨迹数据让LLM代理自动进化技能,免训练自适应方案突破长程任务瓶颈。
Reusable skills are a key mechanism for extending agent capabilities, allowing agents to accumulate experience and solve increasingly complex tasks. Yet most existing skill-learning methods store reus…
AI 点评 · 视觉技能弥补语言局限,让智能体在复杂任务中更高效积累经验。
Deterministic cost / loop / time budgets · full observability · crash-resumable runs · human-approval gates · a memory you own. Self-hosted. Your keys. No telem…
AI 点评 · 用确定性成本和可恢复运行打破AI黑箱,赋予用户数据主权的轻量级内核。
Deterministic cost / loop / time budgets · full observability · crash-resumable runs · human-approval gates · a memory you own. Self-hosted. Your keys. No telem…
AI 点评 · 将确定性成本、循环时间预算与可恢复运行结合,为AI安全执行提供新范式。
AI 点评 · 腾讯PCG将前沿AI融入质量工程,展现测试智能体的落地实践,值得关注。
AI 点评 · 测试智能体如何重塑质量工程,腾讯大牛现场揭秘实战经验。
让世界模型迈向多智能体交互仿真
AI 点评 · 突破单智能体局限,开启世界模型在多人协作与对抗场景的仿真新可能。
让世界模型迈向多智能体交互仿真
AI 点评 · 多智能体交互突破,让AI从单机游戏进化成开放世界,推动具身智能研究迈入新阶段。
让世界模型迈向多智能体交互仿真
AI 点评 · 多智能体交互是世界模型的关键突破,推动AI从单机游戏走向真实协作场景。
Agentic search requires language model agents to explore many sources and answer complex information-seeking questions. Scaling test-time compute is a promising way to improve these agents, but curren…
AI 点评 · 用细粒度自验证扩展测试时计算,首次系统解决智能体搜索中的错误累积问题,为复杂信息检索提供可扩展方案。
Language models can find thousands of severe software vulnerabilities, and agents are increasingly being misused for cyberattacks. To avoid detection, attackers frequently distribute their misuse, spl…
AI 点评 · 分布式智能体攻击难追踪,状态监测实现实时阻断,提升AI安全防御新高度。
Long-context reasoning remains a central challenge for large language models, which often fail to locate and integrate key information in extensive distracting content. Reinforcement learning with ver…
The same arguments often need to be evaluated under different external regimes. An agent with influence over the regime has a strategic lever that standard formalisms do not directly capture. We intro…
AI 点评 · 用博弈视角解析论辩情境依赖,为AI策略性语言操控提供全新建模框架。
AI 点评 · 罗宾汉平台允许AI代理炒股,开创散户自动化交易新纪元。
AI 点评 · Robinhood允许AI代理直接交易,加速金融与AI融合。
As Large Language Models (LLMs) evolve from general-purpose assistants to user-centric agents, personalization has become central to aligning model behavior with individual preferences, making the eva…
AI 点评 · 个性化评估框架创新,让大模型更懂用户,提升人机交互体验。
Cognition makes Devin, the first and arguably most successful AI coding agent. But famed coder Wu says it isn't designed to supplant human programmers.
AI 点评 · AI编程工具定位辅助而非替代,揭示人机协作新方向。
Cognition makes Devin, the first and arguably most successful AI coding agent. But famed coder Wu says it isn't designed to supplant human programmers.
AI 点评 · AI编程工具定位为人机协作而非替代,创始人观点打破行业焦虑。
AI 点评 · 验证码依然能识别AI,凸显当前人机对抗技术的关键进展与挑战。
36氪获悉,宇树科技发文称,5月31日,宇树科技具身智能体验馆亚洲首店将正式登陆上海,门店汇聚G1人形机器人、R1人形机器人、Go2机器狗全系列C端产品。
AI 点评 · 观点尖锐,直指AI编程效率背后的隐性成本与风险,引发行业反思。
AI 点评 · 过度依赖编程Agent可能导致开发效率虚高、维护成本激增。
AI 点评 · 多智能体协作在工业设计中的首次规模化落地,展现AI从单点工具向系统化生产转型。
AI image tools rarely make me feel like I'm part of the creative process. They are, after all, mostly designed so that people with no design experience can type in a few words and…
AI 点评 · 评测揭示AI工具在创意协作中的局限,提醒行业需更关注人机共创体验而非替代。
AI 点评 · 多智能体编排首次深入设计生产场景,展现AI协同解决复杂任务的工程突破。
AI image tools rarely make me feel like I'm part of the creative process. They are, after all, mostly designed so that people with no design experience can type in a few words and…
AI 点评 · 直击AI工具痛点:设计过程缺乏参与感,暴露当前技术局限。
一个人拥有整个创意工作室
AI 点评 · 用AI降低创意门槛,腾讯Miora让个人也能高效产出专业设计内容。
The end of web parsing. The beginning of scalable pixel-native search.
AI 点评 · 像素级搜索技术突破,终结传统网页解析,开启视觉原生检索新范式。
The end of web parsing. The beginning of scalable pixel-native search.
AI 点评 · 将网页解析转向像素级原生搜索,为多模态检索开辟全新路径。
可以长时间执行任务,人类不用经常回来检查它的工作
AI 点评 · 自主多智能体并行协作,大幅提升复杂任务执行效率与连续性。
As AI agents move from experiments to production, AWS, Cloudflare, and others are redesigning cloud infrastructure for a future dominated by machine-generated internet traffic inst…
AI 点评 · 云巨头重造底层架构,AI代理将主导未来网络流量。
As AI agents move from experiments to production, AWS, Cloudflare, and others are redesigning cloud infrastructure for a future dominated by machine-generated internet traffic inst…
AI 点评 · 云巨头正为AI时代重构网络,机器流量将主导未来,基础设施变革迫在眉睫。

This post combines learnings from LangChain’s work on evaluating deep agents and Anthropic’s guide to demystifying evals for AI agents into a practical guide. In this post, you wil…
AI 点评 · 结合LangChain与Anthropic经验,提供AWS上评估深度代理的实用指南,填补实操空白。

This post combines learnings from LangChain’s work on evaluating deep agents and Anthropic’s guide to demystifying evals for AI agents into a practical guide. In this post, you wil…
AI 点评 · 结合LangChain与Anthropic的评估经验,为复杂AI代理提供实用评测指南,填补行业方法论

Undisclosed addition in jqwik instructed AI coding agents to delete app output.
AI 点评 · 开发者用恶意代码反制AI编码工具,暴露了人机协作中的安全漏洞与信任危机。

Undisclosed addition in jqwik instructed AI coding agents to delete app output.
AI 点评 · 开发者用提示注入反制低代码乱象,揭示AI安全与人类创意间的冲突新战场。
Asana will incorporate StackAI into its growing suite of AI workflow tools.
AI 点评 · Asana收购无代码智能体构建工具,加速企业AI工作流自动化布局。
Asana will incorporate StackAI into its growing suite of AI workflow tools.
AI 点评 · Asana收购无代码智能体构建工具,加速AI工作流布局,降低企业自动化门槛。
LLM agents are increasingly expected not only to complete isolated tasks, but also to carry bounded representations of human expertise, judgment, and interaction style. Building such person-grounded a…
AI 点评 · 用专家知识蒸馏让AI自动生成人类技能,大幅提升智能体专业性和拟人化水平。
Long-term memory is essential for multimodal agents to build coherent experience, accumulate world knowledge, and achieve continual learning. However, constructing effective memory goes beyond memory…
AI 点评 · 聚焦多模态智能体的长期记忆构建,突破传统记忆局限,实现持续学习与知识积累。
LLM agents are evolving from conversational chatbots to operational tools in real-world workspaces. In local agentic harnesses, an LLM can read and write files, call tools, and reuse workspace state a…
AI 点评 · 揭示LLM代理从对话到操作工具的安全漏洞,提出防御后门攻击的新思路,对AI安全至关重要。
Long-context reasoning remains a central challenge for large language models, which often fail to locate and integrate key information in extensive distracting content. Reinforcement learning with ver…
AI 点评 · 从搜索代理轨迹中学习长上下文推理,用评分奖励机制提升模型信息整合能力。
Monitoring autonomous language model agents currently relies mostly on surface behavior. But what happens when agent populations invent new languages with the goal of avoiding human oversight. Here, w…
AI 点评 · AI自主创制新语言规避人类监管,暴露智能体协同的深层安全风险。
Long-horizon search agents accumulate large amounts of retrieved content across many tool calls, making context-budget efficiency increasingly important. A minimal intervention is to mask stale observ…
AI 点评 · 用简洁机制揭示信息遮蔽策略的临界点,为长程搜索智能体优化提供实用边界。
LLM agents increasingly retrieve externally curated skills-procedural instructions retrieved at decision time-to improve performance on long-horizon interactive tasks. Existing skill libraries are typ…
AI 点评 · 打破通用技能库局限,提出模型感知对齐,让智能体任务适配更精准高效。
Multimodal large language models (MLLMs) have shown strong capabilities in perception, reasoning, and action generation. However, their ability to sustain exploration in dynamic open worlds remains un…
AI 点评 · 首个用《我的世界》评估多模态大模型开放世界探索能力的基准,填补了该领域测试空白。

Agent evaluation is most powerful when you combine fast-moving online signals with stable offline baselines. To understand whether your agent is truly improving over time, you need…
AI 点评 · 用数据集管理建立随智能体成长的测试套件,是平衡在线信号与离线基准、追踪真实进步的关键。

Agent evaluation is most powerful when you combine fast-moving online signals with stable offline baselines. To understand whether your agent is truly improving over time, you need…
AI 点评 · 亚马逊Bedrock新功能让AI代理测试集动态扩展,平衡线上信号与离线基准,实现持续性能追踪。
Are AI agents tools, co-authors, or researchers? We present a quantified case study ($N=1$): a physicist supervising an AI coding agent (Claude Code, Sonnet and Opus models) over 12 work days and 57 s…
AI 点评 · 物理学家监督AI编码的实证研究,揭示人机协作在科学软件开发中的新边界。
Multi-component LLM agents assemble probabilistic claims from components that each see only part of a joint problem; the composition can violate basic probability axioms even when every component is l…
AI 点评 · 揭示多组件LLM代理因局部概率合理却全局逻辑矛盾的问题,直指当前AI系统可靠性的核心短板。
Autonomous AI research agents aim to accelerate scientific discovery by automating the research pipeline, from hypothesis generation to peer review. However, existing benchmarks rarely test a fundamen…
AI 点评 · 评估AI研究想法的质量,比生成想法更重要,这是迈向自主科研的关键一步。
We introduce Gram, an automated alignment auditing framework to assess the propensity of AI agents to engage in sabotage. We evaluate Gemini models across 17 simulated agentic deployment scenarios tha…
AI 点评 · 自动对齐审计框架首次量化评估AI的蓄意破坏倾向,为AI安全治理提供可操作工具。

This post covers Opus 4.8's improvements and practical guidance for AI engineers integrating the model into agentic systems and production inference workloads on Amazon Bedrock.
AI 点评 · Claude新版本登陆AWS,专为智能体系统优化,工程落地价值显著。

This post covers Opus 4.8's improvements and practical guidance for AI engineers integrating the model into agentic systems and production inference workloads on Amazon Bedrock.
AI 点评 · Claude新模型登陆AWS,为AI工程化部署提供关键升级,值得开发者关注。
AI 点评 · 聚焦AI实用性与体验落差,点出技术落地中的关键矛盾。
Sesame’s new iOS app brings its conversational AI agents to the public, offering more natural back-and-forth interactions designed to feel less like traditional chatbots and more l…
AI 点评 · Oculus创始人出品,让AI对话更接近真人互动,或重新定义语音助手体验。
AI 点评 · 企业Agent从演示到落地,跨越“生产级”鸿沟的实践路径与真实挑战。
Hi HN, we’re open-sourcing ktx. It’s an executable context layer that makes agents reliable on your data stack. We built it after going through the experience of building productio…
AI 点评 · 开源可执行的上下文层,让AI代理在数据栈上更可靠,填补了数据与智能体间的关键空白。
Hi HN, we’re open-sourcing ktx. It’s an executable context layer that makes agents reliable on your data stack. We built it after going through the experience of building productio…
AI 点评 · 首个实现AI Agent安全接管生产数据库的云方案,降低运维风险与成本。
AI 点评 · AI全栈开发效率革命,单人企业级应用落地门槛骤降。
AI 点评 · 算力分工重塑,CPU在AI时代重获核心价值。
AI 点评 · 聚焦人机协同的实用性与可控性,解决Agent落地中的管理痛点。
Visa said that over 1,000 employees have been using Replit for prototyping and development.
AI 点评 · Visa战略投资Replit,推动开发者自主支付,体现金融巨头加速布局AI代理经济。
AI 点评 · 用60秒反思AI频繁请求权限的疲劳感,设计巧妙,直击用户痛点。
AI 点评 · 一分钟游戏,精准戳中AI权限疲劳痛点,值得体验。
Learn how Endava uses Codex to build an agentic organization, accelerating software delivery and reducing requirements analysis from weeks to hours.
AI 点评 · Endava借助Codex将需求分析周期从数周缩至数小时,展示了AI代理加速软件交付的实战价值。
Learn how Endava uses Codex to build an agentic organization, accelerating software delivery and reducing requirements analysis from weeks to hours.
AI 点评 · 恩达瓦用Codex将需求分析从周缩短到小时,展示了AI代理加速软件交付的实战价值。
5月28日,雷神在北京举办以《聚势共生 智算同行》为主题的AI工作站新品发布会,正式推出覆盖塔式、迷你PC和移动三大类别的AI工作站全场景产品矩阵。这是业内首批完成三大形态全覆盖的AI工作站产品发布,以行业领先的品类矩阵和旗舰级算力水准,重新定义了AI工作站的性能基准。 官方图片 AI 正式迈入智能体时代,行业从文本预测转向自主逻辑思考,未来 AI 算力需求…
AI 点评 · 雷神联合AMD率先实现AI工作站三大形态全覆盖,展现行业标杆级算力布局。
CLI, SDK, and IDE plugins for Duel Agents
AI 点评 · 多智能体协作开发工具链,降低AI应用开发门槛。
CLI, SDK, and IDE plugins for Duel Agents
AI 点评 · 用命令行工具和插件简化AI智能体开发,提升调试效率。
Local-first Memory OS for personal AI assistants with L0-L3 memory, Wiki++ knowledge, skill routing, and TokenLess context compression.
AI 点评 · 个人AI助手本地记忆系统,实现知识路由与无令牌压缩,突破云端依赖瓶颈。
医学AI Agent到了关键拐点
AI 点评 · 突破性实现小参数量模型超越顶级大模型,为垂直领域AI应用树立新标杆。
Built on top of the open source Hermes project, Vertu's new foldable combines AI-agent workflows, enterprise integrations, and ultra-premium luxury finishes.
AI 点评 · 奢侈手机品牌用开源AI系统打造企业级折叠机,把高管办公场景与AI代理深度绑定,高端市场差异化打法值得
sqlite AGENTS.md SQLite gained an AGENTS.md file five days ago - but it's not intended for their own development, it's presumably aimed at people who are pointing agents at the SQL…
AI 点评 · SQLite为AI代理设开发规范,开创数据库工具与AI协作新范式。
sqlite AGENTS.md SQLite gained an AGENTS.md file five days ago - but it's not intended for their own development, it's presumably aimed at people who are pointing agents at the SQL…
AI 点评 · SQLite新增AGENTS.md,专为AI代理设计,体现数据库与智能工具融合新趋势。
中文小黑怪诞正文配图生成 Skill | 16:9 白底手绘 | 少量红橙蓝批注 | Codex Skill
AI 点评 · 用代码生成中文怪诞插画,AI绘画技能定制化新玩法。
中文小黑怪诞正文配图生成 Skill | 16:9 白底手绘 | 少量红橙蓝批注 | Codex Skill
AI 点评 · 结合手绘与AI生成,打造独特怪诞视觉风格,创意与工具融合的趣味尝试。

In this post, we share how the AWS Generative AI Innovation Center (GenAIIC) collaborated with Works Human Intelligence (WHI) to build two AI agents using Amazon Bedrock AgentCore.…
AI 点评 · 亚马逊Bedrock AgentCore让企业低门槛构建AI助手,实际案例展示落地价值。

In this post, we share how the AWS Generative AI Innovation Center (GenAIIC) collaborated with Works Human Intelligence (WHI) to build two AI agents using Amazon Bedrock AgentCore.…
AI 点评 · 用亚马逊Bedrock AgentCore构建商业AI助手,为企业自动化客服与流程优化提供可落地的技

In this post, we show you how Verizon Connect built and scaled an agentic AI solution to transform overwhelming fleet data into clear, actionable insights for 100,000 users daily.…
AI 点评 · 企业级AI规模化落地典范,彰显从海量数据到精准决策的转化路径与用户价值。

In this post, we show you how Verizon Connect built and scaled an agentic AI solution to transform overwhelming fleet data into clear, actionable insights for 100,000 users daily.…
AI 点评 · 用AI将海量车队数据转化为每日10万用户的决策指引,规模化落地经验值得行业借鉴。
The design space of agentic AI inference spans two extremes: frontier large language models (LLMs), typically hosted in the cloud and offering strong performance across a wide range of tasks at substa…
AI 点评 · 混合智能体系统设计将云与设备端协同,为AI落地提供关键平衡方案。
We study two-level autoresearch for cooperation: an outer-loop AI agent autonomously redesigns the inner-loop pipeline of an LLM policy-synthesis system for multi-agent Sequential Social Dilemmas (SSD…
AI 点评 · 自动探索合作策略的AI管道设计,为多智能体序贯社会困境提供创新解法。
Large Language Models (LLMs) have advanced autonomous agents from deep search, which retrieves concise factual answers, to deep research, which synthesizes scattered evidence into long-form reports. H…
AI 点评 · 多智能体协作生成可验证长报告,突破深度研究可信度瓶颈,推动AI从搜索迈向论证。
We introduce CausaLab, a scalable environment for evaluating interactive causal discovery by LLM agents. Unlike prior evaluations, CausaLab evaluates both whether an agent can solve a problem using ca…
AI 点评 · 首个可扩展的因果发现交互环境,为AI科学家研究因果推理提供标准化测试平台。
A central bottleneck for phone-use agents is that controllable, reproducible environments covering real mobile behavior are hard to build at scale. Existing mobile-agent benchmarks have made important…
AI 点评 · 首个大规模可复现手机操作环境,填补真实移动行为数据空白,加速AI代理实用化进程。
Multimodal large language models are increasingly deployed as long-horizon agents, where memory must do more than recall: it must track an evolving world, revise what has gone stale, and surface the r…
AI 点评 · 评估多模态智能体的动态记忆,推动从简单回忆到世界建模的跃迁。
Image generation models have evolved from text-conditioned pixel synthesis toward multimodal agents endowed with visual comprehension and tool invocation capabilities. Yet, existing agents remain at t…
AI 点评 · 代码驱动生成图像,打通语言与视觉鸿沟,开辟智能代理新范式。
Modern open-world agents such as OpenClaw exhibit powerful cross-environment execution capabilities yet introduce broad new safety risk sources. Meanwhile, advanced frontier AI models drastically lowe…
AI 点评 · 轻量级框架解决AI代理安全痛点,兼顾可扩展性与实用性,填补行业空白。
Recent advances in mobile GUI agents have shown strong potential for automating mobile tasks, but most effective systems still depend on large vision-language models for screenshot understanding and l…
AI 点评 · 轻量级GUI智能体通过知识图谱实现高效行为探索,突破大模型依赖瓶颈。
Mastering terminal environments requires language agents capable of multi-step planning, feedback-grounded execution, and dynamic state adaptation. However, training such agents is currently bottlenec…
AI 点评 · 突破终端环境长程任务训练瓶颈,为语言智能体提供可扩展的高效学习框架。
Tool retrieval over large API catalogs is a core bottleneck for LLM agents: user queries arrive in colloquial, often underspecified language, while the catalog uses technical API vocabulary that no fi…
AI 点评 · 用迭代协同训练解决大模型工具检索中自然语言与技术术语的语义鸿沟,提升复杂API调用的准确率。
Skills, i.e., structured workflow instructions distilled for large language models (LLMs), are becoming an increasingly important mechanism for improving agent performance on real-world downstream tas…
AI 点评 · 自动化审计开源技能生态,填补LLM代理安全评估空白,推动AI应用标准化。
Large Language Models (LLMs) have demonstrated strong performance on general tasks, while often struggling to adapt to specialized domains without high-quality domain-specific data. Existing LLM-based…
AI 点评 · 自主智能体解决大模型领域数据瓶颈,开辟模型专业化新路径。
Real-world data analysis is inherently iterative, yet existing benchmarks mostly evaluate isolated or short interactive tasks, leaving agents' ability to track evolving analytical context over long ho…
AI 点评 · 长期自主数据分析基准揭示AI在持续追踪复杂分析进程中的关键短板。
Tool retrieval over large API catalogs is a core bottleneck for LLM agents: user queries arrive in colloquial, often underspecified language, while the catalog uses technical API vocabulary that no fi…
AI 点评 · 用迭代协同训练解决大模型工具检索中口语与API术语的语义鸿沟,突破性提升检索精度。
The design space of agentic AI inference spans two extremes: frontier large language models (LLMs), typically hosted in the cloud and offering strong performance across a wide range of tasks at substa…
While GUI agents have advanced rapidly, they often lack the robustness to recover from their own errors, hindering real-world deployment. To bridge this gap at both the evaluation and data levels, we…
AI 点评 · 为GUI智能体提供自我纠错能力评估基准与轨迹合成方法,填补了实际部署中的关键空白。
LLM agents are increasingly deployed as systems built around editable external harnesses, including prompts, skills, memories and tools, that shape task execution without changing model parameters. Ha…
AI 点评 · 揭示大模型进化本质:外部系统更新不等于模型能力提升,为自我进化智能体研究厘清关键概念。
Physical AI systems, including robots, autonomous vehicles, embodied agents and edge copilots, often run a different inference workload from cloud LLM serving: single-stream, batch-1 autoregressive de…
Large Language Model (LLM) search agents have shown strong promise for knowledge-intensive language tasks through multiple rounds of reasoning and information retrieval. Most existing systems access i…
Agentic search enables LLMs to solve complex multi-hop questions through iterative reasoning and external search. Despite the effectiveness, these systems often suffer from a critical limitation in pr…
Autonomous AI research agents aim to accelerate scientific discovery by automating the research pipeline, from hypothesis generation to peer review. However, existing benchmarks rarely test a fundamen…
Scientific figures are among the most effective means of communicating complex research ideas, yet producing publication-quality illustrations remains one of the most labor-intensive parts of paper pr…
AI 点评 · 用AI多智能体协作生成可编辑科研图表,大幅降低论文配图制作门槛,提升科研效率。

As agent adoption scaled, we saw a common pattern emerge across enterprises, including our own sales organization: specialized agents deliver value, but without orchestration, user…
AI 点评 · 亚马逊用自家销售实战验证Agent编排的价值,为企业规模化部署AI代理提供可复用的参考。

As agent adoption scaled, we saw a common pattern emerge across enterprises, including our own sales organization: specialized agents deliver value, but without orchestration, user…
AI 点评 · 用Bedrock AgentCore编排多智能体协作,是企业规模化部署AI销售的关键突破。
AI 点评 · 多智能体协作自动化漏洞发现,大幅提升安全检测效率与覆盖范围。
AI 点评 · 多智能体协同自动化漏洞挖掘,显著提升安全检测效率与可复现性。
AI 点评 · 首个企业IT智能体基准测试揭示前沿模型能力不足,为AI落地关键场景提供重要参考。
AI 点评 · 首个企业IT代理基准测试揭示AI前沿模型表现不足,行业应用瓶颈突破需关注。

AI factories are token factories, converting power into intelligence in real time. And as agentic AI scales and autonomous, always-on special agents are deployed in the enterprise,…
AI 点评 · AI工厂将电力实时转化为智能,标志着智能基础设施革命的开端。
Robinhood is opening its trading platform to AI agents. In an announcement on Wednesday, Robinhood says traders can now create a separate account for an AI agent and add a specific…
AI 点评 · AI代理自主炒股开启散户新纪元,风险与机遇并存。
AI 点评 · AI代理的局限性暴露,暗示当前技术难以颠覆软件底层架构。
🪧 Claude Code / Codex skill — generate Xiaohongshu carousels & WeChat 21:9+1:1 cover pairs. Editorial × Swiss visual systems, 28 layouts, 10 themes, single-fil…
AI 点评 · 将小红书爆款排版与微信封面设计自动化,融合瑞士视觉系统,极大提升内容生产效率。
🪧 Claude Code / Codex skill — generate Xiaohongshu carousels & WeChat 21:9+1:1 cover pairs. Editorial × Swiss visual systems, 28 layouts, 10 themes, single-fil…
AI 点评 · 将小红书和微信封面设计自动化,融合瑞士视觉系统,极大提升内容生产效率。
Lightweight Python SDK for LLM inference logging and observability
AI 点评 · 轻量级LLM推理日志与可观测性工具,填补了模型监控领域的基础设施空白。
CLI harness for WPS Office -- let AI agents control Writer, Calc & Impress via COM automation
AI 点评 · 用命令行让AI自动操控WPS三大组件,打通办公软件自动化新路径。
See how OpenAI, Thrive, and Crete built a self-improving tax agent with Codex, automating filings, improving accuracy, and accelerating workflows.
AI 点评 · 用Codex构建自我进化的税务代理,展现AI在专业领域的自动化与精度突破。
See how OpenAI, Thrive, and Crete built a self-improving tax agent with Codex, automating filings, improving accuracy, and accelerating workflows.
AI 点评 · 利用Codex实现税务代理自我进化,自动化与准确性双提升,开辟AI落地新场景。
Your AI forgets. This remembers. Spec-driven coding harness for vibecoders, product owners, CEOs and real builders — self-improving context memory, 12 agents, 3…
AI 点评 · 用结构化记忆解决AI遗忘痛点,12个智能体协同,适合追求效率的开发者。
Your AI forgets. This remembers. Spec-driven coding harness for vibecoders, product owners, CEOs and real builders — self-improving context memory, 12 agents, 3…
AI 点评 · 用12个智能体构建自进化记忆系统,专为追求高效编码的实干者设计,重新定义AI协作体验。
Warp uses GPT-5.5 and OpenAI models to coordinate coding agents across local, cloud, and open-source development workflows.
AI 点评 · 开源协作与前沿模型结合,展现AI编程工具跨环境协调的新可能。
Warp uses GPT-5.5 and OpenAI models to coordinate coding agents across local, cloud, and open-source development workflows.
AI 点评 · Warp结合GPT-5.5与开源,探索跨环境编程新范式,值得关注。

The shift to agentic AI creates a new CPU requirement for the AI factory: fast cores, massive memory bandwidth and the ability to sustain high performance when all cores are active…
AI 点评 · NVIDIA新CPU针对AI工厂优化,性能强劲,或重塑AI计算格局。
Equipping large language models with explicit skills has emerged as a promising paradigm for enabling autonomous agents to solve complex tasks. Agent skills can be inherently divided into general skil…
AI 点评 · 聚焦大模型技能内化与利用,突破分布外泛化难题,为智能体强化学习开辟新路径。
Large language model (LLM)-based agents have shown strong capabilities in using external tools to solve complex tasks. However, existing evaluations often overlook the temporal dimension of tool use,…
AI 点评 · 评估多任务场景下异步函数调用能力,填补了时序维度空白,对真实应用更具参考价值。
Large language model (LLM) agents are increasingly used to assist with operations research (OR) modeling, yet existing OR-oriented benchmarks often reduce evaluation to one-shot translation from a sel…
AI 点评 · 首个覆盖工业优化全流程的智能体基准,填补了当前评估体系仅关注单次翻译的空白。
LLM agents increasingly act by writing code, yet a split persists between the runtime that drives the agent and the code the model writes. The runtime owns the loop, context, and control flow, and the…
AI 点评 · 用递归编程漏洞让AI代理安全可控,打破运行时与模型代码的割裂,设计思路新颖。
Despite the rapid progress of multimodal large language models in building Graphical User Interface (GUI) agents, their real-world task completion is fundamentally bottlenecked by a lack of world know…
AI 点评 · 因果内化与密度采样策略,突破GUI智能体真实任务瓶颈,值得关注。
If an AI agent makes decisions on a person's behalf, those decisions must align with its user. We introduce representational accuracy to measure how faithfully a system captures a person's interpretat…
AI 点评 · 用行为规范量化AI对用户意图的理解精度,为人机对齐提供可操作评估标准。
As agent capabilities advance, existing benchmarks, such as τ^2-Bench, are becoming increasingly saturated. Yet constructing new benchmark tasks remains complex, costly, and labor-intensive. Moreover,…
AI 点评 · 通过自动化生成更难更全的基准任务,突破现有评测瓶颈,为智能体能力评估提供新思路。

"BadHost" was found in Starlette, a package with 325 million weekly downloads.
AI 点评 · 开源框架漏洞威胁数百万AI代理,揭示AI供应链安全重大隐患。

"BadHost" was found in Starlette, a package with 325 million weekly downloads.
AI 点评 · 开源包漏洞威胁数百万AI代理,用户需立即修复防范数据泄露。
Continuum — the agent runtime by ShyftLabs. Build, orchestrate, ship.
AI 点评 · 专为智能体构建打造的运行时,简化部署与编排流程。
Continuum — the agent runtime by ShyftLabs. Build, orchestrate, ship.
AI 点评 · ShyftLabs推出智能体运行时,简化构建到部署全流程,值得开发者关注。

Amazon Bedrock AgentCore payments is now available in preview, it provides instant payments to paid external services with no manual billing setup per provider, stablecoin support…
AI 点评 · 亚马逊AgentCore支付预览版降低AI代理接入付费服务的门槛,简化结算流程。

Amazon Bedrock AgentCore payments is now available in preview, it provides instant payments to paid external services with no manual billing setup per provider, stablecoin support…

A creepy saved post on Instagram linked man to AI porn account, FBI says.
AI 点评 · 揭露AI生成色情内容背后,身份追踪技术竟如此简单,隐私保护警钟敲响。

A creepy saved post on Instagram linked man to AI porn account, FBI says.
AI 点评 · 揭露AI生成色情内容背后,FBI指出识别匿名发布者竟如此简单,凸显隐私与监管挑战。

In this post, we provide a solution to build highly scalable, serverless multi-agent generative AI systems on AWS using LangGraph Agents as orchestrators integrated with Amazon Bed…
AI 点评 · 用LangGraph编排多智能体,在Bedrock上实现无服务器扩展,大幅降低AI系统部署门槛。

In this post, we provide a solution to build highly scalable, serverless multi-agent generative AI systems on AWS using LangGraph Agents as orchestrators integrated with Amazon Bed…

In this post you'll learn how to build a multi-agent campaign review system that demonstrates parallel reasoning, context persistence, and traceable execution paths using an integr…
AI 点评 · NVIDIA与亚马逊联手,展示多智能体系统并行推理与可追溯执行,为生成式AI落地提供可借鉴的高性能架

In this post you'll learn how to build a multi-agent campaign review system that demonstrates parallel reasoning, context persistence, and traceable execution paths using an integr…

In this post, we demonstrate the capabilities of AgentWatch through practical implementation. You will see how the solution performs infrastructure checks every 15 minutes, summari…
AI 点评 · 用环境智能体实现主动监控,展示了AI运维从被动告警到主动巡检的实用转型。

In this post, we demonstrate the capabilities of AgentWatch through practical implementation. You will see how the solution performs infrastructure checks every 15 minutes, summari…
Microsoft Copilot Cowork Exfiltrates Files The biggest challenge in designing agentic systems continues to be preventing them from enabling attackers to exfiltrate data. In this ca…
AI 点评 · 微软AI助手暴露数据安全漏洞,警示企业需重视智能体系统防护。
Microsoft Copilot Cowork Exfiltrates Files The biggest challenge in designing agentic systems continues to be preventing them from enabling attackers to exfiltrate data. In this ca…
AI 点评 · 揭示AI安全短板:Copilot被利用外泄文件,警示企业需警惕智能助手的数据防护漏洞。
Amid rapidly growing adoption of enterprise-level AI agents, there’s a disconnect emerging between ambition and execution. Although 85% of organizations say they want to be agentic…
AI 点评 · 企业级AI代理快速落地,组织设计必须同步进化,否则战略与执行脱节。
Amid rapidly growing adoption of enterprise-level AI agents, there’s a disconnect emerging between ambition and execution. Although 85% of organizations say they want to be agentic…
AI 点评 · 企业级AI代理快速增长,组织架构面临颠覆性变革,平衡雄心与执行是关键看点。
工程化 RAG 文档助手:知识库、PDF 索引、Agent 工具编排、scope 检索、引用溯源与拒答阈值。FastAPI + Vue3
AI 点评 · 企业级RAG落地范本,从检索拒答到工具编排的完整工程化实践。
LLM safety evaluations predominantly test models in isolation, yet deployed AI agents increasingly operate within persistent social environments alongside other agents. We introduce a Moltbook-style s…
AI 点评 · 多智能体协作时,LLM隐私保护能力堪忧,揭示AI安全评估新盲区。
Recent advances in multimodal web agents often rely on increased inference-time computation, including rollout search, verifier passes, offline skill discovery, and specialist model stacks. This raise…
AI 点评 · 通过在线技能蒸馏,大幅提升多模态AI代理效率,减少推理计算成本,极具实用创新价值。
Recent advances in multimodal web agents often rely on increased inference-time computation, including rollout search, verifier passes, offline skill discovery, and specialist model stacks. This raise…
ADHD — a skill for coding agents. Tree-of-thought with pruning, built on the Claude & Codex Agent SDK. Fans out parallel divergent thoughts under different cogn…
AI 点评 · 用树状思维结合剪枝策略,让AI编码代理更接近人类认知模式,提升复杂任务处理效率。
ADHD — a skill for coding agents. Tree-of-thought with pruning, built on the Claude & Codex Agent SDK. Fans out parallel divergent thoughts under different cogn…
AI 点评 · 用树状思维加剪枝策略,让编码代理模拟多动症思考,提升复杂问题解决效率。
Local-first persistent memory for AI coding agents (Claude Code, Cursor, Codex) via MCP. 94.5% LoCoMo recall@10, 70ms p50, multilingual, zero API keys.
AI 点评 · 为AI编程助手提供本地持久记忆,高召回低延迟,无需API密钥即可实现多语言支持。
Local-first persistent memory for AI coding agents (Claude Code, Cursor, Codex) via MCP. 94.5% LoCoMo recall@10, 70ms p50, multilingual, zero API keys.
AI 点评 · 本地优先持久记忆方案,大幅提升AI编码代理效率,无需API密钥,性能指标出色。
AI 点评 · AI代理术语体系关键,厘清概念才能把握技术演进核心。
AI 点评 · 剖析AI智能体关键术语,助你避开行业认知陷阱。
Release: datasette-agent 0.1a4 Taking advantage of the new makeJumpSections() JavaScript plugin hook added in Datasette 1.0a30 , datasette-agent now presents this "Start a new agen…
AI 点评 · 轻量级AI工具迭代快,新版本利用Datasette新插件钩子,提升Agent启动体验。
This paper studies the next major bottleneck in agentic AI as system scaling, not only model scaling: the design of auditable, persistent, modular, and verifiable architectures around foundation model…
AI 点评 · 大模型后端代码生成能力脆弱,揭示智能体在复杂约束下的稳定性短板。
Enterprise-ready CI/CD reference for Microsoft Foundry AI agents, with parallel GitHub Actions and Azure DevOps pipelines, evaluation-driven quality gates, and…
AI 点评 · 企业级AI代理CI/CD参考实现,提升部署效率与质量管控。
Enterprise-ready CI/CD reference for Microsoft Foundry AI agents, with parallel GitHub Actions and Azure DevOps pipelines, evaluation-driven quality gates, and…
AI 点评 · 企业级AI代理的CI/CD参考方案,实现并行流水线与质量门控,提升部署效率与可靠性。
OpenAI is named a leader in the 2026 Gartner Magic Quadrant for Enterprise AI Coding Agents, with Codex recognized for innovation and enterprise-scale deployment.
AI 点评 · Gartner权威认证,OpenAI在AI编程代理领域的技术领先性获行业标杆认可。
OpenAI is named a leader in the 2026 Gartner Magic Quadrant for Enterprise AI Coding Agents, with Codex recognized for innovation and enterprise-scale deployment.
AI 点评 · Gartner权威认证,OpenAI编码智能体在创新与规模化部署上领先行业。

In this post, you will learn what Nova Act offers, how HIPAA eligibility applies to agentic AI, and how to get started.
AI 点评 · 亚马逊Nova Act获HIPAA认证,医疗AI代理合规门槛突破,商业化落地提速。
Multi-agent LLM workflows route inference through specialized roles to lift end-task accuracy, but jointly training those roles with reinforcement learning is unstable in ways that are poorly understo…
AI 点评 · 多智能体强化学习提升大模型协作效率的关键在于理解分工规模与策略共享的权衡。
MagenticLite is an agentic system for small models that works across the browser and local file system in a single workflow. It combines specialized models and orchestration to sup…
AI 点评 · 轻量级智能体系统专为小模型优化,降低门槛,让更多开发者体验自动化流程。
MagenticLite is an agentic system for small models that works across the browser and local file system in a single workflow. It combines specialized models and orchestration to sup…
AI 点评 · 轻量级模型也能实现智能代理交互,打破大模型独占优势,降低应用门槛。
Hey HN, We're Gus and Carlos from Runtime ( https://runtm.com ). We're building infra that lets your whole team (including non-engineers) ship with Claude Code, Codex, and other ag…
AI 点评 · 让非工程师也能安全使用AI编码代理,大幅降低团队协作门槛。
Hey HN, We're Gus and Carlos from Runtime ( https://runtm.com ). We're building infra that lets your whole team (including non-engineers) ship with Claude Code, Codex, and other ag…
AutoScientists: Self-Organizing Agent Teams for Long-Running Scientific Experimentation
AI 点评 · 自动化科研团队实现长期实验,AI自主协作迈入新阶段,科学发现效率有望大幅提升。
AutoScientists: Self-Organizing Agent Teams for Long-Running Scientific Experimentation
AI 点评 · 自组织AI团队实现长期科学实验,推动自动化研究范式突破。
Reverse-engineered Doubao (豆包) API → OpenAI-compatible REST service. Free multimodal chat, image/video/music generation, and file hosting for AI agents.
AI 点评 · 逆向工程豆包API,提供免费多模态服务,极大降低AI应用开发门槛。
Reverse-engineered Doubao (豆包) API → OpenAI-compatible REST service. Free multimodal chat, image/video/music generation, and file hosting for AI agents.
AI 点评 · 逆向工程将豆包API转为OpenAI兼容接口,免费提供多模态功能,大幅降低AI开发门槛。
Hermes Agent CN desktop app, Windows-First, built with Tauri, Typescript and Rust. Isolated Hermes Agent core insides.
AI 点评 · 用Tauri和Rust构建的Windows桌面应用,实现核心隔离,技术选型值得开发者关注。
🧠 Hybrid long-term memory plugin for OpenClaw agents — SQLite+FTS5 for structured facts, LanceDB for semantic recall
AI 点评 · 将结构化与语义记忆结合,为智能体提供更精准、持久的混合记忆解决方案。
🧠 Hybrid long-term memory plugin for OpenClaw agents — SQLite+FTS5 for structured facts, LanceDB for semantic recall
AI 点评 · 结合SQLite与向量数据库,为AI代理提供结构化事实与语义回忆的双重记忆支持。
中文教育 Agent Skill Pack:教材同步、备考复习、拍照答疑、错题复盘、亲子陪学、阅读写作和教师工具,Hermes Agent 可直接使用,也可导出到 OpenClaw/Codex/Cursor/Claude Code。
AI 点评 · 开源中文教育Agent工具包,填补垂直领域空白,可直接对接多个主流AI平台,实用性强。
中文教育 Agent Skill Pack:教材同步、备考复习、拍照答疑、错题复盘、亲子陪学、阅读写作和教师工具,Hermes Agent 可直接使用,也可导出到 OpenClaw/Codex/Cursor/Claude Code。
国内首个企业级 IT 运维多 Agent 自动化平台 — 基于大语言模型的智能运维解决方案。ITOps Agent Platform 是一个企业级全栈运维自动化平台,通过可视化工作流编排,将多个AI Agent组合成智能运维自动化流水线,实现服务器管理、告警处理、故障诊断、日志分析、脚本管理、定时运维任务的自动化执行,…
AI 点评 · 国内首个企业级IT运维多Agent平台,实现AI驱动的自动化运维流水线,提升故障处理效率。

Google's AI search evolution is accelerating at I/O 2026.
AI 点评 · 谷歌计划2026年用代理式AI重塑搜索,标志着AI从工具进化为主动服务者。

Google's AI search evolution is accelerating at I/O 2026.
🔍 The hardest search benchmark in the wild — vague, multi-turn, proactive. 200 long-horizon tasks with persona-driven progressive disclosure, scored by verifia…
AI 点评 · 首个模糊多轮搜索基准,考验AI主动追问能力,填补了复杂意图检索评估的空白。

Google says its more efficient Gemini 3.5 Flash is the key to your agentic AI future.
AI 点评 · 低延迟推理突破使生成式AI实时应用成为可能,加速智能体落地。

Google says its more efficient Gemini 3.5 Flash is the key to your agentic AI future.

The latest from Google I/O: See how we’re helping you get more done with Gemini.
AI 点评 · 谷歌I/O展示Gemini代理能力,标志AI从工具向自主行动者演进的关键转折。

The latest from Google I/O: See how we’re helping you get more done with Gemini.
AI 点评 · 谷歌发布Agentic Gemini,标志AI从工具向自主行动者进化,定义人机协作新范式。
Hi HN, I'm Antoine Zambelli, AI Director at Texas Instruments. I built Forge, an open-source reliability layer for self-hosted LLM tool-calling. What it does: - Adds domain-and-too…
AI 点评 · 开源方案将8B模型代理任务准确率从53%提升至99%,展示了轻量级防护机制的高效性。
Hi HN, I'm Antoine Zambelli, AI Director at Texas Instruments. I built Forge, an open-source reliability layer for self-hosted LLM tool-calling. What it does: - Adds domain-and-too…
AI 点评 · 开源工具让8B模型在智能体任务中准确率从53%飙升至99%,大幅降低企业部署门槛。
A comprehensive interview preparation guide covering all major RAG (Retrieval-Augmented Generation) architectures. 50 questions across 10 types, from Naive RAG…
AI 点评 · 面试RAG架构必读,50题覆盖10种类型,系统掌握检索增强生成核心技术。
Memory that follows you across every AI tool. No cloud storage. No account required. Set it up once, use it everywhere.
AI 点评 · 打破AI工具记忆孤岛,无需云存储和账户,一次设置即可跨平台复用记忆。
Memory that follows you across every AI tool. No cloud storage. No account required. Set it up once, use it everywhere.
AI 点评 · 打破工具壁垒的本地记忆系统,让AI实现跨平台无缝复用。

Agentic AI inference at one-tenth the cost per token with NVIDIA Vera Rubin NVL72. Agent sandboxes run 50% faster on NVIDIA Vera than traditional CPUs — while enterprise data queri…
AI 点评 · 黄仁勋亲证AI需求呈抛物线式暴增,NVIDIA新架构成本骤降十倍,产业风向标意义重大。

The first NVIDIA Vera CPUs arrived at three of the world's leading AI labs on Friday — Anthropic in San Francisco, OpenAI in Mission Bay, SpaceXAI in Palo Alto — followed by a deli…
AI 点评 · 英伟达首款CPU专为AI代理设计,直供顶级实验室,或改写智能算力格局。
Hi HN, I'm Hang, cofounder of InsForge (YC P26). InsForge is an open-source Heroku for AI coding agents: a backend platform designed for coding agents to deploy, operate, and debug…
AI 点评 · 开源AI部署平台填补市场空白,让编码代理拥有类似Heroku的自动化运维能力,降低开发门槛。
Hi HN, I'm Hang, cofounder of InsForge (YC P26). InsForge is an open-source Heroku for AI coding agents: a backend platform designed for coding agents to deploy, operate, and debug…
AI 点评 · 开源首个面向AI编码代理的Heroku式平台,填补了代理部署与调试的空白,值得开发者关注。
OpenAI and Dell partner to bring Codex to hybrid and on-premise environments, helping enterprises deploy AI coding agents securely across data and workflows.
AI 点评 · OpenAI与戴尔联手,让企业本地部署AI编程助手,兼顾数据安全与效率提升。
Hey HN! We (Stephan and Thomas) recently open-sourced Semble. We kept running into the same problem while using Claude Code on large codebases: when the agent can't find something…
Hey HN! We (Stephan and Thomas) recently open-sourced Semble. We kept running into the same problem while using Claude Code on large codebases: when the agent can't find something…
macOS computer-use agent in the notch. Long-press, talk, Claude drives the mouse.
AI 点评 · 将AI代理嵌入Mac刘海区域,长按语音操控,让Claude直接控制鼠标,交互方式极具创新性。
macOS computer-use agent in the notch. Long-press, talk, Claude drives the mouse.
AI 点评 · 把AI代理嵌入Mac刘海区域,长按语音操控鼠标,交互方式创新且实用。
Provider-neutral Agent Skill for Codex, Claude Code, and agentic harness design.
AI 点评 · 统一多平台智能体技能标准,降低开发门槛,推动AI代理工具生态互通。
Provider-neutral Agent Skill for Codex, Claude Code, and agentic harness design.
AI 点评 · 通用Agent技能框架,适用于多种主流AI编码工具,提升开发效率与互操作性。
A production-ready toolkit to accelerate and automate the end-to-end lifecycle of AI Agent development.
AI 点评 · 一站式AI Agent开发工具,降低企业自动化部署门槛,加速行业落地。
A production-ready toolkit to accelerate and automate the end-to-end lifecycle of AI Agent development.
AI 点评 · 助力企业快速部署AI代理,填补了开发到生产的工具链空白。
Personal-Model First Self Evolving AI Agent 🐘
AI 点评 · 个人模型驱动的自进化AI代理,开创了代理自主迭代的新范式。
Personal-Model First Self Evolving AI Agent 🐘
Most AI agents forget you the moment the tab closes. Constellation Engine gives them a hippocampus — a living star map with spreading activation, Hebbian writeb…
MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research · 浏览器里运行的安卓模拟器 · Browser-hosted Android Simulator · Verifiable Eva…
AI 点评 · 为移动端GUI智能体研究提供可验证的高效并行模拟环境,浏览器运行降低门槛。
MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research · 浏览器里运行的安卓模拟器 · Browser-hosted Android Simulator · Verifiable Eva…
AI 点评 · 移动端GUI智能体研究提速,浏览器运行安卓模拟器实现可验证并行测试。

Reinforcement-learning agents — AI systems that learn by trial and error — can convert computation into new knowledge. That’s the focus of a new engineering-level collaboration bet…
AI 点评 · 两大AI巨头联手,强化学习基础设施将迎来工程级突破,加速智能体从试错中学习的能力。

Agentic AI is changing the way users get work done. Following the success of OpenClaw, the community is embracing new open source agentic frameworks. The latest is Hermes Agent, wh…
Minimalistic coding agent written in Rust, optimized for memory footprint and performance
AI 点评 · 零开销AI代理框架,Rust实现兼顾极致性能与低内存消耗,开发者效率新标杆。
Minimal coding agent written in Rust, optimized for memory footprint and performance
AI 点评 · 用Rust打造极简编码代理,专注内存优化与性能,为轻量化AI工具开辟新路径。
Hey HN, we're Alex and Tyler, co-founders of Voker.ai ( https://voker.ai/ ), an agent analytics platform for AI product teams. Voker gives full visibility into what users are askin…
AI 点评 · 聚焦AI Agent用户行为分析,填补了AI产品团队数据洞察的空白。
Introducing Co-Scientist, a collaborative AI partner built with Gemini to help researchers accelerate scientific breakthroughs.
AI 点评 · 多智能体协作突破科研瓶颈,AI从工具升级为研究伙伴。
Introducing Co-Scientist, a collaborative AI partner built with Gemini to help researchers accelerate scientific breakthroughs.
AI 点评 · 多智能体协作模式,有望大幅缩短科研周期,是AI赋能基础科学的里程碑。
Fast, AI-agent-native code search in Rust — hybrid BM25 + semantic, Tree-sitter AST chunking, dependency & impact analysis. Drop-in replacement for grep/cat/rea…
AI 点评 · Rust实现的AI原生代码搜索工具,结合BM25与语义搜索,性能远超传统grep。
Fast, AI-agent-native code search in Rust — hybrid BM25 + semantic, Tree-sitter AST chunking, dependency & impact analysis. Drop-in replacement for grep/cat/rea…
AI 点评 · 用Rust实现的高性能AI原生代码搜索,结合混合检索与依赖分析,有望替代传统工具。
AZMX AI — The sovereign agent platform.
AI 点评 · 主权级AI代理平台,开创去中心化智能体新范式。
AZMX AI — The sovereign agent platform.
AI 点评 · 自主智能体平台崛起,标志AI从工具向独立行动者进化,值得关注。
Runtime security monitoring and control for AI agents. Catches malicious tool use, prompt injection, and policy drift in real time, before the agent acts.
Local-first RAG and agent skills framework for source-traceable agent memory.
AI 点评 · 用本地优先的RAG框架实现代理记忆可溯源,为可信AI应用开发提供新路径。
Local-first RAG and agent skills framework for source-traceable agent memory.
AI 点评 · 本地优先架构让RAG技能框架实现源头可追溯,为AI代理记忆管理提供新范式。

Using SocialReasoning Bench, we observed a stable pattern across models—agents execute competently, but fail to consistently improve the user’s position, even with explicit instruc…
AI 点评 · AI代理执行任务高效却忽视用户利益,暴露其社会推理能力的核心短板。

Using SocialReasoning Bench, we observed a stable pattern across models—agents execute competently, but fail to consistently improve the user’s position, even with explicit instruc…
AI 点评 · 衡量AI能否真正维护用户利益,揭示出能力与意图间的关键差距。
AI equity research agent with resilient workflows, Redis Lua single-flight, pgvector RAG, versioned reports, evidence tracing, and RAG evaluation.
AI 点评 · 融合弹性工作流与向量检索,实现金融研究全流程可追溯,技术架构值得借鉴。
AI equity research agent with resilient workflows, Redis Lua single-flight, pgvector RAG, versioned reports, evidence tracing, and RAG evaluation.
AI 点评 · 高效AI投研工具,结合弹性工作流与证据溯源,提升研报可信度。
✨ The agentic HTML editor — your local AI agent writes the HTML, you ship it. 🚀 75 Skills × 9 Surfaces (magazine · deck · poster · XHS / tweet · prototype · da…
AI 点评 · AI本地生成HTML,覆盖75种技能9种场景,将编辑效率推向新高度。
✨ The agentic HTML editor — your local AI agent writes the HTML, you ship it. 🚀 75 Skills × 9 Surfaces (magazine · deck · poster · XHS / tweet · prototype · da…
AI 点评 · 本地AI代理直接生成可交付的HTML,覆盖多种设计场景,大幅降低前端开发门槛。
ktx is an executable context layer for data and analytics agents 🐙 Allow Claude Code, Codex, and any AI agent to query data accurately through MCP with skills,…
ktx is an executable context layer for data and analytics agents 🐙 Allow Claude Code, Codex, and any AI agent to query data accurately through MCP with skills,…
AI 点评 · 用章鱼触手般的MCP连接,让AI代理精准查询数据,降低分析门槛。
ktx is an executable context layer for data and analytics agents 🐙 Allow Claude Code, Codex, and any AI agent to query data accurately through MCP with skills,…
ktx is an executable context layer for data and analytics agents 🐙 Allow Claude Code, Codex, and any AI agent to query data accurately through MCP with skills,…
AI 点评 · 用MCP技能层让AI代理精准查询数据,打通代码与分析的执行壁垒。
A curated list of tools, libraries, MCP servers, and frameworks that power AI coding agents.
AI 点评 · 盘点AI编程代理全生态工具链,开发者和研究者必备的实用资源导航。
A curated list of tools, libraries, MCP servers, and frameworks that power AI coding agents.
AI 点评 · 资源聚合清单,帮你快速找到提升AI编程效率的利器。
OpenSeek - 广度求索: open-source TUI coding agent with multi-provider routing, MCP, LSP, and Plan/Agent/YOLO modes.
AI 点评 · 开源TUI编程智能体,集成多模型路由与MCP协议,创新工作模式值得关注。
OpenSeek - 广度求索: open-source TUI coding agent with multi-provider routing, MCP, LSP, and Plan/Agent/YOLO modes.
GEO 领域 AI 员工开源方案 · Open-source GEO AI-employee solution (MIT). GEO Skills package + curated lists of agents and office CLIs that make up the AI-employee stack.
AI 点评 · 开源GEO领域AI员工方案,提供完整技能包与工具链,降低企业部署门槛。
GEO 领域 AI 员工开源方案 · Open-source GEO AI-employee solution (MIT). GEO Skills package + curated lists of agents and office CLIs that make up the AI-employee stack.
AI 点评 · 开源AI员工方案聚焦GEO领域,降低企业部署专业智能助手的门槛。
Break your AI before they do.
AI 点评 · 用对抗攻击测试AI的脆弱性,安全评估工具成刚需。
Break your AI before they do.
AI 点评 · 红队测试工具,主动发现AI模型安全漏洞,强化防御。
Reimplement GitHub for Agents.
AI 点评 · 用Go重构Git服务,为AI代理提供专属代码协作基础设施。
Reimplement GitHub for Agents.
AI 点评 · 用Rust重写GitHub服务,专为AI代理设计,或开启自动化协作新范式。
Production LLM call layer for AI agents and tools: keep OpenAI/Anthropic/AI SDK/LiteLLM, hot-swap models with MDA presets, and add cache, retries, circuit break…
AI 点评 · 为AI代理打造统一调用层,支持模型热切换与故障容错,显著提升生产环境稳定性。
Production LLM call layer for AI agents and tools: keep OpenAI/Anthropic/AI SDK/LiteLLM, hot-swap models with MDA presets, and add cache, retries, circuit break…
AI 点评 · 统一多模型调用层,提升AI代理的稳定性和灵活性,降低开发成本。
半人马环 Centaur Loop:面向 AI Agent 反馈闭环、人类治理和记忆复盘的开源工作台 / Human-governed AI feedback loop workbench.
AI 点评 · 开源AI治理工作台,首次将人类反馈闭环与记忆复盘功能整合,填补了智能体长期可控交互的缺口。
半人马环 Centaur Loop:面向 AI Agent 反馈闭环、人类治理和记忆复盘的开源工作台 / Human-governed AI feedback loop workbench.
AI 点评 · 开源AI Agent工作台,打通人类治理与反馈闭环,助力记忆复盘,实用价值高。
半人马环 Centaur Loop:AI 员工的最小工作单元框架。把复杂岗位拆解为可由 AI 接管、由人类治理、由反馈和记忆持续进化的循环工作流 / The smallest work unit for building AI employees.
AI 点评 · 开源AI治理工具,填补了Agent闭环管理的空白,兼顾人类监督与记忆复盘,实用性很强。
Track token usage across local AI agents (Claude Code, Codex) — Custom StatusLine, CLI Dashboard with cost analysis, rate limit monitoring, and session tracking
AI 点评 · 一个轻量级工具,解决本地AI代理的token消耗追踪痛点,兼顾成本与速率监控。
Track token usage across local AI agents (Claude Code, Codex) — Custom StatusLine, CLI Dashboard with cost analysis, rate limit monitoring, and session tracking
AI 点评 · 集成多款AI工具令牌监控,实时成本分析和速率限制追踪,提升本地代理管理效率。
A unified storage SDK for object and blob backends. One small, honest API. Web-standards I/O.
AI 点评 · 统一对象存储接口,简化多后端切换,提升开发效率,是云原生的实用工具。
A unified storage SDK for object and blob backends. One small, honest API. Web-standards I/O.
AI 点评 · 统一存储SDK实现对象与二进制后端兼容,简化开发流程,值得关注。
Open CLI for integrating AI search, recommendation, and conversational retrieval into agent systems and business systems
AI 点评 · 将AI搜索、推荐与对话检索整合进系统,极大简化了开发流程。
Open CLI for integrating AI search, recommendation, and conversational retrieval into agent systems and business systems
AI 点评 · 用命令行整合AI搜索与推荐,降低智能系统集成门槛,提升开发效率。
Multi-agent orchestration & workflow engine. Declarative YAML workflows, LLM coordinator with hub-and-spoke mailboxes, race-safe delivery. One YAML file, one Go…
AI 点评 · 用声明式YAML编排多智能体工作流,LLM协调器保障消息可靠投递,降低开发门槛。
Multi-agent orchestration & workflow engine. Declarative YAML workflows, LLM coordinator with hub-and-spoke mailboxes, race-safe delivery. One YAML file, one Go…
AI 点评 · 用声明式YAML编排多智能体工作流,结合LLM协调与安全投递,降低开发门槛。
Open-source desktop AI agent workspace with one-click Claude Code, Codex, OpenClaw, Hermes Agent setup and custom LLM model routing.
AI 点评 · 开源桌面AI工作区整合多模型一键部署,降低智能体开发门槛,推动个性化工具构建。
OpenSquilla — Token-Efficient AI Agent with same budget, higher intelligence density
OpenSquilla — Token-Efficient AI Agent with same budget, higher intelligence density
AI 点评 · 用更少token实现更高智能密度,开源AI Agent效率突破值得关注。
AI-powered OSINT agent with interactive REPL, MCP server, and CLI. 9 tools. Works with Claude, GPT-4, or local models. For authorized security research only.
AI-powered OSINT agent with interactive REPL, MCP server, and CLI. 9 tools. Works with Claude, GPT-4, or local models. For authorized security research only.
AI 点评 · 开源AI驱动的OSINT工具,整合交互式命令行与多模型支持,为安全研究提供高效情报分析能力。
Explore how AlphaEvolve's Gemini-powered algorithms are driving impact across business, infrastructure, and science.
Explore how AlphaEvolve's Gemini-powered algorithms are driving impact across business, infrastructure, and science.
AI 点评 · Gemini驱动代码智能体跨领域落地,标志AI从实验室走向产业规模化应用。
Agent memory for LLMs: 30 runnable Jupyter notebooks covering conversation buffers, vector stores, knowledge graphs, episodic and semantic memory, MemGPT, Mem0,…
Agent memory for LLMs: 30 runnable Jupyter notebooks covering conversation buffers, vector stores, knowledge graphs, episodic and semantic memory, MemGPT, Mem0,…
AI 点评 · 30个可运行笔记系统梳理LLM记忆机制,实操价值高,覆盖从基础到前沿。
Assistant Professor Gabriele Farina mines the foundations of decision-making in complex multi-agent scenarios.
Assistant Professor Gabriele Farina mines the foundations of decision-making in complex multi-agent scenarios.
AI 点评 · 从博弈论与多智能体决策切入,揭示机器战略推理突破,推动通用AI能力边界。
Agyn is an open-source Kubernetes-native runtime that moves AI agents like Claude Code and Codex from laptops to company infrastructure with the controls enterp…
Agyn is an open-source Kubernetes-native runtime that moves AI agents like Claude Code and Codex from laptops to company infrastructure with the controls enterp…
AI 点评 · 开源Kubernetes原生方案,让企业安全托管AI代理,填补了从个人工具到平台级部署的空白。
自动化专利侵权竞品分析系统 —— 输入专利公开号,1 小时产出律师可复核的 claim chart 报告(逐特征对比 + 证据URL + 下一步建议);同时打包成 skill,可被任意 agent 调用。
自动化专利侵权竞品分析系统 —— 输入专利公开号,1 小时产出律师可复核的 claim chart 报告(逐特征对比 + 证据URL + 下一步建议);同时打包成 skill,可被任意 agent 调用。
AI 点评 · 专利侵权分析自动化,律师级报告1小时生成,大幅提升IP尽调效率。
Agent skill for turning AI images and videos into playable game art assets
Agent skill for turning AI images and videos into playable game art assets
AI 点评 · 聚焦AI图像转游戏资产的自动化流程,大幅降低游戏开发门槛。
Autonomous self-evolving agents. Vision-grounded layered memory and self-written skills for LLM agents that operate your computer.
Autonomous self-evolving agents. Vision-grounded layered memory and self-written skills for LLM agents that operate your computer.
AI 点评 · 自主进化代理结合视觉记忆,让AI真正学会操作电脑,突破传统指令限制。
🔍 OpenSearch-VL provides a fully open recipe for training strong multimodal deep search agents through high-quality data curation, diverse visual/search tools,…
🔍 OpenSearch-VL provides a fully open recipe for training strong multimodal deep search agents through high-quality data curation, diverse visual/search tools,…
Evidence-driven skill evolution for Hermes Agent — reports, dry-run proposals, candidate search, and guarded apply
Evidence-driven skill evolution for Hermes Agent — reports, dry-run proposals, candidate search, and guarded apply
A roadmap to AI Engineering excellence: Masterclass in Generative AI, RAG, and Agentic Systems with a focus on scalable and production-ready architectures. 🚀🤖
A proactive AI agent for secure, traceable, human-in-the-loop task execution over long-running workflows.
Shared memory + orchestration for your coding agents — one MCP server, persistent vector memory, agent registry
Give your coding agent the power to write and run agent evals.
Desktop automation CLI for AI agents. Fast native Rust CLI.
Curated index of 200+ AI tools, one writeup per tool with hands-on takes. Covers coding, design, research, video, voice, agents, music, local LLMs. Compare alte…
Version-Control for AI coding agents.
Self-hosted AI agent OS — streaming chat, tool use, persistent memory, and multi-agent teams. Runs entirely on your machine.
Knowhere extracts, parses, and outputs structured chunks ready for AI Agents and RAG.
AI 点评 · 英伟达新模型统一处理文档、音频、视频,突破长上下文多模态智能,将驱动下一代AI Agent应用。

Google is bringing back its 5-Day AI Agents Intensive Course with Kaggle and registration is open.
Scored 65.2% vs google's official 47.8%, and the existing top closed source model Junie CLI's 64.3%. Since there are a lot of reports of deliberate cheating on TerminalBench 2.0 la…
Scored 65.2% vs google's official 47.8%, and the existing top closed source model Junie CLI's 64.3%. Since there are a lot of reports of deliberate cheating on TerminalBench 2.0 la…
AI 点评 · 开源智能体首次超越谷歌官方和闭源竞品,验证了社区驱动的创新潜力。
Hey HN! Today we're launching Agent Vault - an open source HTTP credential proxy and vault for AI agents. Repo is at https://github.com/Infisical/agent-vault , and there's an in-de…
Hey HN! Today we're launching Agent Vault - an open source HTTP credential proxy and vault for AI agents. Repo is at https://github.com/Infisical/agent-vault , and there's an in-de…
Hey HN! Today we're launching Agent Vault - an open source HTTP credential proxy and vault for AI agents. Repo is at https://github.com/Infisical/agent-vault , and there's an in-de…
AI 点评 · 开源凭证代理填补了AI代理安全管理的空白,直接解决身份验证痛点。

How coding agents use tools, memory, and repo context to make LLMs work better in practice

How coding agents use tools, memory, and repo context to make LLMs work better in practice
AI 点评 · 拆解编码代理三大核心模块,为LLM落地提供实用框架。

The artificial intelligence coding revolution comes with a catch: it's expensive. Claude Code , Anthropic's terminal-based AI agent that can write, debug, and deploy code autonomou…
AI 点评 · Claude Code收费昂贵,Goose免费替代,AI编程工具价格战打响。

Salesforce on Tuesday launched an entirely rebuilt version of Slackbot , the company's workplace assistant, transforming it from a simple notification tool into what executives des…
AI 点评 · Salesforce升级Slackbot为AI智能体,加剧与微软、谷歌的企业AI竞争。

Anthropic released Cowork on Monday, a new AI agent capability that extends the power of its wildly successful Claude Code tool to non-technical users — and according to company in…
AI 点评 · 面向非技术人员的AI代理工具,降低编程门槛,拓展办公自动化应用场景。
Reward hacking occurs when a reinforcement learning (RL) agent exploits flaws or ambiguities in the reward function to achieve high rewards, without genuinely learning or completin…
AI 点评 · 强化学习易钻空子,揭示AI安全核心挑战,关乎真实任务可靠性。
Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT , GPT-Engineer and BabyAGI , serve as ins…
AI 点评 · 揭示大语言模型作为核心控制器,推动自主智能体从概念走向实用化,标志AI应用新里程碑。