MarkTechPost AIAI Agent

認識 OpenJarvis：具備工具、記憶與學習能力的本地優先裝置端個人 AI 代理框架

2026年6月4日 06:23

重點摘要

史丹佛大學與 Lambda Labs 的研究人員發表了 OpenJarvis 的研究論文，這是一個開放原始碼框架，能將推論、代理、記憶與學習完全在裝置端執行。透過 OpenJarvis 配置的開放權重模型，平均表現僅落後最佳雲端模型 3.2 個百分點，且根據研究基準測試，每次查詢的邊際 API 成本約降低 800 倍，延遲約降低 4 倍。此研究奠基於團隊先前的《每瓦智慧》研究，該研究指出本地模型在互動延遲下已能處理 88.7% 的單輪對話與推理查詢，且智慧效率從 2023 到 2025 年提升了 5.3 倍。模型概述與存取：OpenJarvis 並非單一模型，而是一個整合...的框架。

站內 AI 整理稿

Researchers at Stanford University and Lambda Labs, have published the research paper for OpenJarvis, an open-source framework that runs inference, agents, memory, and learning entirely on-device. The open-weight models configured through OpenJarvis land within 3.2 percentage points of the best cloud model on average, at roughly 800× lower marginal API cost per query and roughly 4× lower latency under the research’s benchmark protocol. This research work builds on the research team’s earlier Intelligence Per Watt study, which reported that local models already handle 88.7% of single-turn chat and reasoning queries at interactive latency, with intelligence efficiency improving 5.3× from 2023 to 2025. Model Overview & Access OpenJarvis is not a single model. It is a framework that composes any supported model with a configurable agent stack, evaluated across 11 local models from four families. PropertyValueLicenseApache 2.0Framework releaseMarch 12, 2026PaperarXiv:2605.17172 (posted May 16, 2026)Repositorygithub.com/open-jarvis/OpenJarvisStars / forks~5.4k / ~1.2k (June 2026)LanguagesPython (~83%), Rust (~9%), TypeScript (~7%)Evaluated models11 local models across 4 families: Qwen3.5, Gemma4, Nemotron, GraniteCloud baselinesClaude Opus 4.6, GPT-5.4, Gemini 3.1 ProSupported enginesOllama, vLLM, SGLang, llama.cpp, Apple Foundation Models, Exo (among others)Context windowModel-dependentInstallationSingle command; ~3 minutes on broadbandHardwareTested on 7 platforms, from Mac Mini M4 to NVIDIA DGX Spark Architecture: Five Primitives and a Spec OpenJarvis decomposes a personal AI system into five typed primitives, composed through a single declarative configuration object called a spec. Intelligence — the model, weights, generation parameters, and quantization format. Engine — the inference runtime (Ollama, vLLM, SGLang, etc.), batching, KV-cache settings, and hardware path. Agents — the reasoning loop (ReAct or CodeAct), system prompts, tool-use policy, and turn limits. Tools & Memory — external interfaces, retrieval backends, 25+ data connectors, and 32+ messaging channels, with native MCP support and interchangeable memory backends. Learning — the optimizer that updates the spec from traces. This slot accepts LoRA, DSPy, GEPA, or LLM-guided spec search. Each primitive is independently swappable, and a spec serializes all five into a TOML file. Two specs can share the same agent and tool configuration and differ only in model and engine, so the same behavior runs on a Mac Mini and a workstation without rewriting prompts. LLM-guided spec search is the second contribution. It is a local–cloud collaboration: a frontier cloud model acts as a teacher at search time, reading traces, diagnosing failure clusters, and proposing edits across Intelligence, Engine, Agents, and Tools & Memory. An edit is accepted only if it improves the target failure cluster without causing meaningful regressions elsewhere — the research team calls this the gate (default tolerance 1%). The optimized spec then runs entirely on-device at inference time, with zero cloud calls. The teacher is used only at search time; at 100 queries per day, the amortized teacher cost falls below $0.001 per query within six months. Prior work (GEPA, DSPy, LoRA) optimizes one primitive at a time, and prompt optimizers alone recover only about 5 pp of the cloud–local gap. LLM-guided spec search recovers 13–32 pp because it edits across primitives jointly, at 7–11× lower optimization cost than single-primitive baselines. The four-primitive move space contributes 5.5–16.5 pp, and the LLM proposer adds about 10 pp on average over an evolutionary search at the same move space. https://arxiv.org/pdf/2605.17172v1 Capabilities & Performance OpenJarvis was evaluated across 8 benchmarks spanning 508 tasks: tool calling (ToolCall-15), agentic workflows (PinchBench), coding (LiveCodeBench), customer service (τ-Bench V2, τ²-Bench Telecom), general assistance (GAIA), and deep research (LiveResearchBench, DeepResearchBench). The swap test: Replacing the intended cloud model with Qwen3.5-9B in existing frameworks (OpenClaw, Hermes Agent) drops accuracy by 25–39 pp. With the same model under an OpenJarvis spec, the residual drop shrinks to 5.6–16.5 pp — recovering 56–77% of the portability loss. The accuracy frontier: The best single local model, Qwen3.5-122B, reaches 80.3% average accuracy versus Claude Opus 4.6 at 83.5% — a 3.2 pp gap. Local specs match or exceed cloud on 4 of 8 benchmarks: ToolCall-15, PinchBench, LiveCodeBench, and τ-Bench V2. Cost and latency: Local configurations form the accuracy–efficiency frontier. Qwen3.5-122B delivers its 80.3% at roughly a thousandth of a cent per query, versus $0.009 per query for Claude Opus 4.6 — an approximately 800× marginal API-cost advantage. End-to-end latency drops by roughly 4× on the agentic workloads, though the paper notes single-shot prompts can favor cloud serving. Search gains: LLM-guided spec search improves the Qwen3.5-9B student to 100% on PinchBench, 83% on LiveCodeBench, and 91% on LiveResearchBench. Across the full eight-benchmark suite, average gains per student model range from 13.1 to 31.5 pp. The authors report that these gains survive their robustness checks (reward-weight variants, search-seed variance, and random restarts). How to Use it Installation is one command. On macOS, Linux, or WSL2: Copy CodeCopiedUse a different Browsercurl -fsSL https://open-jarvis.github.io/OpenJarvis/install.sh | bash Windows users run an equivalent PowerShell script (irm … | iex). The installer provisions uv, a Python virtual environment, Ollama, and a starter model in about three minutes on broadband. A desktop GUI ships as a .dmg, .exe, .deb, .rpm, or .AppImage from the releases page. After install, jarvis starts a chat session. Starter presets cover common workflows: Copy CodeCopiedUse a different Browserjarvis init --preset morning-digest-mac # daily briefing with TTS jarvis init --preset deep-research # multi-hop research with citations jarvis init --preset code-assistant # agent with code execution and shell access jarvis init --preset scheduled-monitor # stateful agent on a schedule The framework ships with eight built-in agents across three execution modes — on-demand, scheduled, and continuous. It connects to 25+ data sources (Gmail, Calendar, iMessage, Notion, Obsidian, Slack, GitHub, and others) and exposes agents over 32+ messaging channels (WhatsApp, Telegram, Discord, iMessage, Signal, and others). Skills can be imported from external catalogs — about 150 from Hermes Agent and about 13,700 community skills from OpenClaw — all following the agentskills.io specification. A jarvis optimize skills --policy dspy command refines them from local trace history. Marktechpost’s Visual Explainer /* ---- scope everything to #mtp-ojx ---- */ #mtp-ojx{ --card:#8C1515; --card-dk:#5e0f0f; --ink:#2e2d29; --grey:#4D4F53; --mut:#6f7176; --line:#e7e1d8; --bg1:#ffffff; --bg2:#f7f4ef; --sand:#b3995d; --green:#175E54; all:initial; display:block !important; box-sizing:border-box !important; width:100% !important; max-width:1000px !important; margin:24px auto !important; background:var(--bg2) !important; color:var(--ink) !important; border:1px solid var(--line) !important; border-radius:16px !important; overflow:hidden !important; font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Roboto,Helvetica,Arial,sans-serif !important; box-shadow:0 14px 40px rgba(46,45,41,.10) !important; } #mtp-ojx *{ box-sizing:border-box !important; } /* kill WordPress wpautop artifacts */ #mtp-ojx hr, #mtp-ojx p:empty, #mtp-ojx del, #mtp-ojx s{ display:none !important; } #mtp-ojx .mtp-line{ height:1px !important; border:0 !important; background:var(--line) !important; margin:0 !important; } /* top accent bar */ #mtp-ojx .mtp-topbar{ height:5px !important; width:100% !important; background:linear-gradient(90deg,var(--card) 0%,var(--card-dk) 60%,var(--sand) 100%) !important; } /* header row */ #m

原始來源：MarkTechPost AI ↗

查看原始來源

智東西AI Agent

階躍Step 3.7 Flash拿下AA榜第一，讓Agent從「跑Demo」到「能搞錢」

這篇消息聚焦「階躍Step 3.7 Flash拿下AA榜第一，讓Agent從「跑Demo」到「能搞錢」」。原始導語提到：主攻極致速度與高性價比。從 AI 情報角度來看，這類內容值得關注其背後的技術進展、產品落地、產業競爭與後續市場影響。

剛剛閱讀分析

鈦媒體AI Agent

Exa獲2.5億美元融資，打造Agent原生的“Google”

這篇消息聚焦「Exa獲2.5億美元融資，打造Agent原生的“Google”」。原始導語提到：AI時代需要把搜索從底層重新做一遍從 AI 情報角度來看，這類內容值得關注其背後的技術進展、產品落地、產業競爭與後續市場影響。

剛剛閱讀分析

鈦媒體AI Agent

封了自家元寶，微信AI親自下場

這篇消息聚焦「封了自家元寶，微信AI親自下場」。原始導語提到：聊天框裡，如何再裝下一個AI操作系統。從 AI 情報角度來看，這類內容值得關注其背後的技術進展、產品落地、產業競爭與後續市場影響。

6 小時前閱讀分析

智東西AI Agent

又一百億估值獨角獸誕生！AI軟件監控創企拿下新融資，去年ARR破6億

智東西編譯 | 田忠婷編輯 | 程茜智東西6月4日報道，昨晚，以色列AI軟件監控獨角獸Coralogix完成2億美元（約合人民幣13.5億元）F輪融資，投後估值達16億美元（約合人民幣108億元）。該輪融資金額將主要用於AI智能體能力研發、遙測數據技術設施建設和市場擴張三個領域。過去一年，Coralogix的營收增長超過60%，並且其年化收入在一年多前就突破1億美元，在全球擁有包括IBM、Tradeweb和JFrog在內的5000多家客戶。 Coralogix本輪融資由Advent、加拿大養老金計劃投資委員會（CPPIB）和Greenfield共同領投，Brighton Park Capital跟投。該公司2025年6月17日完成1.15億美元（約合人民幣7.8億元）E輪融資，投後估值超10億美元（約合人民幣68億元）,一舉躍升獨角獸企業。距離上輪融資不到1年，Coralogix就完成了新一輪融資，這也是其成立以來最大的單筆融資。目前，Coralogix累計融資金額已達5.5億美元（約合人民幣37億元） ▲Coralogix獲得2億美元融資的公告（圖源：Coralogix） Coralogix由Ariel Assaraf於2014年在以色列創立，總部位於美國波士頓，是一家專注於AI時代軟件系統監控的公司。其核心業務是為企業提供新一代的運維監控系統，以AI Agent替代傳統的監控軟件，從而幫助企業在AI時代實現更智能、更自主的系統運維。其創始人Ariel Assaraf畢業於以色列開放大學經濟學與數學專業，後獲神經科學與機器學習碩士學位。他曾在以色列安全部門工作，後在Verint等公司任職，於2014年聯合創立Coralogix並擔任CEO。一、AI Agent倒逼運維變革，傳統監控軟件顯露短板傳統監控軟件主要依賴儀表盤、固定告警規則和人工分析，已經難以應對

6 小時前閱讀分析

Hugging Face BlogAI Agent

EVA-Bench Data 2.0: 3 Domains, 121 Tools, 213 Scenarios

Back to Articles EVA-Bench Data 2.0: 3 Domains, 121 Tools, 213 Scenarios Enterprise Article Published June 4, 2026 Upvote 1 Tara Bogavelli tarabogavelli Follow ServiceNow-AI Gabrielle Gauthier Melancon gabegma Follow ServiceNow-AI Katrina Stankiewicz kstankiewicz Follow ServiceNow-AI Nifemi Bamgbose onifemibam Follow ServiceNow-AI Fanny Riols FannyRiols Follow ServiceNow-AI Hoang Nguyen hnguy7 Follow ServiceNow-AI Raghav Mehndiratta rmehndir Follow ServiceNow-AI Lindsay Brin lindsaybrin Follow ServiceNow-AI Hari Subramani Hari-sub Follow ServiceNow-AI Anil Madamala anilmadamala Follow ServiceNow-AI Introduction Voice agent failures are often highly domain-specific. A system that flawlessly processes alphanumeric confirmation codes in flight re-booking transactions might stumble when handli

6 小時前閱讀分析

36氪AI Agent

AI Agent 的門票，MiniMax 想先打下來

這篇消息聚焦「AI Agent 的門票，MiniMax 想先打下來」。原始導語提到：為何人人都在 token 焦慮？從 AI 情報角度來看，這類內容值得關注其背後的技術進展、產品落地、產業競爭與後續市場影響。

10 小時前閱讀分析

相關文章

階躍Step 3.7 Flash拿下AA榜第一，讓Agent從「跑Demo」到「能搞錢」

Exa獲2.5億美元融資，打造Agent原生的“Google”

封了自家元寶，微信AI親自下場

又一百億估值獨角獸誕生！AI軟件監控創企拿下新融資，去年ARR破6億

EVA-Bench Data 2.0: 3 Domains, 121 Tools, 213 Scenarios

AI Agent 的門票，MiniMax 想先打下來