認識 OpenJarvis:具備工具、記憶與學習能力的本地優先裝置端個人 AI 代理框架

重點摘要
史丹佛大學與 Lambda Labs 的研究人員發表了 OpenJarvis 的研究論文,這是一個開放原始碼框架,能將推論、代理、記憶與學習完全在裝置端執行。透過 OpenJarvis 配置的開放權重模型,平均表現僅落後最佳雲端模型 3.2 個百分點,且根據研究基準測試,每次查詢的邊際 API 成本約降低 800 倍,延遲約降低 4 倍。此研究奠基於團隊先前的《每瓦智慧》研究,該研究指出本地模型在互動延遲下已能處理 88.7% 的單輪對話與推理查詢,且智慧效率從 2023 到 2025 年提升了 5.3 倍。模型概述與存取:OpenJarvis 並非單一模型,而是一個整合...的框架。
Researchers at Stanford University and Lambda Labs, have published the research paper for OpenJarvis, an open-source framework that runs inference, agents, memory, and learning entirely on-device. The open-weight models configured through OpenJarvis land within 3.2 percentage points of the best cloud model on average, at roughly 800× lower marginal API cost per query and roughly 4× lower latency under the research’s benchmark protocol. This research work builds on the research team’s earlier Intelligence Per Watt study, which reported that local models already handle 88.7% of single-turn chat and reasoning queries at interactive latency, with intelligence efficiency improving 5.3× from 2023 to 2025. Model Overview & Access OpenJarvis is not a single model. It is a framework that composes any supported model with a configurable agent stack, evaluated across 11 local models from four families. PropertyValueLicenseApache 2.0Framework releaseMarch 12, 2026PaperarXiv:2605.17172 (posted May 16, 2026)Repositorygithub.com/open-jarvis/OpenJarvisStars / forks~5.4k / ~1.2k (June 2026)LanguagesPython (~83%), Rust (~9%), TypeScript (~7%)Evaluated models11 local models across 4 families: Qwen3.5, Gemma4, Nemotron, GraniteCloud baselinesClaude Opus 4.6, GPT-5.4, Gemini 3.1 ProSupported enginesOllama, vLLM, SGLang, llama.cpp, Apple Foundation Models, Exo (among others)Context windowModel-dependentInstallationSingle command; ~3 minutes on broadbandHardwareTested on 7 platforms, from Mac Mini M4 to NVIDIA DGX Spark Architecture: Five Primitives and a Spec OpenJarvis decomposes a personal AI system into five typed primitives, composed through a single declarative configuration object called a spec. Intelligence — the model, weights, generation parameters, and quantization format. Engine — the inference runtime (Ollama, vLLM, SGLang, etc.), batching, KV-cache settings, and hardware path. Agents — the reasoning loop (ReAct or CodeAct), system prompts, tool-use policy, and turn limits. Tools & Memory — external interfaces, retrieval backends, 25+ data connectors, and 32+ messaging channels, with native MCP support and interchangeable memory backends. Learning — the optimizer that updates the spec from traces. This slot accepts LoRA, DSPy, GEPA, or LLM-guided spec search. Each primitive is independently swappable, and a spec serializes all five into a TOML file. Two specs can share the same agent and tool configuration and differ only in model and engine, so the same behavior runs on a Mac Mini and a workstation without rewriting prompts. LLM-guided spec search is the second contribution. It is a local–cloud collaboration: a frontier cloud model acts as a teacher at search time, reading traces, diagnosing failure clusters, and proposing edits across Intelligence, Engine, Agents, and Tools & Memory. An edit is accepted only if it improves the target failure cluster without causing meaningful regressions elsewhere — the research team calls this the gate (default tolerance 1%). The optimized spec then runs entirely on-device at inference time, with zero cloud calls. The teacher is used only at search time; at 100 queries per day, the amortized teacher cost falls below $0.001 per query within six months. Prior work (GEPA, DSPy, LoRA) optimizes one primitive at a time, and prompt optimizers alone recover only about 5 pp of the cloud–local gap. LLM-guided spec search recovers 13–32 pp because it edits across primitives jointly, at 7–11× lower optimization cost than single-primitive baselines. The four-primitive move space contributes 5.5–16.5 pp, and the LLM proposer adds about 10 pp on average over an evolutionary search at the same move space. https://arxiv.org/pdf/2605.17172v1 Capabilities & Performance OpenJarvis was evaluated across 8 benchmarks spanning 508 tasks: tool calling (ToolCall-15), agentic workflows (PinchBench), coding (LiveCodeBench), customer service (τ-Bench V2, τ²-Bench Telecom), general assistance (GAIA), and deep research (LiveResearchBench, DeepResearchBench). The swap test: Replacing the intended cloud model with Qwen3.5-9B in existing frameworks (OpenClaw, Hermes Agent) drops accuracy by 25–39 pp. With the same model under an OpenJarvis spec, the residual drop shrinks to 5.6–16.5 pp — recovering 56–77% of the portability loss. The accuracy frontier: The best single local model, Qwen3.5-122B, reaches 80.3% average accuracy versus Claude Opus 4.6 at 83.5% — a 3.2 pp gap. Local specs match or exceed cloud on 4 of 8 benchmarks: ToolCall-15, PinchBench, LiveCodeBench, and τ-Bench V2. Cost and latency: Local configurations form the accuracy–efficiency frontier. Qwen3.5-122B delivers its 80.3% at roughly a thousandth of a cent per query, versus $0.009 per query for Claude Opus 4.6 — an approximately 800× marginal API-cost advantage. End-to-end latency drops by roughly 4× on the agentic workloads, though the paper notes single-shot prompts can favor cloud serving. Search gains: LLM-guided spec search improves the Qwen3.5-9B student to 100% on PinchBench, 83% on LiveCodeBench, and 91% on LiveResearchBench. Across the full eight-benchmark suite, average gains per student model range from 13.1 to 31.5 pp. The authors report that these gains survive their robustness checks (reward-weight variants, search-seed variance, and random restarts). How to Use it Installation is one command. On macOS, Linux, or WSL2: Copy CodeCopiedUse a different Browsercurl -fsSL https://open-jarvis.github.io/OpenJarvis/install.sh | bash Windows users run an equivalent PowerShell script (irm … | iex). The installer provisions uv, a Python virtual environment, Ollama, and a starter model in about three minutes on broadband. A desktop GUI ships as a .dmg, .exe, .deb, .rpm, or .AppImage from the releases page. After install, jarvis starts a chat session. Starter presets cover common workflows: Copy CodeCopiedUse a different Browserjarvis init --preset morning-digest-mac # daily briefing with TTS jarvis init --preset deep-research # multi-hop research with citations jarvis init --preset code-assistant # agent with code execution and shell access jarvis init --preset scheduled-monitor # stateful agent on a schedule The framework ships with eight built-in agents across three execution modes — on-demand, scheduled, and continuous. It connects to 25+ data sources (Gmail, Calendar, iMessage, Notion, Obsidian, Slack, GitHub, and others) and exposes agents over 32+ messaging channels (WhatsApp, Telegram, Discord, iMessage, Signal, and others). Skills can be imported from external catalogs — about 150 from Hermes Agent and about 13,700 community skills from OpenClaw — all following the agentskills.io specification. A jarvis optimize skills --policy dspy command refines them from local trace history. Marktechpost’s Visual Explainer /* ---- scope everything to #mtp-ojx ---- */ #mtp-ojx{ --card:#8C1515; --card-dk:#5e0f0f; --ink:#2e2d29; --grey:#4D4F53; --mut:#6f7176; --line:#e7e1d8; --bg1:#ffffff; --bg2:#f7f4ef; --sand:#b3995d; --green:#175E54; all:initial; display:block !important; box-sizing:border-box !important; width:100% !important; max-width:1000px !important; margin:24px auto !important; background:var(--bg2) !important; color:var(--ink) !important; border:1px solid var(--line) !important; border-radius:16px !important; overflow:hidden !important; font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Roboto,Helvetica,Arial,sans-serif !important; box-shadow:0 14px 40px rgba(46,45,41,.10) !important; } #mtp-ojx *{ box-sizing:border-box !important; } /* kill WordPress wpautop artifacts */ #mtp-ojx hr, #mtp-ojx p:empty, #mtp-ojx del, #mtp-ojx s{ display:none !important; } #mtp-ojx .mtp-line{ height:1px !important; border:0 !important; background:var(--line) !important; margin:0 !important; } /* top accent bar */ #mtp-ojx .mtp-topbar{ height:5px !important; width:100% !important; background:linear-gradient(90deg,var(--card) 0%,var(--card-dk) 60%,var(--sand) 100%) !important; } /* header row */ #m
Related
相關文章

階躍Step 3.7 Flash拿下AA榜第一,讓Agent從「跑Demo」到「能搞錢」
這篇消息聚焦「階躍Step 3.7 Flash拿下AA榜第一,讓Agent從「跑Demo」到「能搞錢」」。原始導語提到:主攻極致速度與高性價比。 從 AI 情報角度來看,這類內容值得關注其背後的技術進展、產品落地、產業競爭與後續市場影響。

Exa獲2.5億美元融資,打造Agent原生的“Google”
這篇消息聚焦「Exa獲2.5億美元融資,打造Agent原生的“Google”」。原始導語提到:AI時代需要把搜索從底層重新做一遍 從 AI 情報角度來看,這類內容值得關注其背後的技術進展、產品落地、產業競爭與後續市場影響。

封了自家元寶,微信AI親自下場
這篇消息聚焦「封了自家元寶,微信AI親自下場」。原始導語提到:聊天框裡,如何再裝下一個AI操作系統。 從 AI 情報角度來看,這類內容值得關注其背後的技術進展、產品落地、產業競爭與後續市場影響。
又一百億估值獨角獸誕生!AI軟件監控創企拿下新融資,去年ARR破6億
智東西 編譯 | 田忠婷 編輯 | 程茜 智東西6月4日報道,昨晚,以色列AI軟件監控獨角獸Coralogix完成2億美元(約合人民幣13.5億元)F輪融資,投後估值達16億美元(約合人民幣108億元)。 該輪融資金額將主要用於AI智能體能力研發、遙測數據技術設施建設和市場擴張三個領域。過去一年,Coralogix的營收增長超過60%,並且其年化收入在一年多前就突破1億美元,在全球擁有包括IBM、Tradeweb和JFrog在內的5000多家客戶。 Coralogix本輪融資由Advent、加拿大養老金計劃投資委員會(CPPIB)和Greenfield共同領投,Brighton Park Capital跟投。 該公司2025年6月17日完成1.15億美元(約合人民幣7.8億元)E輪融資,投後估值超10億美元(約合人民幣68億元),一舉躍升獨角獸企業。距離上輪融資不到1年,Coralogix就完成了新一輪融資,這也是其成立以來最大的單筆融資。目前,Coralogix累計融資金額已達5.5億美元(約合人民幣37億元) ▲Coralogix獲得2億美元融資的公告(圖源:Coralogix) Coralogix由Ariel Assaraf於2014年在以色列創立,總部位於美國波士頓,是一家專注於AI時代軟件系統監控的公司。其核心業務是為企業提供新一代的運維監控系統,以AI Agent替代傳統的監控軟件,從而幫助企業在AI時代實現更智能、更自主的系統運維。 其創始人Ariel Assaraf畢業於以色列開放大學經濟學與數學專業,後獲神經科學與機器學習碩士學位。他曾在以色列安全部門工作,後在Verint等公司任職,於2014年聯合創立Coralogix並擔任CEO。 一、AI Agent倒逼運維變革,傳統監控軟件顯露短板 傳統監控軟件主要依賴儀表盤、固定告警規則和人工分析,已經難以應對
EVA-Bench Data 2.0: 3 Domains, 121 Tools, 213 Scenarios
Back to Articles EVA-Bench Data 2.0: 3 Domains, 121 Tools, 213 Scenarios Enterprise Article Published June 4, 2026 Upvote 1 Tara Bogavelli tarabogavelli Follow ServiceNow-AI Gabrielle Gauthier Melancon gabegma Follow ServiceNow-AI Katrina Stankiewicz kstankiewicz Follow ServiceNow-AI Nifemi Bamgbose onifemibam Follow ServiceNow-AI Fanny Riols FannyRiols Follow ServiceNow-AI Hoang Nguyen hnguy7 Follow ServiceNow-AI Raghav Mehndiratta rmehndir Follow ServiceNow-AI Lindsay Brin lindsaybrin Follow ServiceNow-AI Hari Subramani Hari-sub Follow ServiceNow-AI Anil Madamala anilmadamala Follow ServiceNow-AI Introduction Voice agent failures are often highly domain-specific. A system that flawlessly processes alphanumeric confirmation codes in flight re-booking transactions might stumble when handli

AI Agent 的門票,MiniMax 想先打下來
這篇消息聚焦「AI Agent 的門票,MiniMax 想先打下來」。原始導語提到:為何人人都在 token 焦慮? 從 AI 情報角度來看,這類內容值得關注其背後的技術進展、產品落地、產業競爭與後續市場影響。