Hugging Face Blog生成式AI

將 hf CLI 設計為與 Hub 協作的代理優化方式

2026年6月4日 00:00

重點摘要

hf 是 Hugging Face Hub 的官方命令列介面。您在 Python SDK 上能對 Hub 做的任何操作,現在都能在終端機中完成:下載與上傳模型、資料集和 Spaces;建立與管理儲存庫、分支、標籤及 Pull Request;在 HF 基礎架構上執行 Jobs;管理 Buckets、Collections、Webhooks 以及 Inference Endpoints。hf CLI 多年來主要為我們的使用者打造,但如今也越來越多被編碼代理(如 Claude Code、Codex、Cursor 等)所採用。因此我們重新設計它,讓它能同時滿足兩類使用者的需求。這篇部落格文章總結了我們所做的工作。

站內 AI 整理稿

Back to Articles Designing the hf CLI as an agent-optimized way to work with the Hub Published June 4, 2026 Update on GitHub Upvote 4 Célina Hanouti celinah Follow Lucain Pouget Wauplin Follow hf is the official command-line entrypoint to the Hugging Face Hub. Anything you can do on the Hub from the Python SDK, you can do from your terminal: download and upload models, datasets and Spaces; create and manage repos, branches, tags and pull requests; run Jobs on HF infrastructure; manage Buckets, Collections, webhooks and Inference Endpoints. The hf CLI has been primarily built for our users over the years. But it's now increasingly used by coding agents: Claude Code, Codex, Cursor and more. So we rebuilt it to make it work for both audiences at once. This blog post summarizes what we did, and how we benchmarked it. We found that on complex, multi-step tasks the no-CLI baseline (an agent hand-rolling curl or the Python SDK) uses up to 6× as many tokens as the hf CLI. AI agent traffic on the Hub We started tracking agent usage of the Hub in April 2026. The hf CLI (and the huggingface_hub Python SDK it's built on) detects when a coding agent is driving it by reading the environment variables agents set: CLAUDECODE/CLAUDE_CODE for Claude Code, CODEX_SANDBOX for Codex, plus Cursor, Gemini, Pi, and the universal AI_AGENT. That single signal does two jobs: it shapes the CLI's output (more on that below) and it tags each Hub request with an agent/<name> user-agent, so we can attribute traffic to the agent driving it. The two largest by distinct users are Claude Code and Codex, well ahead of everything else, and they're the two agents we benchmark later in this article. The bars count distinct users per agent; request volume is the sub-label. Claude Code alone is ~40k users and nearly 49M requests, with Codex close behind. These are early numbers (we only began attributing agent traffic in April 2026), but the scale is already significant, and we expect it to keep growing as coding agents become a standard way to work with the Hub. Built for humans and agents Humans and coding agents expect different outputs for the same hf commands. A human wants rich terminal output: ANSI color, padded tables truncated to fit the screen, a green ✅ on success, ✔ for booleans, progress bars, prose hints. An agent wants the inverse: no ANSI, nothing truncated, every value in full since an agent can handle far denser output than a human, kept compact and structured to stay light on tokens. It also can't answer a CLI prompt and will happily re-run a command after a timeout. The rest of this section is how hf gives each side what it needs. We introduced agent-mode output in hf v1.9.0 and have been migrating the rest of the CLI to it gradually in the following releases. One command, multiple renderings When hf auto-detects agent use (via the environment variables mentioned above), it renders the same command differently. It optimizes output format for humans or agents without passing a flag: # human (default in a terminal): aligned table, truncated to fit, with a hint > hf models ls --author Qwen --sort downloads --limit 3 ID CREATED_AT DOWNLOADS LIBRARY_NAME LIKES PIPELINE_TAG PRIVATE TAGS ------------------------ ---------- --------- ------------ ----- --------------- ------- ------------------------- Qwen/Qwen3-0.6B 2025-04-27 21156913 transformers 1285 text-generation transformers, safetens... Qwen/Qwen2.5-1.5B-Ins... 2024-09-17 15143953 transformers 725 text-generation transformers, safetens... Qwen/Qwen3-4B 2025-04-27 14808352 transformers 625 text-generation transformers, safetens... Hint: Use `--no-truncate` or `--format json` to display full values. # agent (auto-detected): TSV, full ids + ISO timestamps + every tag, nothing truncated $ hf models ls --author Qwen --sort downloads --limit 3 id created_at downloads library_name likes pipeline_tag private tags Qwen/Qwen3-0.6B 2025-04-27T03:40:08+00:00 21156913 transformers 1285 text-generation False ['transformers', 'safetensors', 'qwen3', 'text-generation', 'conversational', 'arxiv:2505.09388', 'base_model:Qwen/Qwen3-0.6B-Base', 'base_model:finetune:Qwen/Qwen3-0.6B-Base', 'license:apache-2.0', 'text-generation-inference', 'endpoints_compatible', 'deploy:azure', 'region:us'] Qwen/Qwen2.5-1.5B-Instruct 2024-09-17T14:10:29+00:00 15143953 transformers 725 text-generation False['transformers', 'safetensors', 'qwen2', 'text-generation', 'chat', 'conversational', 'en', 'arxiv:2407.10671', 'base_model:Qwen/Qwen2.5-1.5B', 'base_model:finetune:Qwen/Qwen2.5-1.5B', 'license:apache-2.0', 'text-generation-inference', 'endpoints_compatible', 'deploy:azure', 'region:us'] Qwen/Qwen3-4B 2025-04-27T03:41:29+00:00 14808352 transformers 625 text-generation False ['transformers', 'safetensors', 'text-generation', 'arxiv:2309.00071', 'arxiv:2505.09388', 'base_model:Qwen/Qwen3-4B-Base', 'base_model:finetune:Qwen/Qwen3-4B-Base', 'license:apache-2.0', 'endpoints_compatible', 'deploy:azure', 'region:us'] A human gets an aligned table, truncated to fit the terminal, plus a hint on how to see more, with color cues for status (a green ✓ on success, red on error). An agent gets the complete record as TSV: full repo ids, full ISO timestamps, every tag, no ANSI codes, nothing truncated, clean to parse and light on tokens. In practice, we've implemented logging methods like .table(...), .result(...), .json(), etc., which take raw data as input and handle the formatting. In addition to human and agent modes, we've introduced --json and --quiet options to make it easier to pipe commands together. The default mode is automatically chosen based on context, but users can always force the format of their choice with --format human | agent | json | quiet. Next-command hints CLI commands rarely run in isolation: one step usually implies the next (git add, then git commit). Many hf commands now end with a hint: the exact next command to run, pre-filled with the IDs you just used, so a user or agent can chain straight to the next step instead of working it out from scratch. Start a Job in the background and it points you to its logs; create a Space and it points you to its boot status: $ hf jobs run --detach python:3.12 python train.py ✓ Job started id: 6f3a1c2e9b url: https://huggingface.co/jobs/celinah/6f3a1c2e9b Hint: Use `hf jobs logs 6f3a1c2e9b` to fetch the logs. For a human that's a convenience. For an agent it's a rail: the next action is named, parameterized with the right ids, and ready to run, so it takes fewer steps working out what to do. Errors behave the same way, naming the fix instead of just failing: Error: Not logged in. Run `hf auth login` first. Hints, warnings and errors all go to stderr while data goes to stdout, so none of this guidance pollutes the output the agent is parsing. Non-blocking and safe to retry hf never sits on an interactive prompt waiting for a key an agent can't press. A destructive command still asks a human to confirm, but in agent mode it fails fast with the fix in the message (Use --yes to skip confirmation.), and -y/--yes skips it. And because agents retry on timeouts and lost context, operations are built to be safe to repeat: hf repos create --exist-ok is a no-op if the repo already exists, and re-running an upload re-commits cleanly. Separately, the commands that move real data take a --dry-run that shows exactly what they'll transfer before they run, which proves handy for humans and agents alike, since neither has to commit to a long download or blind sync: # agent mode: a destructive command without --yes refuses, with the fix in the message $ hf repos delete my-org/old-model Error: You are about to permanently delete model 'my-org/old-model'. Proceed? Use --yes to skip confirmation. # commands that move data take --dry-run to preview the transfer first $ hf download deepseek-ai/DeepSeek-V4-Pro config.json --dry-run [dry-run] Will download 1 files (out of 1) totalling 1.8K. file size config.json 1.8K Discoverable, p

Related

相關文章

Hugging Face Blog生成式AI

Nemotron 3.5 內容安全:為全球企業 AI 打造可自訂的多模態安全防護

回顧過去兩年,NVIDIA 的內容安全技術棧已從一個專注於英文的分類器,發展為一系列專業模型,逐步擴展至新的模態、語言與推論模式。2026 年 3 月推出的 Nemotron 3 Content Safety 首次在單一 4B 參數模型中整合多模態與多語言能力。今日我們發布 Nemotron 3.5 Content Safety,補齊最後一塊拼圖:一個統一處理多模態輸入的單一模型。

13 分鐘前
IT之家生成式AI

全球最強開源生圖 AI 模型:Ideogram 4.0 登場

Ideogram 於6月3日正式發表4.0版本,這是一款採用開放權重架構的文字轉圖片生成模型,官方宣稱其為「全球最佳開源生圖AI模型」。開發人員與研究人員可下載模型權重進行本地部署與二次開發,此舉有望進一步拉高開源模型的品質天花板。

5 小時前
雷峰網生成式AI

全球首個!材科源圖發佈有機高分子應用智能體

在人工智能重塑科研範式的科技浪潮中,因體系複雜、配方變量多,長期面臨高度依賴專家經驗、試錯成本高、知識難以沉澱複用等行業瓶頸,研發效率提升亟待突破。近日,據雷峰網瞭解,蘇州材科源圖(MatSource)正式發佈全球首個有機高分子材料研發應用智能體(Organic Polymer Agent)。該智能體依託自主構建的通用材料科學智能體框架(Materials Agent Framework),面向高分子材料研發場景打造專家級人工智能系統,推動“人工驅動”向“人工智能協同驅動”加速躍遷,為高新材料的高效自主研發提供了關鍵的技術支撐。01 面向複雜研發場景,構建高分子材料研發“智能中樞”作為材科源圖(MatSource) 材料科學智能體體系的重要組成部分,有機高分子應用智能體聚焦高分子材料研發中的關鍵痛點,融合材料知識圖譜、多模態數據理解、大模型推理與領域機理模型能力,構建覆蓋“設計-預測-優化-決策”的全流程智能研發體系。依託這一技術架構,系統可實現高分子分子結構設計與性能預測、配方體系智能生成與多目標優化、工藝參數推薦與實驗路徑規劃,以及文獻知識解析、研發知識沉澱等核心功能,推動專家經驗向數字化能力轉化。通過“知識+模型+工具”的深度協同,顯著提升研發效率與決策質量,為行業由傳統“經驗驅動”向“智能驅動”轉型提供新的技術路徑。02 率先落地光刻膠,完成產業級驗證作為有機高分子材料中技術壁壘最高、研發難度最大的典型代表,光刻膠成為該智能體的首個驗證場景。目前,系統已完成在ArF光刻膠研發場景中的實測驗證,實現從樹脂設計、配方篩選到性能預測的全流程支持,並完成關鍵指標驗證,證明瞭其在複雜有機高分子體系中的工程化能力與應用價值。這意味著,材科源圖(MatSource)不僅驗證了“AI+高分子材料”的技術可行性,也打通了從實驗室研發到產業應用的關鍵路徑。03 從ArF到EUV,持續拓

5 小時前
雷峰網生成式AI

不卷價格和參數,中國汽車如何賣到5000萬輛?

2026年,國內新能源汽車滲透率突破60%,中國汽車品牌的售價提升到80萬元。中國乘聯會秘書長崔東樹說,國產車未來要達到5000萬輛銷售規模,在全球市場中,佔比超過50%。中國汽車越過規模大關,但高速發展之下,行業參數內卷、體驗同質化、盈利承壓等痛點日益凸顯。第四屆未來汽車先行者大會上,奇瑞副總經理王琅直言,行業進入新的“無人區”,不能再卷參數了。跳出價格與參數之外,國產車如何尋找下一個增長點?01元戎啟行周光:智駕幾十公里接管一次和1000公里接管一次,是兩個物種最近幾年,智駕行業的技術重心從端到端、VLA向著大模型、基座模型和物理AI快速迭代。元戎啟行CEO周光分享了他對物理AI基座模型的思考。他認為,過去5年,智駕行業走的是小模型路線,已經到了能力的上限,投入越來越多,提升越來越慢。這個現象可以用“蹺蹺板效應”來形容:在小模型系統裡,當一個版本解決了上海、武漢等城市的問題,可能就會在深圳、廣州等地效果變差,引入新問題。版本之間因此要反反覆覆地修改。周光說,這種蹺蹺板效應在行業中非常普遍,這也是用戶難以長期信任這個系統的原因。2026年,行業認知進入到大模型階段。周光解釋,大模型並不是一個更大的小模型,而是有一整套技術邏輯,在技術棧、網絡結構、訓練方式和模式上都有變化。他舉了一個例子,來說明大模型和小模型的認知區別。假設一條狗被染上斑馬的條紋,小模型會識別為一隻斑馬;但大模型會作出這是一隻狗的判斷。“小模型擅長條件反射、局部特徵相應,大模型擅長高級認知”,周光總結。自動駕駛從一開始的被激活,城區安全接管,再到更高的認知理解,做到像人一樣的整體判斷和泛化能力,需要從執行系統升級到認知系統。周光判斷,今年年底到明年初,行業裡會迎來從小模型到大模型、基座模型的轉換浪潮。技術陡峭升級,大模型成為智駕發展的下一個技術範式。他透露,元戎啟行很早就判斷要全面擁抱大模型和多模態,202

7 小時前
IT之家生成式AI

奧爾特曼:OpenAI 內部有人每月用掉約 1000 億個詞元

從六年前月耗十萬詞元到如今月耗千億,OpenAI 的詞元消耗量呈爆炸式增長。公司內部設有消耗排行榜,員工甚至曬圖炫耀,與亞馬遜等嚴控成本的企業形成鮮明對比。奧爾特曼承認成本已成難題,正尋求降本增效。 #AI 成本# #詞元消耗#

8 小時前