Hugging Face Blog生成式AI

Nemotron 3.5 內容安全：為全球企業 AI 打造可自訂的多模態安全防護

2026年6月4日 18:57

重點摘要

回顧過去兩年，NVIDIA 的內容安全技術棧已從一個專注於英文的分類器，發展為一系列專業模型，逐步擴展至新的模態、語言與推論模式。2026 年 3 月推出的 Nemotron 3 Content Safety 首次在單一 4B 參數模型中整合多模態與多語言能力。今日我們發布 Nemotron 3.5 Content Safety，補齊最後一塊拼圖：一個統一處理多模態輸入的單一模型。

站內 AI 整理稿

Back to Articles Nemotron 3.5 Content Safety: Customizable Multimodal Safety for Global Enterprise AI Enterprise + Article Published June 4, 2026 Upvote - Varun Singh varunsingh Follow nvidia Isabel Hulseman ihulseman0220 Follow nvidia Anuj Doshi andoshi Follow nvidia Shyamala Prayaga sprayaga25 Follow nvidia The last two years have seen NVIDIA's content safety stack grow from a focused English text classifier into a family of specialized models—each extending coverage to new modalities, languages, and inference modes. Nemotron 3 Content Safety, released in March 2026, combined multimodal and multilingual capabilities for the first time in a single 4B-parameter model. Today, we are releasing Nemotron 3.5 Content Safety, which completes that arc: a single model that unifies multimodal input, multilingual reach, custom enterprise policy enforcement, and auditable reasoning into one inference call. This post covers what changes in 3.5, the design decisions behind each new capability, and how to integrate the model into production safety pipelines. What's New in Nemotron 3.5 Content Safety 1. Unified Multimodal Evaluation Nemotron 3 introduced image understanding; Nemotron 3.5 deepens the multimodal integration. The model takes a user prompt, an optional image, and an optional assistant response as a single context window and produces a coherent safety verdict over the combined input. Evaluating all three together—rather than scoring each independently—closes a well-known gap in multimodal safety scenarios: policy violations that only emerge from the interaction between text and image, or between request and response, are now caught in a single pass. 2. Global Language Coverage Nemotron 3.5 maintains the 12-language explicit training coverage of its predecessors—English, French, Spanish, German, Chinese, Japanese, Korean, Arabic, Hindi, Russian, Portuguese, and Italian—while also inheriting strong zero-shot generalization across approximately 140 languages from the Gemma 3 base model. This means deployments in markets where training data is sparse (e.g., Southeast Asian languages, Scandinavian languages, less-resourced African languages) benefit from base-model multilingual transfer without requiring separate fine-tuning. 3. Custom Policy Enforcement This is the most significant architectural addition in 3.5 relative to Nemotron 3. Production deployments rarely operate under a single universal safety taxonomy. A healthcare platform has a different risk profile than a financial services chatbot, a developer tools IDE, or a children's education app. Nemotron 3.5 accepts a custom policy specification alongside the input. The model reasons over that policy when producing its verdict rather than deferring entirely to the built-in taxonomy. This extends the work first introduced in Nemotron Content Safety Reasoning 4B to the full multimodal, multilingual setting. 4. Reasoning Traces (THINK Mode) Every safety verdict in Nemotron 3.5 can be accompanied by an auditable reasoning trace via an optional think mode. When enabled, the model outputs its step-by-step reasoning before delivering a final safe / unsafe label and, optionally, the violated categories. <think> The user prompt asks for guidance on acquiring a controlled substance without a prescription. The assistant response provides specific sourcing steps and references an online marketplace. This interaction violates the Criminal Planning/Confessions and Controlled Substances categories. The image (a pharmacy exterior) provides locational context but does not alter the verdict. </think> User Safety: unsafe Response Safety: unsafe Safety Categories: Criminal Planning/Confessions, Controlled Substances When latency is the primary constraint, THINK mode can be disabled to return to the same low-latency binary verdict available in Nemotron 3. 5. Safety Dataset With Nemotron 3.5, we are releasing our safety dataset. This is an important milestone since most OSS safety models don't generally provide the training or evaluation sets. This problem is worse for the multimodal space where artifacts such as images or videos are often derived from resources with restrictive licensing terms. The Nemotron 3.5 Content Safety Dataset is multimodal, multilingual, and includes safety reasoning traces that were used to train the model. These reasoning traces were generated in a 2-step manner to make them concise, similar to the Nemotron Content Safety Reasoning 4B model. Model Architecture Nemotron 3.5 Content Safety is built on Google Gemma 3 4B IT (4B parameters), providing a 128K context window, strong vision-language reasoning, and broad multilingual coverage. NVIDIA fine-tunes this base with a LoRA adapter that installs targeted safety classification behavior while keeping the model compact enough for real-time deployment on 8GB+ VRAM GPUs. The inference interface supports three output modes: Mode 1 — Low-latency binary verdict: User Safety: safe Response Safety: unsafe Mode 2 — Binary verdict with categories: User Safety: safe Response Safety: unsafe Safety Categories: Violence, Criminal Planning/Confessions Mode 3 — THINK mode (reasoning + verdict): <think> [step-by-step reasoning trace] </think> User Safety: unsafe Response Safety: unsafe Safety Categories: [categories] The safety taxonomy follows the Aegis 2.0 framework: 13 core categories aligned with the MLCommons safety taxonomy, plus 10 fine-grained subcategories. This alignment allows direct comparison with other open and closed guard systems benchmarked on Aegis-taxonomy datasets. Reasoning Reasoning is a supercharger for content safety classification because it provides the necessary context, customization, and accountability required for production AI systems, especially in enterprise and regulated environments. Enables Custom and Contextual Policy Enforcement Reasoning allows a content safety model to dynamically interpret and enforce custom, domain-specific policies defined in natural language at the time of inference. This is necessary because production deployments rarely operate under a single, universal safety taxonomy. A financial services chatbot has a different risk profile than a children's education app which may have a lower tolerance for profanity. This capability supports: Category Suppression: Disabling irrelevant categories, such as preventing a "violence" category trigger when a DevOps tool handles the phrase "terminate a process". Custom Category Injection: Defining proprietary risk categories specific to an organization's regulatory or product policies. Provides Auditable and Documented Justification The reasoning traces show the model's step-by-step logic before it delivers a final safe or unsafe verdict. This documented justification serves several purposes: Compliance and Audit Logging: Regulated industries often require documented justifications for content moderation decisions. Human Review: Reviewers can audit why a verdict was reached to identify systematic model errors. Policy Iteration: The traces reveal how the model interprets edge cases, allowing teams to iteratively refine and improve custom policy language. Latency While reasoning can introduce latency, the Nemotron model addresses this by condensing reasoning chains into concise summaries to limit output tokens and increase efficiency. This is done in a 2-step process similar to what was done in the predecessor model Nemotron-Content-Safety-Reasoning-4B. In the first step, we use larger, more powerful models such as Qwen 397B to generate chain-of-thought reasoning traces based upon provided prompts, images, and responses. We also provided the ground-truth labels of the samples to avoid any misclassification that can find its way into the reasoning traces. In step 2, we make these reasoning traces more concise by using another large model such as Qwen 80B. We specifically instruct this model to rephrase the original traces (from step 1) so that it fits in no more than 3 sent

原始來源：Hugging Face Blog ↗

查看原始來源

36氪生成式AI

專家預測年底才到，Claude Mythos今天就跑出3小時6分

這篇消息聚焦「專家預測年底才到，Claude Mythos今天就跑出3小時6分」。原始導語提到：AI加速，超乎想象！從 AI 情報角度來看，這類內容值得關注其背後的技術進展、產品落地、產業競爭與後續市場影響。

剛剛閱讀分析

IT之家生成式AI

全球最強開源生圖 AI 模型：Ideogram 4.0 登場

Ideogram 於6月3日正式發表4.0版本，這是一款採用開放權重架構的文字轉圖片生成模型，官方宣稱其為「全球最佳開源生圖AI模型」。開發人員與研究人員可下載模型權重進行本地部署與二次開發，此舉有望進一步拉高開源模型的品質天花板。

6 小時前閱讀分析

雷峰網生成式AI

全球首個！材科源圖發佈有機高分子應用智能體

在人工智能重塑科研範式的科技浪潮中，因體系複雜、配方變量多，長期面臨高度依賴專家經驗、試錯成本高、知識難以沉澱複用等行業瓶頸，研發效率提升亟待突破。近日，據雷峰網瞭解，蘇州材科源圖（MatSource)正式發佈全球首個有機高分子材料研發應用智能體（Organic Polymer Agent）。該智能體依託自主構建的通用材料科學智能體框架（Materials Agent Framework），面向高分子材料研發場景打造專家級人工智能系統，推動“人工驅動”向“人工智能協同驅動”加速躍遷，為高新材料的高效自主研發提供了關鍵的技術支撐。01 面向複雜研發場景，構建高分子材料研發“智能中樞”作為材科源圖（MatSource) 材料科學智能體體系的重要組成部分，有機高分子應用智能體聚焦高分子材料研發中的關鍵痛點，融合材料知識圖譜、多模態數據理解、大模型推理與領域機理模型能力，構建覆蓋“設計-預測-優化-決策”的全流程智能研發體系。依託這一技術架構，系統可實現高分子分子結構設計與性能預測、配方體系智能生成與多目標優化、工藝參數推薦與實驗路徑規劃，以及文獻知識解析、研發知識沉澱等核心功能，推動專家經驗向數字化能力轉化。通過“知識+模型+工具”的深度協同，顯著提升研發效率與決策質量，為行業由傳統“經驗驅動”向“智能驅動”轉型提供新的技術路徑。02 率先落地光刻膠，完成產業級驗證作為有機高分子材料中技術壁壘最高、研發難度最大的典型代表，光刻膠成為該智能體的首個驗證場景。目前，系統已完成在ArF光刻膠研發場景中的實測驗證，實現從樹脂設計、配方篩選到性能預測的全流程支持，並完成關鍵指標驗證，證明瞭其在複雜有機高分子體系中的工程化能力與應用價值。這意味著，材科源圖（MatSource)不僅驗證了“AI+高分子材料”的技術可行性，也打通了從實驗室研發到產業應用的關鍵路徑。03 從ArF到EUV，持續拓

6 小時前閱讀分析

雷峰網生成式AI

不卷價格和參數，中國汽車如何賣到5000萬輛？

2026年，國內新能源汽車滲透率突破60%，中國汽車品牌的售價提升到80萬元。中國乘聯會秘書長崔東樹說，國產車未來要達到5000萬輛銷售規模，在全球市場中，佔比超過50%。中國汽車越過規模大關，但高速發展之下，行業參數內卷、體驗同質化、盈利承壓等痛點日益凸顯。第四屆未來汽車先行者大會上，奇瑞副總經理王琅直言，行業進入新的“無人區”，不能再卷參數了。跳出價格與參數之外，國產車如何尋找下一個增長點？01元戎啟行周光：智駕幾十公里接管一次和1000公里接管一次，是兩個物種最近幾年，智駕行業的技術重心從端到端、VLA向著大模型、基座模型和物理AI快速迭代。元戎啟行CEO周光分享了他對物理AI基座模型的思考。他認為，過去5年，智駕行業走的是小模型路線，已經到了能力的上限，投入越來越多，提升越來越慢。這個現象可以用“蹺蹺板效應”來形容：在小模型系統裡，當一個版本解決了上海、武漢等城市的問題，可能就會在深圳、廣州等地效果變差，引入新問題。版本之間因此要反反覆覆地修改。周光說，這種蹺蹺板效應在行業中非常普遍，這也是用戶難以長期信任這個系統的原因。2026年，行業認知進入到大模型階段。周光解釋，大模型並不是一個更大的小模型，而是有一整套技術邏輯，在技術棧、網絡結構、訓練方式和模式上都有變化。他舉了一個例子，來說明大模型和小模型的認知區別。假設一條狗被染上斑馬的條紋，小模型會識別為一隻斑馬；但大模型會作出這是一隻狗的判斷。“小模型擅長條件反射、局部特徵相應，大模型擅長高級認知”，周光總結。自動駕駛從一開始的被激活，城區安全接管，再到更高的認知理解，做到像人一樣的整體判斷和泛化能力，需要從執行系統升級到認知系統。周光判斷，今年年底到明年初，行業裡會迎來從小模型到大模型、基座模型的轉換浪潮。技術陡峭升級，大模型成為智駕發展的下一個技術範式。他透露，元戎啟行很早就判斷要全面擁抱大模型和多模態，202

8 小時前閱讀分析

IT之家生成式AI

奧爾特曼：OpenAI 內部有人每月用掉約 1000 億個詞元

從六年前月耗十萬詞元到如今月耗千億，OpenAI 的詞元消耗量呈爆炸式增長。公司內部設有消耗排行榜，員工甚至曬圖炫耀，與亞馬遜等嚴控成本的企業形成鮮明對比。奧爾特曼承認成本已成難題，正尋求降本增效。 #AI 成本# #詞元消耗#

9 小時前閱讀分析

雷峰網生成式AI

面壁智能「開源周」：一場定義端側 AI 終局的系統性「亮劍」

一場罕見的「技術組曲」。作者丨馬曉寧編輯丨林覺民難得能遇到一次大模型開源周。5 月 25 日至 29 日，面壁智能聯合 OpenBMB 開源社區，以每日發佈一項關鍵技術成果的節奏，舉辦了一場「端側大模型開源周」。這在中國乃至全球的大模型公司中，都是一次極為罕見的集體「亮劍」。從適配國產昇騰、未來有望將 600 億參數大模型裝進手機的 1.58-bit 低比特訓練大模型 BitCPM-CANN，到性能超越兩倍參數模型、全球同級最優的 MiniCPM5-1B；從 AI 親手編寫、在 H100 上比英偉達自家大模型訓練框架 Megatron 更快的 ForgeTrain，到重構交互範式的智能體操作系統 PilotDeck；最後，再到揭示端側模型高效智能源頭的核心數據集 UltraData 系列……這五項成果並非孤立的「技術煙花」，而是一套環環相扣、邏輯嚴密的「技術組曲」。它們共同指向一個清晰的行業事實：端側大模型的終局之戰，比拼的不是某個單點技術的拔群，而是覆蓋數據、算法、框架、應用的全鏈路系統工程的創新總和。面壁開源周，更迫使我們重新思考：在通往 AGI 的道路上，開源的真正價值是什麼？而端側，又將在其中扮演怎樣的角色？01為何開源周如此之少翻開過去三年的歷史，面壁之外，也僅有以「效率美學」著稱的 DeepSeek 在 2024 年（6月24日-28日）進行過類似的開源周活動。此外，雖然有些公司有過一週開源三款模型的記錄，但是還未曾冠以開源周的名義。做“開源周”，通常意味著一個機構需要在短時間內（一週）集中釋放大量、系統性的技術成果。這不僅僅是數量的堆砌，更關鍵的是質量的深度和規劃的體系性。無論是DeepSeek，還是面壁的開源周案例，我們可以這樣理解：刻意將發佈壓縮在5-7天內，每天甚至半天就有新項目放出。這需要背後有充足的項目儲備和成熟的發佈節奏規劃。這次面壁的開源，

10 小時前閱讀分析

相關文章

專家預測年底才到，Claude Mythos今天就跑出3小時6分

全球最強開源生圖 AI 模型：Ideogram 4.0 登場

全球首個！材科源圖發佈有機高分子應用智能體

不卷價格和參數，中國汽車如何賣到5000萬輛？

奧爾特曼：OpenAI 內部有人每月用掉約 1000 億個詞元

面壁智能「開源周」：一場定義端側 AI 終局的系統性「亮劍」