Constitutional AI

发表于2026-05-27|更新于2026-05-30|Paper Reading

|总字数:155|阅读时长:1分钟|浏览量:

论文阅读记录

Constitutional AI

标题：Constitutional AI: Harmlessness from AI Feedback
作者：Anthropic
年份：2022
链接：Cai

Main

Framework:
Motivation:
- Scaling Supervision: leverage AI to help humans to more efficiently supervise AI
  - AI supervision may be more efficient than collecting human feedback
  - AI systems can already perform some tasks at or beyond human level
  - Visualization:
- A Harmless but Non-Evasive (Still Helpful) Assistant
- Simplicity and Transparency
The Constitutional AI Approach: human supervision will come entirely from a set of principles
- (Supervised Stage) Critique → Revision → Supervised Learning
- (RL Stage) AI Comparison Evaluations → Preference Model → Reinforcement Learning
Models and Data: Helpful Model (H), Helpful & Harmless Model (HH)
Constitutional AI: Critiques, Revisions, and Supervised Learning:
- Method:
- Results:
  - H-RLHF
  - Numbers of Revisions
  - w/o critique
Constitutional AI: Reinforcement Learning from AI Feedback
- Method:
- Results:
- optimization:Constitutional Principles, Ensembling, Preference Labels (Soft vs. Hard vs. Clamped)
- Harmlessness vs. Evasiveness

文章作者: 鱼幼薇

文章链接: https://youweiyu.github.io/2026/05/27/Constitutional_AI/

版权声明: 本博客所有文章除特别声明外，均采用 CC BY-NC-SA 4.0 许可协议。转载请注明来源小鱼日记！

相关推荐

论文阅读记录MAVEN-ARG 标题：MAVEN-ARG: Completing the Puzzle of All-in-One Event Understanding Dataset with Event Argument Annotation 作者：Xiaozhi Wang(THU) 发表会议/期刊：ACL 年份：2024 链接：MAVEN Intro Background: event understanding is typically organized as three information extraction tasks: event detection (ED), event argument extraction (EAE), event relation extraction (ERE). MAVEN MAVEN-ERE MAVEN-ARG Motivation：A large-scale dataset covering all the event understanding tasks has long been...

数据加载中