What's new Search

model safety

Model safety refers to the set of practices, principles, and safeguards designed to ensure that an artificial intelligence (AI) or machine learning model operates reliably, ethically, and without causing unintended harm. It involves preventing harmful outputs, misuse, bias, or unsafe behavior, as well as ensuring robustness, interpretability, and compliance with relevant regulations and values throughout the model’s design, training, deployment, and monitoring phases.

model safety

Longer AI Reasoning Makes Models Easier to Jailbreak, Study Finds