Back to Models

Benchmark Guide

Understanding Model Benchmarks

A simple guide to the metrics used to evaluate abliterated AI models on Abliz.

UGI

How well a model handles sensitive topics without refusing

W/10

Willingness score - how far before the model refuses

NatInt

Natural Intelligence - general knowledge and reasoning

Writing

Creative writing ability, style, and output quality

UGI - Uncensored General Intelligence
Measures how well a model handles sensitive or controversial topics

What it measures:

UGI tests a model's knowledge and willingness to engage with topics that many AI models refuse to discuss. A higher UGI score means the model is more capable of providing information without unnecessary restrictions.

Hazardous

Knowledge of sensitive topics that typical AI models avoid discussing

Entertainment

Knowledge of adult or controversial entertainment and media content

SocPol

Knowledge of sensitive socio-political topics and current events

W/10 - Willingness Score

Measures how far a model can be pushed before it refuses to answer. Scale of 0-10, where 10 means the model almost never refuses.

W/10-Direct:Does the model outright refuse to respond?
W/10-Adherence:Does the model follow instructions without deviating?
Quick Reference
MetricWhat it meansHigher is better?
UGIHandles sensitive topics without refusingYes
W/10Willingness to follow instructionsYes
NatIntGeneral intelligence and reasoningYes
WritingCreative writing qualityYes
Political LeanPolitical alignment (-100% to 100%)Neutral
Error metricsCooking, GeoGuesser, Weight, etc.No (lower is better)