Understanding Model Benchmarks
A simple guide to the metrics used to evaluate abliterated AI models on Abliz.
How well a model handles sensitive topics without refusing
Willingness score - how far before the model refuses
Natural Intelligence - general knowledge and reasoning
Creative writing ability, style, and output quality
What it measures:
UGI tests a model's knowledge and willingness to engage with topics that many AI models refuse to discuss. A higher UGI score means the model is more capable of providing information without unnecessary restrictions.
Hazardous
Knowledge of sensitive topics that typical AI models avoid discussing
Entertainment
Knowledge of adult or controversial entertainment and media content
SocPol
Knowledge of sensitive socio-political topics and current events
W/10 - Willingness Score
Measures how far a model can be pushed before it refuses to answer. Scale of 0-10, where 10 means the model almost never refuses.
| Metric | What it means | Higher is better? |
|---|---|---|
| UGI | Handles sensitive topics without refusing | Yes |
| W/10 | Willingness to follow instructions | Yes |
| NatInt | General intelligence and reasoning | Yes |
| Writing | Creative writing quality | Yes |
| Political Lean | Political alignment (-100% to 100%) | Neutral |
| Error metrics | Cooking, GeoGuesser, Weight, etc. | No (lower is better) |