Ethics & Bias - Gegentic

This feature is experimental and may not be available on every plan.

Each night, Gegentic samples a portion of the previous day’s traces for every project and scores them against six LLM-judge criteria: toxicity, demographic bias, sentiment consistency, fairness, misinformation, and privacy leakage. Evaluation runs asynchronously and never blocks live traffic.

What’s on the report

Overall Ethics Score — a 0–100 gauge; scores below 60 on any evaluator flag the trace
Traces Sampled — how many traces were evaluated, out of the total for that period
Flagged Traces — count of flagged traces, broken out by high-severity findings
Agents Evaluated — how many agents in the project were covered
30-Day Trend — a rolling chart with reference lines at 80 (good) and 60 (risk)
Evaluator Breakdown — scores per evaluator (toxicity, bias, etc.), clickable to filter the flagged-traces table
Agent Ethics Scores — per-agent comparison; agents scoring below 60 are called out for review

Methodology

Sampling runs daily at 03:00 UTC against roughly 17% of the prior day’s traces
Six LLM-judge evaluators score each sampled trace from 0–100
A score below 60 on any evaluator flags the trace for review
Evaluation is fully asynchronous — it never adds latency to or blocks production traffic

Review Queue Notification Channels

​What’s on the report

​Methodology

What’s on the report

Methodology