How the benchmark works: Two files are uploaded to an LLM — (1) an assessment framework defining 31 evaluation criteria, and (2) a catalogue of 15 moral theories written in comparable depth. The LLM is prompted to evaluate every theory against every criterion and produce a scored ranking. The process is then repeated with different models to test for consistency.
Anonymization: SFOM was submitted without any author name or attribution attached — as can be verified by inspecting the
theory catalogue files
directly. It appears as just another numbered theory ("Subjective-Frame Objective Morality Model") alongside the other 14. This eliminates the sycophancy factor — the model has no reason to favor it over any other entry.
Why this matters (and its limits): LLM-based evaluation is an imperfect but useful signal. These models can't "do philosophy" the way a domain expert can — they can't truly judge originality or depth of argument. But they
can assess structural properties: internal consistency, scope of applicability, how well a theory addresses known edge cases, compatibility with empirical findings. When the same theory wins across different models (Grok 3, Claude Opus 4.6), different conditions (normal, steelmanned), and different time periods (2025, 2026) — all without the theory being updated — that's a meaningful signal, even if it's not a substitute for peer review.
Three evaluation runs:
Grok 3 Normal (Feb 2025): Base theory summaries assessed by Grok 3.
Grok 3 Steelman (Feb 2025): LLM-steelmanned versions of each theory (except SKL, used as control — its text stayed identical).
Claude Opus 4.6 (Mar 2026): Same v1 files, one year later, different model with Extended Thinking enabled. No theory updates.
All theories were assessed on equal footing using the same v1 benchmark framework.
SFOM received no special treatment — it competed under the same rules as all 14 other theories.
Important context: The SFOM entry in the benchmark is only a condensed summary of my full moral theory, last updated in this system in Feb 2025. The actual theory is significantly more detailed and has improved dramatically since then. I plan to input more comprehensive versions of SFOM and the competing theories, improve the assessment framework, expand the criteria, add more LLMs, and ultimately automate the entire pipeline so anyone can reproduce and extend these results. Even in its current state — a year-old summary competing against 14 other theories — SFOM still wins consistently.