LLM Ensemble Achieves Expert-Level Accuracy in Content Categorization

A new ensemble framework for content categorization, integrating multiple large language models (LLMs), has achieved a substantial performance improvement, outperforming even the strongest single models by up to 65% in F1-score. The study, conducted by Ariel Kamen of RingCentral Inc. and Yakov Kamen of Relevad Corporation, introduces an ensemble large language model (eLLM) framework that addresses common weaknesses of individual systems, including inconsistency, hallucination, category inflation, and misclassification. The eLLM approach formalizes the ensemble process through a mathematical model of collective decision-making and establishes principled aggregation criteria. Using the Interactive Advertising Bureau (IAB) hierarchical taxonomy, the researchers evaluated ten state-of-the-art LLMs under identical zero-shot conditions on a human-annotated corpus of 8,660 samples. Results show that individual models plateau in performance due to the compression of semantically rich text into sparse categorical representations, while eLLM improves both robustness and accuracy. With a diverse consortium of models, eLLM achieves near human-expert-level performance, offering a scalable and reliable solution for taxonomy-based classification that may significantly reduce dependence on human expert labeling. The study makes several key contributions, including formalizing an ensemble LLM framework for taxonomy-based categorization, providing empirical evidence of substantial improvements over the best single model, and discussing conditions under which eLLM approaches human-expert performance. "Ensemble-based categorization is a practical and robust alternative to single-model LLM classification, particularly when adherence to a fixed taxonomy and reliability under distributional shift are required," the authors state. "This work builds toward a new theory of collaborative AI, where orchestrated ensembles transform unstructured chaos into ordered, expert-level insight."