Archive

Archive for 2025

Approximate statistical tests for comparing binary classifier error rates using H2OML

Motivation

You have just trained a gradient boosting machine (GBM) and a random forest (RF) classifier on your data using Stata’s new h2oml command suite. Your GBM model achieves 87% accuracy on the testing data, and your RF model, 85%. It looks as if GBM is the preferred classifier, right? Not so fast.

Why accuracy alone isn’t enough

Accuracy, area under the curve, and root mean squared error are popular metrics, but they provide only point estimates. These numbers reflect how well a model performed on one specific testing sample, but they don’t account for the variability that can arise from sample to sample. In other words, they don’t answer this key question: Will the difference in performance between these methods hold at the population level, or could it have occurred by chance only in this particular testing dataset? Read more…

Stata 19 is released!