Scikit-learn ought to have robust model persistence

The recommended approach to persist models created using scikit-learn is to pickle the python model object or use the ONNX format.

When saving the model using pickle there is no way of ensuring compatibility with newer (or older) versions of scikit-learn. You are essentially bound to use the model in the exact same environment as the one in which it was created.

When saving the model to the ONNX format it is a one-way process. There is no way to re-create the model object, so you can only use the saved model within the onnxruntime.

From my experience, using scikit-learn version 1.4 to train and save (pickle) a model, there will be a page full of warnings when loading (unplickling) the same model with scikit-learn version 1.6. That is not a reasonable behaviour for a widely used software package.

The recommended approach to persisting models seems very limiting, considering that scikit-learn models are basically defined by simple numerical parameters.

I cannot think of a good reason why each model class can't have a persistence function, e.g., a function that returns a python dict with model parameters that can be pickled or saved in JSON format. And reversely, that all model classes have a load/initialiser function that can re-create the object from a dict with model parameters.

Such persistence function should of course be created in a robust, future proof and backward compatible way, so that the models can be saved and loaded across multiple versions of scikit-learn (within reasonable limits, e.g., allowing for breaking changes with major releases).

Apparently, I am not the only one rooting for this. But, according to a maintainer, based on previous discussions: "the general agreement was that it is out of the scope of the scikit-learn project for the sake of keeping maintenance costs under control."

Apr 11, 2025