# MLBox **Repository Path**: kchen032/MLBox ## Basic Information - **Project Name**: MLBox - **Description**: MLBox is a powerful Automated Machine Learning python library. - **Primary Language**: Python - **License**: BSD-3-Clause - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-11-22 - **Last Updated**: 2024-08-05 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README .. image:: docs/logos/logo.png |Documentation Status| |PyPI version| |Build Status| |Windows Build Status| |GitHub Issues| |codecov| |License| ----------------------- **MLBox is a powerful Automated Machine Learning python library.** It provides the following features: * Fast reading and distributed data preprocessing/cleaning/formatting * Highly robust feature selection and leak detection * Accurate hyper-parameter optimization in high-dimensional space * State-of-the art predictive models for classification and regression (Deep Learning, Stacking, LightGBM,...) * Prediction with models interpretation **For more details**, please refer to the `official documentation `__ -------------------------- Getting started: 30 seconds to MLBox ==================================== MLBox main package contains 3 sub-packages : **preprocessing**, **optimisation** and **prediction**. Each one of them are respectively aimed at reading and preprocessing data, testing or optimising a wide range of learners and predicting the target on a test dataset. **Here are a few lines to import the MLBox:** .. code-block:: python from mlbox.preprocessing import * from mlbox.optimisation import * from mlbox.prediction import * **Then, all you need to give is :** * the list of paths to your train datasets and test datasets * the name of the target you try to predict (classification or regression) .. code-block:: python paths = [".csv", ".csv", ..., ".csv"] #to modify target_name = "" #to modify **Now, let the MLBox do the job !** ... to read and preprocess your files : .. code-block:: python data = Reader(sep=",").train_test_split(paths, target_name) #reading data = Drift_thresholder().fit_transform(data) #deleting non-stable variables ... to evaluate models (here default configuration): .. code-block:: python Optimiser().evaluate(None, data) ... or to test and optimize the whole Pipeline [**OPTIONAL**]: * missing data encoder, aka 'ne' * categorical variables encoder, aka 'ce' * feature selector, aka 'fs' * meta-features stacker, aka 'stck' * final estimator, aka 'est' **NB** : please have a look at all the possibilities you have to configure the Pipeline (steps, parameters and values...) .. code-block:: python space = { 'ne__numerical_strategy' : {"space" : [0, 'mean']}, 'ce__strategy' : {"space" : ["label_encoding", "random_projection", "entity_embedding"]}, 'fs__strategy' : {"space" : ["variance", "rf_feature_importance"]}, 'fs__threshold': {"search" : "choice", "space" : [0.1, 0.2, 0.3]},             'est__strategy' : {"space" : ["XGBoost"]}, 'est__max_depth' : {"search" : "choice", "space" : [5,6]}, 'est__subsample' : {"search" : "uniform", "space" : [0.6,0.9]} } best = opt.optimise(space, data, max_evals = 5) ... finally to predict on the test set with the best parameters (or None for default configuration): .. code-block:: python Predictor().fit_predict(best, data) **That's all !** You can have a look at the folder "save" where you can find : * your predictions * feature importances * drift coefficients of your variables (0.5 = very stable, 1. = not stable at all) -------------------------- How to Contribute ================= MLBox has been developed and used by many active community members. Your help is very valuable to make it better for everyone. - Check out `call for contributions `__ to see what can be improved, or open an issue if you want something. - Contribute to the `tests `__ to make it more reliable. - Contribute to the `documents `__ to make it clearer for everyone. - Contribute to the `examples `__ to share your experience with other users. - Open `issue `__ if you met problems during development. For more details, please refer to `CONTRIBUTING `__. .. |Documentation Status| image:: https://readthedocs.org/projects/mlbox/badge/?version=latest :target: http://mlbox.readthedocs.io/en/latest/ .. |PyPI version| image:: https://badge.fury.io/py/mlbox.svg :target: https://pypi.python.org/pypi/mlbox .. |Build Status| image:: https://travis-ci.org/AxeldeRomblay/MLBox.svg?branch=master :target: https://travis-ci.org/AxeldeRomblay/MLBox .. |Windows Build Status| image:: https://ci.appveyor.com/api/projects/status/5ypa8vaed6kpmli8?svg=true :target: https://ci.appveyor.com/project/AxeldeRomblay/mlbox .. |GitHub Issues| image:: https://img.shields.io/github/issues/AxeldeRomblay/MLBox.svg :target: https://github.com/AxeldeRomblay/MLBox/issues .. |codecov| image:: https://codecov.io/gh/AxeldeRomblay/MLBox/branch/master/graph/badge.svg :target: https://codecov.io/gh/AxeldeRomblay/MLBox .. |License| image:: https://img.shields.io/badge/License-BSD%203--Clause-blue.svg :target: https://github.com/AxeldeRomblay/MLBox/blob/master/LICENSE