Version 0.4.2#
October 21, 2018
Changelog#
Bug fixes#
Fix a bug in
imblearn.over_sampling.SMOTENCin which the the median of the standard deviation instead of half of the median of the standard deviation. By Guillaume Lemaitre in #491.Raise an error when passing target which is not supported, i.e. regression target or multilabel targets. Imbalanced-learn does not support this case. By Guillaume Lemaitre in #490.
Fix a bug in
imblearn.over_sampling.SMOTENCin which a sparse matrices were densify duringinverse_transform. By Guillaume Lemaitre in #495.Fix a bug in
imblearn.over_sampling.SMOTE_NCin which a the tie breaking was wrongly sampling. By Guillaume Lemaitre in #497.
Version 0.4#
October 12, 2018
Warning
Version 0.4 is the last version of imbalanced-learn to support Python 2.7 and Python 3.4. Imbalanced-learn 0.5 will require Python 3.5 or higher.
Highlights#
This release brings its set of new feature as well as some API changes to strengthen the foundation of imbalanced-learn.
As new feature, 2 new modules imblearn.keras and
imblearn.tensorflow have been added in which imbalanced-learn samplers
can be used to generate balanced mini-batches.
The module imblearn.ensemble has been consolidated with new classifier:
imblearn.ensemble.BalancedRandomForestClassifier,
imblearn.ensemble.EasyEnsembleClassifier,
imblearn.ensemble.RUSBoostClassifier.
Support for string has been added in
imblearn.over_sampling.RandomOverSampler and
imblearn.under_sampling.RandomUnderSampler. In addition, a new class
imblearn.over_sampling.SMOTENC allows to generate sample with data
sets containing both continuous and categorical features.
The imblearn.over_sampling.SMOTE has been simplified and break down
to 2 additional classes:
imblearn.over_sampling.SVMSMOTE and
imblearn.over_sampling.BorderlineSMOTE.
There is also some changes regarding the API:
the parameter sampling_strategy has been introduced to replace the
ratio parameter. In addition, the return_indices argument has been
deprecated and all samplers will exposed a sample_indices_ whenever this is
possible.
Changelog#
API#
Replace the parameter
ratiobysampling_strategy. #411 by Guillaume Lemaitre.Enable to use a
floatwith binary classification forsampling_strategy. #411 by Guillaume Lemaitre.Enable to use a
listfor the cleaning methods to specify the class to sample. #411 by Guillaume Lemaitre.Replace
fit_samplebyfit_resample. An alias is still available for backward compatibility. In addition,samplehas been removed to avoid resampling on different set of data. #462 by Guillaume Lemaitre.
New features#
Add a
kerasandtensorflowmodules to create balanced mini-batches generator. #409 by Guillaume Lemaitre.Add
imblearn.ensemble.EasyEnsembleClassifierwhich create a bag of AdaBoost classifier trained on balanced bootstrap samples. #455 by Guillaume Lemaitre.Add
imblearn.ensemble.BalancedRandomForestClassifierwhich balanced each bootstrap provided to each tree of the forest. #459 by Guillaume Lemaitre.Add
imblearn.ensemble.RUSBoostClassifierwhich applied a random under-sampling stage before each boosting iteration of AdaBoost. #469 by Guillaume Lemaitre.Add
imblern.over_sampling.SMOTENCwhich generate synthetic samples on data set with heterogeneous data type (continuous and categorical features). #412 by Denis Dudnik and Guillaume Lemaitre.
Enhancement#
Add a documentation node to create a balanced random forest from a balanced bagging classifier. #372 by Guillaume Lemaitre.
Document the metrics to evaluate models on imbalanced dataset. #367 by Guillaume Lemaitre.
Add support for one-vs-all encoded target to support keras. #409 by Guillaume Lemaitre.
Adding specific class for borderline and SVM SMOTE using
BorderlineSMOTEandSVMSMOTE. #440 by Guillaume Lemaitre.Allow
imblearn.over_sampling.RandomOverSamplercan return indices using the attributesreturn_indices. #439 by Hugo Gascon and Guillaume Lemaitre.Allow
imblearn.under_sampling.RandomUnderSamplerandimblearn.over_sampling.RandomOverSamplerto sample object array containing strings. #451 by Guillaume Lemaitre.
Bug fixes#
Fix bug in
metrics.classification_report_imbalancedfor whichy_predandy_truewhere inversed. #394 by @Ole Silvig <klizter>.Fix bug in ADASYN to consider only samples from the current class when generating new samples. #354 by Guillaume Lemaitre.
Fix bug which allow for sorted behavior of
sampling_strategydictionary and thus to obtain a deterministic results when using the same random state. #447 by Guillaume Lemaitre.Force to clone scikit-learn estimator passed as attributes to samplers. #446 by Guillaume Lemaitre.
Fix bug which was not preserving the dtype of X and y when generating samples. #450 by Guillaume Lemaitre.
Add the option to pass a
Memoryobject tomake_pipelinelike inpipeline.Pipelineclass. #458 by Christos Aridas.
Maintenance#
Remove deprecated parameters in 0.2 - #331 by Guillaume Lemaitre.
Make some modules private. #452 by Guillaume Lemaitre.
Upgrade requirements to scikit-learn 0.20. #379 by Guillaume Lemaitre.
Catch deprecation warning in testing. #441 by Guillaume Lemaitre.
Refactor and impose
pyteststyle tests. #470 by Guillaume Lemaitre.
Documentation#
Remove some docstring which are not necessary. #454 by Guillaume Lemaitre.
Fix the documentation of the
sampling_strategyparameters when used as a float. #480 by Guillaume Lemaitre.
Deprecation#
Deprecate
ratioin favor ofsampling_strategy. #411 by Guillaume Lemaitre.Deprecate the use of a
dictfor cleaning methods. alistshould be used. #411 by Guillaume Lemaitre.Deprecate
random_stateinimblearn.under_sampling.NearMiss,imblearn.under_sampling.EditedNearestNeighbors,imblearn.under_sampling.RepeatedEditedNearestNeighbors,imblearn.under_sampling.AllKNN,imblearn.under_sampling.NeighbourhoodCleaningRule,imblearn.under_sampling.InstanceHardnessThreshold,imblearn.under_sampling.CondensedNearestNeighbours.Deprecate
kind,out_step,svm_estimator,m_neighborsinimblearn.over_sampling.SMOTE. User should useimblearn.over_sampling.SVMSMOTEandimblearn.over_sampling.BorderlineSMOTE. #440 by Guillaume Lemaitre.Deprecate
imblearn.ensemble.EasyEnsemblein favor of meta-estimatorimblearn.ensemble.EasyEnsembleClassifierwhich follow the exact algorithm described in the literature. #455 by Guillaume Lemaitre.Deprecate
imblearn.ensemble.BalanceCascade. #472 by Guillaume Lemaitre.Deprecate
return_indicesin all samplers. Instead, an attributesample_indices_is created whenever the sampler is selecting a subset of the original samples. #474 by @Guillaume Lemaitre <glemaitre.