
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/api/plot_sampling_strategy_usage.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_api_plot_sampling_strategy_usage.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_api_plot_sampling_strategy_usage.py:


====================================================
How to use ``sampling_strategy`` in imbalanced-learn
====================================================

This example shows the different usage of the parameter ``sampling_strategy``
for the different family of samplers (i.e. over-sampling, under-sampling. or
cleaning methods).

.. GENERATED FROM PYTHON SOURCE LINES 11-15

.. code-block:: default


    # Authors: Guillaume Lemaitre <g.lemaitre58@gmail.com>
    # License: MIT








.. GENERATED FROM PYTHON SOURCE LINES 16-21

.. code-block:: default

    print(__doc__)
    import seaborn as sns

    sns.set_context("poster")








.. GENERATED FROM PYTHON SOURCE LINES 22-26

Create an imbalanced dataset
----------------------------

First, we will create an imbalanced data set from a the iris data set.

.. GENERATED FROM PYTHON SOURCE LINES 28-37

.. code-block:: default

    from sklearn.datasets import load_iris

    from imblearn.datasets import make_imbalance

    iris = load_iris(as_frame=True)

    sampling_strategy = {0: 10, 1: 20, 2: 47}
    X, y = make_imbalance(iris.data, iris.target, sampling_strategy=sampling_strategy)








.. GENERATED FROM PYTHON SOURCE LINES 38-48

.. code-block:: default

    import matplotlib.pyplot as plt

    fig, axs = plt.subplots(ncols=2, figsize=(10, 5))
    autopct = "%.2f"
    iris.target.value_counts().plot.pie(autopct=autopct, ax=axs[0])
    axs[0].set_title("Original")
    y.value_counts().plot.pie(autopct=autopct, ax=axs[1])
    axs[1].set_title("Imbalanced")
    fig.tight_layout()




.. image-sg:: /auto_examples/api/images/sphx_glr_plot_sampling_strategy_usage_001.png
   :alt: Original, Imbalanced
   :srcset: /auto_examples/api/images/sphx_glr_plot_sampling_strategy_usage_001.png
   :class: sphx-glr-single-img





.. GENERATED FROM PYTHON SOURCE LINES 49-60

Using ``sampling_strategy`` in resampling algorithms
====================================================

`sampling_strategy` as a `float`
--------------------------------

`sampling_strategy` can be given a `float`. For **under-sampling
methods**, it corresponds to the ratio :math:`\alpha_{us}` defined by
:math:`N_{rM} = \alpha_{us} \times N_{m}` where :math:`N_{rM}` and
:math:`N_{m}` are the number of samples in the majority class after
resampling and the number of samples in the minority class, respectively.

.. GENERATED FROM PYTHON SOURCE LINES 62-69

.. code-block:: default

    import numpy as np

    # select only 2 classes since the ratio make sense in this case
    binary_mask = np.bitwise_or(y == 0, y == 2)
    binary_y = y[binary_mask]
    binary_X = X[binary_mask]








.. GENERATED FROM PYTHON SOURCE LINES 70-78

.. code-block:: default

    from imblearn.under_sampling import RandomUnderSampler

    sampling_strategy = 0.8
    rus = RandomUnderSampler(sampling_strategy=sampling_strategy)
    X_res, y_res = rus.fit_resample(binary_X, binary_y)
    ax = y_res.value_counts().plot.pie(autopct=autopct)
    _ = ax.set_title("Under-sampling")




.. image-sg:: /auto_examples/api/images/sphx_glr_plot_sampling_strategy_usage_002.png
   :alt: Under-sampling
   :srcset: /auto_examples/api/images/sphx_glr_plot_sampling_strategy_usage_002.png
   :class: sphx-glr-single-img





.. GENERATED FROM PYTHON SOURCE LINES 79-84

For **over-sampling methods**, it correspond to the ratio
:math:`\alpha_{os}` defined by :math:`N_{rm} = \alpha_{os} \times N_{M}`
where :math:`N_{rm}` and :math:`N_{M}` are the number of samples in the
minority class after resampling and the number of samples in the majority
class, respectively.

.. GENERATED FROM PYTHON SOURCE LINES 86-93

.. code-block:: default

    from imblearn.over_sampling import RandomOverSampler

    ros = RandomOverSampler(sampling_strategy=sampling_strategy)
    X_res, y_res = ros.fit_resample(binary_X, binary_y)
    ax = y_res.value_counts().plot.pie(autopct=autopct)
    _ = ax.set_title("Over-sampling")




.. image-sg:: /auto_examples/api/images/sphx_glr_plot_sampling_strategy_usage_003.png
   :alt: Over-sampling
   :srcset: /auto_examples/api/images/sphx_glr_plot_sampling_strategy_usage_003.png
   :class: sphx-glr-single-img





.. GENERATED FROM PYTHON SOURCE LINES 94-102

`sampling_strategy` as a `str`
-------------------------------

`sampling_strategy` can be given as a string which specify the class
targeted by the resampling. With under- and over-sampling, the number of
samples will be equalized.

Note that we are using multiple classes from now on.

.. GENERATED FROM PYTHON SOURCE LINES 104-118

.. code-block:: default

    sampling_strategy = "not minority"

    fig, axs = plt.subplots(ncols=2, figsize=(10, 5))
    rus = RandomUnderSampler(sampling_strategy=sampling_strategy)
    X_res, y_res = rus.fit_resample(X, y)
    y_res.value_counts().plot.pie(autopct=autopct, ax=axs[0])
    axs[0].set_title("Under-sampling")

    sampling_strategy = "not majority"
    ros = RandomOverSampler(sampling_strategy=sampling_strategy)
    X_res, y_res = ros.fit_resample(X, y)
    y_res.value_counts().plot.pie(autopct=autopct, ax=axs[1])
    _ = axs[1].set_title("Over-sampling")




.. image-sg:: /auto_examples/api/images/sphx_glr_plot_sampling_strategy_usage_004.png
   :alt: Under-sampling, Over-sampling
   :srcset: /auto_examples/api/images/sphx_glr_plot_sampling_strategy_usage_004.png
   :class: sphx-glr-single-img





.. GENERATED FROM PYTHON SOURCE LINES 119-121

With **cleaning method**, the number of samples in each class will not be
equalized even if targeted.

.. GENERATED FROM PYTHON SOURCE LINES 123-131

.. code-block:: default

    from imblearn.under_sampling import TomekLinks

    sampling_strategy = "not minority"
    tl = TomekLinks(sampling_strategy=sampling_strategy)
    X_res, y_res = tl.fit_resample(X, y)
    ax = y_res.value_counts().plot.pie(autopct=autopct)
    _ = ax.set_title("Cleaning")




.. image-sg:: /auto_examples/api/images/sphx_glr_plot_sampling_strategy_usage_005.png
   :alt: Cleaning
   :srcset: /auto_examples/api/images/sphx_glr_plot_sampling_strategy_usage_005.png
   :class: sphx-glr-single-img





.. GENERATED FROM PYTHON SOURCE LINES 132-139

`sampling_strategy` as a `dict`
------------------------------

When `sampling_strategy` is a `dict`, the keys correspond to the targeted
classes. The values correspond to the desired number of samples for each
targeted class. This is working for both **under- and over-sampling**
algorithms but not for the **cleaning algorithms**. Use a `list` instead.

.. GENERATED FROM PYTHON SOURCE LINES 141-155

.. code-block:: default

    fig, axs = plt.subplots(ncols=2, figsize=(10, 5))

    sampling_strategy = {0: 10, 1: 15, 2: 20}
    rus = RandomUnderSampler(sampling_strategy=sampling_strategy)
    X_res, y_res = rus.fit_resample(X, y)
    y_res.value_counts().plot.pie(autopct=autopct, ax=axs[0])
    axs[0].set_title("Under-sampling")

    sampling_strategy = {0: 25, 1: 35, 2: 47}
    ros = RandomOverSampler(sampling_strategy=sampling_strategy)
    X_res, y_res = ros.fit_resample(X, y)
    y_res.value_counts().plot.pie(autopct=autopct, ax=axs[1])
    _ = axs[1].set_title("Under-sampling")




.. image-sg:: /auto_examples/api/images/sphx_glr_plot_sampling_strategy_usage_006.png
   :alt: Under-sampling, Under-sampling
   :srcset: /auto_examples/api/images/sphx_glr_plot_sampling_strategy_usage_006.png
   :class: sphx-glr-single-img





.. GENERATED FROM PYTHON SOURCE LINES 156-162

`sampling_strategy` as a `list`
-------------------------------

When `sampling_strategy` is a `list`, the list contains the targeted
classes. It is used only for **cleaning methods** and raise an error
otherwise.

.. GENERATED FROM PYTHON SOURCE LINES 164-170

.. code-block:: default

    sampling_strategy = [0, 1, 2]
    tl = TomekLinks(sampling_strategy=sampling_strategy)
    X_res, y_res = tl.fit_resample(X, y)
    ax = y_res.value_counts().plot.pie(autopct=autopct)
    _ = ax.set_title("Cleaning")




.. image-sg:: /auto_examples/api/images/sphx_glr_plot_sampling_strategy_usage_007.png
   :alt: Cleaning
   :srcset: /auto_examples/api/images/sphx_glr_plot_sampling_strategy_usage_007.png
   :class: sphx-glr-single-img





.. GENERATED FROM PYTHON SOURCE LINES 171-177

`sampling_strategy` as a callable
---------------------------------

When callable, function taking `y` and returns a `dict`. The keys
correspond to the targeted classes. The values correspond to the desired
number of samples for each class.

.. GENERATED FROM PYTHON SOURCE LINES 180-195

.. code-block:: default

    def ratio_multiplier(y):
        from collections import Counter

        multiplier = {1: 0.7, 2: 0.95}
        target_stats = Counter(y)
        for key, value in target_stats.items():
            if key in multiplier:
                target_stats[key] = int(value * multiplier[key])
        return target_stats


    X_res, y_res = RandomUnderSampler(sampling_strategy=ratio_multiplier).fit_resample(X, y)
    ax = y_res.value_counts().plot.pie(autopct=autopct)
    ax.set_title("Under-sampling")
    plt.show()



.. image-sg:: /auto_examples/api/images/sphx_glr_plot_sampling_strategy_usage_008.png
   :alt: Under-sampling
   :srcset: /auto_examples/api/images/sphx_glr_plot_sampling_strategy_usage_008.png
   :class: sphx-glr-single-img






.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  1.024 seconds)


.. _sphx_glr_download_auto_examples_api_plot_sampling_strategy_usage.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example




    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_sampling_strategy_usage.py <plot_sampling_strategy_usage.py>`

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_sampling_strategy_usage.ipynb <plot_sampling_strategy_usage.ipynb>`


.. include:: plot_sampling_strategy_usage.recommendations


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
