
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/under-sampling/plot_illustration_tomek_links.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_under-sampling_plot_illustration_tomek_links.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_under-sampling_plot_illustration_tomek_links.py:


==============================================
Illustration of the definition of a Tomek link
==============================================

This example illustrates what is a Tomek link.

.. GENERATED FROM PYTHON SOURCE LINES 8-12

.. code-block:: default


    # Authors: Guillaume Lemaitre <g.lemaitre58@gmail.com>
    # License: MIT








.. GENERATED FROM PYTHON SOURCE LINES 13-20

.. code-block:: default

    print(__doc__)

    import matplotlib.pyplot as plt
    import seaborn as sns

    sns.set_context("poster")








.. GENERATED FROM PYTHON SOURCE LINES 21-22

This function allows to make nice plotting

.. GENERATED FROM PYTHON SOURCE LINES 24-35

.. code-block:: default



    def make_plot_despine(ax):
        sns.despine(ax=ax, offset=10)
        ax.set_xlim([0, 3])
        ax.set_ylim([0, 3])
        ax.set_xlabel(r"$X_1$")
        ax.set_ylabel(r"$X_2$")
        ax.legend(loc="lower right")









.. GENERATED FROM PYTHON SOURCE LINES 36-38

We will generate some toy data that illustrates how
:class:`~imblearn.under_sampling.TomekLinks` is used to clean a dataset.

.. GENERATED FROM PYTHON SOURCE LINES 40-54

.. code-block:: default

    import numpy as np

    rng = np.random.RandomState(18)

    X_minority = np.transpose(
        [[1.1, 1.3, 1.15, 0.8, 0.55, 2.1], [1.0, 1.5, 1.7, 2.5, 0.55, 1.9]]
    )
    X_majority = np.transpose(
        [
            [2.1, 2.12, 2.13, 2.14, 2.2, 2.3, 2.5, 2.45],
            [1.5, 2.1, 2.7, 0.9, 1.0, 1.4, 2.4, 2.9],
        ]
    )








.. GENERATED FROM PYTHON SOURCE LINES 55-57

In the figure above, the samples highlighted in green form a Tomek link since
they are of different classes and are nearest neighbors of each other.

.. GENERATED FROM PYTHON SOURCE LINES 57-86

.. code-block:: default


    fig, ax = plt.subplots(figsize=(8, 8))
    ax.scatter(
        X_minority[:, 0],
        X_minority[:, 1],
        label="Minority class",
        s=200,
        marker="_",
    )
    ax.scatter(
        X_majority[:, 0],
        X_majority[:, 1],
        label="Majority class",
        s=200,
        marker="+",
    )

    # highlight the samples of interest
    ax.scatter(
        [X_minority[-1, 0], X_majority[1, 0]],
        [X_minority[-1, 1], X_majority[1, 1]],
        label="Tomek link",
        s=200,
        alpha=0.3,
    )
    make_plot_despine(ax)
    fig.suptitle("Illustration of a Tomek link")
    fig.tight_layout()




.. image-sg:: /auto_examples/under-sampling/images/sphx_glr_plot_illustration_tomek_links_001.png
   :alt: Illustration of a Tomek link
   :srcset: /auto_examples/under-sampling/images/sphx_glr_plot_illustration_tomek_links_001.png
   :class: sphx-glr-single-img





.. GENERATED FROM PYTHON SOURCE LINES 87-91

We can run the :class:`~imblearn.under_sampling.TomekLinks` sampling to
remove the corresponding samples. If `sampling_strategy='auto'` only the
sample from the majority class will be removed. If `sampling_strategy='all'`
both samples will be removed.

.. GENERATED FROM PYTHON SOURCE LINES 93-136

.. code-block:: default

    from imblearn.under_sampling import TomekLinks

    fig, axs = plt.subplots(nrows=1, ncols=2, figsize=(16, 8))

    samplers = {
        "Removing only majority samples": TomekLinks(sampling_strategy="auto"),
        "Removing all samples": TomekLinks(sampling_strategy="all"),
    }

    for ax, (title, sampler) in zip(axs, samplers.items()):
        X_res, y_res = sampler.fit_resample(
            np.vstack((X_minority, X_majority)),
            np.array([0] * X_minority.shape[0] + [1] * X_majority.shape[0]),
        )
        ax.scatter(
            X_res[y_res == 0][:, 0],
            X_res[y_res == 0][:, 1],
            label="Minority class",
            s=200,
            marker="_",
        )
        ax.scatter(
            X_res[y_res == 1][:, 0],
            X_res[y_res == 1][:, 1],
            label="Majority class",
            s=200,
            marker="+",
        )

        # highlight the samples of interest
        ax.scatter(
            [X_minority[-1, 0], X_majority[1, 0]],
            [X_minority[-1, 1], X_majority[1, 1]],
            label="Tomek link",
            s=200,
            alpha=0.3,
        )

        ax.set_title(title)
        make_plot_despine(ax)
    fig.tight_layout()

    plt.show()



.. image-sg:: /auto_examples/under-sampling/images/sphx_glr_plot_illustration_tomek_links_002.png
   :alt: Removing only majority samples, Removing all samples
   :srcset: /auto_examples/under-sampling/images/sphx_glr_plot_illustration_tomek_links_002.png
   :class: sphx-glr-single-img






.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  0.188 seconds)


.. _sphx_glr_download_auto_examples_under-sampling_plot_illustration_tomek_links.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example




    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_illustration_tomek_links.py <plot_illustration_tomek_links.py>`

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_illustration_tomek_links.ipynb <plot_illustration_tomek_links.ipynb>`


.. include:: plot_illustration_tomek_links.recommendations


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
