
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/model_selection/plot_validation_curve.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_model_selection_plot_validation_curve.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_model_selection_plot_validation_curve.py:


==========================
Plotting Validation Curves
==========================

In this example the impact of the :class:`~imblearn.over_sampling.SMOTE`'s
`k_neighbors` parameter is examined. In the plot you can see the validation
scores of a SMOTE-CART classifier for different values of the
:class:`~imblearn.over_sampling.SMOTE`'s `k_neighbors` parameter.

.. GENERATED FROM PYTHON SOURCE LINES 11-16

.. code-block:: default


    # Authors: Christos Aridas
    #          Guillaume Lemaitre <g.lemaitre58@gmail.com>
    # License: MIT








.. GENERATED FROM PYTHON SOURCE LINES 17-26

.. code-block:: default

    print(__doc__)

    import seaborn as sns

    sns.set_context("poster")


    RANDOM_STATE = 42








.. GENERATED FROM PYTHON SOURCE LINES 27-28

Let's first generate a dataset with imbalanced class distribution.

.. GENERATED FROM PYTHON SOURCE LINES 30-45

.. code-block:: default

    from sklearn.datasets import make_classification

    X, y = make_classification(
        n_classes=2,
        class_sep=2,
        weights=[0.1, 0.9],
        n_informative=10,
        n_redundant=1,
        flip_y=0,
        n_features=20,
        n_clusters_per_class=4,
        n_samples=5000,
        random_state=RANDOM_STATE,
    )








.. GENERATED FROM PYTHON SOURCE LINES 46-50

We will use an over-sampler :class:`~imblearn.over_sampling.SMOTE` followed
by a :class:`~sklearn.tree.DecisionTreeClassifier`. The aim will be to
search which `k_neighbors` parameter is the most adequate with the dataset
that we generated.

.. GENERATED FROM PYTHON SOURCE LINES 50-53

.. code-block:: default


    from sklearn.tree import DecisionTreeClassifier








.. GENERATED FROM PYTHON SOURCE LINES 54-61

.. code-block:: default

    from imblearn.over_sampling import SMOTE
    from imblearn.pipeline import make_pipeline

    model = make_pipeline(
        SMOTE(random_state=RANDOM_STATE), DecisionTreeClassifier(random_state=RANDOM_STATE)
    )








.. GENERATED FROM PYTHON SOURCE LINES 62-66

We can use the :class:`~sklearn.model_selection.validation_curve` to inspect
the impact of varying the parameter `k_neighbors`. In this case, we need
to use a score to evaluate the generalization score during the
cross-validation.

.. GENERATED FROM PYTHON SOURCE LINES 68-83

.. code-block:: default

    from sklearn.metrics import cohen_kappa_score, make_scorer
    from sklearn.model_selection import validation_curve

    scorer = make_scorer(cohen_kappa_score)
    param_range = range(1, 11)
    train_scores, test_scores = validation_curve(
        model,
        X,
        y,
        param_name="smote__k_neighbors",
        param_range=param_range,
        cv=3,
        scoring=scorer,
    )








.. GENERATED FROM PYTHON SOURCE LINES 84-89

.. code-block:: default

    train_scores_mean = train_scores.mean(axis=1)
    train_scores_std = train_scores.std(axis=1)
    test_scores_mean = test_scores.mean(axis=1)
    test_scores_std = test_scores.std(axis=1)








.. GENERATED FROM PYTHON SOURCE LINES 90-92

We can now plot the results of the cross-validation for the different
parameter values that we tried.

.. GENERATED FROM PYTHON SOURCE LINES 94-124

.. code-block:: default

    import matplotlib.pyplot as plt

    fig, ax = plt.subplots(figsize=(7, 7))
    ax.plot(param_range, test_scores_mean, label="SMOTE")
    ax.fill_between(
        param_range,
        test_scores_mean + test_scores_std,
        test_scores_mean - test_scores_std,
        alpha=0.2,
    )
    idx_max = test_scores_mean.argmax()
    ax.scatter(
        param_range[idx_max],
        test_scores_mean[idx_max],
        label=r"Cohen Kappa: ${:.2f}\pm{:.2f}$".format(
            test_scores_mean[idx_max], test_scores_std[idx_max]
        ),
    )

    fig.suptitle("Validation Curve with SMOTE-CART")
    ax.set_xlabel("Number of neighbors")
    ax.set_ylabel("Cohen's kappa")

    # make nice plotting
    sns.despine(ax=ax, offset=10)
    ax.set_xlim([1, 10])
    ax.set_ylim([0.4, 0.8])
    ax.legend(loc="lower right", fontsize=16)
    plt.tight_layout()
    plt.show()



.. image-sg:: /auto_examples/model_selection/images/sphx_glr_plot_validation_curve_001.png
   :alt: Validation Curve with SMOTE-CART
   :srcset: /auto_examples/model_selection/images/sphx_glr_plot_validation_curve_001.png
   :class: sphx-glr-single-img






.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  3.544 seconds)


.. _sphx_glr_download_auto_examples_model_selection_plot_validation_curve.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example




    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_validation_curve.py <plot_validation_curve.py>`

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_validation_curve.ipynb <plot_validation_curve.ipynb>`


.. include:: plot_validation_curve.recommendations


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
