
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/serialization_and_wrappers.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_serialization_and_wrappers.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_serialization_and_wrappers.py:


Serialization of un-picklable objects
=====================================

This example highlights the options for tempering with joblib serialization
process.

.. GENERATED FROM PYTHON SOURCE LINES 10-23

.. code-block:: default


    # Code source: Thomas Moreau
    # License: BSD 3 clause

    import sys
    import time
    import traceback
    from joblib.externals.loky import set_loky_pickler
    from joblib import parallel_config
    from joblib import Parallel, delayed
    from joblib import wrap_non_picklable_objects









.. GENERATED FROM PYTHON SOURCE LINES 24-30

First, define functions which cannot be pickled with the standard ``pickle``
protocol. They cannot be serialized with ``pickle`` because they are defined
in the ``__main__`` module. They can however be serialized with
``cloudpickle``. With the default behavior, ``loky`` is to use
``cloudpickle`` to serialize the objects that are sent to the workers.


.. GENERATED FROM PYTHON SOURCE LINES 30-38

.. code-block:: default


    def func_async(i, *args):
        return 2 * i


    print(Parallel(n_jobs=2)(delayed(func_async)(21) for _ in range(1))[0])






.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    42




.. GENERATED FROM PYTHON SOURCE LINES 39-43

For most use-cases, using ``cloudpickle`` is efficient enough. However, this
solution can be very slow to serialize large python objects, such as dict or
list, compared to the standard ``pickle`` serialization.


.. GENERATED FROM PYTHON SOURCE LINES 43-58

.. code-block:: default


    def func_async(i, *args):
        return 2 * i


    # We have to pass an extra argument with a large list (or another large python
    # object).
    large_list = list(range(1000000))

    t_start = time.time()
    Parallel(n_jobs=2)(delayed(func_async)(21, large_list) for _ in range(1))
    print("With loky backend and cloudpickle serialization: {:.3f}s"
          .format(time.time() - t_start))






.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    With loky backend and cloudpickle serialization: 0.041s




.. GENERATED FROM PYTHON SOURCE LINES 59-63

If you are on a UNIX system, it is possible to fallback to the old
``multiprocessing`` backend, which can pickle interactively defined functions
with the default pickle module, which is faster for such large objects.


.. GENERATED FROM PYTHON SOURCE LINES 63-77

.. code-block:: default


    import multiprocessing as mp
    if mp.get_start_method() != "spawn":
        def func_async(i, *args):
            return 2 * i

        with parallel_config('multiprocessing'):
            t_start = time.time()
            Parallel(n_jobs=2)(
                delayed(func_async)(21, large_list) for _ in range(1))
            print("With multiprocessing backend and pickle serialization: {:.3f}s"
                  .format(time.time() - t_start))






.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    With multiprocessing backend and pickle serialization: 0.098s




.. GENERATED FROM PYTHON SOURCE LINES 78-92

However, using ``fork`` to start new processes can cause violation of the
POSIX specification and can have bad interaction with compiled extensions
that use ``openmp``. Also, it is not possible to start processes with
``fork`` on windows where only ``spawn`` is available. The ``loky`` backend
has been developed to mitigate these issues.

To have fast pickling with ``loky``, it is possible to rely on ``pickle`` to
serialize all communications between the main process and the workers with
the ``loky`` backend. This can be done by setting the environment variable
``LOKY_PICKLER=pickle`` before the script is launched. Here we use an
internal programmatic switch ``loky.set_loky_pickler`` for demonstration
purposes but it has the same effect as setting ``LOKY_PICKLER``. Note that
this switch should not be used as it has some side effects with the workers.


.. GENERATED FROM PYTHON SOURCE LINES 92-103

.. code-block:: default


    # Now set the `loky_pickler` to use the pickle serialization from stdlib. Here,
    # we do not pass the desired function ``func_async`` as it is not picklable
    # but it is replaced by ``id`` for demonstration purposes.

    set_loky_pickler('pickle')
    t_start = time.time()
    Parallel(n_jobs=2)(delayed(id)(large_list) for _ in range(1))
    print("With pickle serialization: {:.3f}s".format(time.time() - t_start))






.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    With pickle serialization: 0.041s




.. GENERATED FROM PYTHON SOURCE LINES 104-108

However, the function and objects defined in ``__main__`` are not
serializable anymore using ``pickle`` and it is not possible to call
``func_async`` using this pickler.


.. GENERATED FROM PYTHON SOURCE LINES 108-119

.. code-block:: default


    def func_async(i, *args):
        return 2 * i


    try:
        Parallel(n_jobs=2)(delayed(func_async)(21, large_list) for _ in range(1))
    except Exception:
        traceback.print_exc(file=sys.stdout)






.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    joblib.externals.loky.process_executor._RemoteTraceback: 
    """
    Traceback (most recent call last):
      File "/home/tom/.local/miniconda/lib/python3.10/site-packages/joblib/externals/loky/process_executor.py", line 426, in _process_worker
        call_item = call_queue.get(block=True, timeout=timeout)
      File "/home/tom/.local/miniconda/lib/python3.10/multiprocessing/queues.py", line 122, in get
        return _ForkingPickler.loads(res)
    AttributeError: Can't get attribute 'func_async' on <module 'joblib.externals.loky.backend.popen_loky_posix' from '/home/tom/.local/miniconda/lib/python3.10/site-packages/joblib/externals/loky/backend/popen_loky_posix.py'>
    """

    The above exception was the direct cause of the following exception:

    Traceback (most recent call last):
      File "/home/tom/Work/prog/joblib/examples/serialization_and_wrappers.py", line 114, in <module>
        Parallel(n_jobs=2)(delayed(func_async)(21, large_list) for _ in range(1))
      File "/home/tom/.local/miniconda/lib/python3.10/site-packages/joblib/parallel.py", line 1952, in __call__
        return output if self.return_generator else list(output)
      File "/home/tom/.local/miniconda/lib/python3.10/site-packages/joblib/parallel.py", line 1595, in _get_outputs
        yield from self._retrieve()
      File "/home/tom/.local/miniconda/lib/python3.10/site-packages/joblib/parallel.py", line 1699, in _retrieve
        self._raise_error_fast()
      File "/home/tom/.local/miniconda/lib/python3.10/site-packages/joblib/parallel.py", line 1734, in _raise_error_fast
        error_job.get_result(self.timeout)
      File "/home/tom/.local/miniconda/lib/python3.10/site-packages/joblib/parallel.py", line 736, in get_result
        return self._return_or_raise()
      File "/home/tom/.local/miniconda/lib/python3.10/site-packages/joblib/parallel.py", line 754, in _return_or_raise
        raise self._result
    joblib.externals.loky.process_executor.BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.




.. GENERATED FROM PYTHON SOURCE LINES 120-130

To have both fast pickling, safe process creation and serialization of
interactive functions, ``joblib`` provides a wrapper function
:func:`~joblib.wrap_non_picklable_objects` to wrap the non-picklable function
and indicate to the serialization process that this specific function should
be serialized using ``cloudpickle``. This changes the serialization behavior
only for this function and keeps using ``pickle`` for all other objects. The
drawback of this solution is that it modifies the object. This should not
cause many issues with functions but can have side effects with object
instances.


.. GENERATED FROM PYTHON SOURCE LINES 130-143

.. code-block:: default


    @delayed
    @wrap_non_picklable_objects
    def func_async_wrapped(i, *args):
        return 2 * i


    t_start = time.time()
    Parallel(n_jobs=2)(func_async_wrapped(21, large_list) for _ in range(1))
    print("With pickle from stdlib and wrapper: {:.3f}s"
          .format(time.time() - t_start))






.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    With pickle from stdlib and wrapper: 0.181s




.. GENERATED FROM PYTHON SOURCE LINES 144-150

The same wrapper can also be used for non-picklable classes. Note that the
side effects of ``wrap_non_picklable_objects`` on objects can break magic
methods such as ``__add__`` and can mess up the ``isinstance`` and
``issubclass`` functions. Some improvements will be considered if use-cases
are reported.


.. GENERATED FROM PYTHON SOURCE LINES 150-154

.. code-block:: default


    # Reset the loky_pickler to avoid border effects with other examples in
    # sphinx-gallery.
    set_loky_pickler()








.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  0.420 seconds)


.. _sphx_glr_download_auto_examples_serialization_and_wrappers.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example




    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: serialization_and_wrappers.py <serialization_and_wrappers.py>`

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: serialization_and_wrappers.ipynb <serialization_and_wrappers.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
