.. EpyNN documentation master file, created by
   sphinx-quickstart on Tue Jul  6 18:46:11 2021.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

.. toctree::

Loss - Functions
===============================

Source code in ``EpyNN/epynn/commons/loss.py``.

See `Appendix - Notations`_ for mathematical conventions.

.. _Appendix - Notations: glossary.html#notations

In the context of error backpropagation, a loss function represents:

* Any differentiable function used to evaluate differences between true values *Y* and predicted values *A*.
* The loss function is used to compute the *cost*.
* The derivative of the loss function is used to compute *losses* for each sample and for each output probability from the output layer.

Note that:

* The training of a Neural Network is driven by the derivative of the loss function. The target is to minimize *losses* for each sample and for each output probability.
* The *cost* is computed from the loss function, not the derivative. Computed for each sample and for each output probability, it is most frequently averaged for each sample. A single scalar can be computed by averaging per-sample costs.
* The *cost* is an absolute difference between true values *Y* and predicted values *A*.
* The *loss* qualifies the direction of the difference between true values *Y* and predicted values *A*.
* Loss functions can be modified or implemented from :py:mod:`epynn.commons.loss`.


Mean Squared Error
-------------------------------

.. literalinclude:: ../epynn/commons/loss.py
    :language: python
    :pyobject: MSE
    :lines: 1-2,15-25

Given *M* and *U* the number of training examples and output units, the MSE function can be defined such as:

.. math::

  \begin{alignat*}{2}
    f:\mathcal{M}_{M,U}(\mathbb{R}), \mathcal{M}_{M,U}(\mathbb{R}) &\to && \mathcal{M}_{M,U}(\mathbb{R}_{+}) \\
    A = \mathop{(a_{mu})}_{\substack{1 \le m \le M \\ 1 \le u \le U}}, Y  = \mathop{(y_{mu})}_{\substack{1 \le m \le M \\ 1 \le u \le U}} &\to && \frac{1}{U} * \sum\limits_{u = 1}^U (y_{mu}-a_{mu})^2
  \end{alignat*}

The derivative of the MSE function with respect to *A* can be defined such as:

.. math::

  \begin{alignat*}{2}
    f':\mathcal{M}_{M,U}(\mathbb{R}), \mathcal{M}_{M,U}(\mathbb{R}) &\to && \mathcal{M}_{M,U}(\mathbb{R}) \\
    A = \mathop{(a_{mu})}_{\substack{1 \le m \le M \\ 1 \le u \le U}}, Y  = \mathop{(y_{mu})}_{\substack{1 \le m \le M \\ 1 \le u \le U}} &\to && - \frac{2}{U} * (y_{mu}-a_{mu})
  \end{alignat*}


.. image:: _static/loss/MSE.svg

Note that the output of the MSE function is always positive and increases along with the difference between true values *Y* and predicted values *A*.

By contrast, the output of the derivative of the MSE function is positive or negative and therefore contains information on the direction of the difference between true values *Y* and predicted values *A*.

Mean Absolute Error
-------------------------------

.. literalinclude:: ../epynn/commons/loss.py
    :language: python
    :pyobject: MAE
    :lines: 1-2,15-25

Given *M* and *U* the number of training examples and output units, the MAE function can be defined such as:

.. math::

  \begin{alignat*}{2}
    f:\mathcal{M}_{M,U}(\mathbb{R}), \mathcal{M}_{M,U}(\mathbb{R}) &\to && \mathcal{M}_{M,U}(\mathbb{R}_{+}) \\
    A = \mathop{(a_{mu})}_{\substack{1 \le m \le M \\ 1 \le u \le U}}, Y  = \mathop{(y_{mu})}_{\substack{1 \le m \le M \\ 1 \le u \le U}} &\to && \frac{1}{U} * \sum\limits_{u = 1}^U |y_{mu}-a_{mu}|
  \end{alignat*}

The derivative of the MAE function with respect to *A* can be defined such as:

.. math::

  \begin{alignat*}{2}
    f':\mathcal{M}_{M,U}(\mathbb{R}), \mathcal{M}_{M,U}(\mathbb{R}) &\to && \mathcal{M}_{M,U}(\mathbb{R}) \\
    A = \mathop{(a_{mu})}_{\substack{1 \le m \le M \\ 1 \le u \le U}}, Y  = \mathop{(y_{mu})}_{\substack{1 \le m \le M \\ 1 \le u \le U}} &\to && - \frac{1}{U} * \frac{y_{mu}-a_{mu}}{|y_{mu}-a_{mu}|}
  \end{alignat*}

.. image:: _static/loss/MAE.svg

Note that the output of the MAE function is always positive and increases along with the difference between true values *Y* and predicted values *A*.

By contrast, the output of the derivative of the MAE function is positive - one - or negative - minus one - and therefore contains information on the direction of the difference between true values *Y* and predicted values *A*.

Mean Squared Logarithmic Error
---------------------------------------

.. literalinclude:: ../epynn/commons/loss.py
    :language: python
    :pyobject: MSLE
    :lines: 1-2,15-25

Given *M* and *U* the number of training examples and output units, the MSLE function can be defined such as:

.. math::

  \begin{alignat*}{2}
    f:\mathcal{M}_{M,U}(]-1, \infty)), \mathcal{M}_{M,U}(]-1, \infty)) &\to && \mathcal{M}_{M,U}(\mathbb{R}_{+}) \\
    A = \mathop{(a_{mu})}_{\substack{1 \le m \le M \\ 1 \le u \le U}}, Y  = \mathop{(y_{mu})}_{\substack{1 \le m \le M \\ 1 \le u \le U}} &\to && \frac{1}{U} * \sum\limits_{u = 1}^U (\ln(y_{mu}+1) - \ln(a_{mu}+1))^2
  \end{alignat*}

The derivative of the MSLE function with respect to *A* can be defined such as:

.. math::

  \begin{alignat*}{2}
    f':\mathcal{M}_{M,U}(]-1, \infty)), \mathcal{M}_{M,U}(]-1, \infty)) &\to && \mathcal{M}_{M,U}(\mathbb{R}) \\
    A = \mathop{(a_{mu})}_{\substack{1 \le m \le M \\ 1 \le u \le U}}, Y  = \mathop{(y_{mu})}_{\substack{1 \le m \le M \\ 1 \le u \le U}} &\to && - \frac{2}{U} * \frac{\ln(y_{mu}+1)-\ln(a_{mu}+1)}{a_{mu}+1}
  \end{alignat*}

.. image:: _static/loss/MSLE.svg

Note that the output of the MSLE function is always positive and increases along with the difference between true values *Y* and predicted values *A*.

By contrast, the output of the derivative of the MSLE function is positive or negative and therefore contains information on the direction of the difference between true values *Y* and predicted values *A*.


Binary Cross-Entropy
-------------------------------

.. literalinclude:: ../epynn/commons/loss.py
    :language: python
    :pyobject: BCE
    :lines: 1-2,15-25

Given *M* and *U* the number of training examples and output units, the BCE function can be defined such as:

.. math::

  \begin{alignat*}{2}
    f:\mathcal{M}_{M,U}(\{a \in \mathbb{R}_{+} | a \not\in \{0, 1\}\}), \mathcal{M}_{M,U}(\mathbb{R}) &\to && \mathcal{M}_{M,U}(\mathbb{R}_{+}) \\
    A = \mathop{(a_{mu})}_{\substack{1 \le m \le M \\ 1 \le u \le U}}, Y  = \mathop{(y_{mu})}_{\substack{1 \le m \le M \\ 1 \le u \le U}} &\to && - \frac{1}{U} * \sum\limits_{u = 1}^U y_{mu} * \ln(a_{mu}) + (1-y_{mu}) * \ln(1-a_{mu})
  \end{alignat*}

The derivative of the BCE function with respect to *A* can be defined such as:

.. math::

  \begin{alignat*}{2}
    f':\mathcal{M}_{M,U}(\{a \in \mathbb{R}_{+} | a \not\in \{0, 1\}\}), \mathcal{M}_{M,U}(\mathbb{R}) &\to && \mathcal{M}_{M,U}(\mathbb{R}) \\
    A = \mathop{(a_{mu})}_{\substack{1 \le m \le M \\ 1 \le u \le U}}, Y  = \mathop{(y_{mu})}_{\substack{1 \le m \le M \\ 1 \le u \le U}} &\to && \frac{1}{U} * \frac{A-y_{mu}}{a_{mu} - a_{mu}^2}
  \end{alignat*}

The BCE function is part of the *categorical loss functions*. This is because it is relevant for classification problems. It means that the values *Y* should belong to the set *{0, 1}*  because otherwise the output of the BCE and derivative is always zero.

If this requirement is satisfied, then the output of the BCE function is always positive and increases along with the difference between true values *Y* and predicted values *A*.

By contrast, the output of the derivative of the BCE function is positive or negative and therefore contains information on the direction of the difference between true values *Y* and predicted values *A*.

.. image:: _static/loss/BCE.svg


Categorical Cross-Entropy
-------------------------------

.. literalinclude:: ../epynn/commons/loss.py
    :language: python
    :pyobject: CCE
    :lines: 1-2,15-25

Given *M* and *U* the number of training examples and output units, the CCE function can be defined such as:

.. math::

  \begin{alignat*}{2}
    f:\mathcal{M}_{M,U}(\mathbb{R}_{+}^{*}), \mathcal{M}_{M,U}(\mathbb{R}) &\to && \mathcal{M}_{M,U}(\mathbb{R}_{+}) \\
    A = \mathop{(a_{mu})}_{\substack{1 \le m \le M \\ 1 \le u \le U}}, Y  = \mathop{(y_{mu})}_{\substack{1 \le m \le M \\ 1 \le u \le U}} &\to && - \sum\limits_{u = 1}^U y_{mu} * \ln(a_{mu})
  \end{alignat*}

The derivative of the CCE function with respect to *A* can be defined such as:

.. math::

  \begin{alignat*}{2}
    f':\mathcal{M}_{M,U}(\mathbb{R}^*), \mathcal{M}_{M,U}(\mathbb{R}) &\to && \mathcal{M}_{M,U}(\mathbb{R}) \\
    A = \mathop{(a_{mu})}_{\substack{1 \le m \le M \\ 1 \le u \le U}}, Y  = \mathop{(y_{mu})}_{\substack{1 \le m \le M \\ 1 \le u \le U}} &\to && - \frac{y_{mu}}{a_{mu}}
  \end{alignat*}


The CCE function is part of the *categorical loss functions*. This is because it is relevant for classification problems. It means that the values *Y* should belong to the set *{0, 1}*  because otherwise the output of the CCE and derivative is always zero.

If this requirement is satisfied, then the output of the CCE function is always positive and increases along with the difference between true values *Y* and predicted values *A*.

By contrast, the output of the derivative of the CCE function is positive or negative and therefore contains information on the direction of the difference between true values *Y* and predicted values *A*.


.. image:: _static/loss/CCE.svg