.. EpyNN documentation master file, created by
   sphinx-quickstart on Tue Jul  6 18:46:11 2021.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

.. toctree::

Architecture Layers - Model
===============================

EpyNN was made modular and architecture layers may be added or modified easily by following a few rules.

* ``Layer`` is the **parent** layer (:py:class:`epynn.commons.models.Layer`) from which all other layers inherit.

* Class definition for custom layers must inherit from ``Layer`` and comply with the method scheme shown for the ``Template`` layer (:py:class:`epynn.template.models.Template`).


Base Layer
------------------------------

Source code in ``EpyNN/epynn/commons/models.py``.

.. image:: _static/other/base-01.svg

Within the `EpyNN Model`_, any architecture layer may be called  such as ``model.layers[i]`` with an explicit alias for ``model.embedding``. Layer instance attributes described below can be inspected from ``model.layers[-1].p['W']`` for instance, which would call the weight associated with the last layer of the Network, predictably a `Dense`_ layer. All layers must inherit from **Base Layer**.

.. autoclass:: epynn.commons.models.Layer

    .. automethod:: __init__

    .. automethod:: update_shapes

Template Layer
------------------------------

Source files in ``EpyNN/epynn/template/``.

An architecture layer may be defined as a finite set of functions (1).

.. math:: layer = \{ f_1,...,f_n | n \in \mathbb{N}^* \}  \tag{1}

In Python, this translates into a class which contains class methods. When designing a custom layer or attempting to modify one natively provided with EpyNN, one should be careful to **not deviate from the organization shown below**.

To avoid having to re-code the whole thing, child layer classes must at least contain the methods shown below. If the method is not relevant to the layer, as exemplified below, the same method **should simply be made dummy**. See the `EpyNN Model`_ training algorithm for details about the training procedure.

Finally, EpyNN was written to prevent users from having hundreds of lines of code within a class definition, which may negatively impact the educational side for peoples less experienced in programming. As such, methods are natively wrappers of simple functions located within a small number of files in the layer's directory. Note that custom layers or further development do not need to stick this rule.

.. autoclass:: epynn.template.models.Template
    :show-inheritance:

    .. automethod:: __init__

Shapes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    .. automethod:: epynn.template.models.Template.compute_shapes

        .. literalinclude:: ./../epynn/template/parameters.py
            :pyobject: template_compute_shapes

        Within a *Template* pass-through layer, shapes of interest include:

        * Input *X* of shape *(m, ...)* with *m* equal to the number of samples. The number of input dimensions is unknown.
        * The number of features *n* per sample can still be determined formally: it is equal to the size of the input *X* divided by the number of samples *m*.

Initialize parameters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    .. automethod:: epynn.template.models.Template.initialize_parameters

        .. literalinclude:: ./../epynn/template/parameters.py
            :pyobject: template_initialize_parameters

    A pass-through layer is not a *trainable* layer. It has no *trainable* parameters such as weight *W* or bias *b*. Therefore, there are no parameters to initialize.

Forward
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    .. automethod:: epynn.template.models.Template.forward

        .. literalinclude:: ./../epynn/template/forward.py
            :pyobject: template_forward

        The forward propagation function in a *Template* pass-through layer *k* includes:

        * (1): Input *X* in current layer *k* is equal to the output *A* of previous layer *k-1*.
        * (2): Output *A* of current layer *k* is equal to input *X*.

        Note that:

        * A pass-through layer is, by definition, a layer that does nothing except forwarding the input from previous layer to next layer.

        Mathematically, the forward propagation is a function which takes a matrix *X* of any dimension as input and returns another matrix *A* of any dimension as output, such as:

        .. math::

          \begin{align}
            A = f(X)
          \end{align}

        .. math::

            \begin{align}
              where~f~is~defined~as: & \\
              f:\mathcal{M}_{m,d_1...d_n}(\mathbb{R}) & \to \mathcal{M}_{m,d_1...d_{n'}}(\mathbb{R}) \\
              X & \to f(X) \\
              with~n,~n' \in \mathbb{N}^* &
            \end{align}

Backward
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    .. automethod:: epynn.template.models.Template.backward

        .. literalinclude:: ./../epynn/template/backward.py
            :pyobject: template_backward

        The backward propagation function in a *Template* pass-through layer *k* includes:

        * (1): *dA* the gradient of the loss with respect to the output of forward propagation *A* for current layer *k*. It is equal to the gradient of the loss with respect to input of forward propagation for next layer *k+1*.
        * (2): The gradient of the loss *dX* with respect to the input of forward propagation *X* for current layer *k* is equal to *dA*.

        Note that:

        * A pass-through layer is, as said above, a layer that does nothing except forwarding the input from previous layer to next layer. When speaking about backward propagation, the layer receives an input from the **next** layer which is sent **backward** to the **previous** layer.

        Mathematically, the backward propagation is a function which takes a matrix *dA* of any dimension as input and returns another matrix *dX* of any dimension as output, such as:

        .. math::

          \begin{align}
            \delta^{k} = \frac{\partial \mathcal{L}}{\partial X^{k}} = f(\frac{\partial \mathcal{L}}{\partial A^{k}})
          \end{align}

        .. math::

            \begin{align}
              where~f~is~defined~as: & \\
              f:\mathcal{M}_{m,d_1...d_{n'}}(\mathbb{R}) & \to \mathcal{M}_{m,d_1...d_n}(\mathbb{R}) \\
              X & \to f(X) \\
              with~n',~n \in \mathbb{N}^* &
            \end{align}

        Note that:

        * The *f()* parameter *dA* is the partial derivative of the loss with respect to output *A* of layer *k*.
        * The output of *f(dA)* is the partial derivative of the loss with respect to the input *X* of layer *k*.
        * The shape of the output of forward propagation *A* is identical to the shape of the input of backward propagation *dA*.
        * The shape of the output of backward propagation *dX* is identical to the shape of the input of forward propagation *X*.
        * The expression *partial derivative of the loss with respect to* is equivalent to *gradient of the loss with respect to*.


Gradients
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    .. automethod:: epynn.template.models.Template.compute_gradients

        .. literalinclude:: ./../epynn/template/parameters.py
            :pyobject: template_compute_gradients

        A pass-through layer is not a *trainable* layer. It has no *trainable* parameters such as weight *W* or bias *b*. Therefore, there are no parameters gradients to compute.

Update parameters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    .. automethod:: epynn.template.models.Template.update_parameters

        .. literalinclude:: ./../epynn/template/parameters.py
            :pyobject: template_update_parameters

        A pass-through layer is not a *trainable* layer. It has no *trainable* parameters such as weight *W* or bias *b*. Therefore, there are no parameters to update.

        **Note that:** For trainable layers, this function is always identical regardless of the layer architecture. Therefore, it is not explicitly documented in the corresponding documentation pages. See :py:func:`epynn.dense.parameters.dense_update_parameters()` for an explicit example.


.. _layer_se_hPars:

Layer Hyperparameters
------------------------------


.. autoclass:: epynn.settings.se_hPars

.. literalinclude:: ../epynn/settings.py
    :language: python
    :start-after: HYPERPARAMETERS SETTINGS


**Schedule learning rate**

* ``learning_rate``: Greater values mean faster learning **but** at risk of the **vanishing or exploding** gradient problem and/or to be trapped in a **local minima**. If too low, the Network may never converge. Optimal values are usually in the - wide - range 1-10\ :sup:`-6`\  but it strongly depends on the network architecture and data.
* ``schedule``: Learning rate scheduler. See :py:mod:`epynn.commons.schedule`. Three schedulers are natively provided with EpyNN with corresponding schedules shown on the plot below. The principle of decay-based learning rate scheduling is to facilitate the Gradient Descent (GD) procedure to reach a minima by implementing fine tuning along with training epochs. It may also allow for greater initial learning rates to converge faster with a learning rate decay which may prevent the vanishing or exploding gradient problem.
* ``decay_k``: Simply the decay rate at which the learning rate decreases with respect to the decay function.
* ``cycle_epochs``: If willing to use *cycling learning rate* this represents the number of epochs per cycle. One schedule will basically repeat *n* times with respect to the total number of training epochs.
* ``cycle_descent``: If using *cycling learning rate*, the initial learning rate at the beginning of each new cycle is diminished by the descent coefficient.

.. image:: schedule.png

**Tune activation function**

* ``ELU_alpha``: Coefficient used in the ELU activation function.
* ``LRELU_alpha``: Coefficient used in the Leky ReLU activation function.
* ``softmax_temperature``: Temperature factor for softmax activation function. When used in the output layer, higher values diminish confidence in the prediction, which may represent a benefit to prevent exploding or vanishing gradients or which will pictorially smooth the output probability distribution.

.. _Dense: Dense.rst
.. _EpyNN model: EpyNN_Model.rst