.. EpyNN documentation master file, created by
   sphinx-quickstart on Tue Jul  6 18:46:11 2021.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

.. toctree::

Fully Connected (Dense)
===============================

Source files in ``EpyNN/epynn/dense/``.

See `Appendix - Notations`_ for mathematical conventions.

.. _Appendix - Notations: glossary.html#notations


Layer architecture
------------------------------

.. image:: _static/Dense/Dense-01.svg
   :alt: Dense

A fully-connected or *Dense* layer is an object containing a number of *units* and provided with functions for parameters *initialization* and non-linear *activation* of inputs.

.. autoclass:: epynn.dense.models.Dense
    :show-inheritance:

Shapes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    .. automethod:: epynn.dense.models.Dense.compute_shapes

        .. literalinclude:: ./../epynn/dense/parameters.py
            :pyobject: dense_compute_shapes
            :language: python

        Within a *Dense* layer, shapes of interest include:

        * Input *X* of shape *(m, n)* with *m* equal to the number of samples and *n* the number of features per sample.
        * Weight *W* of shape *(n, u)* with *n* the number of features per sample and *u* the number of units in the current layer *k*.
        * Bias *b* of shape *(1, u)* with *u* the number of units in the layer.

        Note that:

        * Parameters shape for *W* and *b* is independent from the number of samples *m*.
        * The number of features *n* per sample may be expressed in this context as the number of units in the previous layer *k-1*, even though this definition may tend to be less general.

        .. image:: _static/Dense/Dense1-01.svg

Forward
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    .. automethod:: epynn.dense.models.Dense.forward

        .. literalinclude:: ./../epynn/dense/forward.py
            :pyobject: dense_forward
            :language: python

        The forward propagation function in a *Dense* layer *k* includes:

        * (1): Input *X* in current layer *k* is equal to the output *A* of previous layer *k-1*.
        * (2): *Z* is computed by applying a dot product operation between *X* and *W*, on which the bias *b* is added.
        * (3): Output *A* is computed by applying a non-linear *activation* function on *Z*.

        Note that:

        * *Z* may be referred to as the *(biased) weighted sum of inputs by parameters* or as the *linear activation product*.
        * *A* may be referred to as the *non-linear activation product* or simply the output of *Dense* layer *k*.

        .. image:: _static/Dense/Dense2-01.svg

        .. math::

          \begin{alignat*}{2}
            & x^{k}_{mn} &&= a^{\km}_{mn} \tag{1} \\
            \\
            & z^{k}_{mu} &&= x^{k}_{mn} \cdot W^{k}_{nu} \\
            & &&+ b^{k}_{u} \tag{2} \\
            & a^{k}_{mu} &&= a_{act}(z^{k}_{mu}) \tag{3}
          \end{alignat*}


Backward
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    .. automethod:: epynn.dense.models.Dense.backward

        .. literalinclude:: ./../epynn/dense/backward.py
            :pyobject: dense_backward
            :language: python

        The backward propagation function in a *Dense* layer *k* includes:

        * (1): *dA* the gradient of the loss with respect to the output of forward propagation *A* for current layer *k*. It is equal to the gradient of the loss with respect to input of forward propagation for next layer *k+1*.
        * (2): *dZ* is the gradient of the loss with respect to *Z*. It is computed by applying element-wise multiplication between *dA* and the derivative of the non-linear *activation* function applied on *Z*.
        * (3): The gradient of the loss *dX* with respect to the input of forward propagation *X* for current layer *k* is computed by applying a dot product operation between *dZ* and the transpose of *W*.

        Note that:

        * The expression *gradient of the loss with respect to* is equivalent to *partial derivative of the loss with respect to*.
        * The variable *dA* is often referred to as the *error term* for layer *k+1* and *dX* the error term for layer *k*.
        * In contrast to the forward pass, parameters are used to weight *dZ* with shape *(m, u)*. Therefore, we use the transpose of *W* with shape *(u, n)* in order to compute the dot product.

        .. image:: _static/Dense/Dense3-01.svg

        .. math::

          \begin{alignat*}{2}
            & \delta^{\kp}_{mu} &&= \frac{\partial \mathcal{L}}{\partial a^{k}_{mu}} = \frac{\partial \mathcal{L}}{\partial x^{\kp}_{mu}} \tag{1} \\
            \\
            & \frac{\partial \mathcal{L}}{\partial z^{k}_{mu}} &&= \delta^{\kp}_{mu} \\
            & &&* a_{act}'(z^{k}_{mu}) \tag{2} \\
            & \delta^{k}_{mn} &&= \frac{\partial \mathcal{L}}{\partial x^{k}_{mn}} = \frac{\partial \mathcal{L}}{\partial a^{\km}_{mn}} = \frac{\partial \mathcal{L}}{\partial z^{k}_{mu}} \cdot W^{k~{\intercal}}_{nu}  \tag{3} \\
          \end{alignat*}


Gradients
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    .. automethod:: epynn.dense.models.Dense.compute_gradients

        .. literalinclude:: ./../epynn/dense/parameters.py
            :pyobject: dense_compute_gradients
            :language: python

        The function to compute parameter gradients in a *Dense* layer *k* includes:

        * (1.1): *dW* is the gradient of the loss with respect to *W*. It is computed by applying a dot product operation between the transpose of *X* and *dZ*.
        * (1.2): *db* is the gradient of the loss with respect to *b*. It is computed by summing *dZ* along the axis corresponding to the number of samples *m*.

        Note that:

        * We use the transpose of *X* with shape *(n, m)* for the dot product operation with *dZ* of shape *(m, u)*.

        .. math::

            \begin{alignat*}{2}
              & \frac{\partial \mathcal{L}}{\partial W^{k}_{nu}} &&= x^{k~{\intercal}}_{mn} \cdot \frac{\partial \mathcal{L}}{\partial z^{k}_{mu}} \tag{1.1} \\
              & \frac{\partial \mathcal{L}}{\partial b^{k}_{u}} &&= \sum_{m = 1}^M \frac{\partial \mathcal{L}}{\partial z^{k}_{mu}} \tag{1.2}
            \end{alignat*}

Live examples
------------------------------

The Dense layer is used as an output layer in every `Network training examples`_ provided with EpyNN

Examples of pure Feed-Forward Neural Networks within these examples can be directly accessed from:

* `Dummy Boolean - Basics with Perceptron`_
* `Dummy string - Feed-Forward (FF)`_
* `Protein Modification - Feed-Forward (FF)`_
* `Dummy time - Feed-Forward (FF)`_
* `Author and music - Feed-Forward (FF)`_
* `Dummy image - Feed-Forward (FF)`_
* `MNIST Database - Feed-Forward (FF)`_


.. _Network training examples: run_examples.html

.. _Dummy Boolean - Basics with Perceptron: epynnlive/dummy_boolean/train.html#Perceptron---Single-layer-Neural-Network

.. _Dummy string - Feed-Forward (FF): epynnlive/dummy_string/train.html#Feed-Forward-(FF)

.. _Protein Modification - Feed-Forward (FF): epynnlive/ptm_protein/train.html#Feed-Forward-(FF)

.. _Dummy time - Feed-Forward (FF): epynnlive/dummy_time/train.html#Feed-Forward-(FF)

.. _Author and music - Feed-Forward (FF): epynnlive/author_music/train.html#Feed-Forward-(FF)

.. _Dummy image - Feed-Forward (FF): epynnlive/dummy_image/train.html#Feed-Forward-(FF)

.. _MNIST Database - Feed-Forward (FF): epynnlive/captcha_mnist/train.html#Feed-Forward-(FF)