.. EpyNN documentation master file, created by
sphinx-quickstart on Tue Jul 6 18:46:11 2021.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
.. toctree::
Fully Connected (Dense)
===============================
Source files in ``EpyNN/epynn/dense/``.
See `Appendix - Notations`_ for mathematical conventions.
.. _Appendix - Notations: glossary.html#notations
Layer architecture
------------------------------
.. image:: _static/Dense/Dense-01.svg
:alt: Dense
A fully-connected or *Dense* layer is an object containing a number of *units* and provided with functions for parameters *initialization* and non-linear *activation* of inputs.
.. autoclass:: epynn.dense.models.Dense
:show-inheritance:
Shapes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. automethod:: epynn.dense.models.Dense.compute_shapes
.. literalinclude:: ./../epynn/dense/parameters.py
:pyobject: dense_compute_shapes
:language: python
Within a *Dense* layer, shapes of interest include:
* Input *X* of shape *(m, n)* with *m* equal to the number of samples and *n* the number of features per sample.
* Weight *W* of shape *(n, u)* with *n* the number of features per sample and *u* the number of units in the current layer *k*.
* Bias *b* of shape *(1, u)* with *u* the number of units in the layer.
Note that:
* Parameters shape for *W* and *b* is independent from the number of samples *m*.
* The number of features *n* per sample may be expressed in this context as the number of units in the previous layer *k-1*, even though this definition may tend to be less general.
.. image:: _static/Dense/Dense1-01.svg
Forward
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. automethod:: epynn.dense.models.Dense.forward
.. literalinclude:: ./../epynn/dense/forward.py
:pyobject: dense_forward
:language: python
The forward propagation function in a *Dense* layer *k* includes:
* (1): Input *X* in current layer *k* is equal to the output *A* of previous layer *k-1*.
* (2): *Z* is computed by applying a dot product operation between *X* and *W*, on which the bias *b* is added.
* (3): Output *A* is computed by applying a non-linear *activation* function on *Z*.
Note that:
* *Z* may be referred to as the *(biased) weighted sum of inputs by parameters* or as the *linear activation product*.
* *A* may be referred to as the *non-linear activation product* or simply the output of *Dense* layer *k*.
.. image:: _static/Dense/Dense2-01.svg
.. math::
\begin{alignat*}{2}
& x^{k}_{mn} &&= a^{\km}_{mn} \tag{1} \\
\\
& z^{k}_{mu} &&= x^{k}_{mn} \cdot W^{k}_{nu} \\
& &&+ b^{k}_{u} \tag{2} \\
& a^{k}_{mu} &&= a_{act}(z^{k}_{mu}) \tag{3}
\end{alignat*}
Backward
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. automethod:: epynn.dense.models.Dense.backward
.. literalinclude:: ./../epynn/dense/backward.py
:pyobject: dense_backward
:language: python
The backward propagation function in a *Dense* layer *k* includes:
* (1): *dA* the gradient of the loss with respect to the output of forward propagation *A* for current layer *k*. It is equal to the gradient of the loss with respect to input of forward propagation for next layer *k+1*.
* (2): *dZ* is the gradient of the loss with respect to *Z*. It is computed by applying element-wise multiplication between *dA* and the derivative of the non-linear *activation* function applied on *Z*.
* (3): The gradient of the loss *dX* with respect to the input of forward propagation *X* for current layer *k* is computed by applying a dot product operation between *dZ* and the transpose of *W*.
Note that:
* The expression *gradient of the loss with respect to* is equivalent to *partial derivative of the loss with respect to*.
* The variable *dA* is often referred to as the *error term* for layer *k+1* and *dX* the error term for layer *k*.
* In contrast to the forward pass, parameters are used to weight *dZ* with shape *(m, u)*. Therefore, we use the transpose of *W* with shape *(u, n)* in order to compute the dot product.
.. image:: _static/Dense/Dense3-01.svg
.. math::
\begin{alignat*}{2}
& \delta^{\kp}_{mu} &&= \frac{\partial \mathcal{L}}{\partial a^{k}_{mu}} = \frac{\partial \mathcal{L}}{\partial x^{\kp}_{mu}} \tag{1} \\
\\
& \frac{\partial \mathcal{L}}{\partial z^{k}_{mu}} &&= \delta^{\kp}_{mu} \\
& &&* a_{act}'(z^{k}_{mu}) \tag{2} \\
& \delta^{k}_{mn} &&= \frac{\partial \mathcal{L}}{\partial x^{k}_{mn}} = \frac{\partial \mathcal{L}}{\partial a^{\km}_{mn}} = \frac{\partial \mathcal{L}}{\partial z^{k}_{mu}} \cdot W^{k~{\intercal}}_{nu} \tag{3} \\
\end{alignat*}
Gradients
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. automethod:: epynn.dense.models.Dense.compute_gradients
.. literalinclude:: ./../epynn/dense/parameters.py
:pyobject: dense_compute_gradients
:language: python
The function to compute parameter gradients in a *Dense* layer *k* includes:
* (1.1): *dW* is the gradient of the loss with respect to *W*. It is computed by applying a dot product operation between the transpose of *X* and *dZ*.
* (1.2): *db* is the gradient of the loss with respect to *b*. It is computed by summing *dZ* along the axis corresponding to the number of samples *m*.
Note that:
* We use the transpose of *X* with shape *(n, m)* for the dot product operation with *dZ* of shape *(m, u)*.
.. math::
\begin{alignat*}{2}
& \frac{\partial \mathcal{L}}{\partial W^{k}_{nu}} &&= x^{k~{\intercal}}_{mn} \cdot \frac{\partial \mathcal{L}}{\partial z^{k}_{mu}} \tag{1.1} \\
& \frac{\partial \mathcal{L}}{\partial b^{k}_{u}} &&= \sum_{m = 1}^M \frac{\partial \mathcal{L}}{\partial z^{k}_{mu}} \tag{1.2}
\end{alignat*}
Live examples
------------------------------
The Dense layer is used as an output layer in every `Network training examples`_ provided with EpyNN
Examples of pure Feed-Forward Neural Networks within these examples can be directly accessed from:
* `Dummy Boolean - Basics with Perceptron`_
* `Dummy string - Feed-Forward (FF)`_
* `Protein Modification - Feed-Forward (FF)`_
* `Dummy time - Feed-Forward (FF)`_
* `Author and music - Feed-Forward (FF)`_
* `Dummy image - Feed-Forward (FF)`_
* `MNIST Database - Feed-Forward (FF)`_
.. _Network training examples: run_examples.html
.. _Dummy Boolean - Basics with Perceptron: epynnlive/dummy_boolean/train.html#Perceptron---Single-layer-Neural-Network
.. _Dummy string - Feed-Forward (FF): epynnlive/dummy_string/train.html#Feed-Forward-(FF)
.. _Protein Modification - Feed-Forward (FF): epynnlive/ptm_protein/train.html#Feed-Forward-(FF)
.. _Dummy time - Feed-Forward (FF): epynnlive/dummy_time/train.html#Feed-Forward-(FF)
.. _Author and music - Feed-Forward (FF): epynnlive/author_music/train.html#Feed-Forward-(FF)
.. _Dummy image - Feed-Forward (FF): epynnlive/dummy_image/train.html#Feed-Forward-(FF)
.. _MNIST Database - Feed-Forward (FF): epynnlive/captcha_mnist/train.html#Feed-Forward-(FF)