.. EpyNN documentation master file, created by
   sphinx-quickstart on Tue Jul  6 18:46:11 2021.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

.. toctree::

Convolution 2D (CNN)
===================================

Source files in ``EpyNN/epynn/convolution/``.

See `Appendix - Notations`_ for mathematical conventions.

.. _Appendix - Notations: glossary.html#notations

Layer architecture
-----------------------------------

.. image:: _static/Convolution/convolution0-01.svg
   :alt: Convolution

A *Convolution* layer is an object containing a number of *units* - often referred to as *filters* - and provided with functions for parameters *initialization*, non-linear *activation* of inputs. Importantly, the *Convolution* layer is provided with a *filter_size* and *strides* argument upon instantiation. They define, respectively, the dimension of the window used for the convolution operation and the steps by which the window is moved between each operation.

.. autoclass:: epynn.convolution.models.Convolution
    :show-inheritance:

Shapes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    .. automethod:: epynn.convolution.models.Convolution.compute_shapes

        .. literalinclude:: ./../epynn/convolution/parameters.py
            :pyobject: convolution_compute_shapes
            :language: python

        Within a *Convolution* layer, shapes of interest include:

        * Input *X* of shape *(m, h, w, d)* with *m* equal to the number of samples, *h* the height of features, *w* the width of features and *d* the depth of features.
        * Output height *oh* defined by the ratio of the difference between *h* and filter height *fh* over the stride height *sh*. The float ratio is diminished to the nearest lower integer on which the value one is added.
        * Output width *ow* defined by the ratio of the difference between *w* and filter width *fw* over the stride width *sw*. The float ratio is diminished to the nearest lower integer on which the value one is added.
        * Weight *W* of shape *(fh, fw, d, u)* with *fh* the filter height, *fw* the filter width, *d* the depth of features and *u* the number of units - or filters - in the current layer *k*.
        * Bias *b* of shape *(1, u)* with *u* the number of units - or filters - in the layer.

        Note that:

        * The input of a *Convolution* layer is often an image of features map. Therefore, we would more generally say that *h* is the image height, *w* the image width and *d* the image depth.
        * Parameters shape for *W* and *b* is independent from the number of samples *m*.

        .. image:: _static/Convolution/convolution1-01.svg


Forward
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    .. automethod:: epynn.convolution.models.Convolution.forward

        .. literalinclude:: ./../epynn/convolution/forward.py
            :pyobject: convolution_forward

        .. image:: _static/Convolution/convolution2-01.svg

        The forward propagation function in a *Convolution* layer *k* includes:

        * (1): Input *X* in current layer *k* with shape *(m, h, w, d)* is equal to the output *A* of previous layer *k-1*.
        * (2): *Xb* is an array of blocks with shape *(oh, ow, m, fh, fw, d)* made by iterative slicing of *X* with respect to *fh*, *fw* and *sh*, *sw*.
        * (3): Given *Xb* with shape *(oh, ow, m, fh, fw, d)* the operation moves the axis 2 to the position 0, yielding *Xb* with shape *(m, oh, ow, fh, fw, d)*.
        * (4): The operation adds a new axis in position 6 of the array *Xb*, yielding a shape of *(m, oh, ow, fh, fw, d, u)*.
        * (5): The linear activation block product *Zb* is computed by multiplying the input blocks *Xb* with shape *(m, oh, ow, fh, fw, d, u)* by the weight *W* with shape *(fh, fw, d, u)*.
        * (6): The linear activation product *Z* with shape *(m, oh, ow, u)* is computed by summing the block product *Zb* with shape *(m, oh, ow, fh, fw, d, u)* over the blocks dimension on axis 5, 4 and 3.
        * (7): Computation of *Z* is completed by the addition of the bias *b*.
        * (8): Output *A* is computed by applying a non-linear *activation* function on *Z*.

        Note that:

        * This implementation is not the naive implementation of a *Convolution* layer. In the latter, the process is fully iterative with a nested loop that includes four levels for each input dimension. Because this naive implementation is prohibitively slow, here is depicted a stride groups optimized version that takes advantage of NumPy arrays.

        .. math::

          \begin{alignat*}{2}
            & x^{k}_{mhwd} &&= a^{\km}_{mhwd} \tag{1} \\
            & \_\_Xb^{k} &&= blocks(X^{k}) \tag{2} \\
            & \_Xb^{k} &&= moveaxis(\_\_Xb^{k}) \tag{3} \\
            & Xb^{k} &&= expand\_dims(\_Xb^{k}) \tag{4} \\
            & Zb^{k} &&= Xb^{k} * W^{k}_{f_{h}f_{w}du} \tag{5.1} \\
            & \_z^{k}_{mo_{h}o_{w}u} &&= \sum_{f_{h} = 1}^{Fh} \sum_{f_{w} = 1}^{Fw} \sum_{d = 1}^{D} Zb^{k}_{mo_{h}o_{w}f_{h}f_{w}du} \tag{5.2} \\
            & z^{k}_{mo_{h}o_{w}u} &&= \_z^{k}_{mo_{h}o_{w}u} + b^{k}_{111u} \tag{5.3} \\
            & a^{k}_{mo_{h}o_{w}u} &&= a_{act}(z^{k}_{mo_{h}o_{w}u}) \tag{6}
            \end{alignat*}

        .. math::

            \begin{alignat*}{2}
              & where~blocks~is~defined~as: \\
              &~~~~~~~~~~~~~~~~~~~~~~~~blocks:\mathcal{M}_{M,H,W,D}(\mathbb{R}) &&\to \mathcal{M}_{Oh,Ow,M,Fh,Fw,D}(\mathbb{R}) \\
              &~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~X = \mathop{(x_{mhwd})}_{\substack{1 \le m \le M \\ 1 \le h \le H \\ 1 \le w \le W \\ 1 \le d \le D}} &&\to Y = \mathop{(y_{o_{h}o_{w}mf_hf_wd})}_{\substack{1 \le o_h \le Oh \\ 1 \le o_w \le Ow \\ 1 \le m \le M \\ 1 \le f_h \le Fh \\ 1 \le f_w \le Fw \\ 1 \le d \le D}} \\
              &~~~~~~~~~~~~~~~~~~~~~~~with~Fh, Fw, Sh, Sw \in &&~\mathbb{N^*_+} \\
              &~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\forall{h} \in &&~\{1,..,H-Fh~|~h \pmod{Sh} = 0\} \\
              &~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\forall{w} \in &&~\{1,..,W-Fw~|~h \pmod{Sw} = 0\} \\
              &~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Y = && X[:, h:h+Fh, w:w+Fw:, :] \\
              \\
              & where~moveaxis~is~defined~as: \\
              &~~~~~~~~moveaxis:\mathcal{M}_{Oh,Ow,M,Fh,Fw,D}(\mathbb{R}) &&\to \mathcal{M}_{M,Oh,Ow,Fh,Fw,D}(\mathbb{R}) \\
              &~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~X &&\to  Y \\
              \\
              & where~expand\_dims~is~defined~as: \\
              &~~~~expand\_dims:\mathcal{M}_{M,Oh,Ow,Fh,Fw,D}(\mathbb{R}) &&\to \mathcal{M}_{M,Oh,Ow,Fh,Fw,D,1}(\mathbb{R}) \\
              &~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~X &&\to  Y
            \end{alignat*}

Backward
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    .. automethod:: epynn.convolution.models.Convolution.backward

        .. literalinclude:: ./../epynn/convolution/backward.py
            :pyobject: convolution_backward

        .. image:: _static/Convolution/convolution3-01.svg

        The backward propagation function in a *Convolution* layer *k* includes:

        * (1): *dA* with shape *(m, oh, ow, u)* is the gradient of the loss with respect to the output of forward propagation *A* for current layer *k*. It is equal to the gradient of the loss with respect to input of forward propagation for next layer *k+1*.
        * (2): *dZ* is the gradient of the loss with respect to *Z*. It is computed by applying element-wise multiplication between *dA* and the derivative of the non-linear *activation* function applied on *Z*.
        * (3): Three new axes are added before the last axis of *dZ* with shape *(m, oh, ow, u)* to reintroduce the dimensions corresponding to *Xb* with shape *(m, oh, ow, fh, fw, d, u)*.
        * (4): The gradient of the loss *dX* with respect to the input of forward propagation *X* for current layer *k* is initialized as a zeros array of the same shape as *X*.
        * (5hw): For each row *h* with respect to *oh* and for each column *w* with respect to *ow*, the gradient of the loss for one block *dXb* with respect to the one block input of forward propagation is equal to the product between the corresponding block in *dZb* and the weight *W*.
        * (6hw): For each row *h* with respect to *oh* and for each column *w* with respect to *ow*, the corresponding window coordinates in *X* are computed: *hs*, *he* for height start and height end; *ws*, *we* for width start and width end. The gradient of the loss *dX* with respect to the current window in *X* is the sum of *dXb* over the last axis 4.

        Note that:

        * One window represents an ensemble of coordinates. One value with given coordinates may be part of more than one window. This is why the operation *dX[:, hs:he, ws:we, :] += np.sum(dXb, axis=4)* is used instead of *dX[:, hs:he, ws:we, :] = np.sum(dXb, axis=4)*.

        .. math::

          \begin{alignat*}{2}
            & \delta^{\kp}_{mo_{h}o_{w}u} &&= \frac{\partial \mathcal{L}}{\partial a^{k}_{mo_{h}o_{w}u}} = \frac{\partial \mathcal{L}}{\partial x^{\kp}_{mo_{h}o_{w}u}} \tag{1} \\
            & \frac{\partial \mathcal{L}}{\partial zb^{k}_{mo_{h}o_{w}u}} &&= \delta^{\kp}_{mo_{h}o_{w}u} * a_{act}'(z^{k}_{mo_{h}o_{w}u}) \tag{2} \\
            & \frac{\partial \mathcal{L}}{\partial zb^{k}_{mo_{h}o_{w}111u}} &&= expand\_dims(\frac{\partial \mathcal{L}}{\partial a^{k}_{mo_{h}o_{w}d}}) \tag{3} \\
            & \frac{\partial \mathcal{L}}{\partial X^{k}} &&= [\frac{\partial \mathcal{L}}{\partial x^{k}_{mhwd}}] \in \{0\}^{M \times H \times W \times D} \tag{4} \\
            \\
            & \frac{\partial \mathcal{L}}{\partial Xb^{k<o_{h}o_{w}>}} &&= \frac{\partial \mathcal{L}}{\partial Zb^{k}_{mo_{h}o_{w}111u}} * W^{k}_{f_{h}f_{w}du} \tag{5hw} \\
            \\
            & \Delta\frac{\partial \mathcal{L}}{\partial X^{k<h_s:h_e,w_s:w_e>}_{mf_hf_wd}} &&= \sum_{u = 1}^{U} \frac{\partial \mathcal{L}}{\partial Xb^{k<o_{h}o_{w}>}_{mf_hf_wdu}} \tag{6hw}
	  \end{alignat*}

	.. math::

            \begin{alignat*}{2}
              & where~expand\_dims~is~defined~as: \\
              &~~~~expand\_dims:\mathcal{M}_{M,Oh,Ow,U}(\mathbb{R}) &&\to \mathcal{M}_{M,Oh,Ow,1,1,1,U}(\mathbb{R}) \\
              &~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~X &&\to  Y
  	    \end{alignat*}

Gradients
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    .. automethod:: epynn.convolution.models.Convolution.compute_gradients

        .. literalinclude:: ./../epynn/convolution/parameters.py
            :pyobject: convolution_compute_gradients

        The function to compute parameters gradients in a *Convolution* layer *k* includes:

        * (1.1): *dW* is the gradient of the loss with respect to *W*. It is computed by element-wise multiplication between *Xb* and *dZb* followed by summing over axes 2, 1 and 0. Axes correspond to the output width *ow*, output height *oh* and number of samples *m* dimensions, respectively.
        * (1.2): *db* is the gradient of the loss with respect to *b*. It is computed by summing *dZb* along axes 2, 1 and 0 followed by a squeeze which removes axes of length one.

        .. math::

            \begin{alignat*}{2}
              & \frac{\partial \mathcal{L}}{\partial W^{k}_{f_{h}f_{w}du}} &&= \sum_{o_{w} = 1}^{Ow} \sum_{o_{h} = 1}^{Oh} \sum_{m = 1}^{M} xb^{k}_{mo_{h}o_{w}f_{h}f_{w}du} * \frac{\partial \mathcal{L}}{\partial zb^{k}_{mo_{h}o_{w}111u}} \tag{1.1} \\
              & \frac{\partial \mathcal{L}}{\partial b^{k}_{u}} &&= \sum_{o_{w} = 1}^{Ow} \sum_{o_{h} = 1}^{Oh} \sum_{m = 1}^{M} \frac{\partial \mathcal{L}}{\partial zb^{k}_{mo_{h}o_{w}111u}} \tag{1.2}
            \end{alignat*}


Live examples
-----------------------------------

* `Dummy image - Convolutional Neural Network (CNN)`_
* `MNIST Database - Convolutional Neural Network (CNN)`_

You may also like to browse all `Network training examples`_ provided with EpyNN.

.. _Network training examples: run_examples.html

.. _Dummy image - Convolutional Neural Network (CNN): epynnlive/dummy_image/train.html#Convolutional-Neural-Network-(CNN)

.. _MNIST Database - Convolutional Neural Network (CNN): epynnlive/captcha_mnist/train.html#Convolutional-Neural-Network-(CNN)