.. EpyNN documentation master file, created by
sphinx-quickstart on Tue Jul 6 18:46:11 2021.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
.. toctree::
Convolution 2D (CNN)
===================================
Source files in ``EpyNN/epynn/convolution/``.
See `Appendix - Notations`_ for mathematical conventions.
.. _Appendix - Notations: glossary.html#notations
Layer architecture
-----------------------------------
.. image:: _static/Convolution/convolution0-01.svg
:alt: Convolution
A *Convolution* layer is an object containing a number of *units* - often referred to as *filters* - and provided with functions for parameters *initialization*, non-linear *activation* of inputs. Importantly, the *Convolution* layer is provided with a *filter_size* and *strides* argument upon instantiation. They define, respectively, the dimension of the window used for the convolution operation and the steps by which the window is moved between each operation.
.. autoclass:: epynn.convolution.models.Convolution
:show-inheritance:
Shapes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. automethod:: epynn.convolution.models.Convolution.compute_shapes
.. literalinclude:: ./../epynn/convolution/parameters.py
:pyobject: convolution_compute_shapes
:language: python
Within a *Convolution* layer, shapes of interest include:
* Input *X* of shape *(m, h, w, d)* with *m* equal to the number of samples, *h* the height of features, *w* the width of features and *d* the depth of features.
* Output height *oh* defined by the ratio of the difference between *h* and filter height *fh* over the stride height *sh*. The float ratio is diminished to the nearest lower integer on which the value one is added.
* Output width *ow* defined by the ratio of the difference between *w* and filter width *fw* over the stride width *sw*. The float ratio is diminished to the nearest lower integer on which the value one is added.
* Weight *W* of shape *(fh, fw, d, u)* with *fh* the filter height, *fw* the filter width, *d* the depth of features and *u* the number of units - or filters - in the current layer *k*.
* Bias *b* of shape *(1, u)* with *u* the number of units - or filters - in the layer.
Note that:
* The input of a *Convolution* layer is often an image of features map. Therefore, we would more generally say that *h* is the image height, *w* the image width and *d* the image depth.
* Parameters shape for *W* and *b* is independent from the number of samples *m*.
.. image:: _static/Convolution/convolution1-01.svg
Forward
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. automethod:: epynn.convolution.models.Convolution.forward
.. literalinclude:: ./../epynn/convolution/forward.py
:pyobject: convolution_forward
.. image:: _static/Convolution/convolution2-01.svg
The forward propagation function in a *Convolution* layer *k* includes:
* (1): Input *X* in current layer *k* with shape *(m, h, w, d)* is equal to the output *A* of previous layer *k-1*.
* (2): *Xb* is an array of blocks with shape *(oh, ow, m, fh, fw, d)* made by iterative slicing of *X* with respect to *fh*, *fw* and *sh*, *sw*.
* (3): Given *Xb* with shape *(oh, ow, m, fh, fw, d)* the operation moves the axis 2 to the position 0, yielding *Xb* with shape *(m, oh, ow, fh, fw, d)*.
* (4): The operation adds a new axis in position 6 of the array *Xb*, yielding a shape of *(m, oh, ow, fh, fw, d, u)*.
* (5): The linear activation block product *Zb* is computed by multiplying the input blocks *Xb* with shape *(m, oh, ow, fh, fw, d, u)* by the weight *W* with shape *(fh, fw, d, u)*.
* (6): The linear activation product *Z* with shape *(m, oh, ow, u)* is computed by summing the block product *Zb* with shape *(m, oh, ow, fh, fw, d, u)* over the blocks dimension on axis 5, 4 and 3.
* (7): Computation of *Z* is completed by the addition of the bias *b*.
* (8): Output *A* is computed by applying a non-linear *activation* function on *Z*.
Note that:
* This implementation is not the naive implementation of a *Convolution* layer. In the latter, the process is fully iterative with a nested loop that includes four levels for each input dimension. Because this naive implementation is prohibitively slow, here is depicted a stride groups optimized version that takes advantage of NumPy arrays.
.. math::
\begin{alignat*}{2}
& x^{k}_{mhwd} &&= a^{\km}_{mhwd} \tag{1} \\
& \_\_Xb^{k} &&= blocks(X^{k}) \tag{2} \\
& \_Xb^{k} &&= moveaxis(\_\_Xb^{k}) \tag{3} \\
& Xb^{k} &&= expand\_dims(\_Xb^{k}) \tag{4} \\
& Zb^{k} &&= Xb^{k} * W^{k}_{f_{h}f_{w}du} \tag{5.1} \\
& \_z^{k}_{mo_{h}o_{w}u} &&= \sum_{f_{h} = 1}^{Fh} \sum_{f_{w} = 1}^{Fw} \sum_{d = 1}^{D} Zb^{k}_{mo_{h}o_{w}f_{h}f_{w}du} \tag{5.2} \\
& z^{k}_{mo_{h}o_{w}u} &&= \_z^{k}_{mo_{h}o_{w}u} + b^{k}_{111u} \tag{5.3} \\
& a^{k}_{mo_{h}o_{w}u} &&= a_{act}(z^{k}_{mo_{h}o_{w}u}) \tag{6}
\end{alignat*}
.. math::
\begin{alignat*}{2}
& where~blocks~is~defined~as: \\
&~~~~~~~~~~~~~~~~~~~~~~~~blocks:\mathcal{M}_{M,H,W,D}(\mathbb{R}) &&\to \mathcal{M}_{Oh,Ow,M,Fh,Fw,D}(\mathbb{R}) \\
&~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~X = \mathop{(x_{mhwd})}_{\substack{1 \le m \le M \\ 1 \le h \le H \\ 1 \le w \le W \\ 1 \le d \le D}} &&\to Y = \mathop{(y_{o_{h}o_{w}mf_hf_wd})}_{\substack{1 \le o_h \le Oh \\ 1 \le o_w \le Ow \\ 1 \le m \le M \\ 1 \le f_h \le Fh \\ 1 \le f_w \le Fw \\ 1 \le d \le D}} \\
&~~~~~~~~~~~~~~~~~~~~~~~with~Fh, Fw, Sh, Sw \in &&~\mathbb{N^*_+} \\
&~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\forall{h} \in &&~\{1,..,H-Fh~|~h \pmod{Sh} = 0\} \\
&~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\forall{w} \in &&~\{1,..,W-Fw~|~h \pmod{Sw} = 0\} \\
&~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Y = && X[:, h:h+Fh, w:w+Fw:, :] \\
\\
& where~moveaxis~is~defined~as: \\
&~~~~~~~~moveaxis:\mathcal{M}_{Oh,Ow,M,Fh,Fw,D}(\mathbb{R}) &&\to \mathcal{M}_{M,Oh,Ow,Fh,Fw,D}(\mathbb{R}) \\
&~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~X &&\to Y \\
\\
& where~expand\_dims~is~defined~as: \\
&~~~~expand\_dims:\mathcal{M}_{M,Oh,Ow,Fh,Fw,D}(\mathbb{R}) &&\to \mathcal{M}_{M,Oh,Ow,Fh,Fw,D,1}(\mathbb{R}) \\
&~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~X &&\to Y
\end{alignat*}
Backward
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. automethod:: epynn.convolution.models.Convolution.backward
.. literalinclude:: ./../epynn/convolution/backward.py
:pyobject: convolution_backward
.. image:: _static/Convolution/convolution3-01.svg
The backward propagation function in a *Convolution* layer *k* includes:
* (1): *dA* with shape *(m, oh, ow, u)* is the gradient of the loss with respect to the output of forward propagation *A* for current layer *k*. It is equal to the gradient of the loss with respect to input of forward propagation for next layer *k+1*.
* (2): *dZ* is the gradient of the loss with respect to *Z*. It is computed by applying element-wise multiplication between *dA* and the derivative of the non-linear *activation* function applied on *Z*.
* (3): Three new axes are added before the last axis of *dZ* with shape *(m, oh, ow, u)* to reintroduce the dimensions corresponding to *Xb* with shape *(m, oh, ow, fh, fw, d, u)*.
* (4): The gradient of the loss *dX* with respect to the input of forward propagation *X* for current layer *k* is initialized as a zeros array of the same shape as *X*.
* (5hw): For each row *h* with respect to *oh* and for each column *w* with respect to *ow*, the gradient of the loss for one block *dXb* with respect to the one block input of forward propagation is equal to the product between the corresponding block in *dZb* and the weight *W*.
* (6hw): For each row *h* with respect to *oh* and for each column *w* with respect to *ow*, the corresponding window coordinates in *X* are computed: *hs*, *he* for height start and height end; *ws*, *we* for width start and width end. The gradient of the loss *dX* with respect to the current window in *X* is the sum of *dXb* over the last axis 4.
Note that:
* One window represents an ensemble of coordinates. One value with given coordinates may be part of more than one window. This is why the operation *dX[:, hs:he, ws:we, :] += np.sum(dXb, axis=4)* is used instead of *dX[:, hs:he, ws:we, :] = np.sum(dXb, axis=4)*.
.. math::
\begin{alignat*}{2}
& \delta^{\kp}_{mo_{h}o_{w}u} &&= \frac{\partial \mathcal{L}}{\partial a^{k}_{mo_{h}o_{w}u}} = \frac{\partial \mathcal{L}}{\partial x^{\kp}_{mo_{h}o_{w}u}} \tag{1} \\
& \frac{\partial \mathcal{L}}{\partial zb^{k}_{mo_{h}o_{w}u}} &&= \delta^{\kp}_{mo_{h}o_{w}u} * a_{act}'(z^{k}_{mo_{h}o_{w}u}) \tag{2} \\
& \frac{\partial \mathcal{L}}{\partial zb^{k}_{mo_{h}o_{w}111u}} &&= expand\_dims(\frac{\partial \mathcal{L}}{\partial a^{k}_{mo_{h}o_{w}d}}) \tag{3} \\
& \frac{\partial \mathcal{L}}{\partial X^{k}} &&= [\frac{\partial \mathcal{L}}{\partial x^{k}_{mhwd}}] \in \{0\}^{M \times H \times W \times D} \tag{4} \\
\\
& \frac{\partial \mathcal{L}}{\partial Xb^{k}} &&= \frac{\partial \mathcal{L}}{\partial Zb^{k}_{mo_{h}o_{w}111u}} * W^{k}_{f_{h}f_{w}du} \tag{5hw} \\
\\
& \Delta\frac{\partial \mathcal{L}}{\partial X^{k}_{mf_hf_wd}} &&= \sum_{u = 1}^{U} \frac{\partial \mathcal{L}}{\partial Xb^{k}_{mf_hf_wdu}} \tag{6hw}
\end{alignat*}
.. math::
\begin{alignat*}{2}
& where~expand\_dims~is~defined~as: \\
&~~~~expand\_dims:\mathcal{M}_{M,Oh,Ow,U}(\mathbb{R}) &&\to \mathcal{M}_{M,Oh,Ow,1,1,1,U}(\mathbb{R}) \\
&~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~X &&\to Y
\end{alignat*}
Gradients
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. automethod:: epynn.convolution.models.Convolution.compute_gradients
.. literalinclude:: ./../epynn/convolution/parameters.py
:pyobject: convolution_compute_gradients
The function to compute parameters gradients in a *Convolution* layer *k* includes:
* (1.1): *dW* is the gradient of the loss with respect to *W*. It is computed by element-wise multiplication between *Xb* and *dZb* followed by summing over axes 2, 1 and 0. Axes correspond to the output width *ow*, output height *oh* and number of samples *m* dimensions, respectively.
* (1.2): *db* is the gradient of the loss with respect to *b*. It is computed by summing *dZb* along axes 2, 1 and 0 followed by a squeeze which removes axes of length one.
.. math::
\begin{alignat*}{2}
& \frac{\partial \mathcal{L}}{\partial W^{k}_{f_{h}f_{w}du}} &&= \sum_{o_{w} = 1}^{Ow} \sum_{o_{h} = 1}^{Oh} \sum_{m = 1}^{M} xb^{k}_{mo_{h}o_{w}f_{h}f_{w}du} * \frac{\partial \mathcal{L}}{\partial zb^{k}_{mo_{h}o_{w}111u}} \tag{1.1} \\
& \frac{\partial \mathcal{L}}{\partial b^{k}_{u}} &&= \sum_{o_{w} = 1}^{Ow} \sum_{o_{h} = 1}^{Oh} \sum_{m = 1}^{M} \frac{\partial \mathcal{L}}{\partial zb^{k}_{mo_{h}o_{w}111u}} \tag{1.2}
\end{alignat*}
Live examples
-----------------------------------
* `Dummy image - Convolutional Neural Network (CNN)`_
* `MNIST Database - Convolutional Neural Network (CNN)`_
You may also like to browse all `Network training examples`_ provided with EpyNN.
.. _Network training examples: run_examples.html
.. _Dummy image - Convolutional Neural Network (CNN): epynnlive/dummy_image/train.html#Convolutional-Neural-Network-(CNN)
.. _MNIST Database - Convolutional Neural Network (CNN): epynnlive/captcha_mnist/train.html#Convolutional-Neural-Network-(CNN)