.. EpyNN documentation master file, created by sphinx-quickstart on Tue Jul 6 18:46:11 2021. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. .. toctree:: Convolution 2D (CNN) =================================== Source files in ``EpyNN/epynn/convolution/``. See `Appendix - Notations`_ for mathematical conventions. .. _Appendix - Notations: glossary.html#notations Layer architecture ----------------------------------- .. image:: _static/Convolution/convolution0-01.svg :alt: Convolution A *Convolution* layer is an object containing a number of *units* - often referred to as *filters* - and provided with functions for parameters *initialization*, non-linear *activation* of inputs. Importantly, the *Convolution* layer is provided with a *filter_size* and *strides* argument upon instantiation. They define, respectively, the dimension of the window used for the convolution operation and the steps by which the window is moved between each operation. .. autoclass:: epynn.convolution.models.Convolution :show-inheritance: Shapes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. automethod:: epynn.convolution.models.Convolution.compute_shapes .. literalinclude:: ./../epynn/convolution/parameters.py :pyobject: convolution_compute_shapes :language: python Within a *Convolution* layer, shapes of interest include: * Input *X* of shape *(m, h, w, d)* with *m* equal to the number of samples, *h* the height of features, *w* the width of features and *d* the depth of features. * Output height *oh* defined by the ratio of the difference between *h* and filter height *fh* over the stride height *sh*. The float ratio is diminished to the nearest lower integer on which the value one is added. * Output width *ow* defined by the ratio of the difference between *w* and filter width *fw* over the stride width *sw*. The float ratio is diminished to the nearest lower integer on which the value one is added. * Weight *W* of shape *(fh, fw, d, u)* with *fh* the filter height, *fw* the filter width, *d* the depth of features and *u* the number of units - or filters - in the current layer *k*. * Bias *b* of shape *(1, u)* with *u* the number of units - or filters - in the layer. Note that: * The input of a *Convolution* layer is often an image of features map. Therefore, we would more generally say that *h* is the image height, *w* the image width and *d* the image depth. * Parameters shape for *W* and *b* is independent from the number of samples *m*. .. image:: _static/Convolution/convolution1-01.svg Forward ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. automethod:: epynn.convolution.models.Convolution.forward .. literalinclude:: ./../epynn/convolution/forward.py :pyobject: convolution_forward .. image:: _static/Convolution/convolution2-01.svg The forward propagation function in a *Convolution* layer *k* includes: * (1): Input *X* in current layer *k* with shape *(m, h, w, d)* is equal to the output *A* of previous layer *k-1*. * (2): *Xb* is an array of blocks with shape *(oh, ow, m, fh, fw, d)* made by iterative slicing of *X* with respect to *fh*, *fw* and *sh*, *sw*. * (3): Given *Xb* with shape *(oh, ow, m, fh, fw, d)* the operation moves the axis 2 to the position 0, yielding *Xb* with shape *(m, oh, ow, fh, fw, d)*. * (4): The operation adds a new axis in position 6 of the array *Xb*, yielding a shape of *(m, oh, ow, fh, fw, d, u)*. * (5): The linear activation block product *Zb* is computed by multiplying the input blocks *Xb* with shape *(m, oh, ow, fh, fw, d, u)* by the weight *W* with shape *(fh, fw, d, u)*. * (6): The linear activation product *Z* with shape *(m, oh, ow, u)* is computed by summing the block product *Zb* with shape *(m, oh, ow, fh, fw, d, u)* over the blocks dimension on axis 5, 4 and 3. * (7): Computation of *Z* is completed by the addition of the bias *b*. * (8): Output *A* is computed by applying a non-linear *activation* function on *Z*. Note that: * This implementation is not the naive implementation of a *Convolution* layer. In the latter, the process is fully iterative with a nested loop that includes four levels for each input dimension. Because this naive implementation is prohibitively slow, here is depicted a stride groups optimized version that takes advantage of NumPy arrays. .. math:: \begin{alignat*}{2} & x^{k}_{mhwd} &&= a^{\km}_{mhwd} \tag{1} \\ & \_\_Xb^{k} &&= blocks(X^{k}) \tag{2} \\ & \_Xb^{k} &&= moveaxis(\_\_Xb^{k}) \tag{3} \\ & Xb^{k} &&= expand\_dims(\_Xb^{k}) \tag{4} \\ & Zb^{k} &&= Xb^{k} * W^{k}_{f_{h}f_{w}du} \tag{5.1} \\ & \_z^{k}_{mo_{h}o_{w}u} &&= \sum_{f_{h} = 1}^{Fh} \sum_{f_{w} = 1}^{Fw} \sum_{d = 1}^{D} Zb^{k}_{mo_{h}o_{w}f_{h}f_{w}du} \tag{5.2} \\ & z^{k}_{mo_{h}o_{w}u} &&= \_z^{k}_{mo_{h}o_{w}u} + b^{k}_{111u} \tag{5.3} \\ & a^{k}_{mo_{h}o_{w}u} &&= a_{act}(z^{k}_{mo_{h}o_{w}u}) \tag{6} \end{alignat*} .. math:: \begin{alignat*}{2} & where~blocks~is~defined~as: \\ &~~~~~~~~~~~~~~~~~~~~~~~~blocks:\mathcal{M}_{M,H,W,D}(\mathbb{R}) &&\to \mathcal{M}_{Oh,Ow,M,Fh,Fw,D}(\mathbb{R}) \\ &~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~X = \mathop{(x_{mhwd})}_{\substack{1 \le m \le M \\ 1 \le h \le H \\ 1 \le w \le W \\ 1 \le d \le D}} &&\to Y = \mathop{(y_{o_{h}o_{w}mf_hf_wd})}_{\substack{1 \le o_h \le Oh \\ 1 \le o_w \le Ow \\ 1 \le m \le M \\ 1 \le f_h \le Fh \\ 1 \le f_w \le Fw \\ 1 \le d \le D}} \\ &~~~~~~~~~~~~~~~~~~~~~~~with~Fh, Fw, Sh, Sw \in &&~\mathbb{N^*_+} \\ &~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\forall{h} \in &&~\{1,..,H-Fh~|~h \pmod{Sh} = 0\} \\ &~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\forall{w} \in &&~\{1,..,W-Fw~|~h \pmod{Sw} = 0\} \\ &~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Y = && X[:, h:h+Fh, w:w+Fw:, :] \\ \\ & where~moveaxis~is~defined~as: \\ &~~~~~~~~moveaxis:\mathcal{M}_{Oh,Ow,M,Fh,Fw,D}(\mathbb{R}) &&\to \mathcal{M}_{M,Oh,Ow,Fh,Fw,D}(\mathbb{R}) \\ &~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~X &&\to Y \\ \\ & where~expand\_dims~is~defined~as: \\ &~~~~expand\_dims:\mathcal{M}_{M,Oh,Ow,Fh,Fw,D}(\mathbb{R}) &&\to \mathcal{M}_{M,Oh,Ow,Fh,Fw,D,1}(\mathbb{R}) \\ &~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~X &&\to Y \end{alignat*} Backward ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. automethod:: epynn.convolution.models.Convolution.backward .. literalinclude:: ./../epynn/convolution/backward.py :pyobject: convolution_backward .. image:: _static/Convolution/convolution3-01.svg The backward propagation function in a *Convolution* layer *k* includes: * (1): *dA* with shape *(m, oh, ow, u)* is the gradient of the loss with respect to the output of forward propagation *A* for current layer *k*. It is equal to the gradient of the loss with respect to input of forward propagation for next layer *k+1*. * (2): *dZ* is the gradient of the loss with respect to *Z*. It is computed by applying element-wise multiplication between *dA* and the derivative of the non-linear *activation* function applied on *Z*. * (3): Three new axes are added before the last axis of *dZ* with shape *(m, oh, ow, u)* to reintroduce the dimensions corresponding to *Xb* with shape *(m, oh, ow, fh, fw, d, u)*. * (4): The gradient of the loss *dX* with respect to the input of forward propagation *X* for current layer *k* is initialized as a zeros array of the same shape as *X*. * (5hw): For each row *h* with respect to *oh* and for each column *w* with respect to *ow*, the gradient of the loss for one block *dXb* with respect to the one block input of forward propagation is equal to the product between the corresponding block in *dZb* and the weight *W*. * (6hw): For each row *h* with respect to *oh* and for each column *w* with respect to *ow*, the corresponding window coordinates in *X* are computed: *hs*, *he* for height start and height end; *ws*, *we* for width start and width end. The gradient of the loss *dX* with respect to the current window in *X* is the sum of *dXb* over the last axis 4. Note that: * One window represents an ensemble of coordinates. One value with given coordinates may be part of more than one window. This is why the operation *dX[:, hs:he, ws:we, :] += np.sum(dXb, axis=4)* is used instead of *dX[:, hs:he, ws:we, :] = np.sum(dXb, axis=4)*. .. math:: \begin{alignat*}{2} & \delta^{\kp}_{mo_{h}o_{w}u} &&= \frac{\partial \mathcal{L}}{\partial a^{k}_{mo_{h}o_{w}u}} = \frac{\partial \mathcal{L}}{\partial x^{\kp}_{mo_{h}o_{w}u}} \tag{1} \\ & \frac{\partial \mathcal{L}}{\partial zb^{k}_{mo_{h}o_{w}u}} &&= \delta^{\kp}_{mo_{h}o_{w}u} * a_{act}'(z^{k}_{mo_{h}o_{w}u}) \tag{2} \\ & \frac{\partial \mathcal{L}}{\partial zb^{k}_{mo_{h}o_{w}111u}} &&= expand\_dims(\frac{\partial \mathcal{L}}{\partial a^{k}_{mo_{h}o_{w}d}}) \tag{3} \\ & \frac{\partial \mathcal{L}}{\partial X^{k}} &&= [\frac{\partial \mathcal{L}}{\partial x^{k}_{mhwd}}] \in \{0\}^{M \times H \times W \times D} \tag{4} \\ \\ & \frac{\partial \mathcal{L}}{\partial Xb^{k}} &&= \frac{\partial \mathcal{L}}{\partial Zb^{k}_{mo_{h}o_{w}111u}} * W^{k}_{f_{h}f_{w}du} \tag{5hw} \\ \\ & \Delta\frac{\partial \mathcal{L}}{\partial X^{k}_{mf_hf_wd}} &&= \sum_{u = 1}^{U} \frac{\partial \mathcal{L}}{\partial Xb^{k}_{mf_hf_wdu}} \tag{6hw} \end{alignat*} .. math:: \begin{alignat*}{2} & where~expand\_dims~is~defined~as: \\ &~~~~expand\_dims:\mathcal{M}_{M,Oh,Ow,U}(\mathbb{R}) &&\to \mathcal{M}_{M,Oh,Ow,1,1,1,U}(\mathbb{R}) \\ &~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~X &&\to Y \end{alignat*} Gradients ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. automethod:: epynn.convolution.models.Convolution.compute_gradients .. literalinclude:: ./../epynn/convolution/parameters.py :pyobject: convolution_compute_gradients The function to compute parameters gradients in a *Convolution* layer *k* includes: * (1.1): *dW* is the gradient of the loss with respect to *W*. It is computed by element-wise multiplication between *Xb* and *dZb* followed by summing over axes 2, 1 and 0. Axes correspond to the output width *ow*, output height *oh* and number of samples *m* dimensions, respectively. * (1.2): *db* is the gradient of the loss with respect to *b*. It is computed by summing *dZb* along axes 2, 1 and 0 followed by a squeeze which removes axes of length one. .. math:: \begin{alignat*}{2} & \frac{\partial \mathcal{L}}{\partial W^{k}_{f_{h}f_{w}du}} &&= \sum_{o_{w} = 1}^{Ow} \sum_{o_{h} = 1}^{Oh} \sum_{m = 1}^{M} xb^{k}_{mo_{h}o_{w}f_{h}f_{w}du} * \frac{\partial \mathcal{L}}{\partial zb^{k}_{mo_{h}o_{w}111u}} \tag{1.1} \\ & \frac{\partial \mathcal{L}}{\partial b^{k}_{u}} &&= \sum_{o_{w} = 1}^{Ow} \sum_{o_{h} = 1}^{Oh} \sum_{m = 1}^{M} \frac{\partial \mathcal{L}}{\partial zb^{k}_{mo_{h}o_{w}111u}} \tag{1.2} \end{alignat*} Live examples ----------------------------------- * `Dummy image - Convolutional Neural Network (CNN)`_ * `MNIST Database - Convolutional Neural Network (CNN)`_ You may also like to browse all `Network training examples`_ provided with EpyNN. .. _Network training examples: run_examples.html .. _Dummy image - Convolutional Neural Network (CNN): epynnlive/dummy_image/train.html#Convolutional-Neural-Network-(CNN) .. _MNIST Database - Convolutional Neural Network (CNN): epynnlive/captcha_mnist/train.html#Convolutional-Neural-Network-(CNN)