.. EpyNN documentation master file, created by sphinx-quickstart on Tue Jul 6 18:46:11 2021. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. .. toctree:: Architecture Layers - Model =============================== EpyNN was made modular and architecture layers may be added or modified easily by following a few rules. * ``Layer`` is the **parent** layer (:py:class:`epynn.commons.models.Layer`) from which all other layers inherit. * Class definition for custom layers must inherit from ``Layer`` and comply with the method scheme shown for the ``Template`` layer (:py:class:`epynn.template.models.Template`). Base Layer ------------------------------ Source code in ``EpyNN/epynn/commons/models.py``. .. image:: _static/other/base-01.svg Within the `EpyNN Model`_, any architecture layer may be called such as ``model.layers[i]`` with an explicit alias for ``model.embedding``. Layer instance attributes described below can be inspected from ``model.layers[-1].p['W']`` for instance, which would call the weight associated with the last layer of the Network, predictably a `Dense`_ layer. All layers must inherit from **Base Layer**. .. autoclass:: epynn.commons.models.Layer .. automethod:: __init__ .. automethod:: update_shapes Template Layer ------------------------------ Source files in ``EpyNN/epynn/template/``. An architecture layer may be defined as a finite set of functions (1). .. math:: layer = \{ f_1,...,f_n | n \in \mathbb{N}^* \} \tag{1} In Python, this translates into a class which contains class methods. When designing a custom layer or attempting to modify one natively provided with EpyNN, one should be careful to **not deviate from the organization shown below**. To avoid having to re-code the whole thing, child layer classes must at least contain the methods shown below. If the method is not relevant to the layer, as exemplified below, the same method **should simply be made dummy**. See the `EpyNN Model`_ training algorithm for details about the training procedure. Finally, EpyNN was written to prevent users from having hundreds of lines of code within a class definition, which may negatively impact the educational side for peoples less experienced in programming. As such, methods are natively wrappers of simple functions located within a small number of files in the layer's directory. Note that custom layers or further development do not need to stick this rule. .. autoclass:: epynn.template.models.Template :show-inheritance: .. automethod:: __init__ Shapes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. automethod:: epynn.template.models.Template.compute_shapes .. literalinclude:: ./../epynn/template/parameters.py :pyobject: template_compute_shapes Within a *Template* pass-through layer, shapes of interest include: * Input *X* of shape *(m, ...)* with *m* equal to the number of samples. The number of input dimensions is unknown. * The number of features *n* per sample can still be determined formally: it is equal to the size of the input *X* divided by the number of samples *m*. Initialize parameters ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. automethod:: epynn.template.models.Template.initialize_parameters .. literalinclude:: ./../epynn/template/parameters.py :pyobject: template_initialize_parameters A pass-through layer is not a *trainable* layer. It has no *trainable* parameters such as weight *W* or bias *b*. Therefore, there are no parameters to initialize. Forward ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. automethod:: epynn.template.models.Template.forward .. literalinclude:: ./../epynn/template/forward.py :pyobject: template_forward The forward propagation function in a *Template* pass-through layer *k* includes: * (1): Input *X* in current layer *k* is equal to the output *A* of previous layer *k-1*. * (2): Output *A* of current layer *k* is equal to input *X*. Note that: * A pass-through layer is, by definition, a layer that does nothing except forwarding the input from previous layer to next layer. Mathematically, the forward propagation is a function which takes a matrix *X* of any dimension as input and returns another matrix *A* of any dimension as output, such as: .. math:: \begin{align} A = f(X) \end{align} .. math:: \begin{align} where~f~is~defined~as: & \\ f:\mathcal{M}_{m,d_1...d_n}(\mathbb{R}) & \to \mathcal{M}_{m,d_1...d_{n'}}(\mathbb{R}) \\ X & \to f(X) \\ with~n,~n' \in \mathbb{N}^* & \end{align} Backward ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. automethod:: epynn.template.models.Template.backward .. literalinclude:: ./../epynn/template/backward.py :pyobject: template_backward The backward propagation function in a *Template* pass-through layer *k* includes: * (1): *dA* the gradient of the loss with respect to the output of forward propagation *A* for current layer *k*. It is equal to the gradient of the loss with respect to input of forward propagation for next layer *k+1*. * (2): The gradient of the loss *dX* with respect to the input of forward propagation *X* for current layer *k* is equal to *dA*. Note that: * A pass-through layer is, as said above, a layer that does nothing except forwarding the input from previous layer to next layer. When speaking about backward propagation, the layer receives an input from the **next** layer which is sent **backward** to the **previous** layer. Mathematically, the backward propagation is a function which takes a matrix *dA* of any dimension as input and returns another matrix *dX* of any dimension as output, such as: .. math:: \begin{align} \delta^{k} = \frac{\partial \mathcal{L}}{\partial X^{k}} = f(\frac{\partial \mathcal{L}}{\partial A^{k}}) \end{align} .. math:: \begin{align} where~f~is~defined~as: & \\ f:\mathcal{M}_{m,d_1...d_{n'}}(\mathbb{R}) & \to \mathcal{M}_{m,d_1...d_n}(\mathbb{R}) \\ X & \to f(X) \\ with~n',~n \in \mathbb{N}^* & \end{align} Note that: * The *f()* parameter *dA* is the partial derivative of the loss with respect to output *A* of layer *k*. * The output of *f(dA)* is the partial derivative of the loss with respect to the input *X* of layer *k*. * The shape of the output of forward propagation *A* is identical to the shape of the input of backward propagation *dA*. * The shape of the output of backward propagation *dX* is identical to the shape of the input of forward propagation *X*. * The expression *partial derivative of the loss with respect to* is equivalent to *gradient of the loss with respect to*. Gradients ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. automethod:: epynn.template.models.Template.compute_gradients .. literalinclude:: ./../epynn/template/parameters.py :pyobject: template_compute_gradients A pass-through layer is not a *trainable* layer. It has no *trainable* parameters such as weight *W* or bias *b*. Therefore, there are no parameters gradients to compute. Update parameters ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. automethod:: epynn.template.models.Template.update_parameters .. literalinclude:: ./../epynn/template/parameters.py :pyobject: template_update_parameters A pass-through layer is not a *trainable* layer. It has no *trainable* parameters such as weight *W* or bias *b*. Therefore, there are no parameters to update. **Note that:** For trainable layers, this function is always identical regardless of the layer architecture. Therefore, it is not explicitly documented in the corresponding documentation pages. See :py:func:`epynn.dense.parameters.dense_update_parameters()` for an explicit example. .. _layer_se_hPars: Layer Hyperparameters ------------------------------ .. autoclass:: epynn.settings.se_hPars .. literalinclude:: ../epynn/settings.py :language: python :start-after: HYPERPARAMETERS SETTINGS **Schedule learning rate** * ``learning_rate``: Greater values mean faster learning **but** at risk of the **vanishing or exploding** gradient problem and/or to be trapped in a **local minima**. If too low, the Network may never converge. Optimal values are usually in the - wide - range 1-10\ :sup:`-6`\ but it strongly depends on the network architecture and data. * ``schedule``: Learning rate scheduler. See :py:mod:`epynn.commons.schedule`. Three schedulers are natively provided with EpyNN with corresponding schedules shown on the plot below. The principle of decay-based learning rate scheduling is to facilitate the Gradient Descent (GD) procedure to reach a minima by implementing fine tuning along with training epochs. It may also allow for greater initial learning rates to converge faster with a learning rate decay which may prevent the vanishing or exploding gradient problem. * ``decay_k``: Simply the decay rate at which the learning rate decreases with respect to the decay function. * ``cycle_epochs``: If willing to use *cycling learning rate* this represents the number of epochs per cycle. One schedule will basically repeat *n* times with respect to the total number of training epochs. * ``cycle_descent``: If using *cycling learning rate*, the initial learning rate at the beginning of each new cycle is diminished by the descent coefficient. .. image:: schedule.png **Tune activation function** * ``ELU_alpha``: Coefficient used in the ELU activation function. * ``LRELU_alpha``: Coefficient used in the Leky ReLU activation function. * ``softmax_temperature``: Temperature factor for softmax activation function. When used in the output layer, higher values diminish confidence in the prediction, which may represent a benefit to prevent exploding or vanishing gradients or which will pictorially smooth the output probability distribution. .. _Dense: Dense.rst .. _EpyNN model: EpyNN_Model.rst