Pooling (CNN)

Source files in EpyNN/epynn/pooling/.

See Appendix - Notations for mathematical conventions.

Layer architecture

Pooling

A Pooling layer is an object which can be seen as a data compression layer. It is provided with a pool_size and strides argument upon instantiation. They define, respectively, the dimension of the window used for the pooling operation and the steps by which the window is moved between each operation.

class epynn.pooling.models.Pooling(pool_size=(2, 2), strides=None, pool=<function amax>)[source]

Bases: epynn.commons.models.Layer

Definition of a pooling layer prototype.

Parameters
  • pool_size (int or tuple[int], optional) – Height and width for pooling window, defaults to (2, 2).

  • strides (int or tuple[int], optional) – Height and width to shift the pooling window by, defaults to None which equals pool_size.

  • pool (function, optional) – Pooling activation of units, defaults to np.max(). Use one of min or max pooling.

Shapes

Pooling.compute_shapes(A)[source]

Wrapper for epynn.pooling.parameters.pooling_compute_shapes().

Parameters

A (numpy.ndarray) – Output of forward propagation from previous layer.

def pooling_compute_shapes(layer, A):
    """Compute forward shapes and dimensions from input for layer.
    """
    X = A    # Input of current layer

    layer.fs['X'] = X.shape    # (m, h, w, d)

    layer.d['m'] = layer.fs['X'][0]    # Number of samples      (m)
    layer.d['h'] = layer.fs['X'][1]    # Height of features map (h)
    layer.d['w'] = layer.fs['X'][2]    # Width of features map  (w)
    layer.d['d'] = layer.fs['X'][3]    # Depth of features map  (d)

    # Output height (oh) and width (ow)
    layer.d['oh'] = math.floor((layer.d['h']-layer.d['ph']) / layer.d['sh']) + 1
    layer.d['ow'] = math.floor((layer.d['w']-layer.d['pw']) / layer.d['sw']) + 1

    return None

Within a Pooling layer, shapes of interest include:

  • Input X of shape (m, h, w, d) with m equal to the number of samples, h the height of features, w the width of features and d the depth of features.

  • Output height oh defined by the ratio of the difference between h and pooling height ph over the stride height sh. The float ratio is diminished to the nearest lower integer on which the value one is added.

  • Output width ow defined by the ratio of the difference between w and pooling width pw over the stride width sw. The float ratio is diminished to the nearest lower integer on which the value one is added.

_images/pool1-01.svg

Forward

Pooling.forward(A)[source]

Wrapper for epynn.pooling.forward.pooling_forward().

Parameters

A (numpy.ndarray) – Output of forward propagation from previous layer.

Returns

Output of forward propagation for current layer.

Return type

numpy.ndarray

def pooling_forward(layer, A):
    """Forward propagate signal to next layer.
    """
    # (1) Initialize cache
    X = initialize_forward(layer, A)

    # (2) Slice input w.r.t. pool size (ph, pw) and strides (sh, sw)
    Xb = np.array([[X[ :, h:h + layer.d['ph'], w:w + layer.d['pw'], :]
                    # Inner loop
                    # (m, h, w, d) ->
                    # (ow, m, h, pw, d)
                    for w in range(layer.d['w'] - layer.d['pw'] + 1)
                    if w % layer.d['sw'] == 0]
                # Outer loop
                # (ow, m, h, pw, d) ->
                # (oh, ow, m, ph, pw, d)
                for h in range(layer.d['h'] - layer.d['ph'] + 1)
                if h % layer.d['sh'] == 0])

    # (3) Bring back m along axis 0
    Xb = layer.fc['Xb'] = np.moveaxis(Xb, 2, 0)
    # (oh, ow, m, ph, pw, d) ->
    # (m, oh, ow, ph, pw, d)

    # (4) Apply pooling operation on blocks
    Z = layer.pool(Xb, axis=(4, 3))
    # (m, oh, ow, ph, pw, d) - Xb
    # (m, oh, ow, ph, d)     - layer.pool(Xb, axis=4)
    # (m, oh, ow, d)         - layer.pool(Xb, axis=(4, 3))

    A = layer.fc['A'] = layer.fc['Z'] = Z

    return A    # To next layer
_images/pool2-01.svg

The forward propagation function in a Pooling layer k includes:

  • (1): Input X in current layer k with shape (m, h, w, d) is equal to the output A of previous layer k-1.

  • (2): Xb is an array of blocks with shape (oh, ow, m, ph, pw, d) made by iterative slicing of X with respect to ph, pw and sh, sw.

  • (3): Given Xb with shape (oh, ow, m, ph, pw, d) the operation moves the axis 2 to the position 0, yielding Xb with shape (m, oh, ow, ph, pw, d).

  • (4): The layer output Z with shape (m, oh, ow, d) is computed by pooling each input block within Xb with shape (m, oh, ow, ph, pw, d) over the blocks dimension on axis 4 and 3.

Note that:

  • This implementation is not the vanilla implementation of a Pooling layer. In the naive implementation, the process is fully iterative while here is depicted a stride groups optimized version that takes advantage of NumPy arrays.

  • To preserve code homogeneity across layers, the output of the Pooling layer is A which is equal to Z.

  • Note steps 1-3 are identical between the Convolution and Pooling layers. This is because both layers process the input X by blocks using window sizes and strides.

\[\begin{split}\begin{alignat*}{2} & x^{k}_{mhwd} &&= a^{\km}_{mhwd} \tag{1} \\ & \_Xb^{k} &&= blocks(X^{k}) \tag{2} \\ & Xb^{k} &&= moveaxis(\_Xb^{k}) \tag{3} \\ & Z^{k} &&= pool(Xb^{k}) \tag{4} \end{alignat*}\end{split}\]
\[\begin{split}\begin{alignat*}{2} & where~blocks~is~defined~as: \\ &~~~~~~~~~~~~~~~~~~~~~~~blocks:\mathcal{M}_{M,H,W,D}(\mathbb{R}) &&\to \mathcal{M}_{Oh,Ow,M,Fh,Fw,D}(\mathbb{R}) \\ &~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~X = \mathop{(x_{mhwd})}_{\substack{1 \le m \le M \\ 1 \le h \le H \\ 1 \le w \le W \\ 1 \le d \le D}} &&\to Y = \mathop{(y_{o_{h}o_{w}mf_hf_wd})}_{\substack{1 \le o_h \le Oh \\ 1 \le o_w \le Ow \\ 1 \le m \le M \\ 1 \le f_h \le Fh \\ 1 \le f_w \le Fw \\ 1 \le d \le D}} \\ &~~~~~~~~~~~~~~~~~~~~~~with~Fh, Fw, Sh, Sw \in &&~\mathbb{N^*_+} \\ &~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\forall{h} \in &&~\{1,..,H-Fh~|~h \pmod{Sh} = 0\} \\ &~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\forall{w} \in &&~\{1,..,W-Fw~|~h \pmod{Sw} = 0\} \\ &~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Y = && X[:, h:h+Fh, w:w+Fw:, :] \\ \\ & where~moveaxis~is~defined~as: \\ &~~~~~~~moveaxis:\mathcal{M}_{Oh,Ow,M,Fh,Fw,D}(\mathbb{R}) &&\to \mathcal{M}_{M,Oh,Ow,Fh,Fw,D}(\mathbb{R}) \\ &~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~X &&\to Y \\ \\ & where~pool~is~defined~as: \\ &~~~~~~~~~~~~~~~pool:\mathcal{M}_{M,Oh,Ow,Fh,Fw,D}(\mathbb{R}) &&\to \mathcal{M}_{M,Oh,Ow,D}(\mathbb{R}) \\ &~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~X = \mathop{(x_{mhwd})}_{\substack{1 \le m \le M \\ 1 \le o_h \le Oh \\ 1 \le o_w \le Ow \\ 1 \le f_h \le Fh \\ 1 \le f_w \le Fw \\ 1 \le d \le D}} &&\to Y = \mathop{(y_{mo_{h}o_{w}f_hf_wd})}_{\substack{1 \le m \le M \\ 1 \le o_h \le Oh \\ 1 \le o_w \le Ow \\ 1 \le d \le D}} \\ \\ &~~~~~~~~~~~~~~~~~~~~\forall{m, o_h, o_w, d} \in \{1,..,M\} \\ &~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\times \{1,..,Oh\} \\ &~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\times \{1,..,Ow\} \\ &~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\times \{1,..,D\}&& \\ \\ &~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~y_{mo_{h}o_{w}d} &&= \max\limits_{p_{h} = 1}^{Ph}\max\limits_{p_{w} = 1}^{Pw} x_{mo_{h}o_{w}p_{h}p_{w}d} \end{alignat*}\end{split}\]

Backward

Pooling.backward(dX)[source]

Wrapper for epynn.pooling.backward.pooling_backward().

Parameters

dX (numpy.ndarray) – Output of backward propagation from next layer.

Returns

Output of backward propagation for current layer.

Return type

numpy.ndarray

def pooling_backward(layer, dX):
    """Backward propagate error gradients to previous layer.
    """
    # (1) Initialize cache
    dA = initialize_backward(layer, dX)    # (m, oh, ow, d)

    # (2) Restore pooling block axes
    dZ = dA
    dZ = np.expand_dims(dZ, axis=3)
    dZ = np.expand_dims(dZ, axis=3)
    # (m, oh, ow, d)         ->
    # (m, oh, ow, 1, d)      ->
    # (m, oh, ow, 1, 1, d)

    # (3) Initialize backward output dL/dX
    dX = np.zeros_like(layer.fc['X'])      # (m, h, w, d)

    # Iterate over forward output height
    for oh in range(layer.d['oh']):

        hs = oh * layer.d['sh']
        he = hs + layer.d['ph']

        # Iterate over forward output width
        for ow in range(layer.d['ow']):

            ws = ow * layer.d['sw']
            we = ws + layer.d['pw']

            # (4hw) Retrieve input block
            Xb = layer.fc['Xb'][:, oh, ow, :, :, :]
            # (m, oh, ow, ph, pw, d)  - Xb (array of blocks)
            # (m, ph, pw, d)          - Xb (single block)

            # (5hw) Retrieve pooled value and restore block shape
            Zb = layer.fc['Z'][:, oh:oh+1, ow:ow+1, :]
            Zb = np.repeat(Zb, layer.d['ph'], axis=1)
            Zb = np.repeat(Zb, layer.d['pw'], axis=2)
            # (m, oh, ow, d)    - Z
            # (m,  1,  1, d)    - Zb -> np.repeat(Zb, pw, axis=1)
            # (m, ph,  1, d)         -> np.repeat(Zb, pw, axis=2)
            # (m, ph, pw, d)

            # (6hw) Match pooled value in Zb against Xb
            mask = (Zb == Xb)

            # (7hw) Retrieve gradient w.r.t Z and restore block shape
            dZb = dZ[:, oh, ow, :]
            dZb = np.repeat(dZb, layer.d['ph'], 1)
            dZb = np.repeat(dZb, layer.d['pw'], 2)
            # (m, oh, ow,  1,  1, d) - dZ
            #         (m,  1,  1, d) - dZb -> np.repeat(dZb, ph, axis=1)
            #         (m, ph,  1, d)       -> np.repeat(dZb, pw, axis=2)
            #         (m, ph, pw, d)

            # (8hw) Keep dXb values for coordinates where Zb = Xb (mask)
            dXb = dZb * mask

            # (9hw) Gradient of the loss w.r.t Xb
            dX[:, hs:he, ws:we, :] += dXb
            # (m, ph, pw, d) - dX[:, hs:he, ws:we, :]
            # (m, ph, pw, d) - dXb

    layer.bc['dX'] = dX

    return dX
_images/pool3-01.svg

The backward propagation function in a Pooling layer k includes:

  • (1): dA with shape (m, oh, ow, u) is the gradient of the loss with respect to the output of forward propagation A for current layer k. It is equal to the gradient of the loss with respect to input of forward propagation for next layer k+1.

  • (2): Two new axes are added before the last axis of dZ with shape (m, oh, ow, d) to reintroduce the dimensions corresponding to Xb with shape (m, oh, ow, ph, pw, d).

  • (3): The gradient of the loss dX with respect to the input of forward propagation X for current layer k is initialized as a zero array of the same shape as X.

  • (4hw): For each row h with respect to oh and for each column w with respect to ow, the input block Xb of coordinates [:, h, w, :, :, :] is retrieved.

  • (5hw): For each row h with respect to oh and for each column w with respect to ow, values pooled from Xb are retrieved from Z to yield Zb. Repeat operations are applied on Zb in order to reconstruct the shape of Xb.

  • (6hw): The mask is an array of the same shape as X and Zb. The conditional expression (X == Zx) will return 1 for any value in pooled in Zb from Xb with respect to X coordinates.

  • (7hw): dZb is retrieved from dZ with respect to h and w. Repeat operations are applied on dZb in order to reconstruct the shape of mask.

  • (8hw): dXb is the product of dXb by mask. In fact, dXb is equal to zero for all but not coordinates which correspond to the pooled value.

  • (9hw): dXb is the gradient of the loss with respect to Xb and is added to dX with respect to the current window coordinates.

Note that:

  • One window represents an ensemble of coordinates. One value with given coordinates may be part of more than one window. This is why the operation dX[:, hs:he, ws:we, :] += dXb is used instead of dX[:, hs:he, ws:we, :] = dXb.

\[\begin{split}\begin{alignat*}{2} & \delta^{\kp}_{mo_{h}o_{w}d} &&= \frac{\partial \mathcal{L}}{\partial a^{k}_{mo_{h}o_{w}d}} = \frac{\partial \mathcal{L}}{\partial x^{\kp}_{mo_{h}o_{w}d}} \tag{1} \\ & \frac{\partial \mathcal{L}}{\partial zb^{k}_{mo_{h}o_{w}11d}} &&= expand\_dims(\frac{\partial \mathcal{L}}{\partial a^{k}_{mo_{h}o_{w}d}}) \tag{2} \\ & \frac{\partial \mathcal{L}}{\partial X^{k}} &&= [\frac{\partial \mathcal{L}}{\partial x^{k}_{mhwd}}] \in \{0\}^{M \times H \times W \times D} \tag{3} \\ \\ & xb^{k<o_{h}o_{w}>}_{mp_{h}p_{w}d} &&= Xb^{k}_{mo_{h}o_{w}p_{h}p_{w}d}[:, h, w, :, :, :] \tag{4hw} \\ \\ & \_zb^{k<o_{h}o_{w}>}_{m11d} &&= z^{k}_{mo_{h}o_{w}d}[:, h:h+1, w:w+1, :] \tag{5.1hw} \\ & Zb^{k} &&= repeat(\_Zb) \tag{5.2hw} \\ \\ & M &&= \begin{cases} 1, & Xb = Zb, \\ 0, & Xb \ne Zb \end{cases} \tag{6hw} \\ \\ & \frac{\partial \mathcal{L}}{\partial \_zb^{k<o_{h}o_{w}>}_{m11d}} &&= \frac{\partial \mathcal{L}}{\partial z^{k}_{mo_{h}o_{w}11d}}[:, h, w, :, :, :] \tag{7.1hw} \\ & \frac{\partial \mathcal{L}}{\partial Zb^{k<o_{h}o_{w}>}} &&= repeat(\frac{\partial \mathcal{L}}{\partial \_Zb^{k<o_{h}o_{w}>}}) \tag{7.2hw} \\ \\ & \frac{\partial \mathcal{L}}{\partial Xb^{k<o_{h}o_{w}>}} &&= M * \frac{\partial \mathcal{L}}{\partial Zb^{k<o_{h}o_{w}>}} \tag{8hw} \\ \\ & \Delta\frac{\partial \mathcal{L}}{\partial X^{k<h_s:h_e,w_s:w_e>}_{mp_hp_wd}} &&= \frac{\partial \mathcal{L}}{\partial Xb^{k<o_{h}o_{w}>}_{mp_hp_wd}} \tag{9hw} \\ \end{alignat*}\end{split}\]
\[\begin{split}\begin{alignat*}{2} & where~expand\_dims~is~defined~as: \\ &~~~~~~expand\_dims:\mathcal{M}_{M,Oh,Ow,D}(\mathbb{R}) &&\to \mathcal{M}_{M,Oh,Ow,1,1,D}(\mathbb{R}) \\ &~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~X &&\to Y \\ \\ & where~repeat~is~defined~as: \\ &~~~~~~~~~~~~~~~~~~~~repeat:\mathcal{M}_{M,1,1,D}(\mathbb{R}) &&\to \mathcal{M}_{M,Ph,Pw,D}(\mathbb{R}) \\ &~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~X = \mathop{(x_{m11d})}_{\substack{1 \le m \le M \\ 1 \le d \le D}} &&\to Y = \mathop{(y_{mo_{h}o_{w}f_hf_wd})}_{\substack{1 \le m \le M \\ 1 \le p_h \le Ph \\ 1 \le p_w \le Pw \\ 1 \le d \le D}} \\ &~~~~~~~~~~~~~~~\forall{m, p_h, p_w, d} \in \{1,..,M\} \\ &~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\times \{1,..,Ph\} \\ &~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\times \{1,..,Pw\} \\ &~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\times \{1,..,D\} \\ \\ &~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~y_{mp_{h}p_{w}d} &&= x_{m11d} \end{alignat*}\end{split}\]

Gradients

Pooling.compute_gradients()[source]

Wrapper for epynn.pooling.parameters.pooling_compute_gradients(). Dummy method, there are no gradients to compute in layer.

The Pooling layer is not a trainable layer. It has no trainable parameters such as weight W or bias b. Therefore, there is no parameters gradients to compute.