Pooling!? What Does That Imply?

Pooling In Convnets

In a Convolutional Neural Network there are 2 main processes, the first is the process of convolution which I wrote a post about here, while the second is pooling. If you are familiar with CNNs you'll probably notice that I've resisted the urge to use the word 'Max Pooling', that's because the pooling process isn't quite exclusively based on maximums.

Pooling is a process in Convolutional Neural Networks where feature maps are downsampled. To put it in literal terms, the size of features/images generated in a layer is reduced to allow for better generalisation of its extracted features.

The pooling process involves sliding over a predefined patch of an image's pixels and returning the maximum value in the patch (max pooling) or the average value of all pixels in the patch (average pooling) as a single pixel of it's own. The code below replicates the max pooling process.

Downsampling

So where does the downsampling/size reduction come in? If a image is pooled in progressive 2x2 patches, the resulting image will effectively be half the size of the original image. Take for instance if you have an image of resolution 100 pixels x 100 pixels, when pooled with a 2x2 kernel the image becomes 50 pixels x 50 pixels.

Now what's quite interesting about the pooling process is that the overall structure of the image is preserved, it's not cropped in any manner, it just highlights the most important pixels in an image/feature map (max pooling) or generates a general representation of all pixels (average pooling).

Consider the cat image below, this image originally has a dimension 250 pixels by 440 pixels, when max pooled it becomes 125 pixels by 220 pixels with the image retaining its general features albeit with some pixelation beginning to set in (down to low resolution).

When this is repeated a couple more times, the image begins to get heavily pixelated but notice now the general structure of the cat is preserved.

The Lesser Known Sibling

Average pooling behaves similarly but there's a difference in the sort of downsampled image generated. Average pooling tends to return a cooler downsampled representation while max pooling returns a brighter representation. This is sort of expected as max pooling returns the maximum (brightest) pixel in a kernel whereby average pooling returns an average of pixels.

Pooling Feature Maps

In a Convolutional Neural Network, after convolution feature maps are pooled to return a downsampled or smaller representation which properly approximates the pixels of the feature map.

As seen in the image above, max pooling returns a brighter downsampled representation of the detected pixels.

Pooling!? What Does That Imply?

Why feature maps are pooled in a CNN.

Pooling In Convnets

Downsampling

The Lesser Known Sibling

Pooling Feature Maps