[딥러닝초급]Maximum Pooling

로봇-AI

by happynaraepapa 2025. 2. 28. 13:21

source: https://www.kaggle.com/code/ryanholbrook/maximum-pooling

Introduction
In Lesson 2 we began our discussion of how the base in a convnet performs feature extraction. We learned about how the first two operations in this process occur in a Conv2D layer with relu activation.
앞 강의에서 컨브이넷이 어떻게 피쳐 추출을 하는지 논의했다. 그리고 Conv2D layer 추출 과정에서 ReLU 활성화함수를 사용하는 것도 배웠다.

In this lesson, we'll look at the third (and final) operation in this sequence: condense with maximum pooling, which in Keras is done by a MaxPool2D layer.

이번 레슨에서는 언급했던 추출과정 중 3번째(마지막) 단계를 언급할 것인데, Maximum Pooling을 이용한 condense layer (컨덴스 레이어, 집약 레이어) 이고 MaxPool2D 함수를 사용할 것이다.

Condense with Maximum Pooling
Adding condensing step to the model we had before, will give us this:
기존 모델에 콘덴싱 스텝을 추가해보자.
#파이썬 코드
from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential([
    layers.Conv2D(filters=64, kernel_size=3), # activation is None
    layers.MaxPool2D(pool_size=2),
    # More layers follow
])

A MaxPool2D layer is much like a Conv2D layer, except that it uses a simple maximum function instead of a kernel, with the pool_size parameter analogous to kernel_size. A MaxPool2D layer doesn't have any trainable weights like a convolutional layer does in its kernel, however.
MaxPool2D 레이어는 Conv2D 레이어와 비슷해 보이지만 parameter는최대 pool 사이즈(=커널 사이즈)를 정의한 pool_size 만 주어지고, 합성곱 레이어처럼 트레이닝되는 가중치는 없다.

Let's take another look at the extraction figure from the last lesson. Remember that MaxPool2D is the Condense step.
지난 레슨때 추출된 이미지를 다시한번 보자. MaxPool2D가 컨덴싱 스텝인 것 기억하자.

An example of the feature extraction process.
Notice that after applying the ReLU function (Detect) the feature map ends up with a lot of "dead space," that is, large areas containing only 0's (the black areas in the image). Having to carry these 0 activations through the entire network would increase the size of the model without adding much useful information. Instead, we would like to condense the feature map to retain only the most useful part -- the feature itself.
이 피쳐 추출 예시에서 ReLU 함수(인지 - detect)를 적용하고 나면 피쳐 맵에 수많은 'Dead space; 죽은 공간; 빈 공간; Output=0'으로 채워진다. 해당 공간은 output이 0인데, 이 값이 신경망 전체를 통과하더라도 값이 0이기 때문에 의미있는 정보를 주지도 않으면서 데이터 공간만 차지하는 경우가 된다.
따라서 컨덴싱한다는 것, 응축한다는 의미는 이런 죽은 공간이 존재하는 데이터를 의미있는 공간만 남겨서 응축된 데이터로 만든다는 의미다.

This in fact is what maximum pooling does. Max pooling takes a patch of activations in the original feature map and replaces them with the maximum activation in that patch.

이것이 Max pooliing의 동작인데 애초의 피쳐 맵에서 활성화된 패치를 받은 뒤 이것을 패치 내의 최대 활성화값으로 변경해준다.

Maximum pooling replaces a patch with the maximum value in that patch.
Max Pooling에 의해서 patch 내의 활성화 최대값으로 대체 된 것을 볼 수 있다.

When applied after the ReLU activation, it has the effect of "intensifying" features. The pooling step increases the proportion of active pixels to zero pixels.
ReLU 활성화 함수를 사용하면 당겨진 스텝에 의해 활성화된 픽셀 대비 제로, 픽셀 의 수량 비를 더 늘릴 수 았다.

Example - Apply Maximum Pooling
맥시멈 풀링 적용 사례

Let's add the "condense" step to the feature extraction we did in the example in Lesson 2. This next hidden cell will take us back to where we left off.
앞에서 다루었던 피쳐 추출 문제에 컨덴싱 레이어를 달아 보자. (접힌 부분은 앞서 진행한 코드임. 원문 참조)

Poolstep
이번에는 풀링 스텝(poolstep)을 적용하기 위해 tf.nn에 있는 레이어를 활용할 거다.  이 파이썬 함수는 MaxPool2D와 하는 일은 유사하다.

#파이썬코드
import tensorflow as tf
image_condense = tf.nn.pool(
    input=image_detect, # image in the Detect step above
    window_shape=(2, 2),
    pooling_type='MAX',
    # we'll see what these do in the next lesson!
    strides=(2, 2),
    padding='SAME',
)

plt.figure(figsize=(6, 6))
plt.imshow(tf.squeeze(image_condense))
plt.axis('off')
plt.show();
====
Pretty cool! Hopefully you can see how the pooling step was able to intensify the feature by condensing the image around the most active pixels.
이 결과에서 풀링 스텝이 활성화된 피쳐들을 어떻게 강화하고 어떻게 압축해내는지 보길 바란다.

Translation Invariance
이동불변?
We called the zero-pixels "unimportant". Does this mean they carry no information at all? In fact, the zero-pixels carry positional information. The blank space still positions the feature within the image. When MaxPool2D removes some of these pixels, it removes some of the positional information in the feature map. This gives a convnet a property called translation invariance.

This means that a convnet with maximum pooling will tend not to distinguish features by their location in the image. ("Translation" is the mathematical word for changing the position of something without rotating it or changing its shape or size.)

우리는 앞서서 제로 픽셀이 '중요하지 않다.'라고 말했다. 그러면 그 픽셀들은 정말로 중요하지 않은 정보만 담고 있을까? 사실 그 픽셀들은 값이 0이지만 여전히 이미지내에서 그 피쳐의 위치정보를 가지고 있다.
그런데 MaxPool2D로 맥스 풀링을 해서 이 픽셀들 중 일부를 제거하면 피쳐맵에서 위치 정보가 사라지게 된다. 이러한 현상을 Translation invariance (이걸 어떻게 번역할지는 고민이 필요하다. )라는 특성으로 이야기한다.
이 특성은 맥시멈 풀링을 한 컨브이넷이 이미지에서 피쳐를 위치 정보로 파악하지 않는다는 것을 의미???
~~(여기서 Translation 은 수학적 변환이나 회전없는 이동을 의미.)~~

Watch what happens when we repeatedly apply maximum pooling to the following feature map.
아래 피쳐맵에 반복적으로 맥시멈풀링을 가하면 어떻게 되는지 살펴보자.

Pooling tends to destroy positional information.
앞서 이야기 한것처럼 풀링은 위치정보를 파괴하는 경향이 있다고 했다.

The two dots in the original image became indistinguishable after repeated pooling. In other words, pooling destroyed some of their positional information. Since the network can no longer distinguish between them in the feature maps, it can't distinguish them in the original image either: it has become invariant to that difference in position.
이미지에 있던 가까운 2개의 점은 맥스풀링을 거듭하자 구분할 수 없는 하나의 점으로 변경되었다. 달리 말하자면 두개의 점이 각각의 위치 정보를 잃어버렸다고 할 수 있다. 이 때 신경망은 피쳐맵에서 두개의 점을 더이상 구분못하고 즉 두점이 다르다는 것을 인지하지 못한다. 위치값의 입력에도 출력값은 동일한 불변성 (Translation invariant)을 갖게 된다.
#일단 Translation invariant는 불변성으로 번역했고, 반대는 Translation equivariant 로 가변성으로 번역.

In fact, pooling only creates translation invariance in a network over small distances, as with the two dots in the image. Features that begin far apart will remain distinct after pooling; only some of the positional information was lost, but not all of it.
실제로 신경망에서 가까운 거리를 두고 풀링 pooling 을 계속하면 불변성으로 인해 가까운 두 점은 위치 구분을 잃게 되고, 멀리 떨어진 두점은 위치 차이를 유지한다.

But only over small distances. Two dots far apart stay separated
하지만 매우 작은 거리를 가진 두 점이라면 오히려 분리된 채로 남을 수 있게 되는데,

This invariance to small differences in the positions of features is a nice property for an image classifier to have. Just because of differences in perspective or framing, the same kind of feature might be positioned in various parts of the original image, but we would still like for the classifier to recognize that they are the same. Because this invariance is built into the network, we can get away with using much less data for training: we no longer have to teach it to ignore that difference. This gives convolutional networks a big efficiency advantage over a network with only dense layers. (You'll see another way to get invariance for free in Lesson 6 with Data Augmentation!)
이렇게 아주 작은 거리에 대하여 불변성은 이미지의 분류기법에서 매우 중요한 성질이다.

구도나 프레이밍으로 인해서 같은 대상의 이미지 피쳐들이 다양한 포지션에 놓일 수 있는데, 모두 개별 피쳐로 구분해서 학습하지 않고도 같은 이미지로 구분한다면? -> 이 불변성으로 인해 가능하고 이미지 분류 모델에서 더 적은 데이터와 학습량으로 인지가 가능하게 하는 중요한 특성. (이에 대해서는 나중에 Data Augmentation에서 배운다.)

'로봇-AI' 카테고리의 다른 글

[딥러닝기초]Custom Convnet (0)	2025.03.07
[딥러닝초급]Data Augmentation (0)	2025.03.07
[딥러닝초급] Convnet + Relu (0)	2025.02.26
[딥러닝초급]the convolutional classifier (0)	2025.02.24
[딥러닝기초]Binary Classification (0)	2025.02.19