[딥러닝기초]Custom Convnet

로봇-AI

by happynaraepapa 2025. 3. 7. 13:53

sources :
Introduction
Now that you've learned the fundamentals of convolutional classifiers, you're ready to move on to more advanced topics.

In this lesson, you'll learn a trick that can give a boost to your image classifiers: it's called data augmentation.

이번 강의에서는 개발 해오던 이미지 분류기 성능을 업그레이드 해줄 수 있는 트릭을 배울것이다. 이번에 배울 트릭은 Data augmentation이다.

augmentation 증대, 증가, 증대율
Data augmentation 데이터증강
The Usefulness of Fake Data
The best way to improve the performance of a machine learning model is to train it on more data. The more examples the model has to learn from, the better it will be able to recognize which differences in images matter and which do not. More data helps the model to generalize better.

머신러닝 모델의 성능을 증대시킬 수 있는 최선의 방법 중 하나는 단순히 더 많은 데이터를 학습시키는 것이다. 더 많은 사례 학습을 통해서 이미지를 더 잘 인식하고 구분할 수 있게 된다. 그래서 더 많은 데이터는 일반적으로 모델이 더 '일반화'되는데 도움을 준다고 할 수 있다.

One easy way of getting more data is to use the data you already have. If we can transform the images in our dataset in ways that preserve the class, we can teach our classifier to ignore those kinds of transformations. For instance, whether a car is facing left or right in a photo doesn't change the fact that it is a Car and not a Truck. So, if we augment our training data with flipped images, our classifier will learn that "left or right" is a difference it should ignore.

그러면 어디서 더 많은 데이터를 얻을 것인가? 가장 손쉬운 방법 중 하나는 바로 이미 가지고 있는 데이터를 활용하는 것이다. 만약 우리가 어떤 데이터셋을 카피한 뒤 변환하되 분류된 클래스는 유지하는 변환방법으로 데이터를 늘린다면 우리는 이러한 변환방법을 무시할 수 있도록 (클래스를 정확히 고르도록) 모델을 학습시킬 수 있다.

예를 들어, 현재 다루고 있는 car or truck 에서 어떤 자동차 사진을 상하나 좌우로 뒤집는다고 해서 해당 사진이 승용차(car) 인지 트럭(truck)인지는 변하지 않는다. 따라서 우리는 뒤집힌 사진을 추가하여 트레이닝에 필요한 데이터를 증강(augment)할 수 있다.

And that's the whole idea behind data augmentation: add in some extra fake data that looks reasonably like the real data and your classifier will improve.

Using Data Augmentation
데이터 증강 적용
Typically, many kinds of transformation are used when augmenting a dataset. These might include rotating the image, adjusting the color or contrast, warping the image, or many other things, usually applied in combination. Here is a sample of the different ways a single image might be transformed. 통상 이미지 데이터를 증강 시키는 데에는 다양하 방법이 사용됨. 회전, 칼라 컨트라스트를 조정, 이미지를 이동 등등. 또는 여러 방법을 동시에 적용하기도 한다. 아래는 이미지 변형의 예시다.

Sixteen transformations of a single image of a car.
Data augmentation is usually done online, meaning, as the images are being fed into the network for training. Recall that training is usually done on mini-batches of data. This is what a batch of 16 images might look like when data augmentation is used. 트레이닝을 위한 데이터가 네트워크에서 들어오는 경우, 데이터 증강은 통상 온라인으로 이루어지는데, 데이터의 미니 배치들 (mini-batches)로 이루어지며 아래는 데이터 증강에 의해 16개의 이미지가 배치를 이룬 것.

A batch of 16 images with various random transformations applied.
Each time an image is used during training, a new random transformation is applied. This way, the model is always seeing something a little different than what it's seen before. This extra variance in the training data is what helps the model on new data. 트레이닝 하는 동안, 각 이미지에 대한 새로운 랜덤 변환이 이루어지고 각 모델은 항상 이전과 약간 다른 이미지를 입력으로 얻는다. 이러한 약간의 변경이 가해진 트레이닝 데이터는 새로운 데이터에 대한 모델 성능을 높여준다.

It's important to remember though that not every transformation will be useful on a given problem. Most importantly, whatever transformations you use should not mix up the classes. If you were training a digit recognizer, for instance, rotating images would mix up '9's and '6's. In the end, the best approach for finding good augmentations is the same as with most ML problems: try it and see! 모든 변환이 모델 성능에 도움을 주는 것은 아니다. 그리고 어떤 변환에서도 클래스를 섞어 버리면 안된다. 예를 들어 digit recognizer를 학습한다고 했을때, 9 숫자를 회전시킨 이미지와 6 을 섞어버린다면 문제가 된다. 결국. 좋은 data augmentation 을 실행하는 것은 모든 머신러닝 문제와 같은 문제로 귀결된다. Try and See~!

Example - Training with Data Augmentation
Keras lets you augment your data in two ways. The first way is to include it in the data pipeline with a function like ImageDataGenerator. The second way is to include it in the model definition by using Keras's preprocessing layers. This is the approach that we'll take. The primary advantage for us is that the image transformations will be computed on the GPU instead of the CPU, potentially speeding up training. 케라스는 데이터 증강에서 2가지 방법을 제공한다. 하나는 ImageDataGenerator와 같은 함수를 데이터 파이프라인에 사용하는 방법이 있고, 다른 하나는 preprocessing layer를 이용하여 모델을 정의하는 것이다. 우리는 후자를 사용할 것이다. --> 가장 큰 이점은 이미지 변환을 GPU로 처리할 수 있으므로 스피드에 이점이 있다.

In this exercise, we'll learn how to improve the classifier from Lesson 1 through data augmentation. This next hidden cell sets up the data pipeline. 아래 예시에서는 데이터 증강을 통해 앞서 개발한 이미지 분류기 성능을 어떻게 개선할 수 있는지 살펴볼 것이다. 다음 내용은 데이터 파이프라인 셋업 내용이다.

Step 2 - Define Model
To illustrate the effect of augmentation, we'll just add a couple of simple transformations to the model from Tutorial 1. 데이터 증강효과를 보기 위해 튜토리얼 1에서 간단한 변환을 더해서 비교.

from tensorflow import keras
from tensorflow.keras import layers
# these are a new feature in TF 2.2
from tensorflow.keras.layers.experimental import preprocessing

pretrained_base = tf.keras.models.load_model(
    '../input/cv-course-models/cv-course-models/vgg16-pretrained-base',
)
pretrained_base.trainable = False

model = keras.Sequential([
    # Preprocessing
    preprocessing.RandomFlip('horizontal'), # flip left-to-right
    preprocessing.RandomContrast(0.5), # contrast change by up to 50%
    # Base
    pretrained_base,
    # Head
    layers.Flatten(),
    layers.Dense(6, activation='relu'),
    layers.Dense(1, activation='sigmoid'),
])
Step 3 - Train and Evaluate
And now we'll start the training! 트레이닝 시작

model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['binary_accuracy'],
)

history = model.fit(
    ds_train,
    validation_data=ds_valid,
    epochs=30,
    verbose=0,
)
import pandas as pd

history_frame = pd.DataFrame(history.history)

history_frame.loc[:, ['loss', 'val_loss']].plot()
history_frame.loc[:, ['binary_accuracy', 'val_binary_accuracy']].plot();
The training and validation curves in the model from Tutorial 1 diverged fairly quickly, suggesting that it could benefit from some regularization. The learning curves for this model were able to stay closer together, and we achieved some modest improvement in validation loss and accuracy. This suggests that the dataset did indeed benefit from the augmentation.

Your Turn
Move on to the Exercise to apply data augmentation to the custom convnet you built in Lesson 5. This will be your best model ever!

Have questions or comments? Visit the course discussion forum to chat with other learners.

Features extracted from an image of a car, from simple to refined.
Convolutional Blocks
It does this by passing them through long chains of convolutional blocks which perform this extraction.

Extraction as a sequence of blocks.
These convolutional blocks are stacks of Conv2D and MaxPool2D layers, whose role in feature extraction we learned about in the last few lessons.

A kind of extraction block: convolution, ReLU, pooling.
Each block represents a round of extraction, and by composing these blocks the convnet can combine and recombine the features produced, growing them and shaping them to better fit the problem at hand. The deep structure of modern convnets is what allows this sophisticated feature engineering and has been largely responsible for their superior performance.

Example - Design a Convnet
Let's see how to define a deep convolutional network capable of engineering complex features. In this example, we'll create a Keras Sequence model and then train it on our Cars dataset.

Step 1 - Load Data
This hidden cell loads the data.

Step 2 - Define Model
Here is a diagram of the model we'll use:

Diagram of a convolutional model.
Now we'll define the model. See how our model consists of three blocks of Conv2D and MaxPool2D layers (the base) followed by a head of Dense layers. We can translate this diagram more or less directly into a Keras Sequential model just by filling in the appropriate parameters.

from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential([

    # First Convolutional Block
    layers.Conv2D(filters=32, kernel_size=5, activation="relu", padding='same',
                  # give the input dimensions in the first layer
                  # [height, width, color channels(RGB)]
                  input_shape=[128, 128, 3]),
    layers.MaxPool2D(),

    # Second Convolutional Block
    layers.Conv2D(filters=64, kernel_size=3, activation="relu", padding='same'),
    layers.MaxPool2D(),

    # Third Convolutional Block
    layers.Conv2D(filters=128, kernel_size=3, activation="relu", padding='same'),
    layers.MaxPool2D(),

    # Classifier Head
    layers.Flatten(),
    layers.Dense(units=6, activation="relu"),
    layers.Dense(units=1, activation="sigmoid"),
])
model.summary()
Notice in this definition is how the number of filters doubled block-by-block: 32, 64, 128. This is a common pattern. Since the MaxPool2D layer is reducing the size of the feature maps, we can afford to increase the quantity we create.

Step 3 - Train
We can train this model just like the model from Lesson 1: compile it with an optimizer along with a loss and metric appropriate for binary classification.

model.compile(
    optimizer=tf.keras.optimizers.Adam(epsilon=0.01),
    loss='binary_crossentropy',
    metrics=['binary_accuracy']
)

history = model.fit(
    ds_train,
    validation_data=ds_valid,
    epochs=40,
    verbose=0,
)
import pandas as pd

history_frame = pd.DataFrame(history.history)
history_frame.loc[:, ['loss', 'val_loss']].plot()
history_frame.loc[:, ['binary_accuracy', 'val_binary_accuracy']].plot();
This model is much smaller than the VGG16 model from Lesson 1 -- only 3 convolutional layers versus the 16 of VGG16. It was nevertheless able to fit this dataset fairly well. We might still be able to improve this simple model by adding more convolutional layers, hoping to create features better adapted to the dataset. This is what we'll try in the exercises.

Conclusion
In this tutorial, you saw how to build a custom convnet composed of many convolutional blocks and capable of complex feature engineering.

Your Turn
In the exercises, you'll create a convnet that performs as well on this problem as VGG16 does -- without pretraining! Try it now!

Have questions or comments? Visit the course discussion forum to chat with other learners.

'로봇-AI' 카테고리의 다른 글

[AI초급] Play the Game (0)	2025.03.21
[딥러닝초급]Custom Convnets (0)	2025.03.11
[딥러닝초급]Data Augmentation (0)	2025.03.07
[딥러닝초급]Maximum Pooling (0)	2025.02.28
[딥러닝초급] Convnet + Relu (0)	2025.02.26