Week 7-8 ํ•™์Šต ์ •๋ฆฌ

์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ์˜ ์ดํ•ด์™€ ์ „์ฒ˜๋ฆฌ, ๋ถ„๋ฅ˜ ๊ธฐ๋ฒ•

AI์™€ ์ปดํ“จํ„ฐ ๋น„์ „ ๋ถ„์•ผ์—์„œ ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ๋Š” ํ•ต์‹ฌ ์ž์›์ž…๋‹ˆ๋‹ค.
์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ๋Š” ๋‹จ์ˆœํžˆ ํ”ฝ์…€๋“ค์˜ ์ง‘ํ•ฉ์ผ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, ํ•ด์ƒ๋„, ์ฑ„๋„, ์ƒ‰๊ณต๊ฐ„ ๋“ฑ ๋‹ค์–‘ํ•œ ํŠน์„ฑ์„ ๊ฐ€์ง€๊ณ  ์žˆ์œผ๋ฉฐ, ์ด๋ฅผ ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ์ฒ˜๋ฆฌํ•˜๋Š” ๊ฒƒ์ด ๋ชจ๋ธ ์„ฑ๋Šฅ์— ํฐ ์˜ํ–ฅ์„ ๋ฏธ์นฉ๋‹ˆ๋‹ค. ๋˜ํ•œ, ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜๋Š” ๋ฐ์ดํ„ฐ์˜ ๋‹ค์–‘ํ•œ ํŠน์„ฑ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์—ฌ๋Ÿฌ ๋ฐฉ์‹์œผ๋กœ ์ˆ˜ํ–‰๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ๋Š” ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ์˜ ๊ธฐ๋ณธ ์„ฑ๋ถ„, ์ „์ฒ˜๋ฆฌ ๊ธฐ๋ฒ•, ๊ทธ๋ฆฌ๊ณ  ๋‹ค์–‘ํ•œ ๋ถ„๋ฅ˜ ๊ธฐ๋ฒ•์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.


1. ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ์˜ ๊ธฐ๋ณธ ์„ฑ๋ถ„

ํ•ด์ƒ๋„ (Resolution)

ํ”ฝ์…€ (Pixel)

์ฑ„๋„ (Channel)

์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ์˜ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ


2. ์ด๋ฏธ์ง€ ์ „์ฒ˜๋ฆฌ์™€ EDA (Exploratory Data Analysis)

EDA (ํƒ์ƒ‰์  ๋ฐ์ดํ„ฐ ๋ถ„์„)

์ด๋ฏธ์ง€๋ฅผ ๋ถ„์„ํ•  ๋•Œ ๋‹ค์Œ ์‚ฌํ•ญ๋“ค์„ ๊ณ ๋ คํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

์ „์ฒ˜๋ฆฌ๋ฅผ ํ†ตํ•ด ์ด๋ฏธ์ง€์˜ ์˜๋ฏธ ์žˆ๋Š” feature์™€ representation์„ ์ถ”์ถœํ•˜๋ฉด, ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ๊ณผ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.


3. Color Space ๋ณ€ํ™˜

์ƒ‰๊ณต๊ฐ„(color space)์€ ์ƒ‰์„ ๋””์ง€ํ„ธ์ ์œผ๋กœ ํ‘œํ˜„ํ•˜๋Š” ์ˆ˜ํ•™์  ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
์ผ๋ฐ˜์ ์ธ ์˜ˆ๋กœ๋Š” RGB, HSV, Lab, YCbCr, Grayscale ๋“ฑ์ด ์žˆ์Šต๋‹ˆ๋‹ค.
OpenCV๋ฅผ ์ด์šฉํ•œ ๋ณ€ํ™˜ ์˜ˆ์‹œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

import cv2

# ์›๋ณธ BGR ์ด๋ฏธ์ง€ ๋กœ๋“œ
img = cv2.imread('image.jpg')

# BGR -> HSV
img_hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)

# BGR -> LAB
img_lab = cv2.cvtColor(img, cv2.COLOR_BGR2LAB)

# BGR -> YCrCb
img_ycrcb = cv2.cvtColor(img, cv2.COLOR_BGR2YCrCb)

# BGR -> Grayscale
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

๋˜ํ•œ, ํžˆ์Šคํ† ๊ทธ๋žจ ํ‰ํ™œํ™”(histogram equalization) ๋ฅผ ํ†ตํ•ด ์ด๋ฏธ์ง€์˜ contrast๋ฅผ ๊ฐœ์„ ํ•˜๊ณ  ๋””ํ…Œ์ผ์„ ๋ถ€๊ฐ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.


4. Geometric Transform & Data Augmentation

Geometric Transform

์ด๋ฏธ์ง€์˜ ํ˜•ํƒœ๋‚˜ ํฌ๊ธฐ, ์œ„์น˜ ๋“ฑ์„ ๋ณ€ํ™˜ํ•˜๋Š” ๊ธฐ๋ฒ•์ž…๋‹ˆ๋‹ค.

์˜ˆ์‹œ (ํšŒ์ „):

import cv2
import numpy as np

row, col = img.shape[:2]
# ์ค‘์‹ฌ์„ ๊ธฐ์ค€์œผ๋กœ 90๋„ ํšŒ์ „
matrix = cv2.getRotationMatrix2D((col/2, row/2), 90, 1)
new_img = cv2.warpAffine(img, matrix, (col, row))

Data Augmentation

๋ฐ์ดํ„ฐ์˜ ๋‹ค์–‘์„ฑ์„ ๋†’์—ฌ ๋ชจ๋ธ์˜ ๊ฒฌ๊ณ ์„ฑ๊ณผ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•œ ๊ธฐ๋ฒ•์ž…๋‹ˆ๋‹ค.

์˜ˆ์‹œ (Albumentations):

import albumentations as A

transform = A.Compose([
    A.HorizontalFlip(p=0.5),
    A.RandomBrightnessContrast(p=0.2),
    A.RandomCrop(height=224, width=224)
])

5. Normalization & Batch Normalization

Normalization

์ด๋ฏธ์ง€์˜ ํ”ฝ์…€ ๊ฐ’์„ ์ผ์ • ๋ฒ”์œ„๋กœ ์Šค์ผ€์ผ๋งํ•˜๋Š” ๊ธฐ๋ฒ•์œผ๋กœ, ๋ชจ๋ธ ํ•™์Šต์˜ ์ˆ˜๋ ด ์†๋„์™€ ์•ˆ์ •์„ฑ์„ ๊ฐœ์„ ํ•ฉ๋‹ˆ๋‹ค.

์˜ˆ์‹œ (PyTorch):

from torchvision import transforms

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], 
                         std=[0.229, 0.224, 0.225])
])

Batch Normalization

๋ฏธ๋‹ˆ ๋ฐฐ์น˜ ๋‹จ์œ„๋กœ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋ฅผ ์ •๊ทœํ™”ํ•˜์—ฌ internal covariate shift๋ฅผ ์ค„์ด๊ณ , ๋†’์€ ํ•™์Šต๋ฅ ๊ณผ ์•ˆ์ •์„ฑ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.


6. ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜

์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜๋Š” ์ฃผ์–ด์ง„ ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์ „์— ์ •์˜๋œ ํด๋ž˜์Šค์— ํ• ๋‹นํ•˜๋Š” ์ž‘์—…์ž…๋‹ˆ๋‹ค.
๋ถ„๋ฅ˜ ๋ฌธ์ œ๋Š” ์—ฌ๋Ÿฌ ๋ฐฉ์‹์œผ๋กœ ๋‚˜๋‰ฉ๋‹ˆ๋‹ค.

6-1. Binary Classification

6-2. Multi-Class Classification

6-3. Multi-Label Classification

6-4. Coarse-Grained vs. Fine-Grained Classification

6-5. N-Shot Classification


๊ฒฐ๋ก 

์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ๋Š” ํ•ด์ƒ๋„, ํ”ฝ์…€, ์ฑ„๋„, ์ƒ‰๊ณต๊ฐ„ ๋“ฑ ๋‹ค์–‘ํ•œ ์„ฑ๋ถ„์œผ๋กœ ๊ตฌ์„ฑ๋˜๋ฉฐ, ์ „์ฒ˜๋ฆฌ ๊ณผ์ •์—์„œ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๊ณผ ๊ณ„์‚ฐ ํšจ์œจ์„ฑ์— ํฐ ์˜ํ–ฅ์„ ๋ฏธ์นฉ๋‹ˆ๋‹ค.


Inductive Bias์™€ Representation: ๋ชจ๋ธ์˜ ์„ ์ž…๊ฒฌ์ด ๋งŒ๋“œ๋Š” ๋ฐ์ดํ„ฐ ํ‘œํ˜„

๋จธ์‹  ๋Ÿฌ๋‹ ๋ชจ๋ธ์€ ๋ฐ์ดํ„ฐ๋ฅผ ๋‹จ์ˆœํžˆ ์•”๊ธฐํ•˜์ง€ ์•Š๊ณ , ์ผ๋ฐ˜ํ™”๋œ ํŒจํ„ด์„ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•ด ํŠน์ • ๊ฐ€์ •์„ ๋‚ดํฌํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ inductive bias(๊ท€๋‚ฉ์  ํŽธํ–ฅ) ๋ผ๊ณ  ํ•˜๋ฉฐ, ์ด bias๊ฐ€ ๋ชจ๋ธ์ด ๋ฐ์ดํ„ฐ๋ฅผ ์–ด๋–ป๊ฒŒ ํ•ด์„ํ•˜๊ณ  ํ‘œํ˜„ํ•˜๋Š”์ง€ ๊ฒฐ์ •ํ•˜๋Š” ํ•ต์‹ฌ ์š”์†Œ์ž…๋‹ˆ๋‹ค.

1. Inductive Bias๋ž€?

Inductive bias๋Š” ๋ชจ๋ธ์ด ํ•™์Šตํ•  ๋•Œ ๋ฐ์ดํ„ฐ์—์„œ ํŠน์ • ํŒจํ„ด์ด๋‚˜ ๊ด€๊ณ„๋ฅผ ์ฐพ์œผ๋ ค๋Š” ๊ฒฝํ–ฅ, ํ˜น์€ ๋ฐ์ดํ„ฐ๋ฅผ ํŠน์ • ๋ฐฉ์‹์œผ๋กœ ํ•ด์„ํ•˜๋ ค๋Š” ๊ฐ€์ •์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.
์˜ˆ๋ฅผ ๋“ค์–ด:

๋งŒ์•ฝ ๋ชจ๋ธ์— inductive bias๊ฐ€ ์—†๋‹ค๋ฉด, ๋ชจ๋ธ์€ ๋‹จ์ˆœํžˆ ๋ฐ์ดํ„ฐ๋ฅผ ์•”๊ธฐ(overfitting)ํ•˜๊ฒŒ ๋˜์–ด ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์ด ํฌ๊ฒŒ ๋–จ์–ด์ง€๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ bias๋ฅผ ์ดํ•ดํ•˜๋ฉด, ๋ฌธ์ œ์˜ ํŠน์„ฑ์— ๋งž๋Š” ๋ชจ๋ธ์„ ์„ ํƒํ•˜๊ณ  ์„ค๊ณ„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

2. Representation (ํ‘œํ˜„)์ด๋ž€?

Representation์€ ๋ชจ๋ธ ๋‚ด๋ถ€์—์„œ ๋ฐ์ดํ„ฐ๊ฐ€ ์–ด๋–ป๊ฒŒ ๋ณ€ํ™˜๋˜๊ณ  ๋‹ค๋ค„์ง€๋Š”์ง€๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

๋”ฅ๋Ÿฌ๋‹์—์„œ๋Š” feature engineering ์—†์ด๋„ ๋ชจ๋ธ์ด ์ž๋™์œผ๋กœ representation์„ ํ•™์Šตํ•˜์ง€๋งŒ, ์ด representation์€ ๋ชจ๋ธ์˜ inductive bias์— ํฌ๊ฒŒ ์˜ํ–ฅ์„ ๋ฐ›์Šต๋‹ˆ๋‹ค.

3. Representation์€ Inductive Bias์˜ ์‚ฐ๋ฌผ์ด๋‹ค

๊ฒฐ๊ตญ, ๋ชจ๋ธ์ด ๋ฐ์ดํ„ฐ๋ฅผ ์–ด๋–ป๊ฒŒ ํ‘œํ˜„ํ•˜๋Š”์ง€๋Š” ๊ทธ ๋ชจ๋ธ์ด ๊ฐ€์ง€๋Š” inductive bias์— ์˜ํ•ด ๊ฒฐ์ •๋ฉ๋‹ˆ๋‹ค.

๋”ฐ๋ผ์„œ, representation์€ ๊ฒฐ๊ตญ ๋ชจ๋ธ์˜ inductive bias์˜ ์‚ฐ๋ฌผ์ž…๋‹ˆ๋‹ค. ์ด ๋ง์€, ๋ชจ๋ธ์ด ๋ฐ์ดํ„ฐ์—์„œ ์–ด๋–ค ํŒจํ„ด์„ ํ•™์Šตํ• ์ง€, ๊ทธ๋ฆฌ๊ณ  ํ•™์Šตํ•œ ํŒจํ„ด์„ ์–ด๋–ค ๋ฐฉ์‹์œผ๋กœ ํ‘œํ˜„ํ• ์ง€๋Š” ๋ชจ๋ธ์ด ๋‚ดํฌํ•˜๊ณ  ์žˆ๋Š” ์„ ์ž…๊ฒฌ์— ๋”ฐ๋ผ ๊ฒฐ์ •๋œ๋‹ค๋Š” ์˜๋ฏธ์ž…๋‹ˆ๋‹ค.


CNN, ViT, ๊ทธ๋ฆฌ๊ณ  Hybrid ๋ชจ๋ธ: Computer Vision์˜ ์ƒˆ๋กœ์šด ํŒจ๋Ÿฌ๋‹ค์ž„

์ปดํ“จํ„ฐ ๋น„์ „ ๋ถ„์•ผ์—์„œ๋Š” ์˜ค๋žœ ๊ธฐ๊ฐ„ ๋™์•ˆ Convolutional Neural Network (CNN)์ด ํ‘œ์ค€ ๋ชจ๋ธ๋กœ ์ž๋ฆฌ ์žก์•„์™”์Šต๋‹ˆ๋‹ค. ์ตœ๊ทผ์—๋Š” Transformer ๊ธฐ๋ฐ˜์˜ Vision Transformer (ViT)๊ฐ€ ์ฃผ๋ชฉ๋ฐ›์œผ๋ฉฐ, CNN์˜ ํ•œ๊ณ„๋ฅผ ๋ณด์™„ํ•˜๋Š” Hybrid ๋ชจ๋ธ๋“ค๋„ ํ™œ๋ฐœํ•˜๊ฒŒ ์—ฐ๊ตฌ๋˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ๋Š” ๊ฐ ๋ชจ๋ธ์˜ ๊ธฐ๋ณธ ๊ฐœ๋…๊ณผ ํŠน์ง•, ๊ทธ๋ฆฌ๊ณ  ์„œ๋กœ์˜ ์žฅ์ ์„ ๊ฒฐํ•ฉํ•œ Hybrid ๋ชจ๋ธ์˜ ์˜ˆ์‹œ๋ฅผ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.


1. Convolutional Neural Network (CNN)

CNN์€ ์˜ค๋žœ ์—ฐ๊ตฌ ๊ธฐ๊ฐ„ ๋™์•ˆ ์ด๋ฏธ์ง€ ์ฒ˜๋ฆฌ์—์„œ ๊ฐ•๋ ฅํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ์—ˆ์œผ๋ฉฐ, ์ง€์—ญ์  ํŒจํ„ด์„ ํšจ๊ณผ์ ์œผ๋กœ ํ•™์Šตํ•˜๋Š” ๋Šฅ๋ ฅ์ด ๋›ฐ์–ด๋‚ฉ๋‹ˆ๋‹ค.


2. ConvNeXt

Pasted image 20250312143709.png
ConvNeXt๋Š” ๊ธฐ์กด CNN์˜ ๊ฐ•์ ์„ ์œ ์ง€ํ•˜๋ฉด์„œ ์ตœ์‹  ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฒ•์„ ๋„์ž…ํ•œ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.


3. Vision Transformer (ViT)

Pasted image 20250312143720.png
ViT๋Š” Transformer ๋ชจ๋ธ์„ Computer Vision์— ์ ์šฉํ•œ ๋Œ€ํ‘œ์ ์ธ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.


4. CNN vs. ViT ๋น„๊ต

ํ•ญ๋ชฉ CNN ViT
Local Pattern ํ•™์Šต ๊ฐ•์  (convolution ๊ธฐ๋ฐ˜) ์ƒ๋Œ€์ ์œผ๋กœ ์•ฝํ•จ
๊ณ„์‚ฐ ํšจ์œจ์„ฑ ๋†’์€ ํšจ์œจ์„ฑ ์—ฐ์‚ฐ ๋น„์šฉ์ด ๋†’์Œ
๊ฐ€๋ณ€ ํฌ๊ธฐ ์ž…๋ ฅ ๊ณ ์ • ํฌ๊ธฐ ํ•„์š” ๊ฐ€๋ณ€ ์ž…๋ ฅ ์ง€์›
์žฅ๊ฑฐ๋ฆฌ ์˜์กด์„ฑ ํ•œ๊ณ„ ์กด์žฌ ํšจ๊ณผ์  (Self-Attention)
๋ฐ์ดํ„ฐ ํšจ์œจ์„ฑ ์†Œ๊ทœ๋ชจ ๋ฐ์ดํ„ฐ์— ์œ ๋ฆฌ ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ ํ•„์š”

5. Hybrid ๋ชจ๋ธ: CNN๊ณผ ViT์˜ ์žฅ์ ์„ ๊ฒฐํ•ฉ

Hybrid ๋ชจ๋ธ์€ CNN๊ณผ ViT์˜ ์žฅ์ ์„ ๋ชจ๋‘ ํ™œ์šฉํ•˜์—ฌ, ์„œ๋กœ์˜ ๋‹จ์ ์„ ๋ณด์™„ํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ๊ฐœ๋ฐœ๋ฉ๋‹ˆ๋‹ค. ๋Œ€ํ‘œ์ ์ธ ์˜ˆ๊ฐ€ CoAtNet์ž…๋‹ˆ๋‹ค.

CoAtNet

Pasted image 20250312143755.png

๊ทธ ์™ธ์—๋„ ConViT, CvT, LocalViT ๋“ฑ ๋‹ค์–‘ํ•œ hybrid ๋ชจ๋ธ๋“ค์ด ๊ฐœ๋ฐœ๋˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.


๊ฒฐ๋ก 


Transfer Learning, Self-Supervised Learning, Multimodal Learning, ๊ทธ๋ฆฌ๊ณ  Foundation Models

์ตœ๊ทผ AI ๋ถ„์•ผ์—์„œ๋Š” ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์œ„ํ•ด ๋‹จ์ˆœํžˆ ๋ชจ๋ธ ๊ตฌ์กฐ๋ฅผ ๊ฐœ์„ ํ•˜๋Š” ๊ฒƒ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, ๋ฐ์ดํ„ฐ๋ฅผ ์–ด๋–ป๊ฒŒ ํ™œ์šฉํ•˜๊ณ , ๋‹ค์–‘ํ•œ ์ •๋ณด๋ฅผ ๊ฒฐํ•ฉํ•˜๋Š๋ƒ๊ฐ€ ์ค‘์š”ํ•œ ํ™”๋‘๋กœ ๋– ์˜ฌ๋ž์Šต๋‹ˆ๋‹ค. ํŠนํžˆ, Transfer Learning, Self-Supervised Learning, Multimodal Learning, ๊ทธ๋ฆฌ๊ณ  Foundation Model์€ AI ๊ฐœ๋ฐœ์˜ ์ƒˆ๋กœ์šด ํŒจ๋Ÿฌ๋‹ค์ž„์œผ๋กœ ์ฃผ๋ชฉ๋ฐ›๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ํฌ์ŠคํŠธ์—์„œ๋Š” ๊ฐ๊ฐ์˜ ๊ฐœ๋…๊ณผ ์žฅ์ ์„ ์‚ดํŽด๋ณด๊ณ , ์‹ค์ œ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์—์„œ ์–ด๋–ป๊ฒŒ ํ™œ์šฉ๋˜๋Š”์ง€ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.


1. Transfer Learning (์ „์ด ํ•™์Šต)

Transfer Learning์€ ํ•œ ๋„๋ฉ”์ธ ๋˜๋Š” ํƒœ์Šคํฌ์—์„œ ์‚ฌ์ „ ํ•™์Šต๋œ ๋ชจ๋ธ์„ ๋‹ค๋ฅธ ๋„๋ฉ”์ธ์— ์ ์šฉํ•˜๋Š” ํ•™์Šต ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.
์˜ˆ๋ฅผ ๋“ค์–ด, ImageNet์—์„œ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์„ ์˜๋ฃŒ ์˜์ƒ์ด๋‚˜ ์œ„์„ฑ ์‚ฌ์ง„๊ณผ ๊ฐ™์ด ์ „ํ˜€ ๋‹ค๋ฅธ ๋„๋ฉ”์ธ์— ์ ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ฃผ์š” ํŠน์ง•

์ฃผ์˜์‚ฌํ•ญ


2. Self-Supervised Learning

Self-Supervised Learning์€ ๋ชจ๋ธ์ด ๋ฐ์ดํ„ฐ ์ž์ฒด๋กœ๋ถ€ํ„ฐ ๋ผ๋ฒจ์„ ์ƒ์„ฑํ•˜๊ณ , ๊ทธ ๋ผ๋ฒจ์„ ๋ฐ”ํƒ•์œผ๋กœ ํ•™์Šตํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.
์ˆ˜๋™์œผ๋กœ ๋ผ๋ฒจ๋งํ•˜๋Š” ๊ณผ์ •์ด ์‹œ๊ฐ„๊ณผ ๋น„์šฉ์„ ๋งŽ์ด ๋“ค๊ฒŒ ํ•˜๋Š” ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๋” ์ผ๋ฐ˜ํ™”๋œ representation์„ ํ•™์Šตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Pretext ์ž‘์—… ์˜ˆ์‹œ

์ด๋Ÿฌํ•œ pretext ์ž‘์—…์„ ํ†ตํ•ด ํ•™์Šต๋œ representation์€ ์ดํ›„ Transfer Learning์˜ ์‚ฌ์ „ ํ•™์Šต ๋‹จ๊ณ„๋กœ ํ™œ์šฉ๋˜์–ด, ๋‹ค์–‘ํ•œ downstream ํƒœ์Šคํฌ์—์„œ ์„ฑ๋Šฅ์„ ๋†’์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.


3. Multimodal Learning

Multimodal Learning์€ ํ•˜๋‚˜์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค์–‘ํ•œ modality (์˜ˆ: vision, audio, text ๋“ฑ)๋กœ ํ‘œํ˜„ํ•˜์—ฌ ์–ป์€ ์ •๋ณด๋ฅผ ๊ฒฐํ•ฉํ•ด ํ•™์Šตํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.

์ฃผ์š” ํ™œ์šฉ ์˜ˆ์‹œ

์žฅ์ 


4. Foundation Models

Foundation Model์€ ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ์‚ฌ์ „ ํ•™์Šต๋œ ๋ชจ๋ธ๋กœ, ๋‹ค์–‘ํ•œ downstream ํƒœ์Šคํฌ์— ์ฑ„ํƒ๋  ์ˆ˜ ์žˆ๋Š” ๋ฒ”์šฉ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

ํŠน์ง•

๋Œ€ํ‘œ ์˜ˆ์‹œ

Foundation Models๋Š” AI์˜ ์—ฌ๋Ÿฌ ๋ถ„์•ผ์—์„œ ๊ฐ•๋ ฅํ•œ ์„ฑ๋Šฅ์˜ ๊ธฐ๋ฐ˜์ด ๋˜๋ฉฐ, ์ ์€ ์ถ”๊ฐ€ ๋ฐ์ดํ„ฐ๋กœ๋„ ๋‹ค์–‘ํ•œ ์ž‘์—…์— ์‰ฝ๊ฒŒ fine-tuning ํ•  ์ˆ˜ ์žˆ๋Š” ์žฅ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค.


๊ฒฐ๋ก 

ํ˜„๋Œ€ AI๋Š” ์‚ฌ์ „ ํ•™์Šต๋œ ๋ชจ๋ธ์˜ ์žฌํ™œ์šฉ๊ณผ ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ์ž๋™์œผ๋กœ ์–ป์€ ํ‘œํ˜„(representation)์„ ๋ฐ”ํƒ•์œผ๋กœ ๋น ๋ฅด๊ณ  ํšจ์œจ์ ์ธ ํ•™์Šต์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ ํŒจ๋Ÿฌ๋‹ค์ž„๋“ค์„ ์ดํ•ดํ•˜๊ณ  ์ ์ ˆํžˆ ํ™œ์šฉํ•˜๋ฉด, ๋”์šฑ ํšจ์œจ์ ์ด๊ณ  ๋ฒ”์šฉ์ ์ธ AI ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.


Computer Vision: ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜ ๋ชจ๋ธ ํ•™์Šต ํ”„๋กœ์„ธ์Šค

์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜๋Š” ์ฃผ์–ด์ง„ ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์ „์— ์ •์˜๋œ ํด๋ž˜์Šค์— ํ• ๋‹นํ•˜๋Š” ์ž‘์—…์ž…๋‹ˆ๋‹ค. ์ด ํฌ์ŠคํŒ…์—์„œ๋Š” ๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์ถ•๋ถ€ํ„ฐ ๋ชจ๋ธ ์ •์˜, ํ•™์Šต, ๊ทธ๋ฆฌ๊ณ  ํ‰๊ฐ€๊นŒ์ง€ ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜ ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ค๋Š” ์ „๋ฐ˜์ ์ธ ํ”„๋กœ์„ธ์Šค๋ฅผ ์‚ดํŽด๋ด…๋‹ˆ๋‹ค.


1. ๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์ถ•๊ณผ DataLoader

Dataset ํด๋ž˜์Šค

๋ฐ์ดํ„ฐ์…‹ ํด๋ž˜์Šค๋Š” ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ์™€ ๊ทธ์— ํ•ด๋‹นํ•˜๋Š” ๋ผ๋ฒจ์„ ๋กœ๋“œํ•˜๊ณ , ์ „์ฒ˜๋ฆฌ ๋ฐ ๋ฐฐ์น˜ ์ฒ˜๋ฆฌ๋ฅผ ์‰ฝ๊ฒŒ ํ•  ์ˆ˜ ์žˆ๋„๋ก ๋„์™€์ค๋‹ˆ๋‹ค. Dataset ํด๋ž˜์Šค๋Š” ๋‘ ๊ฐ€์ง€ ํ•„์ˆ˜ ๋ฉ”์„œ๋“œ๋ฅผ ํฌํ•จํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

import os
import pandas as pd
from torch.utils.data import Dataset
from torchvision.io import read_image

class CustomImageDataset(Dataset):
    def __init__(self, annotations_file, img_dir, transform=None, target_transform=None):
        self.img_labels = pd.read_csv(annotations_file)
        self.img_dir = img_dir
        self.transform = transform
        self.target_transform = target_transform

    def __len__(self):
        return len(self.img_labels)

    def __getitem__(self, idx):
        img_path = os.path.join(self.img_dir, self.img_labels.iloc[idx, 0])
        image = read_image(img_path)
        label = self.img_labels.iloc[idx, 1]
        if self.transform:
            image = self.transform(image)
        if self.target_transform:
            label = self.target_transform(label)
        return image, label

DataLoader

DataLoader๋Š” Dataset์œผ๋กœ๋ถ€ํ„ฐ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐฐ์น˜(batch) ๋‹จ์œ„๋กœ ๋กœ๋“œํ•ด ์ค๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ ํ•™์Šต ์‹œ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ ์ตœ์ ํ™”์™€ ํ•™์Šต ์†๋„๋ฅผ ๋†’์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

from torch.utils.data import DataLoader

train_dataset = CustomImageDataset("train_annotations.csv", "./train_images")
test_dataset = CustomImageDataset("test_annotations.csv", "./test_images")

train_dataloader = DataLoader(
    train_dataset, 
    batch_size=64,
    shuffle=True,
    num_workers=0,
    drop_last=True)

test_dataloader = DataLoader(
    test_dataset, 
    batch_size=64,
    shuffle=False,
    num_workers=0,
    drop_last=False)

2. ๋ชจ๋ธ ์ •์˜

๋ชจ๋ธ์€ ์ง์ ‘ ๊ตฌ์„ฑํ•  ์ˆ˜๋„ ์žˆ๊ณ , torchvision์ด๋‚˜ timm๊ณผ ๊ฐ™์€ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ํ™œ์šฉํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์•„๋ž˜๋Š” ๊ฐ„๋‹จํ•œ CNN ๋ชจ๋ธ์˜ ์˜ˆ์‹œ์ž…๋‹ˆ๋‹ค.

import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)  # ์ž…๋ ฅ ์ฑ„๋„ 3, ์ถœ๋ ฅ ์ฑ„๋„ 6, kernel size 5
        self.pool = nn.MaxPool2d(2, 2)     # 2x2 max pooling
        self.conv2 = nn.Conv2d(6, 16, 5)    # ๋‘ ๋ฒˆ์งธ convolution layer
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)        # 10๊ฐœ ํด๋ž˜์Šค ๋ถ„๋ฅ˜

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = torch.flatten(x, 1)  # ๋ฐฐ์น˜ ์ฐจ์› ์ œ์™ธ ๋ชจ๋“  ์ฐจ์› flatten
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net()

3. ์†์‹ค ํ•จ์ˆ˜ (Loss Function)

๋ชจ๋ธ์˜ ์˜ˆ์ธก ๊ฐ’๊ณผ ์‹ค์ œ ๋ผ๋ฒจ ์‚ฌ์ด์˜ ์ฐจ์ด๋ฅผ ์ธก์ •ํ•˜๋Š” ์†์‹ค ํ•จ์ˆ˜๋Š” ํ•™์Šต์˜ ํ•ต์‹ฌ ์š”์†Œ์ž…๋‹ˆ๋‹ค. ๋‹ค์ค‘ ํด๋ž˜์Šค ๋ถ„๋ฅ˜ ๋ฌธ์ œ์—๋Š” ๋ณดํ†ต CrossEntropyLoss๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

import torch.nn as nn

loss_fn = nn.CrossEntropyLoss(
    weight=None,         # ํด๋ž˜์Šค ๋ถˆ๊ท ํ˜• ์กฐ์ ˆ ์‹œ ์‚ฌ์šฉ
    ignore_index=-100,   # ํŠน์ • label์€ ๋ฌด์‹œ
    reduction='mean',    # ํ‰๊ท  ์†์‹ค ๊ฐ’ ๋ฐ˜ํ™˜
    label_smoothing=0.0  # ๋ผ๋ฒจ ์Šค๋ฌด๋”ฉ ์ ์šฉ (๊ณผ์ ํ•ฉ ๋ฐฉ์ง€)
)

# ์˜ˆ์‹œ: loss = loss_fn(predictions, labels)

๊ธฐํƒ€ ์†์‹ค ํ•จ์ˆ˜๋กœ๋Š” NLLLoss, BCELoss, BCEWithLogitsLoss, F1 Loss, Focal Loss ๋“ฑ์ด ์žˆ์œผ๋ฉฐ, ๋ฌธ์ œ์— ๋งž๊ฒŒ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.


4. Optimizer

Optimizer๋Š” ์†์‹ค ํ•จ์ˆ˜๊ฐ€ ์ตœ์†Œํ™”๋˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ๋ชจ๋ธ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์—…๋ฐ์ดํŠธํ•ฉ๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ SGD์™€ Adam์ด ๋งŽ์ด ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

import torch.optim as optim

optimizer = optim.Adam(
    net.parameters(),
    lr=0.001,
    betas=(0.9, 0.999),
    weight_decay=0.01)

5. Learning Rate Scheduler

ํ•™์Šต๋ฅ (Learning Rate)์€ ๋ชจ๋ธ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ ์—…๋ฐ์ดํŠธ ํฌ๊ธฐ๋ฅผ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค. ํ•™์Šต ์ดˆ๊ธฐ์— ๋†’์€ ํ•™์Šต๋ฅ ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋น ๋ฅด๊ฒŒ ํ•™์Šตํ•˜๊ณ , ์ดํ›„์—๋Š” ์ ์ง„์ ์œผ๋กœ ํ•™์Šต๋ฅ ์„ ๋‚ฎ์ถฐ ์ˆ˜๋ ด ๊ณผ์ •์„ ์•ˆ์ •ํ™”ํ•ฉ๋‹ˆ๋‹ค. ๋Œ€ํ‘œ์ ์ธ ์Šค์ผ€์ค„๋Ÿฌ๋กœ๋Š” StepLR, ExponentialLR, CosineAnnealingLR, ReduceLROnPlateau ๋“ฑ์ด ์žˆ์Šต๋‹ˆ๋‹ค.


6. Training & Validation Process

Training Loop

๋ชจ๋ธ ํ•™์Šต์€ ์•„๋ž˜์™€ ๊ฐ™์€ ๋‹จ๊ณ„๋กœ ์ง„ํ–‰๋ฉ๋‹ˆ๋‹ค:

  1. ๋ฐ์ดํ„ฐ์™€ ๋ชจ๋ธ์„ GPU ๋˜๋Š” CPU๋กœ ์ด๋™
  2. ๋ชจ๋ธ์„ training ๋ชจ๋“œ๋กœ ์ „ํ™˜
  3. ๋ฐฐ์น˜๋ณ„๋กœ ์˜ˆ์ธก ์ˆ˜ํ–‰, ์†์‹ค ๊ณ„์‚ฐ, backpropagation, optimizer ์—…๋ฐ์ดํŠธ, learning rate scheduler ์—…๋ฐ์ดํŠธ
  4. ๋ฐฐ์น˜๋ณ„ ์†์‹ค์„ ๋ˆ„์ ํ•˜์—ฌ epoch๋‹น ํ‰๊ท  ์†์‹ค ๊ณ„์‚ฐ
from tqdm import tqdm

def train_epoch(model, train_loader, optimizer, loss_fn, scheduler, device):
    model.train()
    total_loss = 0.0
    progress_bar = tqdm(train_loader, desc="Training", leave=False)
    
    for images, targets in progress_bar:
        images, targets = images.to(device), targets.to(device)
        optimizer.zero_grad()
        outputs = model(images)
        loss = loss_fn(outputs, targets)
        loss.backward()
        optimizer.step()
        scheduler.step()
        total_loss += loss.item()
        progress_bar.set_postfix(loss=loss.item())
    
    return total_loss / len(train_loader)

Validation Loop

ํ‰๊ฐ€ ์‹œ์—๋Š” ๋ชจ๋ธ์„ ํ‰๊ฐ€ ๋ชจ๋“œ๋กœ ์ „ํ™˜ํ•œ ํ›„, gradient ๊ณ„์‚ฐ ์—†์ด ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

def validate(model, val_loader, loss_fn, device):
    model.eval()
    total_loss = 0.0
    progress_bar = tqdm(val_loader, desc='Validating', leave=False)
    
    with torch.no_grad():
        for images, targets in progress_bar:
            images, targets = images.to(device), targets.to(device)
            outputs = model(images)
            loss = loss_fn(outputs, targets)
            total_loss += loss.item()
            progress_bar.set_postfix(loss=loss.item())
            
    return total_loss / len(val_loader)

์ „์ฒด Training Process

def train(model, train_loader, val_loader, optimizer, loss_fn, scheduler, epochs, device):
    for epoch in range(epochs):
        train_loss = train_epoch(model, train_loader, optimizer, loss_fn, scheduler, device)
        val_loss = validate(model, val_loader, loss_fn, device)
        print(f"Epoch {epoch+1}: Train Loss = {train_loss:.4f}, Val Loss = {val_loss:.4f}")
        # ๋ชจ๋ธ ์ €์žฅ, scheduler ์—…๋ฐ์ดํŠธ ๋“ฑ ์ถ”๊ฐ€ ์ž‘์—… ์ˆ˜ํ–‰

๊ฒฐ๋ก 

์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜ ๋ชจ๋ธ ํ•™์Šต ํ”„๋กœ์„ธ์Šค๋Š” ๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์ถ•๊ณผ ์ „์ฒ˜๋ฆฌ, DataLoader๋ฅผ ํ†ตํ•œ ๋ฐฐ์น˜ ์ฒ˜๋ฆฌ, ๋ชจ๋ธ ์ •์˜, ์†์‹ค ํ•จ์ˆ˜์™€ optimizer, ๊ทธ๋ฆฌ๊ณ  learning rate scheduler ์„ค์ • ๋ฐ training/validation loop๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค.
์ด๋Ÿฌํ•œ ์ „์ฒด ํŒŒ์ดํ”„๋ผ์ธ์„ ์ฒด๊ณ„์ ์œผ๋กœ ๊ตฌ์„ฑํ•˜๋ฉด, ํšจ๊ณผ์ ์œผ๋กœ ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ค๊ณ  ํ‰๊ฐ€ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๋‚˜์•„๊ฐ€ ์„ฑ๋Šฅ ๊ฐœ์„ ์— ํฐ ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค.


๋ชจ๋ธ ํ•™์Šต ์†๋„ ํ–ฅ์ƒ์„ ์œ„ํ•œ ํšจ์œจ์ ์ธ ๊ธฐ๋ฒ•๋“ค

๋ชจ๋ธ ํ•™์Šต ์†๋„๊ฐ€ ์ง€๋‚˜์น˜๊ฒŒ ๋Š๋ฆด ๋•Œ๋Š” ์—ฌ๋Ÿฌ ํšจ์œจ์ ์ธ ๋ฐฉ๋ฒ•๋“ค์„ ์ ์šฉํ•˜์—ฌ ํ•™์Šต ์‹œ๊ฐ„์„ ๋‹จ์ถ•ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ํฌ์ŠคํŒ…์—์„œ๋Š” ๋ฐ์ดํ„ฐ ์บ์‹ฑ, ๊ทธ๋ผ๋””์–ธํŠธ ๋ˆ„์ , ํ˜ผํ•ฉ ์ •๋ฐ€๋„ ํ•™์Šต, pseudo labeling, ๊ทธ๋ฆฌ๊ณ  ์ƒ์„ฑ ๋ชจ๋ธ์„ ํ™œ์šฉํ•œ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ๊ธฐ๋ฒ•์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.


1. Data Caching

๋ฐ์ดํ„ฐ I/O๊ฐ€ ํ•™์Šต ์†๋„์˜ ๋ณ‘๋ชฉ์ด ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ ์ด๋ฏธ์ง€์™€ ๊ฐ™์ด ์šฉ๋Ÿ‰์ด ํฐ ๋ฐ์ดํ„ฐ๋ฅผ ๋งค epoch๋งˆ๋‹ค ๋””์Šคํฌ์—์„œ ๋ถˆ๋Ÿฌ์˜ค๋ฉด ํฐ ์˜ค๋ฒ„ํ—ค๋“œ๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.
ํ•ด๊ฒฐ๋ฒ•:

def data_caching(root_dir: str, info_df: pd.DataFrame):
    for idx, row in info_df.iterrows():
        image_path = os.path.join(root_dir, row['image_path'])
        image = cv2.imread(image_path, cv2.IMREAD_COLOR)
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        npy_path = image_path.replace('.jpg', '.npy')
        np.save(npy_path, image)

data_caching('base_directory', info_df)
class CustomImageDataset(Dataset):
    def __init__(self, info_df, root_dir, transform=None):
        self.info_df = info_df
        self.root_dir = root_dir
        self.transform = transform
        self.images = [None] * len(info_df)  # ์บ์‹œ์šฉ ๋ฆฌ์ŠคํŠธ

    def __len__(self):
        return len(self.info_df)

    def __getitem__(self, index):
        if self.images[index] is None:
            img_path = os.path.join(self.root_dir, self.info_df.iloc[index]['image_path'])
            image = cv2.imread(img_path, cv2.IMREAD_COLOR)
            image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
            self.images[index] = image
        else:
            image = self.images[index]
        label = self.info_df.iloc[index]['label']
        if self.transform:
            image = self.transform(image)
        return image, label

์ฃผ์˜: ์ „์ฒด ๋ฐ์ดํ„ฐ์…‹์ด ๋ฉ”๋ชจ๋ฆฌ ์šฉ๋Ÿ‰์„ ์ดˆ๊ณผํ•˜์ง€ ์•Š๋„๋ก ์ฃผ์˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.


2. Gradient Accumulation

๋ฉ”๋ชจ๋ฆฌ ์ œ์•ฝ์œผ๋กœ ์ธํ•ด ์ž‘์€ batch size๋ฅผ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ, gradient ์ถ”์ •์ด ๋ถˆ์•ˆ์ •ํ•ด์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
ํ•ด๊ฒฐ๋ฒ•:

accumulation_steps = 10
optimizer.zero_grad()  # loop ๋ฐ”๊นฅ์—์„œ ์ดˆ๊ธฐํ™”

for i, (images, targets) in enumerate(progress_bar):
    images, targets = images.to(device), targets.to(device)
    outputs = model(images)
    loss = loss_fn(outputs, targets)
    loss = loss / accumulation_steps  # ํ‰๊ท ํ™”
    loss.backward()
    total_loss += loss.item()

    if (i + 1) % accumulation_steps == 0 or (i + 1) == len(train_loader):
        optimizer.step()
        optimizer.zero_grad()
        scheduler.step()  # ์—…๋ฐ์ดํŠธ ์‹œ์—๋งŒ step

3. Mixed Precision Training

ํ•™์Šต ์‹œ float32๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ์ •๋ฐ€๋„๋Š” ๋†’์ง€๋งŒ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰๊ณผ ์—ฐ์‚ฐ ๋น„์šฉ์ด ์ปค์ง‘๋‹ˆ๋‹ค.
ํ•ด๊ฒฐ๋ฒ•:

from torch.cuda.amp import autocast, GradScaler

scaler = GradScaler()

for images, targets in train_loader:
    optimizer.zero_grad()
    images, targets = images.to(device), targets.to(device)
    
    with autocast():
        outputs = model(images)
        loss = loss_fn(outputs, targets)
    
    scaler.scale(loss).backward()
    scaler.step(optimizer)
    scaler.update()

4. Pseudo Labeling

๋Œ€๊ทœ๋ชจ unlabeled ๋ฐ์ดํ„ฐ์…‹์˜ ๊ฒฝ์šฐ, ์ง์ ‘ ๋ผ๋ฒจ๋งํ•˜๋Š” ๋น„์šฉ๊ณผ ์‹œ๊ฐ„์ด ๋งŽ์ด ๋“ญ๋‹ˆ๋‹ค.
Pseudo Labeling ๋ฐฉ๋ฒ•:

  1. ๋ชจ๋ธ ํ•™์Šต: labeled ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•ด ์ดˆ๊ธฐ ๋ชจ๋ธ ํ•™์Šต
  2. ์˜ˆ์ธก: ํ•™์Šต๋œ ๋ชจ๋ธ์„ ์ด์šฉํ•ด unlabeled ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์˜ˆ์ธก ์ˆ˜ํ–‰
  3. ์‹ ๋ขฐ๋„ ๊ธฐ์ค€ ์„ ํƒ: ๋†’์€ ์‹ ๋ขฐ๋„์˜ ์˜ˆ์ธก ๊ฒฐ๊ณผ๋ฅผ pseudo label๋กœ ์„ ํƒ
  4. ์žฌํ•™์Šต: labeled ๋ฐ์ดํ„ฐ์™€ pseudo labeled ๋ฐ์ดํ„ฐ๋ฅผ ํ•จ๊ป˜ ์‚ฌ์šฉํ•ด ๋ชจ๋ธ์„ ์žฌํ•™์Šต

์ฃผ์˜: validation set์€ pseudo labeling์— ํฌํ•จ์‹œํ‚ค์ง€ ์•Š์•„์•ผ ํ•ฉ๋‹ˆ๋‹ค.


5. Generative Models๋ฅผ ํ™œ์šฉํ•œ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•

์ƒ์„ฑ ๋ชจ๋ธ(text-to-image, image-to-image)์„ ํ™œ์šฉํ•˜๋ฉด ์ถ”๊ฐ€ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋‹จ, ์ƒ์„ฑ๋œ ๋ฐ์ดํ„ฐ๊ฐ€ ์‹ค์ œ์™€ ์œ ์‚ฌํ•œ์ง€ ๊ฒ€์ฆํ•˜๋Š” ๊ณผ์ •์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.


๊ฒฐ๋ก 

๋ชจ๋ธ ํ•™์Šต ์†๋„๊ฐ€ ๋Š๋ฆด ๋•Œ๋Š” ๋‹ค์–‘ํ•œ ํšจ์œจ์ ์ธ ๊ธฐ๋ฒ•๋“ค์„ ์ ์šฉํ•ด ์ „์ฒด ํŒŒ์ดํ”„๋ผ์ธ์˜ ์†๋„๋ฅผ ๊ฐœ์„ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ ๊ธฐ๋ฒ•๋“ค์„ ์ ์ ˆํžˆ ์กฐํ•ฉํ•˜๋ฉด, ํ•™์Šต ์†๋„๋Š” ๋ฌผ๋ก  ์ตœ์ข… ๋ชจ๋ธ ์„ฑ๋Šฅ๊นŒ์ง€ ๊ฐœ์„ ํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ, ์‹ค๋ฌด์—์„œ ๋งค์šฐ ์œ ์šฉํ•˜๊ฒŒ ํ™œ์šฉ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

---์•„๋ž˜๋Š” ๋ชจ๋ธ ํ•™์Šต ์†๋„ ํ–ฅ์ƒ์„ ์œ„ํ•œ ํšจ์œจ์ ์ธ ๊ธฐ๋ฒ•๋“ค์„ ์ •๋ฆฌํ•œ ์ตœ์ข… ๋ธ”๋กœ๊ทธ ํฌ์ŠคํŠธ ์ดˆ์•ˆ์ž…๋‹ˆ๋‹ค.


๋ชจ๋ธ ํ•™์Šต ์†๋„ ํ–ฅ์ƒ์„ ์œ„ํ•œ ํšจ์œจ์ ์ธ ๊ธฐ๋ฒ•๋“ค

๋ชจ๋ธ ํ•™์Šต ์†๋„๊ฐ€ ์ง€๋‚˜์น˜๊ฒŒ ๋Š๋ฆด ๋•Œ๋Š” ์—ฌ๋Ÿฌ ํšจ์œจ์ ์ธ ๋ฐฉ๋ฒ•๋“ค์„ ์ ์šฉํ•˜์—ฌ ํ•™์Šต ์‹œ๊ฐ„์„ ๋‹จ์ถ•ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ๋Š” ๋ฐ์ดํ„ฐ ์บ์‹ฑ, ๊ทธ๋ผ๋””์–ธํŠธ ๋ˆ„์ (Gradient Accumulation), ํ˜ผํ•ฉ ์ •๋ฐ€๋„ ํ•™์Šต(Mixed Precision Training), Pseudo Labeling, ๊ทธ๋ฆฌ๊ณ  ์ƒ์„ฑ ๋ชจ๋ธ์„ ํ™œ์šฉํ•œ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ๊ธฐ๋ฒ•์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.


1. Data Caching

๋ฐ์ดํ„ฐ I/O๊ฐ€ ํ•™์Šต ์†๋„์˜ ๋ณ‘๋ชฉ์ด ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ ์ด๋ฏธ์ง€์™€ ๊ฐ™์ด ์šฉ๋Ÿ‰์ด ํฐ ๋ฐ์ดํ„ฐ๋ฅผ ๋งค epoch๋งˆ๋‹ค ๋””์Šคํฌ์—์„œ ๋ถˆ๋Ÿฌ์˜ค๋ฉด ํฐ ์˜ค๋ฒ„ํ—ค๋“œ๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.
ํ•ด๊ฒฐ๋ฒ•:

def data_caching(root_dir: str, info_df: pd.DataFrame):
    for idx, row in info_df.iterrows():
        image_path = os.path.join(root_dir, row['image_path'])
        image = cv2.imread(image_path, cv2.IMREAD_COLOR)
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        npy_path = image_path.replace('.jpg', '.npy')
        np.save(npy_path, image)

data_caching('base_directory', info_df)
class CustomImageDataset(Dataset):
    def __init__(self, info_df, root_dir, transform=None):
        self.info_df = info_df
        self.root_dir = root_dir
        self.transform = transform
        self.images = [None] * len(info_df)  # ์บ์‹œ์šฉ ๋ฆฌ์ŠคํŠธ

    def __len__(self):
        return len(self.info_df)

    def __getitem__(self, index):
        if self.images[index] is None:
            img_path = os.path.join(self.root_dir, self.info_df.iloc[index]['image_path'])
            image = cv2.imread(img_path, cv2.IMREAD_COLOR)
            image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
            self.images[index] = image
        else:
            image = self.images[index]
        label = self.info_df.iloc[index]['label']
        if self.transform:
            image = self.transform(image)
        return image, label

์ฃผ์˜: ์ „์ฒด ๋ฐ์ดํ„ฐ์…‹์ด ๋ฉ”๋ชจ๋ฆฌ ์šฉ๋Ÿ‰์„ ์ดˆ๊ณผํ•˜์ง€ ์•Š๋„๋ก ์ฃผ์˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.


2. Gradient Accumulation

๋ฉ”๋ชจ๋ฆฌ ์ œ์•ฝ์œผ๋กœ ์ธํ•ด ์ž‘์€ batch size๋ฅผ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ, gradient ์ถ”์ •์ด ๋ถˆ์•ˆ์ •ํ•ด์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
ํ•ด๊ฒฐ๋ฒ•:

accumulation_steps = 10
optimizer.zero_grad()  # loop ๋ฐ”๊นฅ์—์„œ ์ดˆ๊ธฐํ™”

for i, (images, targets) in enumerate(progress_bar):
    images, targets = images.to(device), targets.to(device)
    outputs = model(images)
    loss = loss_fn(outputs, targets)
    loss = loss / accumulation_steps  # ํ‰๊ท ํ™”
    loss.backward()
    total_loss += loss.item()

    if (i + 1) % accumulation_steps == 0 or (i + 1) == len(train_loader):
        optimizer.step()
        optimizer.zero_grad()
        scheduler.step()  # ์—…๋ฐ์ดํŠธ ์‹œ์—๋งŒ step

3. Mixed Precision Training

ํ•™์Šต ์‹œ float32๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ์ •๋ฐ€๋„๋Š” ๋†’์ง€๋งŒ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰๊ณผ ์—ฐ์‚ฐ ๋น„์šฉ์ด ์ปค์ง‘๋‹ˆ๋‹ค.
ํ•ด๊ฒฐ๋ฒ•:

from torch.cuda.amp import autocast, GradScaler

scaler = GradScaler()

for images, targets in train_loader:
    optimizer.zero_grad()
    images, targets = images.to(device), targets.to(device)
    
    with autocast():
        outputs = model(images)
        loss = loss_fn(outputs, targets)
    
    scaler.scale(loss).backward()
    scaler.step(optimizer)
    scaler.update()

4. Pseudo Labeling

๋Œ€๊ทœ๋ชจ unlabeled ๋ฐ์ดํ„ฐ์…‹์˜ ๊ฒฝ์šฐ, ์ง์ ‘ ๋ผ๋ฒจ๋งํ•˜๋Š” ๋น„์šฉ๊ณผ ์‹œ๊ฐ„์ด ๋งŽ์ด ๋“ญ๋‹ˆ๋‹ค.
Pseudo Labeling ๋ฐฉ๋ฒ•:

  1. ๋ชจ๋ธ ํ•™์Šต: labeled ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•ด ์ดˆ๊ธฐ ๋ชจ๋ธ์„ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.
  2. ์˜ˆ์ธก: ํ•™์Šต๋œ ๋ชจ๋ธ์„ ์ด์šฉํ•ด unlabeled ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์˜ˆ์ธก์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
  3. ์‹ ๋ขฐ๋„ ๊ธฐ์ค€ ์„ ํƒ: ๋†’์€ ์‹ ๋ขฐ๋„์˜ ์˜ˆ์ธก ๊ฒฐ๊ณผ๋ฅผ pseudo label๋กœ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.
  4. ์žฌํ•™์Šต: labeled ๋ฐ์ดํ„ฐ์™€ pseudo labeled ๋ฐ์ดํ„ฐ๋ฅผ ํ•จ๊ป˜ ์‚ฌ์šฉํ•ด ๋ชจ๋ธ์„ ์žฌํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.

์ฃผ์˜: Validation set์€ pseudo labeling ๊ณผ์ •์— ํฌํ•จ์‹œํ‚ค์ง€ ์•Š์•„์•ผ ํ•ฉ๋‹ˆ๋‹ค.


5. Generative Models๋ฅผ ํ™œ์šฉํ•œ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•

์ƒ์„ฑ ๋ชจ๋ธ(text-to-image, image-to-image)์„ ํ™œ์šฉํ•˜๋ฉด ํ•™์Šต์— ์‚ฌ์šฉํ•  ์ถ”๊ฐ€ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋‹จ, ์ƒ์„ฑ๋œ ๋ฐ์ดํ„ฐ์˜ ์‹ ๋ขฐ์„ฑ์„ ๊ฒ€์ฆํ•˜๋Š” ๊ณผ์ •์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.


๊ฒฐ๋ก 

๋ชจ๋ธ ํ•™์Šต ์†๋„๊ฐ€ ๋Š๋ฆด ๋•Œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํšจ์œจ์ ์ธ ๊ธฐ๋ฒ•๋“ค์„ ์ ์ ˆํžˆ ์กฐํ•ฉํ•˜์—ฌ ์ „์ฒด ํ•™์Šต ํŒŒ์ดํ”„๋ผ์ธ์˜ ์†๋„๋ฅผ ๊ฐœ์„ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ ๊ธฐ๋ฒ•๋“ค์„ ํ†ตํ•ด ํ•™์Šต ์†๋„์™€ ์ตœ์ข… ๋ชจ๋ธ ์„ฑ๋Šฅ์„ ๋™์‹œ์— ๊ฐœ์„ ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์‹ค๋ฌด์—์„œ ํšจ์œจ์ ์ธ ํ•™์Šต ํŒŒ์ดํ”„๋ผ์ธ ๊ตฌ์ถ•์— ํฐ ๋„์›€์ด ๋  ๊ฒƒ์ž…๋‹ˆ๋‹ค.


Confusion Matrix์™€ Ensemble ๊ธฐ๋ฒ•์„ ํ™œ์šฉํ•œ ๋ถ„๋ฅ˜ ์„ฑ๋Šฅ ํ‰๊ฐ€

๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•  ๋•Œ, confusion matrix๋Š” ์˜ˆ์ธก ๊ฒฐ๊ณผ์™€ ์‹ค์ œ ๋ผ๋ฒจ์„ ํ•œ๋ˆˆ์— ๋น„๊ตํ•  ์ˆ˜ ์žˆ๋Š” ๊ฐ•๋ ฅํ•œ ๋„๊ตฌ์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด accuracy, precision, recall, ๊ทธ๋ฆฌ๊ณ  F1-score์™€ ๊ฐ™์€ ์—ฌ๋Ÿฌ ์„ฑ๋Šฅ ์ง€ํ‘œ๋ฅผ ์‚ฐ์ถœํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, ์—ฌ๋Ÿฌ ๋ชจ๋ธ์˜ ์˜ˆ์ธก์„ ๊ฒฐํ•ฉํ•˜๋Š” ensemble ๊ธฐ๋ฒ•์„ ํ™œ์šฉํ•˜๋ฉด ์„ฑ๋Šฅ์„ ๋”์šฑ ๋†’์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ํฌ์ŠคํŒ…์—์„œ๋Š” confusion matrix์˜ ๊ตฌ์„ฑ๊ณผ ๊ฐ ์ง€ํ‘œ์˜ ์˜๋ฏธ, ๊ทธ๋ฆฌ๊ณ  ensemble ๊ธฐ๋ฒ•๋“ค์— ๋Œ€ํ•ด ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.


1. Confusion Matrix๋ž€?

Pasted image 20250312145017.png
Confusion Matrix๋Š” ๋ชจ๋ธ์˜ ์˜ˆ์ธก ๊ฒฐ๊ณผ์™€ ์‹ค์ œ ๋ผ๋ฒจ์„ ๋น„๊ตํ•˜์—ฌ ๋‹ค์Œ ๋„ค ๊ฐ€์ง€ ํ•ญ๋ชฉ์œผ๋กœ ๋ถ„๋ฅ˜ํ•œ ํ–‰๋ ฌ์ž…๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ ๊ตฌ์„ฑ ์š”์†Œ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋‹ค์–‘ํ•œ ์„ฑ๋Šฅ ์ง€ํ‘œ๋ฅผ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ฃผ์š” ์„ฑ๋Šฅ ์ง€ํ‘œ

์„ฑ๋Šฅ ์ง€ํ‘œ ์„ ํƒ์˜ ์ค‘์š”์„ฑ

๋ฌธ์ œ์˜ ์ •์˜์— ๋”ฐ๋ผ ์–ด๋–ค ์„ฑ๋Šฅ ์ง€ํ‘œ๋ฅผ ์šฐ์„ ์‹œํ•ด์•ผ ํ•˜๋Š”์ง€๊ฐ€ ๋‹ฌ๋ผ์ง‘๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด:

์–ด๋–ค ์ง€ํ‘œ๋ฅผ ์‚ฌ์šฉํ• ์ง€ ๊ฒฐ์ •ํ•  ๋•Œ, ์ƒํ™ฉ์— ๋งž๋Š” threshold๋ฅผ ์„ ํƒํ•˜์—ฌ ๋น„์šฉ์„ ์ตœ์†Œํ™”ํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.


2. Ensemble ๊ธฐ๋ฒ•

Ensemble ๊ธฐ๋ฒ•์€ ๋™์ผํ•œ ํƒœ์Šคํฌ์— ๋Œ€ํ•ด ์—ฌ๋Ÿฌ ๋ชจ๋ธ์˜ ์˜ˆ์ธก์„ ๊ฒฐํ•ฉํ•˜์—ฌ, ๊ฐœ๋ณ„ ๋ชจ๋ธ์˜ ์•ฝ์ ์„ ๋ณด์™„ํ•˜๊ณ  ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.

์ฃผ์š” Ensemble ๊ธฐ๋ฒ•


๊ฒฐ๋ก