Week 9-12 ํ•™์Šต ์ •๋ฆฌ

Object Detection ํ‰๊ฐ€: ์„ฑ๋Šฅ๊ณผ ์†๋„ ์ง€ํ‘œ ์ดํ•ดํ•˜๊ธฐ

Object Detection์€ ์ด๋ฏธ์ง€ ๋‚ด ๊ฐ์ฒด์˜ ์œ„์น˜์™€ ์ข…๋ฅ˜๋ฅผ ๋™์‹œ์— ํŒ๋ณ„ํ•˜๋Š” ๋ณตํ•ฉ์ ์ธ ํƒœ์Šคํฌ์ž…๋‹ˆ๋‹ค. ๋‹จ์ˆœํ•œ ๋ถ„๋ฅ˜(classification) ๋ฌธ์ œ๋ณด๋‹ค ๋” ๋ณต์žกํ•œ ์ด ๋ฌธ์ œ๋Š” ์ž์œจ ์ฃผํ–‰, OCR, ์งˆ๋ณ‘ ์ง„๋‹จ, CCTV ๋ชจ๋‹ˆํ„ฐ๋ง ๋“ฑ ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์—์„œ ํ•ต์‹ฌ ์—ญํ• ์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์ด ํฌ์ŠคํŠธ์—์„œ๋Š” Object Detection์˜ ์„ฑ๋Šฅ ํ‰๊ฐ€ ์ง€ํ‘œ์™€ ์†๋„ ํ‰๊ฐ€ ์ง€ํ‘œ์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.


1. Object Detection ์„ฑ๋Šฅ ํ‰๊ฐ€: mAP (mean Average Precision)

Pasted image 20250312140513.png

1-1. Precision๊ณผ Recall ๋ณต์Šต

์ด ๋‘ ์ง€ํ‘œ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ Precision-Recall (PR) Curve ๋ฅผ ๊ทธ๋ฆฝ๋‹ˆ๋‹ค.
PR Curve ์•„๋ž˜ ๋ฉด์ ์„ ๊ณ„์‚ฐํ•œ ๊ฐ’์ด **Average Precision (AP)**์ด๋ฉฐ, ์—ฌ๋Ÿฌ ํด๋ž˜์Šค์— ๋Œ€ํ•ด AP๋ฅผ ํ‰๊ท ํ•œ ๊ฐ’์ด mAP์ž…๋‹ˆ๋‹ค. mAP ๊ฐ’์ด ๋†’์„์ˆ˜๋ก ๋ชจ๋ธ์˜ ๊ฐ์ฒด ๊ฒ€์ถœ ์„ฑ๋Šฅ์ด ์šฐ์ˆ˜ํ•˜๋‹ค๊ณ  ํ‰๊ฐ€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

1-2. Bounding Box ํ‰๊ฐ€: IoU (Intersection over Union)

Object Detection์—์„œ๋Š” ๊ฐ์ฒด์˜ ์œ„์น˜๋ฅผ bounding box (bbox)๋กœ ๋‚˜ํƒ€๋‚ด๋Š”๋ฐ, ์˜ˆ์ธก bbox๊ฐ€ Ground Truth bbox์™€ ์–ผ๋งˆ๋‚˜ ์ผ์น˜ํ•˜๋Š”์ง€๋ฅผ ํ‰๊ฐ€ํ•˜๋Š” ์ง€ํ‘œ๋กœ IoU๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

IoU= ๋‘ย bbox์˜ย ๊ต์ง‘ํ•ฉย ์˜์—ญ๋‘ย bbox์˜ย ํ•ฉ์ง‘ํ•ฉย ์˜์—ญ
IoU=๋‘ bbox์˜ ๊ต์ง‘ํ•ฉ ์˜์—ญ๋‘ bbox์˜ ํ•ฉ์ง‘ํ•ฉ ์˜์—ญ

1-3. mAP ๊ณ„์‚ฐ ๊ณผ์ •

  1. ๊ฐ ํด๋ž˜์Šค๋ณ„๋กœ, ๋ชจ๋ธ์˜ ์˜ˆ์ธก bbox์— ๋Œ€ํ•ด IoU ๊ธฐ์ค€์„ ๋งŒ์กฑํ•˜๋Š” True Positive (TP)์™€ False Positive (FP)๋ฅผ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค.
  2. confidence score์— ๋”ฐ๋ผ ์˜ˆ์ธก ๊ฒฐ๊ณผ๋ฅผ ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ์ •๋ ฌํ•œ ํ›„, Precision-Recall Curve๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
  3. ๊ฐ ํด๋ž˜์Šค์˜ AP (Average Precision)๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ , ์ด๋ฅผ ํ‰๊ท ํ•œ ๊ฐ’์ด mAP์ž…๋‹ˆ๋‹ค.

2. Object Detection ์†๋„ ํ‰๊ฐ€

์„ฑ๋Šฅ ์™ธ์—๋„, ์‹ค์‹œ๊ฐ„ ์ฒ˜๋ฆฌ๊ฐ€ ์š”๊ตฌ๋˜๋Š” ๊ฒฝ์šฐ ์†๋„ ์ง€ํ‘œ๋„ ๋งค์šฐ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

2-1. FPS (Frames Per Second)

2-2. FLOPs (Floating Point Operations)


๊ฒฐ๋ก 

Object Detection์˜ ํ‰๊ฐ€์—์„œ๋Š” ์„ฑ๋Šฅ๊ณผ ์†๋„ ๋‘ ๊ฐ€์ง€ ์ธก๋ฉด์ด ๋ชจ๋‘ ๊ณ ๋ ค๋ฉ๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ ํ‰๊ฐ€ ์ง€ํ‘œ๋“ค์„ ์ข…ํ•ฉํ•˜์—ฌ, ์‹ค์ œ ์„œ๋น„์Šค์— ์ ํ•ฉํ•œ Object Detection ๋ชจ๋ธ์„ ์„ ํƒํ•˜๊ณ  ์ตœ์ ํ™”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.


Object Detection ๋ชจ๋ธ์˜ ๋ฐœ์ „๊ณผ 2 Stage Detector์˜ ๊ตฌ์กฐ

Object Detection์€ ์ด๋ฏธ์ง€ ๋‚ด์—์„œ ๊ฐ์ฒด์˜ ์œ„์น˜์™€ ์ข…๋ฅ˜๋ฅผ ๋™์‹œ์— ํŒ๋ณ„ํ•˜๋Š” ๋ณตํ•ฉ์ ์ธ ํƒœ์Šคํฌ์ž…๋‹ˆ๋‹ค. ๋‹จ์ˆœํ•œ ๋ถ„๋ฅ˜ ๋ฌธ์ œ๋ณด๋‹ค ๋ณต์žกํ•œ ์ด ๋ฌธ์ œ๋Š” ์ž์œจ ์ฃผํ–‰, OCR, ์งˆ๋ณ‘ ์ง„๋‹จ, CCTV ๋“ฑ ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์—์„œ ํ•ต์‹ฌ ์—ญํ• ์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์˜ค๋Š˜์€ ๊ฐ์ฒด ์ธ์ง€๋ฅผ ์ธ๊ฐ„์˜ ์ธ์‹ ๊ณผ์ •๊ณผ ์œ ์‚ฌํ•˜๊ฒŒ ๋‘ ๋‹จ๊ณ„๋กœ ๋‚˜๋ˆ„์–ด ์ฒ˜๋ฆฌํ•˜๋Š” 2 Stage Detector ๋ชจ๋ธ์˜ ๋ฐœ์ „ ๊ณผ์ •์„ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.


1. 2 Stage Detector ๋ฐœ์ „ ๊ณผ์ •

2 Stage Detector๋Š” ์ธ๊ฐ„์ด ๋ฌผ์ฒด๋ฅผ ์ธ์ง€ํ•  ๋•Œ ๋จผ์ € ์œ„์น˜๋ฅผ ํŒŒ์•…ํ•œ ํ›„, ํ•ด๋‹น ๋ฌผ์ฒด์˜ ์ข…๋ฅ˜๋ฅผ ํŒ๋ณ„ํ•˜๋Š” ๋ฐฉ์‹์„ ๋ชจ๋ฐฉํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์€ ์•„๋ž˜์™€ ๊ฐ™์ด R-CNN ๊ณ„์—ด ๋ชจ๋ธ์˜ ๋ฐœ์ „์„ ํ†ตํ•ด ์ด๋ฃจ์–ด์กŒ์Šต๋‹ˆ๋‹ค.

1-1. R-CNN (Region-based Convolutional Neural Network)

1-2. SPPNet (Spatial Pyramid Pooling Network)

1-3. Fast R-CNN

1-4. Faster R-CNN


2. Object Detection ๋ชจ๋ธ์˜ Framework

2 Stage Detector ๋ชจ๋ธ์€ Backbone, Neck, Head๋กœ ๊ตฌ์„ฑ๋œ ๊ณ„์ธต์  ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
Pasted image 20250312140932.png


๊ฒฐ๋ก 

Object Detection ๋ชจ๋ธ์€ R-CNN์—์„œ ์‹œ์ž‘ํ•ด SPPNet, Fast R-CNN, Faster R-CNN์œผ๋กœ ๋ฐœ์ „ํ•ด ์™”์Šต๋‹ˆ๋‹ค.

๋˜ํ•œ, ์ด๋Ÿฌํ•œ 2 Stage Detector ๋ชจ๋ธ์˜ ๊ตฌ์กฐ๋Š” Backbone, Neck, Head๋กœ ๋‚˜๋‰˜๋ฉฐ, ์ด๋ฅผ ํ†ตํ•ด ๋‹ค์–‘ํ•œ ๊ฐ์ฒด์˜ ํฌ๊ธฐ์™€ ํ˜•ํƒœ๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ๊ฒ€์ถœํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.


Object Detection ๋ชจ๋ธ์˜ ๋ฐœ์ „: IoU Threshold๋ถ€ํ„ฐ Transformer๊นŒ์ง€

์ดˆ๊ธฐ ๋ชจ๋ธ๋“ค์€ ๊ฐ์ฒด์˜ ์œ„์น˜์™€ ํด๋ž˜์Šค๋ฅผ ๊ฐœ๋ณ„์ ์œผ๋กœ ์ฒ˜๋ฆฌํ–ˆ์ง€๋งŒ, ์‹œ๊ฐ„์ด ํ๋ฅด๋ฉด์„œ ํšจ์œจ์„ฑ๊ณผ ์ •ํ™•๋„๋ฅผ ๋†’์ด๊ธฐ ์œ„ํ•ด ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ํ˜์‹ ์  ๊ธฐ๋ฒ•๋“ค์ด ๋„์ž…๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ๋Š” Fast R-CNN์—์„œ ์‹œ์ž‘ํ•ด Cascade R-CNN, Deformable Convolutional Networks, DETR, ๊ทธ๋ฆฌ๊ณ  Swin Transformer์— ์ด๋ฅด๊ธฐ๊นŒ์ง€์˜ ๋ฐœ์ „ ๊ณผ์ •์„ ์‚ดํŽด๋ณด๊ณ , IoU Threshold ์„ค์ •์ด detection ์„ฑ๋Šฅ์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์„ ํ•จ๊ป˜ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.


1. Fast R-CNN๊ณผ IoU Threshold์˜ ์˜ํ–ฅ

Fast R-CNN์€ ๊ธฐ์กด R-CNN์˜ ๋‹จ์ ์„ ๋ณด์™„ํ•˜์—ฌ end-to-end๋กœ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋„๋ก ์„ค๊ณ„๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

์ด์™€ ๊ฐ™์€ ์‹คํ—˜ ๊ฒฐ๊ณผ๋Š” ๋‹จ์ˆœํžˆ IoU threshold๋ฅผ ๋†’์ธ๋‹ค๊ณ  ํ•ด์„œ ํ•ญ์ƒ ์„ฑ๋Šฅ์ด ํ–ฅ์ƒ๋˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, ๋ฐ์ดํ„ฐ์˜ ํŠน์„ฑ๊ณผ ํ•™์Šต ์กฐ๊ฑด์— ๋”ฐ๋ผ ์ ์ ˆํ•œ threshold๋ฅผ ์„ ํƒํ•ด์•ผ ํ•œ๋‹ค๋Š” ์ ์„ ์‹œ์‚ฌํ•ฉ๋‹ˆ๋‹ค.


2. Cascade R-CNN

Cascade R-CNN์€ IoU threshold์— ๋”ฐ๋ฅธ ๋ชจ๋ธ ์„ฑ๋Šฅ์˜ ๋ณ€๋™์„ฑ์„ ์ฒด๊ณ„์ ์œผ๋กœ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•ด ๊ณ ์•ˆ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
Pasted image 20250312141835.png


3. Deformable Convolutional Networks

Pasted image 20250312141843.png
๊ธฐ์กด์˜ ๊ณ ์ •๋œ convolution filter๋Š” ์ด๋ฏธ์ง€์— ๊ธฐ์šธ๊ธฐ, ์‹œ์  ๋ณ€ํ™”, ํฌ์ฆˆ ๋ณ€ํ™” ๋“ฑ์˜ geometric transform์ด ๊ฐ€ํ•ด์กŒ์„ ๋•Œ ํ•œ๊ณ„๊ฐ€ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.


4. DETR (DEtection TRansformer)

Pasted image 20250312141850.png
DETR์€ Transformer๋ฅผ object detection์— ์ฒ˜์Œ ๋„์ž…ํ•œ ๋ชจ๋ธ๋กœ, ๊ธฐ์กด ๋ชจ๋ธ๋“ค์˜ ๋ณต์žกํ•œ ํ›„๋ณด ์˜์—ญ(NMS ๋“ฑ) ํ›„์ฒ˜๋ฆฌ ๋‹จ๊ณ„๋ฅผ ์ œ๊ฑฐํ–ˆ์Šต๋‹ˆ๋‹ค.


5. Swin Transformer

Pasted image 20250312141855.png
Transformer๋Š” ์›๋ž˜ ๊ณ„์‚ฐ ๋น„์šฉ์ด ๋งค์šฐ ๋†’์ง€๋งŒ, Swin Transformer๋Š” CNN๊ณผ ์œ ์‚ฌํ•œ ๊ณ„์ธต์  ๊ตฌ์กฐ์™€ window ๊ธฐ๋ฐ˜ self-attention์„ ๋„์ž…ํ•ด ์ด๋ฅผ ๊ทน๋ณตํ–ˆ์Šต๋‹ˆ๋‹ค.


๊ฒฐ๋ก 

Object Detection ๋ชจ๋ธ์€ Fast R-CNN์„ ์‹œ์ž‘์œผ๋กœ Cascade R-CNN, Deformable Convolutional Networks, DETR, ๊ทธ๋ฆฌ๊ณ  Swin Transformer์— ์ด๋ฅด๊ธฐ๊นŒ์ง€ ์ง€์†์ ์œผ๋กœ ๋ฐœ์ „ํ•ด์™”์Šต๋‹ˆ๋‹ค.


Neck ๋ชจ๋“ˆ: Backbone๊ณผ Head ์‚ฌ์ด์˜ ๋‹ค๋ฆฌ ์—ญํ• 

2 stage detector ๋ชจ๋ธ์€ ํฌ๊ฒŒ Backbone, Neck, Head๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค.

๊ธฐ์กด์˜ object detection ๋ชจ๋ธ๋“ค์€ Backbone์˜ ๋งˆ์ง€๋ง‰ layer์—์„œ ์ถ”์ถœ๋œ ๋‹จ์ผ feature map์„ RPN(Region Proposal Network)์— ์—ฐ๊ฒฐํ•ด ๊ฐ์ฒด๋ฅผ ํƒ์ง€ํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๊ฐ์ฒด์˜ ํฌ๊ธฐ๊ฐ€ ๋‹ค์–‘ํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ์—ฌ๋Ÿฌ ๋‹จ๊ณ„์˜ feature map์„ ํ™œ์šฉํ•˜์—ฌ ์ •๋ณด์˜ ํ’๋ถ€ํ•จ๊ณผ ์„ธ๋ฐ€ํ•จ์„ ๋™์‹œ์— ํ™•๋ณดํ•˜๋Š” ๊ฒƒ์ด ํ•„์š”ํ•ด์กŒ์Šต๋‹ˆ๋‹ค. ์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ๋Š” ๋‹ค์–‘ํ•œ Neck ๊ตฌ์กฐ์™€ ๊ทธ ๋ฐœ์ „ ๊ณผ์ •์„ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.


1. Feature Pyramid Network (FPN)

FPN์€ Neck ๋ชจ๋“ˆ์˜ ๋Œ€ํ‘œ์ ์ธ ๊ตฌ์กฐ๋กœ, CNN์„ ํ†ต๊ณผํ•˜๋ฉฐ ์—ฌ๋Ÿฌ ์ˆ˜์ค€์˜ feature map์„ ์ƒ์„ฑํ•œ ํ›„, ์ƒ์œ„ level์˜ semantic ์ •๋ณด(์ €ํ•ด์ƒ๋„, ์˜๋ฏธ, ํŒจํ„ด)์™€ ํ•˜์œ„ level์˜ ์„ธ๋ถ€ ์ •๋ณด(๊ณ ํ•ด์ƒ๋„, ๊ตฌ์กฐ)๋ฅผ ๊ฒฐํ•ฉํ•ฉ๋‹ˆ๋‹ค.
Pasted image 20250312142309.png

์žฅ์ :

ํ•œ๊ณ„:


2. Path Aggregation Network (PANet)

PANet์€ FPN์˜ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด ๋„์ž…๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

์žฅ์ :


3. DetectoRS: Recurrent Feature Pyramid

DetectoRS๋Š” PANet์˜ ์•„์ด๋””์–ด๋ฅผ ํ•œ ๋‹จ๊ณ„ ๋ฐœ์ „์‹œ์ผœ, ์ƒ์œ„์™€ ํ•˜์œ„ level์˜ feature map ๊ฐ„ ์ •๋ณด ๊ตํ™˜์„ ๋ฐ˜๋ณต(recurrent) ํ•˜์—ฌ ๋”์šฑ ํ’๋ถ€ํ•œ ํ‘œํ˜„์„ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.
Pasted image 20250312142425.png

์žฅ์ :


4. Bi-directional Feature Pyramid (BiFPN)

Pasted image 20250312142432.png
BiFPN์€ EfficientDet ๋ชจ๋ธ์—์„œ ๋„์ž…๋œ ๊ตฌ์กฐ๋กœ, PANet์ฒ˜๋Ÿผ top-down ๋ฐ bottom-up ๊ฒฝ๋กœ๋ฅผ ๋ชจ๋‘ ์‚ฌ์šฉํ•˜๋ฉด์„œ๋„, ๋ถˆํ•„์š”ํ•œ ์—ฐ๊ฒฐ์€ ์ œ๊ฑฐํ•ด ๋ณด๋‹ค ๋‹จ์ˆœํ•˜๋ฉด์„œ๋„ ํšจ๊ณผ์ ์œผ๋กœ ์ค‘์š”ํ•œ ์ •๋ณด๋ฅผ ๊ฒฐํ•ฉํ•ฉ๋‹ˆ๋‹ค.

์žฅ์ :


5. NAS-FPN

NAS-FPN์€ Neural Architecture Search (NAS)๋ฅผ ์ด์šฉํ•˜์—ฌ ์ตœ์ ์˜ Feature Pyramid Network ๊ตฌ์กฐ๋ฅผ ์ž๋™์œผ๋กœ ํƒ์ƒ‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.


6. Augmented Feature Pyramid Network (AugFPN)

AugFPN์€ FPN์˜ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด ์—ฌ๋Ÿฌ ๊ธฐ๋ฒ•์„ ๊ฒฐํ•ฉํ•œ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

์žฅ์ :


๊ฒฐ๋ก 

Neck ๋ชจ๋“ˆ์€ 2 stage detector์—์„œ Backbone๊ณผ Head ์‚ฌ์ด์˜ ๋‹ค๋ฆฌ ์—ญํ• ์„ ํ•˜๋ฉฐ, ๋‹ค์–‘ํ•œ scale์˜ feature๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ๊ฒฐํ•ฉํ•ด ๊ฐ์ฒด ํƒ์ง€ ์„ฑ๋Šฅ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค.


Object Detection ๋ชจ๋ธ ๋ฐœ์ „๊ณผ ์ตœ์‹  ๊ธฐ์ˆ  ๋™ํ–ฅ

์ดˆ๊ธฐ์˜ 2 stage detector๋Š” ๊ฐ์ฒด์˜ ์œ„์น˜์™€ ๋ถ„๋ฅ˜๋ฅผ ๋ถ„๋ฆฌํ•ด ์ฒ˜๋ฆฌํ•˜์—ฌ ๋†’์€ ์ •ํ™•๋„๋ฅผ ๋ณด์˜€์ง€๋งŒ, ์ฒ˜๋ฆฌ ์†๋„๊ฐ€ ๋Š๋ ธ์Šต๋‹ˆ๋‹ค. ์ด์— ๋ฐ˜ํ•ด, 1 stage detector๋Š” ์ „์ฒด ์ด๋ฏธ์ง€๋ฅผ ํ•œ ๋ฒˆ์— ์ฒ˜๋ฆฌํ•˜์—ฌ ์‹ค์‹œ๊ฐ„ ์„ฑ๋Šฅ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œ์ผฐ์Šต๋‹ˆ๋‹ค. ์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ๋Š” 1 stage detector์˜ ๋Œ€ํ‘œ ๋ชจ๋ธ๋“ค๊ณผ ํ•จ๊ป˜, ์ตœ์‹  ๊ธฐ์ˆ ๋“ค์ด object detection์— ์–ด๋–ป๊ฒŒ ์ ์šฉ๋˜๊ณ  ์žˆ๋Š”์ง€ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.


1. 1 Stage Detector์˜ ๋“ฑ์žฅ

1-1. YOLO ์‹œ๋ฆฌ์ฆˆ

YOLO (You Only Look Once) ๋Š” 1 stage detector์˜ ๋Œ€ํ‘œ์ ์ธ ์˜ˆ์‹œ์ž…๋‹ˆ๋‹ค.

1-2. SSD (Single Shot Multibox Detector)

Pasted image 20250312142744.png

1-3. RetinaNet


2. ์ตœ์‹  ๊ธฐ์ˆ  ๋ฐ Anchor-Free ์ ‘๊ทผ๋ฒ•

2-1. M2Det

Pasted image 20250312142839.png

2-2. Anchor-Free Approaches

2-3. DETR (Detection Transformer)

2-4. Swin Transformer


๊ฒฐ๋ก 

1 stage detector๋Š” ์ „์ฒด ์ด๋ฏธ์ง€๋ฅผ ํ•œ ๋ฒˆ์— ์ฒ˜๋ฆฌํ•จ์œผ๋กœ์จ, 2 stage detector์—์„œ ๋ฐœ์ƒํ•˜๋Š” region proposal์˜ ์—ฐ์‚ฐ ๋น„์šฉ๊ณผ ์†๋„ ๋ฌธ์ œ๋ฅผ ํฌ๊ฒŒ ๊ฐœ์„ ํ•ฉ๋‹ˆ๋‹ค.


MMDetection: PyTorch ๊ธฐ๋ฐ˜ Object Detection ๋”ฅ๋Ÿฌ๋‹ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ํ™œ์šฉํ•˜๊ธฐ

MMDetection์ด๋‚˜ Detectron2์™€ ๊ฐ™์€ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋ฏธ๋ฆฌ ๊ฐ–์ถฐ์ง„ configuration ํŒŒ์ผ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์‰ฝ๊ฒŒ object detection ๋ชจ๋ธ์„ ๊ตฌ์ถ•ํ•˜๊ณ  ํ•™์Šตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ๋Š” MMDetection์˜ ๊ธฐ๋ณธ ๊ตฌ์„ฑ ๋ฐ configuration ์ˆ˜์ • ๋ฐฉ๋ฒ•, ๊ทธ๋ฆฌ๊ณ  ์ปค์Šคํ…€ ๋ฐฑ๋ณธ ๋ชจ๋ธ ๋“ฑ๋ก ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.


1. MMDetection ๊ธฐ๋ณธ ์‚ฌ์šฉ๋ฒ•

MMDetection์€ PyTorch ๊ธฐ๋ฐ˜์˜ ์˜คํ”ˆ์†Œ์Šค ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋กœ, ๋‹ค์–‘ํ•œ ์ตœ์‹  object detection ์•Œ๊ณ ๋ฆฌ์ฆ˜(Faster R-CNN, Mask R-CNN, RetinaNet ๋“ฑ)์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.

1-1. ์ฃผ์š” Import

from mmcv import Config
from mmdet.datasets import build_dataset, build_dataloader, replace_ImageToTensor
from mmdet.models import build_detector
from mmdet.apis import train_detector  # (์ฃผ์˜: 'apls'๊ฐ€ ์•„๋‹ˆ๋ผ 'apis' ์ž…๋‹ˆ๋‹ค.)
from mmdet.utils import get_device

1-2. Configuration ํŒŒ์ผ ๋‹ค๋ฃจ๊ธฐ

๊ธฐ๋ณธ์ ์œผ๋กœ MMDetection์€ ๋ฏธ๋ฆฌ ์ž‘์„ฑ๋œ configuration ํŒŒ์ผ์„ ์ƒ์†๋ฐ›์•„ ํ•„์š”ํ•œ ๋ถ€๋ถ„๋งŒ ์ˆ˜์ •ํ•ด์„œ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด Faster R-CNN์„ ๊ธฐ๋ฐ˜์œผ๋กœ Trash detection ๋ชจ๋ธ์„ ๊ตฌ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

# configuration ํŒŒ์ผ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
cfg = Config.fromfile('./configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py')

# ๋ฐ์ดํ„ฐ ๊ฒฝ๋กœ ์„ค์ •
route = './dataset/'

# ํด๋ž˜์Šค ์ˆ˜์ •
classes = ("General Trash", "Paper", "Paper pack", "Metal", "Glass", 
           "Plastic", "Styrofoam", "Plastic bag", "Battery", "Clothing")
cfg.model.roi_head.bbox_head.num_classes = len(classes)

# training set ์„ค์ •
cfg.data.train.classes = classes
cfg.data.train.img_prefix = route
cfg.data.train.ann_file = route + 'train.json'
cfg.data.train.pipeline[2]['img_scale'] = (512, 512)   # resize ํฌ๊ธฐ

# validation set ์„ค์ •
cfg.data.val.classes = classes
cfg.data.val.img_prefix = route
cfg.data.val.ann_file = route + 'val.json'
cfg.data.val.pipeline[1]['img_scale'] = (512, 512)

# test set ์„ค์ •
cfg.data.test.classes = classes
cfg.data.test.img_prefix = route
cfg.data.test.ann_file = route + 'test.json'
cfg.data.test.pipeline[1]['img_scale'] = (512, 512)

# ๊ธฐํƒ€ ํ•™์Šต ์„ค์ •
cfg.data.samples_per_gpu = 4
cfg.seed = 2020
cfg.gpu_ids = [0]
cfg.work_dir = './work_dirs/faster_rcnn_r50_fpn_1x_trash'
cfg.optimizer_config.grad_clip = dict(max_norm=35, norm_type=2)
cfg.checkpoint_config = dict(max_keep_ckpts=3, interval=1)
cfg.device = get_device()

1-3. Dataset, Model ๋ฐ ํ•™์Šต

# Dataset ์ •์˜ (ํ•™์Šต์šฉ)
datasets = [build_dataset(cfg.data.train)]

# Model ์ •์˜
model = build_detector(cfg.model)
model.init_weights()  # ๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™”

# ํ•™์Šต ์ˆ˜ํ–‰
train_detector(model, datasets[0], cfg, distributed=False, validate=True)

2. Custom Backbone ๋ชจ๋ธ ๋“ฑ๋กํ•˜๊ธฐ

MMDetection์—์„œ๋Š” ๊ธฐ๋ณธ ์ œ๊ณต๋˜๋Š” ๋‹ค์–‘ํ•œ ๋ฐฑ๋ณธ ์™ธ์—๋„, ์ง์ ‘ ๊ตฌํ˜„ํ•œ ์ปค์Šคํ…€ ๋ฐฑ๋ณธ์„ ๋“ฑ๋กํ•˜์—ฌ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

2-1. Custom Backbone ๋ชจ๋ธ ์ฝ”๋“œ ์˜ˆ์‹œ

import torch.nn as nn
from ..builder import BACKBONES  # mmdetection์˜ ๋ฐฑ๋ณธ ๋นŒ๋” ๋ชจ๋“ˆ

@BACKBONES.register_module()
class MyModel(nn.Module):
    def __init__(self, args):
        super(MyModel, self).__init__()
        # ํ•„์š”ํ•œ layer ์ •์˜
        # ์˜ˆ) self.conv = nn.Conv2d(...)

    def forward(self, x):
        # forward pass๋ฅผ ๊ตฌํ˜„ (tuple ํ˜•ํƒœ์˜ feature map์„ returnํ•ด์•ผ ํ•จ)
        # ์˜ˆ) feat = self.conv(x)
        return (feat,)  # tuple๋กœ ๋ฐ˜ํ™˜

์ด ํŒŒ์ผ์€ mmdetection/mmdet/models/backbones/mymodel.py ๊ฒฝ๋กœ์— ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.

2-2. Configuration์— Custom Backbone ์ ์šฉ

cfg.model.backbone = dict(
    type='MyModel',
    args='arg1'  # MyModel์—์„œ ํ•„์š”๋กœ ํ•˜๋Š” ์ธ์ž ๊ฐ’
)

์ด๋ ‡๊ฒŒ ์ˆ˜์ •ํ•œ ํ›„, ๊ธฐ์กด๊ณผ ๋™์ผํ•˜๊ฒŒ model์„ buildํ•˜๊ณ  ํ•™์Šตํ•˜๋ฉด ์ปค์Šคํ…€ ๋ฐฑ๋ณธ์ด ์ ์šฉ๋œ object detection ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.


๊ฒฐ๋ก 

MMDetection์€ ๋ฏธ๋ฆฌ ๊ตฌ์„ฑ๋œ configuration ํŒŒ์ผ์„ ์ˆ˜์ •ํ•˜๋Š” ๊ฒƒ๋งŒ์œผ๋กœ๋„ ์ตœ์‹  object detection ๋ชจ๋ธ์„ ์‰ฝ๊ฒŒ ๊ตฌ์ถ•ํ•˜๊ณ  ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋„๋ก ๋„์™€์ค๋‹ˆ๋‹ค.


Detectron2๋กœ Object Detection ๋ชจ๋ธ ๊ตฌ์ถ•ํ•˜๊ธฐ

Detectron2๋Š” Facebook AI Research์—์„œ ๊ฐœ๋ฐœํ•œ PyTorch ๊ธฐ๋ฐ˜ ๋”ฅ๋Ÿฌ๋‹ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋กœ, object detection, segmentation ๋“ฑ ๋‹ค์–‘ํ•œ ์ปดํ“จํ„ฐ ๋น„์ „ ํƒœ์Šคํฌ๋ฅผ ์†์‰ฝ๊ฒŒ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. ์ง์ ‘ ๊ตฌํ˜„ํ•˜๊ธฐ์—๋Š” ๋ณต์žกํ•œ object detection ๋ชจ๋ธ์„ ๋ฏธ๋ฆฌ ์ค€๋น„๋œ configuration ํŒŒ์ผ ํ•˜๋‚˜๋งŒ ์ˆ˜์ •ํ•˜์—ฌ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ปค์Šคํ„ฐ๋งˆ์ด์ง•๋„ ์šฉ์ดํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ๋Š” Detectron2์˜ ๊ธฐ๋ณธ ์‚ฌ์šฉ๋ฒ•๊ณผ ํ•จ๊ป˜, ์ปค์Šคํ…€ ๋ฐ์ดํ„ฐ augmentation, dataset ๋“ฑ๋ก, ํ•™์Šต, ๊ทธ๋ฆฌ๊ณ  custom backbone ๋ชจ๋ธ ๋“ฑ๋ก ๋ฐฉ๋ฒ•์„ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.


1. Detectron2 ๊ธฐ๋ณธ ์„ค์ •

1-1. Import ๋ฐ Logger ์„ค์ •

import os
import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()

from detectron2 import model_zoo
from detectron2.config import get_cfg
from detectron2.engine import DefaultTrainer
from detectron2.data import DatasetCatalog, MetadataCatalog, register_coco_instances
import detectron2.data.transforms as T
from detectron2.evaluation import COCOEvaluator
from detectron2.data import build_detection_train_loader, build_detection_test_loader

2. Configuration ํŒŒ์ผ ๋‹ค๋ฃจ๊ธฐ

๋ฏธ๋ฆฌ ์ค€๋น„๋œ configuration ํŒŒ์ผ์„ ๋ถˆ๋Ÿฌ์™€์„œ ํ•„์š”ํ•œ ๋ถ€๋ถ„๋งŒ ์ˆ˜์ •ํ•ฉ๋‹ˆ๋‹ค.

cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file('COCO-Detection/faster_rcnn_R_101_FPN_3x.yaml'))

# Dataset ์„ค์ •
cfg.DATASETS.TRAIN = ("coco_trash_train",)
cfg.DATASETS.TEST = ("coco_trash_val",)

# ํ•™์Šต ์„ค์ •
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url('COCO-Detection/faster_rcnn_R_101_FPN_3x.yaml')
cfg.SOLVER.IMS_PER_BATCH = 4
cfg.SOLVER.BASE_LR = 0.001
cfg.SOLVER.MAX_ITER = 3000
cfg.SOLVER.STEPS = (1000, 1500)
cfg.SOLVER.GAMMA = 0.05
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 10
cfg.TEST.EVAL_PERIOD = 500

3. ๋ฐ์ดํ„ฐ์…‹ ๋“ฑ๋ก ๋ฐ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์„ค์ •

COCO ํ˜•์‹์˜ annotation ํŒŒ์ผ๊ณผ ์ด๋ฏธ์ง€ ๋””๋ ‰ํ† ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•ด ๋ฐ์ดํ„ฐ์…‹์„ ๋“ฑ๋กํ•ฉ๋‹ˆ๋‹ค.

# train dataset ๋“ฑ๋ก
register_coco_instances('coco_trash_train', {}, '/home/data/train.json', '/home/data')
# validation dataset ๋“ฑ๋ก
register_coco_instances('coco_trash_val', {}, '/home/data/val.json', '/home/data')

# ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์„ค์ • (์„ ํƒ ์‚ฌํ•ญ)
classes = ["General Trash", "Paper", "Paper pack", "Metal", "Glass", 
           "Plastic", "Styrofoam", "Plastic bag", "Battery", "Clothing"]
MetadataCatalog.get('coco_trash_train').set(thing_classes=classes)
MetadataCatalog.get('coco_trash_val').set(thing_classes=classes)

4. Augmentation Mapper ์ •์˜

Detectron2๋Š” MMDetection์ฒ˜๋Ÿผ ๋‚ด์žฅ๋œ augmentation ๊ธฐ๋Šฅ์ด ์ œํ•œ์ ์ด๋ฏ€๋กœ, ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ ๋ฐ augmentation์€ custom mapper๋กœ ์ง์ ‘ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.

import copy
import torch
from detectron2.data import detection_utils as utils

def MyMapper(dataset_dict):
    dataset_dict = copy.deepcopy(dataset_dict)
    image = utils.read_image(dataset_dict['file_name'], format='BGR')
    
    transform_list = [
        T.RandomFlip(prob=0.5, horizontal=False, vertical=True),
        T.RandomBrightness(0.8, 1.8),
        T.RandomContrast(0.6, 1.3)
    ]
    
    image, transforms = T.apply_transform_gens(transform_list, image)
    dataset_dict['image'] = torch.as_tensor(image.transpose(2, 0, 1).astype('float32'))
    
    annos = [
        utils.transform_instance_annotations(obj, transforms, image.shape[:2])
        for obj in dataset_dict.pop('annotations')
        if obj.get('iscrowd', 0) == 0
    ]
    
    instances = utils.annotations_to_instances(annos, image.shape[:2])
    dataset_dict['instances'] = utils.filter_empty_instances(instances)
    
    return dataset_dict

5. Trainer ํด๋ž˜์Šค ์ •์˜ ๋ฐ ํ•™์Šต

Custom trainer๋ฅผ ์ •์˜ํ•˜์—ฌ, ํ•™์Šต ๋ฐ์ดํ„ฐ ๋กœ๋”์— augmentation mapper๋ฅผ ์ ์šฉํ•˜๊ณ , evaluator๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.

class MyTrainer(DefaultTrainer):
    @classmethod
    def build_train_loader(cls, cfg, sampler=None):
        return build_detection_train_loader(cfg, mapper=MyMapper, sampler=sampler)
    
    @classmethod
    def build_evaluator(cls, cfg, dataset_name, output_folder=None):
        if output_folder is None:
            os.makedirs('./output_eval', exist_ok=True)
            output_folder = './output_eval'
        return COCOEvaluator(dataset_name, cfg, False, output_folder)

os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = MyTrainer(cfg)
trainer.resume_or_load(resume=False)
trainer.train()

6. Custom Backbone ๋ชจ๋ธ ๋“ฑ๋กํ•˜๊ธฐ

์›ํ•˜๋Š” ๋ชจ๋ธ์ด ์—†๋Š” ๊ฒฝ์šฐ, ์ปค์Šคํ…€ ๋ฐฑ๋ณธ์„ ๋“ฑ๋กํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

from detectron2.modeling import BACKBONE_REGISTRY, Backbone, ShapeSpec
import torch.nn as nn

@BACKBONE_REGISTRY.register()
class MyBackbone(Backbone):
    def __init__(self, cfg, input_shape: ShapeSpec):
        super(MyBackbone, self).__init__()
        # ํ•„์š”ํ•œ layer ์ •์˜ (์˜ˆ: CNN layers)
        self.conv = nn.Conv2d(input_shape.channels, 64, kernel_size=3, stride=1, padding=1)
        self._out_features = ["res5"]  # ์ถœ๋ ฅ feature ์ด๋ฆ„ ์ •์˜
        
    def forward(self, x):
        # forward pass ๊ตฌํ˜„
        x = self.conv(x)
        return {"res5": x}  # dict ํ˜•ํƒœ๋กœ feature map ๋ฐ˜ํ™˜
        
    def output(self):
        # ์ถœ๋ ฅ ํŠน์„ฑ ์ •๋ณด ๋ฐ˜ํ™˜ (์˜ˆ: ์ฑ„๋„ ์ˆ˜, stride ๋“ฑ)
        return {"res5": ShapeSpec(channels=64, stride=1)}

# configuration ์ˆ˜์ •ํ•˜์—ฌ custom backbone ์‚ฌ์šฉ
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file('COCO-Detection/faster_rcnn_R_101_FPN_3x.yaml'))
cfg.MODEL.BACKBONE.NAME = 'MyBackbone'
model = build_model(cfg)

๊ฒฐ๋ก 

Detectron2๋Š” ๋ฏธ๋ฆฌ ๊ตฌ์„ฑ๋œ configuration ํŒŒ์ผ์„ ์ˆ˜์ •ํ•˜๋Š” ๊ฒƒ๋งŒ์œผ๋กœ๋„ object detection ๋ชจ๋ธ์„ ์†์‰ฝ๊ฒŒ ๊ตฌ์ถ•ํ•˜๊ณ  ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋Š” ๊ฐ•๋ ฅํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ž…๋‹ˆ๋‹ค.