Week 18 ํ•™์Šต ์ •๋ฆฌ

Generative AI


๋Œ€ํ˜• ์–ธ์–ด ๋ชจ๋ธ(LLM)์ด๋ž€?

๋Œ€ํ˜• ์–ธ์–ด ๋ชจ๋ธ(LLM, Large Language Model)์€ ์ˆ˜์‹ญ์–ต ๊ฐœ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ๊ตฌ์„ฑ๋˜์–ด ๋ฐฉ๋Œ€ํ•œ ์‚ฌ์ „ ํ•™์Šต ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ›ˆ๋ จ๋œ ๋ฒ”์šฉ ์–ธ์–ด ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ๋Œ€ํ‘œ์ ์ธ ์˜ˆ๋กœ๋Š” OpenAI์˜ ChatGPT, Meta์˜ LLaMA, MISTRAL AI์˜ Mistral, Upstage์˜ Solar ๋“ฑ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ๊ณผ๊ฑฐ์—๋Š” GPT-1, GPT-2, BERT์™€ ๊ฐ™์€ pretrained language model์ด ๊ฐ ํƒœ์Šคํฌ์— ๋งž๊ฒŒ fine-tuning์„ ๊ฑฐ์ณ ์‚ฌ์šฉ๋˜์—ˆ์œผ๋‚˜, LLM์€ ํ•˜๋‚˜์˜ ๋ชจ๋ธ๋กœ ๋‹ค์–‘ํ•œ ํƒœ์Šคํฌ๋ฅผ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์—์„œ ํฐ ๋ณ€ํ™”๋ฅผ ๊ฐ€์ ธ์™”์Šต๋‹ˆ๋‹ค.


LLM์˜ ๋™์ž‘ ์›๋ฆฌ

LLM์€ Zero-shot learning๊ณผ Few-shot learning์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋™์ž‘ํ•ฉ๋‹ˆ๋‹ค.


Prompt๋ž€?

LLM์—๊ฒŒ ์›ํ•˜๋Š” ์ž‘์—…๊ณผ ์‹ค์ œ ์ž…๋ ฅ๊ฐ’์„ ์ œ๊ณตํ•˜๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ, ๋ชจ๋ธ์˜ ํƒœ์Šคํฌ ์ˆ˜ํ–‰ ๋ฐ ์ถœ๋ ฅ๋ฌธ์„ ์ œ์–ดํ•ฉ๋‹ˆ๋‹ค.
Prompt๋Š” ๋‹ค์Œ ์„ธ ๊ฐ€์ง€ ์š”์†Œ๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค.

์˜ˆ์‹œ
[Instruction] ์œ„ ์˜ˆ์‹œ๋Š” ์˜ํ™” ๋ฆฌ๋ทฐ์— ๋Œ€ํ•œ ๋ถ„์„ ๊ฒฐ๊ณผ์•ผ. ์˜ˆ์‹œ๋ฅผ ๋ณด๊ณ  ์•„๋ž˜ ๋ฆฌ๋ทฐ์˜ ๊ฐ์„ฑ์„ ๋ถ„์„ํ•ด์ค˜.
[Demonstration]
๋ฆฌ๋ทฐ : ๋Ÿฌ๋‹ ํƒ€์ž„ ๋‚ด๋‚ด ์›ƒ์Œ์ด ๋Š์ด์ง€ ์•Š์€ ์˜ํ™”
๊ฐ์„ฑ : ๊ธ์ •
[Input]
๋ฆฌ๋ทฐ : ๋„ˆ๋ฌด ๊ธธ๊ณ  ์ง€๋ฃจํ–ˆ๋‹ค.
๊ฐ์„ฑ :


LLM์˜ ์•„ํ‚คํ…์ฒ˜

๋Œ€๋ถ€๋ถ„์˜ LLM์€ Transformer ๊ตฌ์กฐ๋ฅผ ๋ณ€ํ˜•ํ•œ ๋‘ ๊ฐ€์ง€ ๋ชจ๋ธ ๊ตฌ์กฐ๋ฅผ ์ฑ„ํƒํ•ฉ๋‹ˆ๋‹ค.

1. Encoder-Decoder ๊ตฌ์กฐ

Pasted image 20250311135132.png

2. Decoder-Only ๊ตฌ์กฐ

Pasted image 20250311135142.png


Corpus ๊ตฌ์ถ•

LLM์˜ ์‚ฌ์ „ ํ•™์Šต์„ ์œ„ํ•ด์„œ๋Š” ๋Œ€๊ทœ๋ชจ ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ์ธ Corpus๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
์›์‹œ ๋ฐ์ดํ„ฐ์—๋Š” ์š•์„ค, ํ˜์˜ค ํ‘œํ˜„, ์ค‘๋ณต ๋ฐ์ดํ„ฐ, ๊ฐœ์ธ ์ •๋ณด ๋“ฑ ํ•™์Šต์— ๋ถˆํ•„์š”ํ•œ ๋‚ด์šฉ์ด ํฌํ•จ๋  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ, ์ •์ œ ๊ณผ์ •์„ ํ†ตํ•ด ๊นจ๋—ํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ๊ตฌ์ถ•ํ•ฉ๋‹ˆ๋‹ค.


Instruction Tuning

LLM์˜ ์‘๋‹ต ํ’ˆ์งˆ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด instruction tuning ๊ณผ์ •์„ ๊ฑฐ์นฉ๋‹ˆ๋‹ค. ์ด๋Š” ์‚ฌ์šฉ์ž์˜ ๋‹ค์–‘ํ•œ ์ž…๋ ฅ์— ๋Œ€ํ•ด ์•ˆ์ „ํ•˜๊ณ  ๋„์›€์ด ๋˜๋Š” ๋‹ต๋ณ€์„ ์ƒ์„ฑํ•˜๋„๋ก fine-tuningํ•˜๋Š” ๊ณผ์ •์œผ๋กœ, ์„ธ ๊ฐ€์ง€ ๋‹จ๊ณ„๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค.

  1. Supervised Fine-Tuning (SFT):
    • Prompt์™€ Demonstration์„ ํ™œ์šฉํ•˜์—ฌ ์ง€๋„ ํ•™์Šต์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
  2. Reward Modeling:
    • LLM์ด ์ƒ์„ฑํ•œ ๋‹ต๋ณ€์˜ helpfulness(์งˆ๋ฌธ ์˜๋„์— ๋งž๋Š” ์œ ์šฉ์„ฑ)์™€ safety(์•ˆ์ „์„ฑ)๋ฅผ ํ‰๊ฐ€ํ•˜์—ฌ ์ ์ˆ˜๋ฅผ ์‚ฐ์ถœํ•ฉ๋‹ˆ๋‹ค.
  3. Reinforcement Learning with Human Feedback (RLHF):
    • PRO ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋“ฑ์„ ์‚ฌ์šฉํ•ด ๋†’์€ ์ ์ˆ˜๋ฅผ ๋ฐ›์€ ๋‹ต๋ณ€์„ ๋”์šฑ ๊ฐ•ํ™”ํ•˜๋„๋ก ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ ๊ณผ์ •์„ ํ†ตํ•ด instruction tuning์„ ์ง„ํ–‰ํ•˜๋ฉด, ์‚ฌ์šฉ์ž ์ง€์‹œ ํ˜ธ์‘๋„๊ฐ€ ๋†’์•„์ง€๊ณ  ๊ฑฐ์ง“ ์ •๋ณด ์ƒ์„ฑ(ํ• ๋ฃจ์‹œ๋„ค์ด์…˜) ๋นˆ๋„๊ฐ€ ๊ฐ์†Œํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.


๊ธฐ์กด์˜ ์–ธ์–ด ๋ชจ๋ธ ํ•™์Šต ๋ฐฉ๋ฒ•๋ก ๊ณผ In-Context Learning

๊ณผ๊ฑฐ์—๋Š” target task์— ๋งž์ถฐ ๋ชจ๋ธ ์ „์ฒด ๋˜๋Š” ์ผ๋ถ€ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์—…๋ฐ์ดํŠธํ•˜๋Š” fine-tuning ๋ฐฉ๋ฒ•์ด ์ฃผ๋กœ ์‚ฌ์šฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

๋˜ํ•œ In-Context Learning (ICL) ๋ฐฉ์‹์€ ๋ช‡ ๊ฐ€์ง€ ์˜ˆ์‹œ๋ฅผ prompt๋กœ ์ œ๊ณตํ•จ์œผ๋กœ์จ ์ƒˆ๋กœ์šด ํƒœ์Šคํฌ๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐฉ์‹์ธ๋ฐ, ์ด ๊ฒฝ์šฐ ๋ชจ๋ธ์˜ ๊ฐ€์ค‘์น˜๋Š” ์—…๋ฐ์ดํŠธ๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋‹จ, ICL์€ ๋ฌด์ž‘์œ„ label์—๋„ ๋ชจ๋ธ์ด ์ž˜ ๋ฐ˜์‘ํ•˜๋Š” ์—ฐ๊ตฌ ๊ฒฐ๊ณผ๊ฐ€ ์žˆ์–ด ์‹ ๋ขฐ์„ฑ ์ธก๋ฉด์—์„œ ํ•œ๊ณ„๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.


PEFT (Parameter-Efficient Fine-Tuning)

๋ชจ๋ธ ์ „์ฒด๋ฅผ ์—…๋ฐ์ดํŠธํ•  ๋•Œ ๋ฐœ์ƒํ•˜๋Š” ๋ฌธ์ œ(๊ธฐ์–ต ์ƒ์‹ค, ๋ง‰๋Œ€ํ•œ ์ž์› ์†Œ๋ชจ ๋“ฑ)๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด PEFT ๋ฐฉ๋ฒ•์ด ๊ณ ์•ˆ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. PEFT๋Š” ์ „์ฒด ํŒŒ๋ผ๋ฏธํ„ฐ ์ค‘ ์ผ๋ถ€๋งŒ ์—…๋ฐ์ดํŠธํ•˜๋Š” ์ ‘๊ทผ ๋ฐฉ์‹์œผ๋กœ, ๋Œ€ํ‘œ์ ์ธ 4๊ฐ€์ง€ ๋ฐฉ๋ฒ•์ด ์žˆ์Šต๋‹ˆ๋‹ค.

1. Adapter Tuning

Pasted image 20250311135204.png

Adapter Tuning ๊ตฌํ˜„ ์˜ˆ์‹œ:

def transformer_block_with_adapter(x):     
	residual = x     
	x = SelfAttention(x)     
	x = FFN(x)     # Adapter     
	x = LN(x + residual)     
	residual = x     
	x = FFN(x)     # transformer FFN     
	x = FFN(x)     # Adapter     
	x = LN(x + residual)     
	return x

2. Prefix Tuning

Pasted image 20250311135329.png

Prefix Tuning ๊ตฌํ˜„ ์˜ˆ์‹œ:

def transformer_block_for_prefix_tuning(x):     
	soft_prompt = FFN(soft_prompt)     
	x = concat([soft_prompt, x], dim=seq)     
	return transformer_block(x)

3. Prompt Tuning

Prompt Tuning ๊ตฌํ˜„ ์˜ˆ์‹œ:

def soft_prompted_model(input_ids):     
	x = Embed(input_ids)     
	x = concat([soft_prompt, x], dim=seq)     
	return model(x)

4. Low-Rank Adaptation (LoRA)

Pasted image 20250311135401.png

LoRA ๊ตฌํ˜„ ์˜ˆ์‹œ:

def lora_linear(x):     
	h = x @ W     # regular linear     
	h += x @ W_A @ W_B     # low-rank update     
	return scale * h

LLM์˜ ํ‰๊ฐ€ ๋ฐฉ๋ฒ•

LLM์ด ๋‹ค์–‘ํ•œ ํƒœ์Šคํฌ๋ฅผ ์–ผ๋งˆ๋‚˜ ์ž˜ ์ˆ˜ํ–‰ํ•˜๋Š”์ง€๋ฅผ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•œ ์—ฌ๋Ÿฌ ๋ฐ์ดํ„ฐ์…‹๊ณผ ํ‰๊ฐ€ ๋ฐฉ๋ฒ•์ด ์žˆ์Šต๋‹ˆ๋‹ค.

์ฃผ์š” ํ‰๊ฐ€ ๋ฐ์ดํ„ฐ์…‹

ํ‰๊ฐ€ ํ”„๋ ˆ์ž„์›Œํฌ


๊ฒฐ๋ก 

๋Œ€ํ˜• ์–ธ์–ด ๋ชจ๋ธ(LLM)์€ ๋ฐฉ๋Œ€ํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ์™€ ๋‹ค์–‘ํ•œ ํ•™์Šต ๊ธฐ๋ฒ•์„ ํ†ตํ•ด ํ•˜๋‚˜์˜ ๋ชจ๋ธ๋กœ ์—ฌ๋Ÿฌ ํƒœ์Šคํฌ๋ฅผ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ํ˜์‹ ์ ์ธ ๊ธฐ์ˆ ์ž…๋‹ˆ๋‹ค.


GAN๊ณผ ๋‹ค์–‘ํ•œ ์ด๋ฏธ์ง€ ์ƒ์„ฑ ๋ชจ๋ธ ๊ธฐ์ˆ  ์ •๋ฆฌ

์ด ํฌ์ŠคํŠธ์—์„œ๋Š” ๊ธฐ๋ณธ์ ์ธ GAN(Generative Adversarial Networks)๋ถ€ํ„ฐ ์ตœ์‹  ์ƒ์„ฑ ๋ชจ๋ธ์— ์ด๋ฅด๊ธฐ๊นŒ์ง€, ๋‹ค์–‘ํ•œ ์ด๋ฏธ์ง€ ์ƒ์„ฑ ๊ธฐ๋ฒ•๊ณผ ๊ทธ ํ•™์Šต ๋ฐฉ๋ฒ•, ๊ทธ๋ฆฌ๊ณ  ํ‰๊ฐ€ ์ง€ํ‘œ์— ๋Œ€ํ•ด ์‚ดํŽด๋ด…๋‹ˆ๋‹ค. ๊ฐ ๋ชจ๋ธ์˜ ํŠน์ง•๊ณผ ์†์‹ค ํ•จ์ˆ˜, ๊ทธ๋ฆฌ๊ณ  paired/unpaired ๋ฐ์ดํ„ฐ์˜ ๊ฐœ๋…๊นŒ์ง€ ์ž์„ธํžˆ ์ •๋ฆฌํ•ด ๋ณด์•˜์Šต๋‹ˆ๋‹ค.


3 1. GAN (Generative Adversarial Networks)

Pasted image 20250311135856.png


2. ์กฐ๊ฑด๋ถ€ GAN (cGAN)


3. Pix2Pix


4. CycleGAN

Pasted image 20250311135930.png


5. Paired Image vs. Unpaired Image


6. StarGAN

Pasted image 20250311140259.png
Pasted image 20250311140630.png


7. ProgressiveGAN


8. StyleGAN

Pasted image 20250311140642.png


9. AutoEncoder ๊ณ„์—ด ๋ชจ๋ธ

9-1. AutoEncoder

Pasted image 20250311140654.png

9-2. Variational AutoEncoder (VAE)

9-3. VQ-VAE (Vector Quantized-Variational AutoEncoder)


10. Diffusion Model ๊ณ„์—ด

10-1. DDPM (Denoising Diffusion Probabilistic Models)

Pasted image 20250311140856.png

10-2. DDIM (Denoising Diffusion Implicit Models)

10-3. Classifier Guidance & Classifier-free Guidance

Pasted image 20250311140925.png


11. Latent Diffusion Model (LDM)

Pasted image 20250311141022.png


12. Stable Diffusion

Pasted image 20250311141052.png

Stable Diffusion์˜ ํ•™์Šต ๊ณผ์ •

Pasted image 20250311141100.png

  1. ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ์ธ์ฝ”๋”ฉ:
    • ์ด๋ฏธ์ง€์™€ ํ…์ŠคํŠธ๋ฅผ ๊ฐ๊ฐ์˜ encoder๋ฅผ ํ†ตํ•ด latent space๋กœ ๋ณ€ํ™˜
  2. ๋…ธ์ด์ฆˆ ์ถ”๊ฐ€:
    • image latent์— noise scheduler๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ randomํ•œ timestep๋งŒํผ ๋…ธ์ด์ฆˆ ์ถ”๊ฐ€
  3. U-Net ํ•™์Šต:
    • noisy latent, token embedding, time step์„ ์ž…๋ ฅ๋ฐ›์•„ U-Net์ด ๋…ธ์ด์ฆˆ๋ฅผ ์˜ˆ์ธก
    • ์˜ˆ์ธก๋œ ๋…ธ์ด์ฆˆ์™€ ์‹ค์ œ ๋…ธ์ด์ฆˆ ๊ฐ„์˜ ์ฐจ์ด๋ฅผ MSE loss๋กœ ๊ณ„์‚ฐํ•˜์—ฌ ํ•™์Šต

Stable Diffusion์˜ Inference

Stable Diffusion 2 ๋ฐ XL


13. ์ด๋ฏธ์ง€ ์ƒ์„ฑ ๋ชจ๋ธ ํ‰๊ฐ€ ์ง€ํ‘œ

13-1. Inception Score

13-2. FID (Frechet Inception Distance) Score

13-3. CLIP Score


๊ฒฐ๋ก 

๋ณธ ํฌ์ŠคํŠธ์—์„œ๋Š” GAN์˜ ๊ธฐ๋ณธ ๊ฐœ๋…๋ถ€ํ„ฐ ์‹œ์ž‘ํ•˜์—ฌ, ์กฐ๊ฑด๋ถ€ GAN, Pix2Pix, CycleGAN, StarGAN, ProgressiveGAN, StyleGAN๊ณผ ๊ฐ™์€ ๋‹ค์–‘ํ•œ ์ƒ์„ฑ ๋ชจ๋ธ๊ณผ AutoEncoder ๊ณ„์—ด, Diffusion Model, ๊ทธ๋ฆฌ๊ณ  ์ตœ์‹  Stable Diffusion๊นŒ์ง€ ํญ๋„“๊ฒŒ ๋‹ค๋ฃจ์—ˆ์Šต๋‹ˆ๋‹ค.
๋˜ํ•œ, ๊ฐ ๋ชจ๋ธ์˜ ํ•™์Šต ์ „๋žต(์˜ˆ: cycle consistency, classifier guidance, latent diffusion)๊ณผ ํ‰๊ฐ€ ์ง€ํ‘œ(Inception, FID, CLIP score)์— ๋Œ€ํ•ด์„œ๋„ ์ •๋ฆฌํ•˜์—ฌ, ์ด๋ฏธ์ง€ ์ƒ์„ฑ ๋ชจ๋ธ์˜ ์ „์ฒด์ ์ธ ํ๋ฆ„๊ณผ ์ตœ์‹  ๊ธฐ์ˆ  ๋™ํ–ฅ์„ ํ•œ๋ˆˆ์— ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜์˜€์Šต๋‹ˆ๋‹ค.