Eldor’s AI Lab
前往频道在 Telegram
🚀 Eldor’s AI Lab – Sun’iy intellektni chuqur va amaliy o‘rganish! 🔹 AI va ML nazariyasi 🔹 Kod va amaliy mashg‘ulotlar 🔹 Dasturlash bo‘yicha maslahatlar 🔹 Ilmiy maqolalar va eng so‘nggi yangiliklar 💡 AIni o‘rganishni istaysizmi? Let's go!
显示更多未指定国家未指定类别
376
订阅者
无数据24 小时
+17 天
+330 天
吸引订阅者
六月 '26
六月 '26
+3
在0个频道中
五月 '26
+13
在0个频道中
Get PRO
四月 '26
+19
在0个频道中
Get PRO
三月 '26
+26
在0个频道中
Get PRO
二月 '26
+12
在0个频道中
Get PRO
一月 '26
+21
在0个频道中
Get PRO
十二月 '25
+21
在0个频道中
Get PRO
十一月 '25
+19
在0个频道中
Get PRO
十月 '25
+149
在2个频道中
Get PRO
九月 '250
在1个频道中
Get PRO
八月 '25
+13
在1个频道中
Get PRO
七月 '250
在1个频道中
Get PRO
六月 '25
+165
在1个频道中
Get PRO
五月 '250
在0个频道中
Get PRO
四月 '250
在4个频道中
Get PRO
三月 '250
在0个频道中
Get PRO
二月 '25
+2
在4个频道中
| 日期 | 订阅者增长 | 提及 | 频道 | |
| 14 六月 | 0 | |||
| 13 六月 | 0 | |||
| 12 六月 | +2 | |||
| 11 六月 | 0 | |||
| 10 六月 | 0 | |||
| 09 六月 | 0 | |||
| 08 六月 | 0 | |||
| 07 六月 | 0 | |||
| 06 六月 | 0 | |||
| 05 六月 | 0 | |||
| 04 六月 | 0 | |||
| 03 六月 | 0 | |||
| 02 六月 | +1 | |||
| 01 六月 | 0 |
频道帖子
📌 8.2-dars: Activation Functions — Neyron tarmoqning "qarori"
🎯 Deep Learning Mathematics — @EldorML
Savol: 100 ta qatlam qo'shsam, model kuchliroq bo'ladimi?
Javob: Activation bo'lmasa — YO'Q. Sababini ko'ramiz.
🔹 Asosiy mantiq
Faqat W·x + b ishlatib 2 qatlam qursak:
• z1 = W1·x + b1
• y = W2·z1 + b2
= (W2·W1)·x + (W2·b1 + b2)
= W_yangi·x + b_yangi
💥 100 ta qatlam birlashib — bitta chiziqqa aylanadi! Chuqurlik kuch bermaydi.
Yechim — qatlamlar orasiga nochiziqli funksiya qo’shish: h = f(z1) ← activation!
Endi qatlamlar birlashmaydi. Model egri chiziq, XOR, rasm, matnni o'rgana oladi.
🔹 1. Sigmoid (1990) — birinchi mashhur
σ(x) = 1 / (1 + e^(-x)) → chiqish (0, 1)
✅ Ehtimollik sifatida o'qiladi
❌ Vanishing gradient: max hosila = 0.25
10 qatlam: 0.25^10 ≈ 0.0000009
💀 Birinchi qatlamga gradient yetmaydi!
Shu sabab 1990-yillarda chuqur tarmoqlar ishlamasdi.
🔹 2. Tanh — yaxshilangan Sigmoid
tanh(x) → chiqish (-1, 1), nol atrofida markazlangan
Sigmoiddan yaxshiroq, lekin vanishing gradient muammosi qoldi.
💡 RNN/LSTM ichida bugungacha ishlatiladi.
🔹 3. Softmax — ko'p sinf uchun
Sigmoid 2 sinf uchun. 10 sinf (0-9) uchun — Softmax:
Softmax(xᵢ) = e^(xᵢ) / Σ e^(xⱼ)
Logitlar → ehtimollar, yig'indi = 1.00
💡 Faqat oxirgi qatlamda ishlatiladi.
🔹 4. ReLU (2012) — INQILOB
ReLU(x) = max(0, x)
2012-yil AlexNet ImageNet'da g'olib. Siri — ReLU.
✅ Hosila = 1 (musbat tomonda) → vanishing gradient ancha yaxshi
✅ Juda tez (faqat if x > 0)
✅ Sparsity — neyronlarning yarmi "uyqu rejimida"
❌ Dying ReLU: katta manfiy bias → neyron har doim 0 → gradient 0 → o'lik ☠️
Yechim — Leaky ReLU:
x ≤ 0 → 0.01·x (kichik gradient, neyron o'lmaydi)
🔹 5. GELU (2018) — Transformer davri
ReLU qattiq qaror beradi: x ≤ 0 → 0.
GELU yumshoq, ehtimol asosida:
GELU(x) = x · Φ(x)
x = -2: ReLU → 0, GELU → -0.046
x = 2: ReLU → 2, GELU → 1.95
🔥 BERT, GPT-2, GPT-3, ViT — hammasi GELU.
🔹 6. Swish/SiLU (2017) va Mish (2019)
SiLU(x) = x · σ(x)
Mish(x) = x · tanh(ln(1 + e^x))
GELUga juda o'xshash. Farqi kichik koeffitsient.
SiLU → EfficientNet, MobileNetV3, YOLOv5/v8, Stable Diffusion
Mish → YOLOv4
💡 GELU vs Swish vs Mish — farqi juda kichik, kontekstga bog'liq.
🎯 Qaysi vazifada qaysi?
CNN (rasm) → ReLU yoki SiLU
Transformer (BERT, GPT, ViT) → GELU
Mobile / Diffusion → SiLU
YOLO → SiLU
RNN/LSTM → Tanh
Binary (oxirgi qatlam) → Sigmoid
Multi-class (oxirgi qatlam) → Softmax
💡 Qoida: ReLU bilan boshlang, keyin GELU/SiLU sinab ko'ring.
⚠️ Muhim: Hech qaysi activation "muammosiz" emas. Har biri ayrim kamchiliklarni yumshatadi, lekin o'z narxi bilan (sekinroq hisoblash, ko'proq xotira).
🤝 YouTube: 🎥 Havola
🖥️ Colab: 📂 Havola
📘 Barcha darslar: Havola
🚨 Videolar jonli yozilgan. Matematik izohlarda xatolar bo'lishi mumkin. Oldindan uzr so'rayman 🙏
@EldorML
| 2 | 视频消息 | 178 |
| 3 | 📌 8.1-dars: Forward va Backward Pass — Neyron tarmoq qanday "o'ylaydi" va "o'rganadi"
🎯 Deep Learning Mathematics — @EldorML
Savol: CNN, ViT, Diffusion, GNN, Transformer — nima ularni bog'laydi?
Javob: Forward + Backward Pass. Hammasining yuragi shu.
🔹 Asosiy mantiq
Bola olma va apelsinni o'rganadi:
- Forward: mevani ko'radi → "olma" deydi
- Backward: ona "yo'q, apelsin" → bola xatoni tushunadi
Neyron tarmoq aynan shu. "Bola" o'rniga — weights. "Ona javobi" o'rniga — loss.
🔹 1. Forward Pass — bashorat
2 qatlamli tarmoq, x = [1, 2], target = 5:
z1 = W1·x + b1 → [0.2, 1.9, 1.3]
h = ReLU(z1) → [0.2, 1.9, 1.3]
y = W2·h + b2 → 0.5
L = (y - 5)² → 20.25
Model 0.5 dedi, javob 5 edi. Xato = 20.25 💥
🔹 2. Computation Graph
Har operatsiya grafga yoziladi:
x → [W1·x+b1] → [ReLU] → [W2·h+b2] → y → L
Backward passda shu grafdan teskari yo'l yuriladi.
💡 PyTorch, TensorFlow — barchasi shu prinsipda. Siz forward yozasiz, framework backwardni avtomatik hisoblaydi (autograd).
🔹 3. Backward Pass — Chain Rule
Savol: "W1 ni biroz o'zgartirsam, loss qanchaga o'zgaradi?"
dL/dW1 = dL/dy · dy/dh · dh/dz1 · dz1/dW1
Qatlamma-qatlam orqaga:
dL/dy = 2(y-5) = -9
dL/dh = -9 · W2 = [-3.6, -2.7, 4.5]
dL/dz1 = dL/dh · 1 = [-3.6, -2.7, 4.5] (ReLU musbat)
dL/dW1 = dL/dz1 · xᵀ → 3×2 matritsa
🔹 4. Gradient Descent — yangilanish
W_yangi = W_eski - η · dL/dW
η = 0.01 bilan:
W1 = [[0.5, -0.2], → [[0.536, -0.128],
[0.3, 0.8], [0.327, 0.854],
[-0.1, 0.6]] [-0.145, 0.510]]
Parametrlar xato kamayadigan tomonga siljidi 📉
🔹 5. To'liq oqim
Forward → Loss → Backward → Yangilash
↓
1000 marta takrorlash
↓
Model tayyor ✅
🎯 Xulosa
- Forward — bashorat (kirish → chiqish)
- Loss — xatoni o'lchash
- Backward — chain rule bo'yicha gradientlar
- Gradient Descent — parametrlarni yangilash
- Autograd — PyTorch buni avtomatik qiladi
💡 CNN, ViT, Diffusion, GNN, Transformer — hammasi shu mexanizmda o'rganadi. Faqat ichidagi operatsiyalar farq qiladi. GPT-4 da ham, sizning 2 qatlamli tarmog'ingizda ham — bir xil prinsip!
🤝 YouTube: 🎥 Havola
🖥️ Colab: 📂 Havola
📘 Barcha darslar: Havola
🚨 Videolar jonli yozilgan. Matematik izohlarda xatolar bo'lishi mumkin. Oldindan uzr so'rayman 🙏
@EldorML | 316 |
| 4 | 📌 7.5-dars: Efficient Attention — Transformerning O(n²) muammosi
🎯 Deep Learning Mathematics — @EldorML
Savol: ChatGPT, Claude, Llama qanday qilib 1M tokenli kontekstni qo'llab-quvvatlaydi?
Javob: Efficient Attention variantlari.
🔹 Muammo: O(n²)
n = 512 → 262 ming
n = 8192 → 67 million
n = 100K → 10 milliard 💥
QK^T — n×n matritsa. n oshganda portlaydi.
🔹 1. Sparse Attention — kam juftlik
Token hammaga qarashi shart emas.
- Sliding Window — yaqin w ta tokenga
- Longformer — lokal + global tokenlar (65K)
- BigBird — window + global + random (100K)
Murakkablik: O(n·w) — chiziqli
🔹 2. Linear Attention — matematik usul
Usul: (QK^T)V = Q(K^T V)
K^T V → d × d matritsa (kichik!)
Murakkablik: O(n · d²)
Softmax muammosi → kernel usuli (Performer):
softmax(q·k) ≈ phi(q)·phi(k)
n = 100K da: standart 10 milliard → Performer 26 million
Tezlash: 380x 🚀
🔹 3. FlashAttention — GPU darajasidagi
O(n²) ni o'zgartirmaydi, lekin 5-10x tezroq!
Siri: GPU xotirasi 2 xil
HBM (40 GB, sekin)
SRAM (20 MB, 100x tez)
Standart: hammasi HBM orqali (sekin)
Flash: bloklarda SRAMda → HBMga faqat natija
Natija: xotira 10-20x kam, 2-4x tezroq
🔹 4. Qo'shimcha usullar
- Gradient Checkpointing — xotira 4x kam (+30% vaqt)
- Mixed Precision (BF16) — 2x kam, 2x tez
- GQA — Llama, GPT-4 da ishlatiladi
🎯 Xulosa
- O(n²) — uzun matn uchun fizik to'siq
- Sparse → Longformer/BigBird (kam juftlik)
- Linear → Performer (matematik qayta yozish)
- FlashAttention → 5-10x bepul tezlash
- GQA + BF16 + Checkpointing → barcha LLM'da
💡 GPT-4, Claude, Llama 3 — bir nechta tekniklarni birga ishlatadi: GQA + FlashAttention + BF16 + KV-cache. Endi 128K, 1M tokenli kontekst qanday ishlashini tushunasiz!
🤝 YouTube: 🎥 Havola
🖥️ Colab: 📂 Havola
📘 Barcha darslar: Havola
🚨 Videolar jonli yozilgan. Matematik izohlarda xatolar bo'lishi mumkin. Oldindan uzr so'rayman 🙏
@EldorML | 286 |
| 5 | 📌 7.4-dars: Graph Neural Networks (GNN) — Graf shaklidagi ma'lumotlar
🎯 Deep Learning Mathematics — @EldorML
Oldingi darsda Diffusion Models va shovqin (noise)dan rasm yaratish haqida gaplashdik.
Endi savol:
❓ Agar ma'lumot rasm ham, matn ham emas, balki graf bo'lsa-chi?
❓ Facebook "Siz tanishingiz mumkin", Google Maps trafik, AlphaFold — qanday ishlaydi?
Javob: barchasi Graph Neural Networks asosida.
🔹 1. Asosiy savol
CNN — rasmlar uchun (regular grid)
Transformer — matn uchun (sequence)
GNN — graflar uchun (irregular structure)
Misollar:
• Ijtimoiy tarmoq: odamlar (tugun) + do'stlik (qirra)
• Molekula: atomlar + bog'lanishlar
• Yo'l xaritasi: shaharlar + yo'llar
• Tavsiya: foydalanuvchi-mahsulot
"GNN — bu CNNning umumlashtirilgan versiyasi: 'qo'shni piksellar' o'rniga 'qo'shni tugunlar' bilan ishlaydi."
🔹 2. Adjacency Matrix — grafni raqamlarda
Kim kim bilan bog'langanini matritsa orqali ifodalaymiz:
Ali Vali Soli Rustam
Ali [ 0 1 1 0 ]
Vali [ 1 0 0 1 ]
Soli [ 1 0 0 1 ]
Rustam [ 0 1 1 0 ]
🟢 Diagonal nol — tugun o'ziga bog'lanmagan
🟢 Simmetrik — yo'naltirilmagan grafda
Self-loop qo'shamiz:
A_tilde = A + I
Sababi: tugun aggregate paytida o'z xususiyatini ham saqlashi kerak.
🔹 3. Message Passing — GNNning yuragi
Uch qadam:
1) MESSAGE — har tugun qo'shnilariga "xabar" yuboradi
2) AGGREGATE — har tugun olgan xabarlarni birlashtiradi (sum/mean/max)
3) UPDATE — neyron tarmoq orqali yangi xususiyat hisoblanadi
Hayotiy o'xshatish — gap-tarqalish:
Boshida: faqat Ali biladi
1 qadam: Ali → Vali, Soli ham biladi
2 qadam: Vali, Soli → Rustam ham biladi
💡 Eng muhim xulosa: K marta message passing = har tugun K-uzoqlikdagi qo'shnilardan ma'lumot oladi degani.
🔹 4. GCN formulasi
H^(k+1) = sigma( A_hat · H^(k) · W^(k) )
Bu yerda:
A_hat = D^(-1/2) · A_tilde · D^(-1/2)
Qadamma-qadam:
• A_tilde · H — qo'shnilar yig'indisi (avtomatik aggregate)
• H · W — linear transform (CNNdagi filter o'xshashi)
• D^(-1/2) bilan ko'paytma — normalizatsiya
• sigma — ReLU yoki SiLU
🔹 5. Normalizatsiya nima uchun?
Muammo: ba'zi tugunlarda 1000+ qo'shni (mashhur odam), ba'zilarida 5 ta.
Sodda yig'indida:
Mashhur odam → katta qiymat
Oddiy odam → kichik qiymat
Bu adolatsiz — mashhur tugunlar dominantlik qiladi.
Yechim: degree bilan bo'lish:
h_i_new = sum( h_j / sqrt(d_i · d_j) )
Endi har kimning ma'lumoti bir xil masshtabda.
🔹 6. K qatlam = K-uzoqlik
1 qatlam → bevosita qo'shnilar
2 qatlam → qo'shnining qo'shnisi
K qatlam → K-uzoqlik
⚠️ Lekin 5+ qatlam — over-smoothing muammosi:
barcha tugunlar bir xil bo'lib qoladi.
Boshida: 10 qatlamdan keyin:
Ali = [1, 0] Ali = [0.4, 0.4]
Vali = [0, 1] Vali = [0.4, 0.4]
Soli = [1, 1] Soli = [0.4, 0.4]
→ HAMMASI BIR XIL!
Optimal: 2-3 qatlam.
🔹 7. GNN vazifa turlari
Node-level — har tugun uchun bashorat
Misol: spam akkauntmi? qaysi guruh?
Edge-level — qirra bo'ladimi?
Misol: do'st tavsiyasi (link prediction)
Graph-level — butun graf uchun
Misol: molekula zaharlimi?
🎯 Yakuniy xulosa
• Graf = tugunlar + qirralar (adjacency matrix bilan ifoda)
• Message passing: message → aggregate → update
• GCN formula: H' = sigma(A_hat · H · W) — qo'shnilar yig'indisi + linear + ReLU
• Normalizatsiya: degree bilan bo'lish (mashhur tugunlar dominatsiya qilmasin)
• 2-3 qatlam optimal, 5+ qatlam over-smoothing keltiradi
• GNN istalgan o'lchamdagi grafda ishlaydi (permutation invariant)
💡 AlphaFold (protein), Google Maps (trafik), Pinterest (tavsiya), Facebook ("siz tanishingiz mumkin") — barchasi GNN asosida. Biz har kuni GNN dan foydalanamiz, lekin uni ko'rmaymiz.
🤝 YouTube dars: 🎥 Havola
🖥️ Colab notebook: 📂 Havola
📘 Barcha darslar: Havola
🚨 Videolar jonli yozilgan. Matematik izohlarda xatolar bo'lishi mumkin. Oldindan uzr so'rayman 🙏
@EldorML | 267 |
| 6 | Guruhdagi hamma kurslarga havola:
Python kurs: https://medium.com/@mr.eldorabdukhamidov/intensiv-python-kursi-8aac613fca5c
AI agent kurs: https://medium.com/@mr.eldorabdukhamidov/ai-agentlar-qurish-bepul-onlayn-kurs-e1ad0a2246b9
ML kurs: https://medium.com/@mr.eldorabdukhamidov/machine-learning-ml-to-liq-kurs-tarkibi-79c0c5c35da2
DL Math kurs: https://medium.com/@mr.eldorabdukhamidov/deep-learning-matematikasi-intensiv-kurs-rejasi-3a04e0f12453 | 0 |
| 7 | Agar biror taklif yoki istaklaringiz bo’lsa, izohlarda yozib qoldiring. Darslarni shunga qarab moslashga harakat qilaman! | 0 |
| 8 | Assalom alaykum do’stlar. Video darslar sizlarga tushunarli va foydali bo’lyaptimi? | 0 |
| 9 | 📌 7.3-dars: Diffusion Models — Noisedan(Shovqin) rasm yaratish
🎯 Deep Learning Mathematics — @EldorML
Oldingi darsda ViT va patch embedding haqida gaplashdik.
Endi savol:
❓ Sof noisedan(shovqin) haqiqiy rasm yaratish mumkinmi?
❓ Stable Diffusion va DALL-E qanday ishlaydi?
Javob: Ha — buning siri "diffuziya" jarayonida.
🔹 1. Asosiy g'oya
GAN: rasmni "ixtiro qiladi"
VAE: rasmni siqib qayta tiklaydi
Diffusion: shovqinni olib tashlab rasm "quradi"
"Agar biz rasmni buzishni o'rgansak, uni tiklashni ham o'rganishimiz mumkin."
🔹 2. Forward Process — Shovqin qo'shish
Rasmga T = 1000 qadamda asta-sekin Gaussian shovqin qo'shamiz:
x_0 → x_1 → x_2 → ... → x_T
rasm ozgina ko'p sof
shovqin shovqin shovqin
Reparameterization formulasi:
x_t = √ᾱ_t · x_0 + √(1-ᾱ_t) · ε
Bu yerda:
- ᾱ_t — α larning ko'paytmasi (t qadamgacha)
- ε ~ N(0, I) — sof Gaussian shovqin
🟢 Forward process O'RGATILMAYDI (TRAIN) — bu matematik formula.
🔹 3. Reverse Process — Rasmni tiklash
Sof shovqindan boshlab, har qadamda ozgina shovqin olib tashlaymiz:
x_T → x_{T-1} → ... → x_1 → x_0
shovqin toza rasm
Muammo: aniq formula yo'q (posterior hisoblash imkonsiz)
Yechim: neyron tarmoq (U-Net) shovqinni bashorat qiladi
🔹 4. Score Matching — chuqur g'oya
Score funksiyasi = log p(x) gradienti
Bu — "haqiqiy rasmni ko’rsatadigan kompas"
DDPMda (Diffusion Model) isbotlangan:
score = -ε / √(1-ᾱ_t)
Ya'ni shovqinni bashorat qilish == scoreni hisoblash
Ikkisi MATEMATIK EKVIVALENT!
🔹 5. DDPM Loss — sodda MSE
Murakkab variational lower bound (VLB) qisqartirildi:
L = || ε - ε_θ(x_t, t) ||²
Bu — oddiy MSE. Hammasi shu!
Training algoritmi:
1. Datasetdan rasm olish: x_0
2. Tasodifiy qadam: t ~ Uniform(1, T)
3. Tasodifiy shovqin: ε ~ N(0, I)
4. x_t hisoblash (formula yuqorida)
5. Loss = ||ε - ε_θ(x_t, t)||²
6. Gradient descent
🔹 6. U-Net — Shovqin bashorat qiluvchi tarmoq
Kirish: shovqinli rasm + qadam raqami (t)
Chiqish: bashorat qilingan shovqin
Encoder (siqish)
x_t → [64] → [128] → [256] → [512]
↓
Bottleneck
↓
Decoder (kengaytirish)
[512] → [256] → [128] → [64] → ε_pred
Skip connections: har qatlamda — mayda detallar yo'qolmaydi.
Time embedding sinusoidal — model qaysi qadamda ekanligini biladi.
🔹 7. Sampling — sekin lekin sifatli
Trening: 1 ta forward pass
Sampling: 1000 ta forward pass
Diffusion GANdan 1000 marta sekinroq, lekin sifati ancha yuqori.
Yangi metodlar (DDIM) bu sonni 20-50 ga tushiradi.
🎯 Yakuniy xulosa
- Forward process → matematik formula, o'rgatilmaydi
- Reverse process → U-Net o'rganadi
- DDPM loss → oddiy MSE
- Score matching = shovqin bashorati (matematik ekvivalent)
- U-Net + skip connections → mayda detallar saqlanadi
- Time embedding → bir model 1000 ta vazifani bajaradi
💡 Stable Diffusion, DALL-E 2, Midjourney, Imagen — barchasi DDPM asosida!
🤝 YouTube dars: 🎥 Havola
🖥️ Colab notebook: 📂 Havola
📘 Barcha darslar: Havola
🚨 Videolar jonli yozilgan. Matematik izohlarda xatolar bo'lishi mumkin. Oldindan uzr so’rayman🙏
@EldorML | 0 |
| 10 | 📌 7.2-dars: Vision Transformers (ViT) — Rasmlarni tokenga aylantirish
🎯 Deep Learning Mathematics — @EldorML
Oldingi darsda ResNet va skip connectionlar haqida gaplashdik.
Endi savol:
❓ Transformer faqat matn uchunmi?
❓ Rasmni ham Transformerga berish mumkinmi?
Javob: Ha — lekin avval rasmni "so'zlarga" aylantirish kerak.
🔹 1. Muammo — Rasmni token qilish
Har bir pikselni token deb olsak:
224×224 = 50176 token
Attention hisoblash O(n²) → 50176² ≈ 2.5 milliard operatsiya.
Bu amalda mumkin emas.
🔹 2. Yechim — Patch Embedding
Rasmni P×P patchlarga bo'lamiz:
Patch hajmi: 16×16
Patch soni: 224×224 / 16×16 = 196 ta
50176 piksel → faqat 196 token! ✅
Har patch:
1. Yassilanadi: 16×16×3 = 768 element
2. Linear proyeksiya: 768 → D o'lchamli embedding
3. Position embedding qo'shiladi
🔹 3. CLS Token
Transformerga kirishda [CLS] token qo'shiladi.
• Hech qaysi patchga tegishli emas
• Barcha patchlar bilan attention orqali muloqot qiladi
• Oxirida butun rasmning "xulosa" representatsiyasi
• Klassifikatsiya uchun faqat [CLS] ishlatiladi
🔹 4. Position Embedding nima uchun kerak?Z
Transformer tartibsiz (permutation invariant):
[p1][p2][p3] va [p5][p1][p99] — bir xil ko'rinadi!
Position embedding har tokenga "men i-chi o'rindaman" degan ma'lumot qo'shadi.
ViTda o'rganiluvchi position embedding ishlatiladi.
🔹 5. Inductive Bias — CNN vs ViT
Inductive bias — arxitekturaning ma'lumot haqidagi avvalgi taxminlari.
CNNning taxminlari:
• Locality → faqat qo'shni piksellar bilan ishlaydi
• Translation equivariance → bir xil filter hamma joyda ishlaydi
ViTning taxminlari:
• Locality YO'Q → har patch barcha patchlarni ko'radi
• Translation equivariance YO'Q → position embedding o'rganiladi
• Global receptive field → darhol mavjud ✅
Taqqoslash:
CNN:
Locality ✅ (tayyor)
Translation eq. ✅ (tayyor)
Global context ❌ (sekin)
Kam data ✅ yaxshi
Ko'p data ✅ yaxshi
ViT:
Locality ❌ (o'rganiladi)
Translation eq. ❌ (o'rganiladi)
Global context ✅ (darhol)
Kam data ❌ ko'p data kerak
Ko'p data ✅✅ CNNdan yaxshi
Amalda:
• Kam data (< 1M) → CNN afzal
• Ko'p data (> 10M) → ViT afzal
🔹 6. To'liq ViT Pipeline
Kirish rasm (224×224×3)
↓
Patch bo'lish → 196 ta 16×16×3
↓
Flatten + Linear → 196×768
↓
CLS token → 197×768
↓
Position embedding → 197×768
↓
Transformer Encoder × 12
↓
CLS token → 768
↓
MLP Head → 1000 klass
🎯 Yakuniy xulosa
• Patch embedding → rasm tokenlar ketma-ketligiga aylanadi
• CLS token → butun rasmning xulosa representatsiyasi
• Position embedding → har patchning joylashuvini bildiradi
• CNN → inductive bias bor, kam data uchun yaxshi
• ViT → global attention, ko'p data uchun yaxshi
💡 DINOv2, SAM, Stable Diffusion — barchasi ViT asosida!
🤝 YouTube dars: 🎥 Havola
🖥️ Colab notebook: 📂 Havola
📘 Barcha darslar: Havola
🚨 Videolar jonli yozilgan. Matematik izohlarda xatolar bo'lishi mumkin. Oldindan uzr 🙏
@EldorML | 0 |
| 11 | 📌 7.1-dars: ResNet va Skip Connections — Chuqur tarmoqlar muammosiga yechim
🎯 Deep Learning Mathematics — @EldorML
Oldingi darsda Batch Normalization haqida gaplashdik.
Endi savol:
❓ Nega 56 qatlamli tarmoq 20 qatlamlilikdan yomon ishlaydi?
❓ Nega chuqur tarmoq har doim yaxshiroq emas?
Javob: Degradation muammosi — vanishing gradient.
🔹 1. Muammo — Vanishing Gradient
Backpropagationda gradient zanjir qoidasi orqali hisoblanadi:
∂L/∂w₁ = ∂L/∂hₙ · ∂hₙ/∂hₙ₋₁ · ... · ∂h₁/∂w₁
Har qatlam gradientni oldingi gradientga ko'paytiradi.
Agar har qatlamda gradient < 1 bo'lsa:
0.9¹⁰ = 0.35
0.9⁵⁰ = 0.005
0.9¹⁰⁰ ≈ 0.00003 ← deyarli nol!
Natijada:
• Birinchi qatlamlar deyarli o'qimaydi
• Chuqur tarmoq sayoz tarmoqdan yomon ishlaydi
🔹 2. Residual Learning — F(x) + x
Oddiy qatlam:
h(x) = F(x) ← to'liq mapping o'rganadi
ResNet qatlam:
h(x) = F(x) + x ← faqat "qoldiq" (residual) o'rganadi
Nima uchun bu oson?
• Oddiy tarmoqda: h(x) = x ni o'rganish → qiyin
• ResNetda: F(x) = 0 ni o'rganish → oson!
Oddiy:
x → [Conv→BN→ReLU] → F(x)
ResNet:
x ─────────┐
x → [F qatlam] → (+) → ReLU
🔹 3. Identity Mapping Matematikasi
Bir blok:
y = F(x, {Wᵢ}) + x
Ko'p blok uchun:
x_L = x_l + Σ F(xᵢ) (l dan L gacha)
Ya'ni istalgan chuqur qatlam — istalgan sayoz qatlamning to'g'ridan-to'g'ri yig'indisi.
Gradient formulasi:
∂L/∂x_l = ∂L/∂x_L · (1 + ∂/∂x_l · ΣF(xᵢ))
💡 Formulada "1" bor!
• Oddiy tarmoqda: gradient faqat qatlamlar orqali → yo'qolishi mumkin
• ResNetda: 1 + ... → gradient hech qachon nolga tushmaydi ✅
🔹 4. Skip Connection arxitekturasi
Basic Block (ResNet-18, 34):
x ┐
↓
Conv(3×3) → BN → ReLU
↓
Conv(3×3) → BN
↓
(+) ← x
↓
ReLU
Bottleneck Block (ResNet-50, 101, 152):
x ┐
↓
Conv(1×1) → BN → ReLU ← kanallar kamayadi
↓
Conv(3×3) → BN → ReLU ← asosiy hisoblash
↓
Conv(1×1) → BN ← kanallar oshadi
↓
(+) ← x
↓
ReLU
1×1 convolutionlar kanallar sonini kamaytiradi → hisoblash tejaladi.
O'lchamlar farq qilganda — Projection ishlatiladi:
y = F(x) + Wₛ·x ← bu yerda Wₛ = 1×1 conv
🔹 5. Natija
Oddiy tarmoq:
20 qatlam → ✅ yaxshi
56 qatlam → ❌ yomonlashadi
152 qatlam → ❌❌ juda yomon
ResNet:
20 qatlam → ✅ yaxshi
56 qatlam → ✅ hali yaxshi
152 qatlam → ✅ eng yaxshi (ImageNet 2015 🏆)
ResNet-152 — ImageNetda 2015-yilda eng yaxshi natija.
🎯 Yakuniy xulosa
• Degradation → chuqur tarmoq sayozdan yomon ishlaydi
• Skip connection → F(x) + x gradientga to'g'ridan-to'g'ri yo'l ochadi
• Identity mapping → F(x)=0 o'rganish oson → qo'shimcha qatlamlar zararlanmaydi
• ResNet g'oyasi → bugungi barcha zamonaviy arxitekturalarda ishlatiladi
🤝 YouTube dars: 🎥 Havola
🖥️ Colab notebook: 📂 Havola
📘 Barcha darslar: Havola
🚨 Videolar jonli yozilgan. Matematik izohlarda xatolar bo'lishi mumkin. Oldindan uzr 🙏
@EldorML | 0 |
| 12 | 📌 6.7-dars: Batch Normalization — Chuqur tarmoqlarda o'qitishni tezlashtirish
🎯 Deep Learning Mathematics — @EldorML
Oldingi darsda overfitting va generalization haqida gaplashdik.
Endi savol:
❓ Nega chuqur tarmoqlar o'qitish davomida beqaror bo'ladi?
❓ Nega learning rateni katta qilsak training buziladi?
Javob: Internal Covariate Shift.
🔹 1. Internal Covariate Shift nima?
Masalan, fabrika misolini olsak:
• Yaxshi holat → xomashyo har kuni bir xil keladi, ishchi bir maromda ishlaydi
• Yomon holat → xomashyo har kuni boshqacha, ishchi doim moslashadi
Tarmoqda ham xuddi shunday:
• Har qatlam oldingi qatlamdan input oladi
• O'qitish davomida oldingi qatlam o'zgargani sayin keyingi qatlam inputi ham o'zgaradi
• Keyingi qatlam doim "yangi sharoitga" moslashadi → o'qitish sekinlashadi
Bu — Internal Covariate Shift.
🔹 2. Batch Normalization — Yechim
G'oya: har qatlamning inputini normalizatsiya qilamiz — ya'ni mean=0, std=1 ga keltiramiz.
Batch = [1.0, 2.0, 3.0, 4.0] misol sifatida:
1-qadam — Mean:
μ = (1.0 + 2.0 + 3.0 + 4.0) / 4 = 2.5
2-qadam — Variance:
σ² = ((1−2.5)² + (2−2.5)² + (3−2.5)² + (4−2.5)²) / 4
= (2.25 + 0.25 + 0.25 + 2.25) / 4 = 1.25
3-qadam — Normalizatsiya:
x̂ᵢ = (xᵢ − μ) / √(σ² + ε)
x̂₁ = (1.0 − 2.5) / √1.25 = −1.34
x̂₂ = (2.0 − 2.5) / √1.25 = −0.45
x̂₃ = (3.0 − 2.5) / √1.25 = +0.45
x̂₄ = (4.0 − 2.5) / √1.25 = +1.34
Natija: mean ≈ 0, std ≈ 1 ✅
4-qadam — Scale va Shift: yᵢ = γ · x̂ᵢ + β
💡 γ va β nima uchun kerak?
Agar faqat normalizatsiya qilsak — model har doim mean=0, std=1 ga majbur.
Lekin ba'zi qatlamlarda boshqa taqsimot kerak bo'lishi mumkin.
γ va β — o'rganiluvchi parametrlar, model o'zi kerakli taqsimotni tanlaydi.
🔹 3. Training vs Inference
Muammo: inferenceda batch bo'lmasa nima qilamiz?
Yechim — Running Statistics:
μ_run ← (1−α)·μ_run + α·μ_batch
σ²_run ← (1−α)·σ²_run + α·σ²_batch
Training: batch statistikasi + running yangilanadi
Inference: running statistikasi — o'zgarmaydi
PyTorchda:
model.train() → batch stat, running yangilanadi
model.eval() → running stat, o'zgarmaydi
⚠️ Keng tarqalgan xato: model.eval() qismini unutish:
Inferenceda BatchNorm noto'g'ri ishlaydi → natijalar beqaror.
🔹 4. Batch Norm afzalliklari
• Katta LR ishlatish mumkin → tezroq o'qitish
• Initializationga kamroq bog'liqlik
• Regularization effekti — ozgina overfitting kamayadi
• Gradient vanishing kamayadi
🔹 5. Qayerga qo'yish kerak?
Original: Linear → BN → Activation
Zamonaviy: Linear → Activation → BN
PyTorchda:
nn.Linear(in, out)
nn.BatchNorm1d(out)
nn.ReLU()
🎯 Yakuniy xulosa:
• Internal Covariate Shift → qatlam inputi o'qitishda o'zgarib turadi
• Batch Norm → har batchda mean=0, std=1 ga keltiradi
• γ, β → model kerakli taqsimotni o'zi o'rganadi
• model.eval() → running statistikani ishlatadi
🤝 YouTube dars: 🎥 Havola
🖥️ Colab notebook: 📂 Havola
📘 Barcha darslar: Havola
🚨 Videolar jonli yozilgan. Matematik izohlarda xatolar bo'lishi mumkin. Oldindan uzr 🙏
@EldorML | 0 |
| 13 | 📌 6.6-dars: Overfitting va Generalization — Model nima o'rganadi?
🎯 Deep Learning Mathematics — @EldorML
Oldingi darsda regularization haqida gaplashdik.
Endi savol:
❓ Nega train datada yaxshi, yangi datada yomon ishlaydi?
❓ Model aslida nimani o'rganishi kerak?
Javob: Model bog’liqlikni (pattern) o'rganishi kerak — chalg’ituvchi ma’lumotni (noise) emas.
🔹 1. Overfitting / Underfitting
Imtihon analogiyasi:
• Umuman o'qimagansiz → Underfitting
• Mavzuni tushundingiz → Just right ✅
• Faqat javoblarni yod oldingiz → Overfitting
Math:
• Underfitting: Train loss↑ Val loss↑
• Just right: Train loss↓ Val loss↓ (yaqin)
• Overfitting: Train loss↓↓ Val loss↑
Polinom misoli:
• Daraja 1 → juda sodda → underfitting
• Daraja 4 → optimal → just right ✅
• Daraja 20 → juda murakkab → overfitting
🔹 2. Generalization Geometriyasi
Loss yuzada ikki xil minimum:
Sharp minimum:
Loss
| \ /
| \ /
| \ / ← tik devorlar
| \ /
| \/
Flat minimum:
Loss
| \ /
| \ /
| \_____/ ← keng, tekis tub
Nima uchun flat minimum yaxshi?
Train va test distribution ozgina farq qilsa → parametrlar siljishi mumkin.
• Sharp → kichik siljish → loss tez oshadi → testda yomon natija
• Flat → kichik siljish → loss deyarli o'zgarmaydi → testda yaxshi natija
Flat minimumga qanday erishish mumkin?
• Kichik batch size → flat minimum
• Weight Decay / L2 → katta parametrlarni jazolaydi
• Dropout → robustness oshadi
💡 Kichik batch (32–256) ko'proq tavsiya etiladi — tezroq bo'lmasa ham.
🔹 3. Bias-Variance Tradeoff
Xato = Bias² + Variance + Irreducible Noise
Bias → modelning tizimli xatosi → underfitting belgisi
Variance → modelning train dataga bog'liqligi → overfitting belgisi
Murakkablik bilan o'zgarishi:
Bias: ████████ → ░░░░░░░░ (murakkaklik oshsa kamayadi)
Variance: ░░░░░░░░ → ████████ (murakkaklik oshsa oshadi)
⚠️ Deep learningda "Double Descent" hodisasi:
Juda katta modellarda umumiy xato yana pasayadi — lekin bu hali to'liq tushuntirilmagan mavzu.
🔹 4. Train / Val / Test Split
| Set | Maqsad | Hajm |
| Train | Modelni o'qitish | 70–80% |
| Val | Hyperparameter sozlash | 10–15% |
| Test | Yakuniy baholash | 10–15% |
⚠️ Oltin qoida: Test setga faqat bir marta qarang.
Agar test natijasiga qarab model o'zgartirsangiz — u endi haqiqiy test emas.
Early Stopping:
Val loss oshib ketganda o'qitishni to'xtatish.
Epoch: 1 2 3 4 5 6 7 ...
Train: 0.9 0.7 0.5 0.4 0.3 0.2 0.15
Val: 0.95 0.8 0.65 0.6 0.58 0.60 0.65
↑
eng yaxshi model
🔹 5. Qaysi holda nima qilish kerak?
• Train↓ Val↑ → overfitting → regularization, dropout, ko'proq data
• Train↑ Val↑ → underfitting → kattaroq model, ko'proq epoch
• Train≈Val, ikkalasi↑ → ko'proq data kerak
• Train≈Val, ikkalasi↓ → ideal ✅
🎯 Yakuniy xulosa
• Overfitting → train datani yod olish
• Flat minimum → yaxshi umumiylashtirish
• Bias²+Variance → xatoning ikki komponenti
• Train/Val/Test → har birining alohida vazifasi bor
Yaxshi model — trainda eng past loss emas,
ko'rmagan datada eng past loss. ✅
🤝 YouTube dars: 🎥 Havola
🖥️ Colab notebook: 📂 Havola
📘 Barcha darslar: Havola
🚨 Videolar jonli yozilgan. Matematik izohlarda xatolar bo'lishi mumkin. Oldindan uzr 🙏
@EldorML | 0 |
现已上线!2025 年 Telegram 研究 — 年度关键洞察 
