uz
Feedback
Continuous Learning_Startup & Investment

Continuous Learning_Startup & Investment

Kanalga Telegramโ€™da oโ€˜tish

We journey together through the captivating realms of entrepreneurship, investment, life, and technology. This is my chronicle of exploration, where I capture and share the lessons that shape our world. Join us and let's never stop learning!

Ko'proq ko'rsatish
2 291
Obunachilar
-324 soatlar
-167 kunlar
-4530 kunlar
Postlar arxiv
์ดˆ๊ธฐ ํˆฌ์ž ์ง‘์ค‘ํ•˜๋Š” ํŽ€๋“œ์—์„œ ๋ฐ›์€ ๋‚ด์šฉ์ค‘ ์ผ๋ถ€... Software multiples have come down a bit in the past quarter to a median of 5.4x forward revenue, with 8x for the top quartile. (ImageSource: BVP Cloud Index as of 10/3/23) ์ฆ‰...์†Œํ”„ํŠธ์›จ์–ด ํšŒ์‚ฌ๋“ค ๊ธฐ์—…๊ฐ€์น˜๋Š” ๋ฏธ๋ž˜ 12๊ฐœ์›” ๋งค์ถœ์˜ 5.4๋ฐฐ... ์•„์ฃผ ํ†ฑ ํšŒ์‚ฌ๋“ค์€ 8๋ฐฐ ์ •๋„๋กœ ๋‚ด๋ ค์™”๋‹ค. ์–ผ๋งˆ์ „ 100X ์ด์ƒ๊นŒ์ง€ ์˜ฌ๋ผ๊ฐ€๋˜ ์‹œ์ ˆ์€ ์žŠ๋Š”๊ฒŒ ์ข‹๋‹ค. 2021๋…„ Q4๋ถ€ํ„ฐ ํˆฌ์ž๋Š” ๊ณ„์† ์ค„๊ณ  ์žˆ๋‹ค. ์ง€๋‚œ๋ถ„๊ธฐ (Q3, 2024) ์—๋Š” $73B ์ด ํˆฌ์ž๋˜์—ˆ๋Š”๋ฐ ๊ทธ๊ฒƒ์€ Q4 2019 ์ดํ›„ ์ œ์ผ ๋‚ฎ์€ ๊ธˆ์•ก์ด์˜€๊ณ  ๋˜ 10,095 ํšŒ์‚ฌ๊ฐ€ ํˆฌ์ž ๋ฐ›์•˜๋Š”๋ฐ Q3 2020 ๋…„ ์ดํ›„ ์ œ์ผ ๋‚ฎ์•˜๋‹ค. ์‹œ๋“œ์—์„œ Series A ๋ฐ›๋Š”๊ฒŒ ๋ฌด์ง€ ์–ด๋ ค์›Œ์กŒ๋‹ค. ์ด์   A ๋ผ์šด๋“œ ๋ฐ›์œผ๋ ค๋ฉด ์•ฝ ์›”๋งค์ถœ 5์ฒœ๋งŒ์› ์ด์ƒ ๋„˜์œผ๋ฉด์„œ ์ด์ต์„ ๋‚ด๊ธฐ ์‹œ์ž‘ํ•˜๋˜์ง€... ๊ฐ€๊นŒ์›Œ์•ผ๋œ๋‹ค. --- ์ด๊ฒŒ ์ง€๊ธˆ ๋ฏธ๊ตญ ์ดˆ๊ธฐ ๊ธฐ์—…๋“ค ์‚ฌ์ •์ด๋‹ค. ์ข€ ๋” ํฐ ๊ธฐ์—…๋“ค์€ ์—ฌ๊ธฐ์„œ ์„ ์„ ์ด์–ด์„œ ์ดํ•ดํ•˜๋ฉด ์–ผ๋งŒํผ ์–ด๋ ค์šด์ง€ ์•Œ๊ธฐ ์‰ฝ๋‹ค. ๊ทธ๋ž˜์„œ... 1) ๊ณ„์† ์ถ”๊ฐ€ ํŽ€๋”ฉ์ด ๋˜๊ฒ ์ง€ ์ƒ๊ฐํ•˜๊ณ  ์‚ฌ์—…์„ ํ•˜๋ฉด ์•ˆ๋œ๋‹ค. ๋ฌด์กฐ๊ฑด ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ์ž๊ธˆ์œผ๋กœ ์ด์ต์„ ๋‚ด๋ฉด์„œ ์ปค์•ผ๋œ๋‹ค. ๊ทธ๋Ÿฌ๋ฉด ํŽ€๋”ฉ์ด ๋œ๋‹ค. 2) ๋†’์€ ๋ฐธ๋ฅ˜์—์…˜ ์‹œ์ ˆ์ด ๋Œ์•„์˜ค๊ฒ ์ง€ ๊ธฐ๋Œ€๊ฐ์€ ์ €๋ฒ„๋ ค๋ผ. 3) ์ข‹์€์‹œ์ ˆ์— ๋†’์€ ๋ฐธ๋ฅ˜์—์…˜ ๋ฐ›์•˜๋Š”๋ฐ ์ถ”๊ฐ€ ํˆฌ์ž๊ธˆ์ด ๊ผญ ํ•„์š”ํ•˜๋ฉด ์ ˆ๋ฐ˜์ด์ƒ ๋ฐธ๋ฅ˜์—์…˜ ๋‚ด๋ ค์•ผ๋œ๋‹ค๋Š” ๊ฐ์˜ค๊ฐ€ ์žˆ์–ด์•ผ๋œ๋‹ค. 4) ๊ทธ๋Ÿฌ๋‚˜ ๋Š˜ ์˜ˆ์™ธ๋Š” ์žˆ๋‹ค. ๊ทธ ์˜ˆ์™ธ๊ฐ€ ๋˜๋ฉด ๊ฐ์‚ฌํ•˜๋ผ. ๋Œ€์‹  ๋ ๊ฑฐ๋ผ ์ƒ๊ฐ์€ ํ•˜์ง€ ๋ง์ž.

[Multi-head attention์˜ ๋ณธ์งˆ์€ ๋ฌด์—‡์ธ๊ฐ€?] ์ œ ๋ณธ์—…์ธ ์ตœ์ ํ™”/๊ฒฝ๋Ÿ‰ํ™”๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” AI ๋ชจ๋ธ์ด ๊ตฌ๋™ํ•˜๋Š” ๊ทผ๋ณธ ์›๋ฆฌ์™€ ์ด์œ ์— ๋Œ€ํ•œ ๊นŠ์€ ์ดํ•ด๋ฅผ ๋™๋ฐ˜ํ•ฉ๋‹ˆ๋‹ค. AI ๋ฐ˜๋„์ฒด๋ฅผ ๋งŒ๋“ค๋•Œ์—๋„ ์ด๋Ÿฌํ•œ ๋ณธ์งˆ์— ๋Œ€ํ•œ ์ดํ•ด๋Š”, ์ฐจ๋ณ„ํ™”๋œ ๋ฐ˜๋„์ฒด๋ฅผ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋Š” ๊ทผ๊ฑฐ์™€ ๋ฌด๊ธฐ๊ฐ€ ๋ฉ๋‹ˆ๋‹ค. AI์— ๋Œ€ํ•œ ๊ต๊ณผ์„œ๋ฅผ ๋ณผ์ผ์ด ๊ฑฐ์˜ ์—†๋‹ค๋ณด๋‹ˆ ์ตœ๊ทผ ๊ธฐ๋ณธ ๊ฐœ๋…์— ๋Œ€ํ•ด ๋‹ค๋ฅธ ์‚ฌ๋žŒ๋“ค์€ ์–ด๋–ป๊ฒŒ ์„ค๋ช…ํ•˜๋‚˜, ๊ฐ•์˜๋Š” ์–ด๋–ป๊ฒŒ ํ•˜๋‚˜์— ๋Œ€ํ•ด ๊ด€์‹ฌ์„ ๊ฐ€์งˆ๋•Œ๊ฐ€ ๊ฐ€๋” ์ƒ๊ธฐ๋Š”๋ฐ, Transformer์— ๋Œ€ํ•ด ์ž˜ ์„ค๋ช…ํ•˜๋Š” ๊ฐ•์˜๋ฅผ ์ฐพ๊ธฐ๊ฐ€ ์‰ฝ์ง€ ์•Š๋”๋ผ๊ณ ์š”.. (Transformer๊ฐ€ ์ด๋ ‡๊ฒŒ ์ค‘์š”ํ•ด์กŒ๋Š”๋ฐ๋„..??) ์–ธ์ œ ํ•œ๋ฒˆ ๊ธฐํšŒ๋˜๋ฉด ๊ฐ•์˜๋„ ์ œ ๋‚˜๋ฆ„ ๋‹ค์‹œ ๋งŒ๋“ค์–ด๋ณด๊ณ  ์‹ถ๋‹ค๋Š” ์ƒ๊ฐ๋„ ๊ฐ€๋” ํ•˜๊ฒŒ ๋˜๋Š”๋ฐ ์ •๋ง ์‹œ๊ฐ„์ด ์—†์–ด์„œ ๊ทธ๊ฒŒ ์ฐธ ์•„์‰ฝ๋„ค์š”. ๊ทธ ์ค‘์—์„œ multi-head attention์€, ์–ด๋–ค ๋ถ„๋“ค์—๊ฒŒ๋Š” ์ฒ˜์Œ Transformer๋ฅผ ๊ณต๋ถ€ํ•˜๊ฒŒ ๋  ๋•Œ ๊ณ ๊ฐœ๋ฅผ ๊ฐธ์šฐ๋šฑ ๊ฑฐ๋ฆฌ๊ฒŒ ํ•˜๋Š” ๋™์ž‘๋“ค์ด ๋งŽ์„ ์ˆ˜ ์žˆ๋‹ค๋Š” ์ƒ๊ฐ์„ ํ•ฉ๋‹ˆ๋‹ค.. ์™œ ํ•˜ํ•„ multi-head attention ๊ตฌ์กฐ๋Š” ์ด๋ ‡๊ฒŒ ๋งŒ๋“ค์—ˆ์ง€? ์™œ ์ด๋Ÿฐ์‹์œผ๋กœ ๋™์ž‘์„ ํ•˜๋Š”๊ฑฐ์ง€? ํ•˜๋Š” ๋ถ€๋ถ„์ด ๋งŽ์„ ์ˆ˜ ์žˆ์„ ๊ฒƒ ๊ฐ™์€๋ฐ์š”, ์ €๋Š” ์–ด๋–ค ์›๋ฆฌ๋ฅผ ์ดํ•ดํ•  ๋•Œ, โ€˜์•„~ ๋‚˜๋ผ๋„ ์ด๋ ‡๊ฒŒ ๋งŒ๋“ค์—ˆ๊ฒ ๋‹คโ€™ ์‹ถ์„ ๋•Œ, ์ €๋Š” ์–ด๋А์ •๋„ ๋‚ด ๊ธฐ์ค€์œผ๋กœ ์ดํ•ด๋ฅผ ํ–ˆ๋‹ค๊ณ  ๋งŒ์กฑ์„ ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๊ทธ๋ ‡์ง€ ์•Š์œผ๋ฉด ๊ผญ ๋‚˜์ค‘์— ์ œ๋Œ€๋กœ ์ดํ•ด๋ฅผ ๋ชปํ•œ ๋ฌธ์ œ๊ฐ€ ์—ฐ๊ตฌ๋˜ ๊ฐœ๋ฐœ์ด๋˜ ๋ฐœ์ƒ์„ ํ•˜๋”๋ผ๊ณ ์š”.. Transformer์— ๋Œ€ํ•ด ์ „๋ถ€ ๋‹ค ๊ฐ•์˜๋ฅผ ํ•ด๋ณผ ์ˆ˜ ์žˆ์œผ๋ฉด ์ฐธ ์ข‹๊ฒ ์œผ๋‚˜ ์˜ค๋Š˜์€ ๊ทธ์ค‘์—์„œ, multi-head attention์— ๋Œ€ํ•ด ์ œ ๋ฐฉ์‹๋Œ€๋กœ ์„ค๋ช…์„ ํ•ด๋ณด๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค (์ €ํฌํŒ€์€ ์ด๋Ÿฐ์‹์œผ๋กœ AI ์›๋ฆฌ๋ฅผ ์ดํ•ดํ•œ๋‹ค๋Š” ์˜ˆ๋กœ ๋ด์ฃผ์…”๋„ ์ข‹๊ฒ ์Šต๋‹ˆ๋‹ค) 1. ๋”ฅ๋Ÿฌ๋‹์˜ ๋†€๋ผ์›€์€ random initialization ๊ฐœ๋…์œผ๋กœ ๋ถ€ํ„ฐ ์‹œ์ž‘ Multi-head attention์˜ ๋ณธ์งˆ์€ ์—ฌ๊ธฐ์„œ๋ถ€ํ„ฐ ์‹œ์ž‘๋ฉ๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๊ฐ€ ResNet์ด๋‚˜ BERT๊ฐ™์€ ๋ชจ๋ธ๋“ค์„ ๊ตฌ๋™ํ•  ๋•Œ ์ •๋ง ๋†€๋ผ์šด ๊ฒƒ์€ ์•„๋ฌด ์ˆซ์ž๋‚˜ (๋ฌผ๋ก  Gaussian Distribution๊ฐ™์€ ๋ชจํ˜•์€ ๊ฐ€์ •ํ•˜๊ณ ) randomํ•˜๊ฒŒ ์‹œ์ž‘์„ ํ•ด๋„ ํ•™์Šต์„ ํ†ตํ•ด์„œ ์ตœ์ข… accuracy๋Š” ๊ทธ๋ ‡๊ฒŒ ํฌ๊ธฐ ๋ฐ”๋€Œ์ง€ ์•Š๋Š”๋‹ค๋Š” ์‚ฌ์‹ค์ž…๋‹ˆ๋‹ค. ์ƒ๊ฐํ• ์ˆ˜๋ก ๋†€๋ผ์šด ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค. ์™œ๋ƒํ•˜๋ฉด ๋งจ ์ฒ˜์Œ ์‹œ์ž‘์ ๊ณผ ๋์ ์„ ์—ฐ๊ฒฐํ•ด์„œ ๋ฌผ๋ฆฌ์ ์ธ distance๊ฐ™์€ ๊ฐœ๋…์„ ๊ฐ€์ ธ์˜ค๋”๋ผ๋„ ์–ด๋งˆ์–ด๋งˆํ•˜๊ฒŒ ์‹œ์ž‘์ ๋ถ€ํ„ฐ ๋ฉ€๋ฆฌ๊ฐ€๋Š” ๊ฒƒ๋„ ์•„๋‹ˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ ๋งŽ์€ ํ•ด๊ฐ€ ์กด์žฌํ•  ์ˆ˜๋Š” ์žˆ์ง€๋งŒ ์–ด๋–ป๊ฒŒ ๊ทธ ๋งŽ์€ ์ตœ์ข… ๊ฒฐ๊ณผ๋“ค์˜ ์„ฑ๋Šฅ์ด ๋น„์Šท๋น„์Šทํ•ด์งˆ ์ˆ˜ ์žˆ๋Š”์ง€ ๋†€๋ผ์šด ์ ๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด์— ๋Œ€ํ•œ ์—ฐ๊ตฌ๊ฒฐ๊ณผ๋„ ๋ฌด์ˆ˜ํžˆ ๋งŽ์•˜๋Š”๋ฐ ํŠนํžˆ Lottery Ticker Hypothesis๋Š” ์ด๋ฅผ ์„ค๋ช…ํ•  ์ˆ˜ ์žˆ๋Š” ๋งค์šฐ ๊ฐ•๋ ฅํ•œ ์ด๋ก ์ด๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค. 2. Attention์—์„œ๋Š” randomํ•˜๊ฒŒ ์ถœ๋ฐœํ•ด๋„ ๊ดœ์ฐฎ์€๊ฐ€? ์ „์ฒด weight์„ randomํ•˜๊ฒŒ ์‹œ์ž‘์„ ํ•ด๋„ feed-forward network (์ฆ‰, ๋‹จ์ˆœ linear layer) ๊ฐ™์€ ๊ฒฝ์šฐ์—๋Š” ํ•™์Šต ๋’ค ๋น„์Šทํ•œ ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ์–ป์„ ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ๋ฐ ํ•˜ํ•„ attention์— ํ•ด๋‹นํ•˜๋Š” weight๋“ค์€ ์ด๊ฒŒ ๊ทธ๋ ‡๊ฒŒ ์‰ฝ์ง€๊ฐ€ ์•Š์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‹ˆ๊นŒ ์–ด๋–ป๊ฒŒ random initilization์„ ํ•˜๋ƒ์— ๋”ฐ๋ผ ์ตœ์ข… ๊ฒฐ๊ณผ๊ฐ€ ๊ฝค ๋‹ฌ๋ผ์ง‘๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋ฉด ์–ด๋–ป๊ฒŒ ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ• ๊นŒ ๋ผ๊ณ  ๋ณด๋ฉด ๋งˆ์น˜ ์šฐ๋ฆฌ๊ฐ€ ์ฃผ์‹์‹œ์žฅ์—์„œ ๋ถ„์‚ฐ ํˆฌ์ž๋ฅผ ํ•˜๋“ฏ์ด ์—ฌ๋Ÿฌ๊ฐœ์˜ random initliazation์„ ๋งŒ๋“ค์–ด๋ณด๊ณ  ๊ทธ ์ค‘์— ์ข‹์€ ๋…€์„์ด ๋‚˜ํƒ€๋‚  ํ™•๋ฅ ์„ ๋†’์—ฌ๋ณด๋Š” ๊ฒ๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ header๊ฐ€ ๋งŒ์•ฝ 10๊ฐœ๊ฐ€ ์žˆ๋‹ค๊ณ  ํ•œ๋‹ค๋ฉด ๊ทธ๊ฒƒ์€ 10์ข…๋ชฉ์˜ ์ฃผ์‹์„ ์‚ฌ๋†“๊ณ  ์ข‹์€ ์ฃผ์‹์ด ๊ทธ์ค‘์—์„œ ํ•œ๋‘๊ฐœ ๋‚˜ํƒ€๋‚˜๊ธฐ๋ฅผ ๊ธฐ๋‹ค๋ฆฌ๋Š” ๊ฒƒ๊ณผ ๋น„์Šทํ•ฉ๋‹ˆ๋‹ค (๋งˆ์น˜ ETF ํˆฌ์ž๋ฅผ ํ•˜๋“ฏ์ด์š”). ์ด์— ๋Œ€ํ•œ ๊ด€๋ จ ์—ฐ๊ตฌ๋Š” ๋งค์šฐ ๋งŽ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค๋ฉด header๋‹จ์œ„๋กœ pruning์„ ํ•ด๋ณธ ์—ฐ๊ตฌ๋“ค๋„ ๋งŽ๊ณ  header๋งˆ๋‹ค weight์˜ ํŠน์ง•์ด ๋งค์šฐ ๋‹ค๋ฅด๋‹ค๋Š” ์—ฐ๊ตฌ๋„ ๋งŽ์Šต๋‹ˆ๋‹ค. ๊ทธ๋งŒํผ attention weight์˜ ๊ฒฝ์šฐ์—๋Š” โ€˜๊ธˆ์ˆ˜์ €โ€™์™€ โ€˜ํ™์ˆ˜์ €โ€™์˜ ์ฐจ์ด๊ฐ€ ํฝ๋‹ˆ๋‹ค (์ข‹์€ ์˜ˆ์ธ์ง€๋Š” ๋ชจ๋ฅด๊ฒ ์œผ๋‚˜ ๊ตณ์ด ๋น„์œ ๋ฅผ ํ•˜์ž๋ฉด์š”โ€ฆ) 3. Attention๊ฒฐ๊ณผ๋ฌผ์€ summation์„ ํ•˜์ง€ ์•Š๊ณ  concatenation์„ ํ•œ๋‹ค.. ์™œ? header์˜ ๊ฐœ์ˆ˜๊ฐ€ ๋Š˜์–ด๋‚ ์ˆ˜๋ก ๊ฐ header๊ฐ€ ๋งŒ๋“ค์–ด๋‚ด๋Š” dimension์€ ์ค„์ž…๋‹ˆ๋‹ค. ์™œ๋ƒํ•˜๋ฉด header์˜ ๊ฒฐ๊ณผ๋ฅผ ๋‚˜์ค‘์—๋Š” ๊ฒฐ๊ตญ concatenationํ•˜๊ณ  ์ „์ฒด output dimention์€ ์œ ์ง€๋ฅผ ํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ์ด ๋•Œ (๊ผญ ํ•ญ์ƒ ๊ทธ๋Ÿฐ๊ฒƒ์€ ์•„๋‹ˆ์ง€๋งŒ) ๋งŒ์•ฝ ์–ด๋–ค output๋“ค์„ ํ•ฉ์น  ๋•Œ, ๊ฐ๊ฐ์˜ ๋…€์„๋“ค์ด ๋Œ€๋™์†Œ์ดํ•˜๊ฑฐ๋‚˜ ์„ฑ๋Šฅ์ด ๋น„์Šท๋น„์Šทํ•˜๋ฉด summation์„ ํ•˜๊ณ , ๋งŒ์ผ ์ฐจ์ด๊ฐ€ ํฌ๋‹ค๋ฉด concatenation์„ ํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋”ฅ๋Ÿฌ๋‹์— ๋งŽ์Šต๋‹ˆ๋‹ค. ๊ตณ์ด ๋˜ ๋น„์œ ๋ฅผ ํ•˜์ž๋ฉด, ์•ˆ์ข‹์€ ๋…€์„๋“ค๊ณผ ์ข‹์€ ๋…€์„๋“ค์„ ํ•œ๋ฒˆ์— ๋‹ค ์„ž์–ด๋ฒ„๋ฆฌ์ง€ ๋ง๊ณ , ๋งค์šฐ ๋›ฐ์–ด๋‚œ ์ธ์žฌ๊ฐ€ ์žˆ๋‹ค๊ณ  ํ•˜๋ฉด ๋…๋ฆฝ์ ์œผ๋กœ ์‚ด๋ ค๋‘๋Š” ๋ฐฉ์‹ ์ด๋ผ๊ณ  ์ƒ๊ฐํ• ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์ฆ‰, ํƒœ์ƒ์ ์œผ๋กœ header๋“ค์€ ํ™•๋ฅ ์ ์œผ๋กœ ๋งค์šฐ ์ข‹๊ฑฐ๋‚˜ ๋งค์šฐ ๋‚˜์  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์ด ์—ฌ๊ธฐ์ €๊ธฐ structure ๊ตฌ์กฐ ๊ณ ๋ฏผ์— ๋…น์•„์ ธ ์žˆ์Šต๋‹ˆ๋‹ค. 4. Header ๊ฐœ์ˆ˜๋Š” ๊ฒฐ๊ตญ trade-off์˜ ์‚ฐ๋ฌผ Header ๊ฐœ์ˆ˜๊ฐ€ ๋งŽ์•„์ง€๋ฉด ์ข‹์€ header๋ฅผ ์ฐพ์„ ํ™•๋ฅ ์ด ๋†’์•„์ง‘๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด ํ•˜๋‚˜์˜ header๊ฐ€ ๊ธฐ์—ฌํ•˜๋Š” dimension์€ ์ž‘์•„์ง‘๋‹ˆ๋‹ค. ์ด๋ ‡๊ธฐ ๋–„๋ฌธ์— header์˜ ๊ฐœ์ˆ˜๋ฅผ ์ ์  ๋Š˜์ด๋‹ค๋ณด๋ฉด ์„ฑ๋Šฅ์ด ์˜ฌ๋ผ๊ฐ”๋‹ค๊ฐ€ ๋‹ค์‹œ ๋‚ด๋ ค๊ฐ€๋Š” ํ˜„์ƒ์„ ๋ณด์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ header์˜ ๊ฐœ์ˆ˜๊ฐ€ ์–ผ๋งˆ๋‚˜ ํ•„์š”ํ•œ์ง€๋Š” ์ „์ ์œผ๋กœ empiricalํ•œ ๊ฒฐ๊ณผ๋ผ๊ณ  ๋ณผ ์ˆ˜ ์žˆ๊ณ , ํ†ต์ƒ์ ์œผ๋กœ ๋ชจ๋ธ์ด ์ปค์งˆ์ˆ˜๋ก ์ตœ์ ์˜ header ๊ฐœ์ˆ˜๋Š” ์ฆ๊ฐ€๋ฅผ ํ•ฉ๋‹ˆ๋‹ค. ์ฐธ๊ณ ๋กœ header์˜ (์ด์ œ๊นŒ์ง€ ๋ง์”€๋“œ๋ฆฐ random initialization ๊ด€์ ์—์„œ) ๋…ํŠนํ•œ ํ˜„์ƒ๋“ค ๋•Œ๋ฌธ์— ์ตœ์ ํ™”๋‚˜ ๊ฒฝ๋Ÿ‰ํ™”๋ฅผ ํ•  ๋•Œ๋„ header ๊ด€๋ จ ๋งŒํผ์€ ํŠน๋ณ„ํžˆ ๋‹ค๋ฅธ ๋ฐฉ์‹๋“ค์„ ์ทจํ•ด์•ผํ•  ๋•Œ๊ฐ€ ๋งŽ์Šต๋‹ˆ๋‹ค. ์ตœ๊ทผ GQA๋ผ๋Š” ๋ฐฉ์‹์ด ์œ ํ–‰ํ•˜๊ณ  ์žˆ๋Š”๋ฐ (๋ผ๋งˆ2 70B๋ชจ๋ธ, PaLM๋“ฑ๋“ฑ์— ์ฑ„ํƒ) ์ด๊ฒŒ ๋ฌด์Šจ ๋ง์ด๋ƒ๋ฉด, ์‚ฌ์‹ค ์•Œ๊ณ ๋ณด๋‹ˆ header๊ฐœ์ˆ˜๊ฐ€ K์™€ V์—๋Š” ๊ทธ๋ฆฌ ๋งŽ์„ ํ•„์š”๊นŒ์ง€๋Š” ์—†๊ณ  ๋Œ€์‹  Q์—๋Š” ๋งŽ์ด ํ•„์š”ํ•˜๋”๋ผ ๋ผ๋Š” ์–˜๊ธฐ๊ฐ€ ๋ฉ๋‹ˆ๋‹ค. ์ฆ‰, ํƒ€๊ณ ๋‚œ ๊ธˆ์ˆ˜์ €์™€ ํ™์ˆ˜์ €์˜ ํฐ ์ฐจ์ด๋Š” ์‚ฌ์‹ค ์•Œ๊ณ ๋ณด๋ฉด Q์— ํ•ด๋‹นํ•˜๋Š” weight์—์„œ ๋” ํฌ๊ฒŒ ๋ฐœ์ƒํ•˜๋”๋ผ ๋ผ๊ณ  ๋ณผ์ˆ˜๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์•„๋งˆ ์ฒ˜์Œ Transformer๋ฅผ ๋งŒ๋“ค์—ˆ์„ ๋•Œ๋„ ์ €์ž๋“ค์€ ๋งˆ์ฐฌ๊ฐ€์ง€ ์‹œํ–‰์ฐฉ์˜ค๋ฅผ ํ–ˆ์„ ๊ฑฐ๋ผ๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค. Header๋ผ๋Š” ์• ๋Š” random initilization ์กฐ๊ฑด์— ๋”ฐ๋ผ ์ตœ์ข… ์„ฑ๋Šฅ์ด ๋„ˆ๋ฌด ๋‹ค๋ฅด๋„ค? ๊ทธ๋Ÿผ ์œ„ํ—˜ ๋ถ„์‚ฐ์„ ์–ด๋–ป๊ฒŒ ํ•  ์ˆ˜ ์žˆ์ง€? ์•„ header ๊ฐœ์ˆ˜๋ฅผ ๋Š˜์—ฌ๋ณด๊ณ  ๊ฒฐ๊ณผ๋ฅผ concatenationํ•ด์•ผ๊ฒ ๊ตฌ๋‚˜ ํ•˜๋Š” ๋…ผ๋ฆฌ๊ฐ€ ๋ฉ๋‹ˆ๋‹ค. ์ด๋ฏธ ์ž์„ธํžˆ ์•Œ๊ณ  ์žˆ์„ ๋ถ„๋“ค์—๊ฒŒ๋Š” ๋‹น์—ฐํ•œ ์–˜๊ธฐ๋“ค์„ ๊ธธ๊ฒŒ ์“ด ๊ฒƒ์ผ์ˆ˜๋„ ์žˆ๋Š”๋ฐ, ์œ„์™€ ๊ฐ™์ด ์„ค๋ช…ํ•˜๋Š” ๋ถ„๋“ค์„ ์ฐพ์•„๋ณด์ง€๋ฅผ ๋ชปํ•ด์„œ ํ•œ๋ฒˆ ์ €๋งŒ์˜ ํ•ด์„์€ ์ด๋ ‡๋‹ค๋Š” ๊ฒƒ๋„ ๊ณต์œ ๋“œ๋ฆด ๊ฒธ, ๊ทธ๋ฆฌ๊ณ  ์ €ํฌํŒ€์—์„œ๋Š” ์ด๋Ÿฐ์‹์œผ๋กœ ์ƒˆ๋กœ๋‚˜์˜ค๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค์„ ์ดํ•ดํ•˜๊ณ  ๋„˜์–ด๊ฐ„๋‹ค๋Š” ์  ๊ณต์œ ๋“œ๋ฆฌ๊ณ  ์‹ถ์—ˆ์Šต๋‹ˆ๋‹ค. ๋”ฅ๋Ÿฌ๋‹์— ๋Œ€ํ•œ ๊นŠ์€ insight๋“ค์€ ๋งŽ์€ ์‹คํ—˜์„ ํ•ด๋ณด๊ณ  ์ง์ ‘ ๊ฒฝํ—˜ํ•ด๋ณด๊ณ , ๊ด€๋ จ๋œ ์—ฐ๊ตฌ๋“ค๋„ ๊ด‘๋ฒ”์œ„ํ•˜๊ฒŒ ์ดํ•ดํ•˜๋ฉด์„œ ํญ๋„“์€ ์‹œ์•ผ๋ฅผ ๊ฐ€์ง€๋„๋ก ๋‚˜๋ฆ„ ๋…ธ๋ ฅํ•œ๋‹ค๋Š” ์ , ๊ทธ๋ฆฌ๊ณ  ์œ„์™€ ๊ฐ™์€ ์ดํ•ด๋ฐฉ์‹์„ ๊ฐ€์ง€๊ณ  ์žˆ์–ด์•ผ ์ƒˆ๋กœ์šด ์—ฐ๊ตฌ๋“ค๋„ ๊ฐ€๋Šฅํ•˜์ง€ ์•Š์„๊นŒ ์ƒ๊ฐํ•œ๋‹ค๋Š” ์  ๊ณต์œ ๋“œ๋ ค๋ด…๋‹ˆ๋‹ค.

https://youtu.be/IPDAFffVsv0 ์ด์ „์˜ ์„ฑ๊ณต์ด ์ดํ›„์—๋„ ๋ฐ˜๋ณต๋ ๊ฑฐ๋ผ๋Š” ๋ณด์žฅ์€ ์—†์Šต๋‹ˆ๋‹ค. ์—ฌ๋Ÿฌ๋ถ„์ด ์ง๋ฉดํ•˜๊ฒŒ ๋  ๋ฏธ๋ž˜๋Š” ๊ณผ๊ฑฐ์˜ ๋ฐ์ดํ„ฐ๋กœ ์˜ˆ์ธกํ•˜๊ธฐ ์–ด๋ ต๊ฑฐ๋“ ์š”. ๋ช…ํ™•ํ•œ ๋น„์ „ -> ์‹คํ–‰ -> ์‹คํŒจ/๋ฐฐ์›€ -> ์žฌ๋„์ „, ๋ช…ํ™•ํ•œ ๋น„์ „์ด ์—†๋‹ค๋ฉด ์ˆ ์ทจํ•œ ์‚ฌ๋žŒ๊ณผ ๊ฐ™๋‹ค. ์ด์ œ๊ป ๋ณด์ง€ ๋ชปํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋ ค๋ฉด ๋ถ„์•ผ๋ฅผ ๋”ฐ์ง€์ง€ ์•Š๊ณ  ๋ชจ๋“  ์ง€์‹์„ ํ™œ์šฉํ•ด์•ผํ•œ๋‹ค. ๊ณผ๋ชฉ์ด๋‚˜ ๋ถ„์•ผ๋กœ ์ง€์‹์„ ๋‚˜๋ˆ„๋Š” ๋Œ€์‹ ์— ๋ชฉ์ ์„ ์ด๋ฃจ๋Š” ๋ฐ์— ์ง‘์ค‘ํ•ด์•ผํ•œ๋‹ค. ์ปดํ“จํ„ฐ๋Š” ๊ณผํ•™๊ณผ ๊ณตํ•™์„ ์ง€๋ฐฐํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค. (AI...) - Economic: Cheaper than people - Speed: Far faster - Accuracy - Reliability(Predictability) ๊ธฐ๊ณ„๋Š” ์—ฐ๊ธˆ์„ ๋ฐ›์ง€๋„ ๊ฐœ์ธ์ ์ธ ๋‹คํˆผ์„ ํ•˜์ง€๋„ ์•Š์Šต๋‹ˆ๋‹ค. ์ธ๊ฐ„์˜ ์žฅ์ ์€ ์—ฌ๋Ÿฌ๋ถ„์ด ์•ž์œผ๋กœ ํ’€์–ด๊ฐˆ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ข‹์€ ์ธ์ƒ์„ ๋งŒ๋“ค๋ ค๋ฉด ๋…ธ๋ ฅํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค. ๋งŽ์€ ์‚ฌ๋žŒ๋“ค์ด ๋ชฉํ‘œ๋ฅผ ์ด๋ฃจ๋ฉด ํ–‰๋ณตํ•  ๊ฑฐ๋ผ๊ณ  ๋ฏฟ์ฃ . ํ•˜์ง€๋งŒ ์˜ค๋žœ์‹œ๊ฐ„๋™์•ˆ ๋งŽ์€ ๊ฑธ ๊ณต๋ถ€ํ•˜๊ณ  ์‚ฌ๋žŒ๋“ค๊ณผ ๋Œ€ํ™”๋ฅผ ๋‚˜๋ˆ„๋ฉด์„œ ๊นจ๋‹ฌ์€ ๊ฒƒ์€ ์‚ฌ๋žŒ๋“ค์€ ๊ทธ ๋ชฉํ‘œ๋ฅผ ์ด๋ฃจ๋Š” ์ˆœ๊ฐ„์ด ์•„๋‹Œ ๋ชฉํ‘œ์— ๋‹ค๋‹ค๋ฅผ ๋•Œ๊นŒ์ง€์˜ ๊ณ ๋‚œ๊ณผ ๊ฐˆ๋“ฑ์„ ํ†ตํ•ด ์Šค์Šค๋กœ๋ฅผ ์ดํ•ดํ•˜๊ณ  ์™„์„ฑ์‹œํ‚ต๋‹ˆ๋‹ค. ์–ธ์  ๊ฐ€ ๋‚˜์ด๋ฅผ ๋จน๊ณ  ์™„์„ฑ๋œ ๋‚˜๋ฅผ ์ธ์ •ํ•ด์•ผํ•  ๋•Œ๊ฐ€ ์˜ฌ ๊ฑฐ์—์š”. ์—ฌ๋Ÿฌ๋ถ„์ด ๋‚˜์ด๋ฅผ ๋จน์—ˆ์„ ๋•Œ ์–ด๋–ค ๋ชจ์Šต์œผ๋กœ ์‚ด์•„๊ฐˆ์ง€ ์ƒ์ƒํ•˜์„ธ์š”. ๊ทธ ๋ฏธ๋ž˜๋ฅผ ์œ„ํ•ด ๋‹น์žฅ ์›€์ง์ด์„ธ์š”. ์ด ๋ฉ”์„ธ์ง€๊ฐ€ ์ˆ˜์—…์˜ ํ•ต์‹ฌ์ž…๋‹ˆ๋‹ค. ๋ฌผ๋ก  ์ œ ๋ง์ด ํ•ญ์ƒ ๋งž๋Š” ๊ฒƒ์€ ์•„๋‹™๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์ œ๊ฐ€ ์ด์ œ๊ป ๋งŒ๋‚œ ์ˆ˜๋งŽ์€ ์‚ฌ๋žŒ๋“ค์€ ์–ด๋ ค์›€ ์†์—์„œ ๋ถ„ํˆฌํ•˜๋ฉฐ ํƒ์›”ํ•œ ๊ฐ€์น˜๋ฅผ ์ฐพ์•˜์ฃ . ์ €๋Š” ์–ด๋ ธ์„ ๋•Œ ์˜ํ™”๋ฅผ ์ž์ฃผ ๋ดค์Šต๋‹ˆ๋‹ค. ํ† ์š”์ผ ์นœ๊ตฌ์™€ ํ•˜๋ฃจ์ข…์ผ ์›ƒ์œผ๋ฉฐ ์˜ํ™”๋ฅผ ๋ดค์Šต๋‹ˆ๋‹ค. ์นœ๊ตฌ๊ฐ€ ์ €์—๊ฒŒ ๊ทธ๋ ‡๊ฒŒ ์›ƒ๊ธด ์˜ํ™”๋Š” ์•„๋‹ˆ๋˜๋ฐ? ๋ผ๊ณ  ๋งํ–ˆ๊ณ  ์นœ๊ตฌ๋ง์— ๊ณต๊ฐํ–ˆ์Šต๋‹ˆ๋‹ค. ์›ƒ๊ธด ์žฅ๋ฉด์ด ๋งŽ์€ ์˜ํ™”๋ผ๊ณ  ์›ƒ๊ธด ์˜ํ™”๋Š” ์•„๋‹ˆ์ฃ . ์ธ์ƒ๋„ ๋˜‘๊ฐ™์Šต๋‹ˆ๋‹ค. ํ–‰๋ณตํ•œ ์ˆœ๊ฐ„๋งŒ ์žˆ๋‹ค๊ณ  ํ–‰๋ณตํ•œ ์‚ถ์€ ์•„๋‹™๋‹ˆ๋‹ค. ์ง„์งœ ํ–‰๋ณตํ•œ ์‚ถ์€ ๊ทธ๋ ‡๊ฒŒ ๋‹จ์กฐ๋กญ์ง€ ์•Š์•„์š”. ๊ธฐ์จ๋งŒ ์žˆ๋‹ค๊ณ  ์ข‹์€ ์‚ถ์ด ์•„๋‹™๋‹ˆ๋‹ค. ๋งค์ผ ์•„์นจ์ด ํ•ญ์ƒ ํ–‰๋ณตํ•  ์ˆœ์—†์ฃ . ์ข‹์€ ์‚ถ์€ ์ž”์ž”ํ•˜๊ฒŒ ๋‹ค๊ฐ€์˜ต๋‹ˆ๋‹ค. ์ง„์ •์œผ๋กœ ํ–‰๋ณตํ•œ ์‚ถ์„ ๋งŒ๋“ค๊ณ  ์‹ถ๋‹ค๋ฉด ๋ชฉํ‘œ๋ฅผ ์ •ํ•˜๊ณ  ๊ทธ๊ฑธ ์ด๋ฃจ๊ธฐ ์œ„ํ•ด์„œ ๋…ธ๋ ฅํ•˜์„ธ์š”. ์ˆ ์ทจํ•œ ์„ ์›๋“ค์ฒ˜๋Ÿผ ๋ฉํ•˜๋‹ˆ ํ‘œ๋ฅ˜ํ•˜์ง€ ๋ง๊ณ ์š”. ์†Œํฌ๋ผํ…Œ์Šค๋Š” ๋˜๋Œ์•„๋ณด์ง€ ์•Š๋Š” ์‚ถ์€ ๊ฐ€์น˜๊ฐ€ ์—†๋‹ค๊ณ  ํ–ˆ์Šต๋‹ˆ๋‹ค.

How Do Nvidia Allocations Work? # They have an allocation they give per customer. But for example, Azure saying โ€œhey we would like 10,000 H100s all to be used by Inflectionโ€ is different from Azure saying โ€œhey we would like 10,000 H100s for Azureโ€™s cloudโ€ - Nvidia cares about who the end customer is, and so clouds might be able to get an extra allocation for a specific end customer if Nvidia is excited about the end customer. Nvidia also wants to know who that end customer is, as much as possible. And they prefer customers with nice brand names or startups with strong pedigrees. Yes, this seems to be the case. NVIDIA likes to guarantee GPU access to rising AI companies (many of which they have a close relationship with). See Inflection โ€” an AI company they invested in โ€” testing a huge H100 cluster on CoreWeave, which they also invested in โ€“ Private cloud exec Itโ€™s a unique situation in that Nvidia is giving large allocations to private clouds: CoreWeave has more H100s than GCP. Nvidia would prefer not to give large allocations to companies that are attempting to compete directly with them (AWS Inferentia and Tranium, Google TPUs, Azure Project Athena).

์—ฌ๋Ÿฌ ์„œ๋น„์Šค๋ฅผ ๊ฝค ์ž˜ ๋น„๊ตํ•ด๋†จ๋„ค์š” ใ…Žใ…Ž Which GPU cloud should I use? # If you need a huge number of A100s/H100s - talk to Oracle, FluidStack, Lambda Labs, maybe a few others. Capacity is very low though for large quantities, especially of H100s, based on a couple of cloud founders/execs Iโ€™ve talked with. If you need a couple A100s or H100s: Runpod, perhaps Tensordock or Latitude. If you need 1x H100: Runpod (Fluidstack and Lambda have been out of on-demand capacity for qhite a while). If you need cheap 3090s, 4090s, or A6000s: Tensordock. If you need Stable Diffusion inference only: Salad. If you need a wide variety of GPUs: Runpod or Tensordock. If you want to play around with templates / general hobbyist: Runpod. The large clouds generally have worse pricing and more complicated setups than the above. If youโ€™re tied to one of the big clouds (AWS, Azure, GCP), then you donโ€™t have a choice, so use that. More info here, here, here, here, here and here.

https://www.theinformation.com/articles/metas-free-ai-isnt-cheap-to-use-companies-say?rc=3jxaoo (์œ ๋ฃŒ๊ธฐ์‚ฌ์ž…๋‹ˆ๋‹ค) ์˜คํ”ˆ์†Œ์Šค ๋ชจ๋ธ์ด ์ƒ์—…๋ชจ๋ธ๋ณด๋‹ค ์ €๋ ดํ•˜์ง€ ์•Š์€ ๊ฒฝ์šฐ๊ฐ€ ํ›จ์”ฌ ๋งŽ๋‹ค๋Š” ๊ธฐ์‚ฌ์ž…๋‹ˆ๋‹ค. ๋‰ด์Šค์— ์ฐธ์กฐ๋œ ์•„๋ž˜์˜ ๊ทธ๋ž˜ํ”„ ๋ณด์‹œ๋ฉด ์‹ฌ์ง€์–ด 7B์งœ๋ฆฌ LLaMA ๋ชจ๋ธ ๊ตฌ๋™์ด GPT3.5 Turbo๋ณด๋‹ค ๋น„์‹ธ๋„ค์š”. ์ตœ๊ทผ LLM์˜ ๊ฒฝ๋Ÿ‰ํ™”/์ตœ์ ํ™” ์ถ”์„ธ๋ฅผ ๋ˆˆ์—ฌ๊ฒจ ๋ณด์‹œ๋Š” ๋ถ„๋“ค์ด๋ผ๋ฉด ์–ด์ฉŒ๋ฉด ๋‹น์—ฐํ•  ์ˆ˜๋„ ์žˆ๋Š” ๊ฒƒ์ด, ๊ต‰์žฅํžˆ ๋งŽ์€ traffic์ด ๋ชฐ๋ฆด ์ˆ˜๋ก (๋„๋งค ์žฅ์‚ฌ์ฒ˜๋Ÿผ) cost๋ฅผ ๋‚ฎ์ถœ ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•๋“ค์ด ์กด์žฌํ•˜๋Š”๋ฐ, ์ปค์Šคํ„ฐ๋งˆ์ด์ง• ๋œ ์˜คํ”ˆ์†Œ์Šค ๋ชจ๋ธ์„ ๊ตฌ๋™ํ• ์ˆ˜๋ก ์„œ๋ฒ„ ๋น„์šฉ์ด ๋” ๋น„์‹ธ์ง€๊ณ , ์ฐจ๋ผ๋ฆฌ ๋ฐฑ๋ณธ ๋ชจ๋ธ์—์„œ tuning ์„œ๋น„์Šค๋ฅผ ๊ตฌ๋™ํ•˜๋Š” ๊ฒƒ์ด (์˜ˆ๋ฅผ ๋“ค๋ฉด LoRA weight๋“ค๋„ ๋™์ผ ๊ทœ๊ฒฉ์ด๋ฉด ํ•œ๊บผ๋ฒˆ์— ๋ฌถ์–ด๋ฒ„๋ ค์„œ ์„œ๋น„์Šค๋ฅผ ์ฒ˜๋ฆฌํ•œ๋‹ค๋˜๊ฐ€) ๊ฐ€๊ฒฉ์€ ๋” ์ €๋ ดํ•  ์ˆ˜ ๋ฐ–์— ์—†์Šต๋‹ˆ๋‹ค. LLM์ด ํด๋ผ์šฐ๋“œ ์„œ๋น„์Šค์™€ ๋”๋”์šฑ ๋ฐ€์ ‘ํ•˜๊ฒŒ ์—ฐ๊ฒฐ๋  ์ˆ˜ ๋ฐ–์— ์—†๋‹ค๋ณด๋‹ˆ ์ด๋Ÿฌํ•œ ๊ฒฝํ–ฅ์ด ์ƒ๊ธธ ์ˆ˜ ๋ฐ–์— ์—†๋Š” ๊ฒƒ์ด๊ธฐ๋„ ํ•œ๋ฐ์š”, ๋„ค์ด๋ฒ„ํด๋ผ์šฐ๋“œ๊ฐ€ AI๋ฅผ ๋ฆฌ๋“œํ•˜๋Š” ๊ฒƒ์€ ์ด๋Ÿฌํ•œ ์ถ”์„ธ์— ์ ํ•ฉํ•œ ์„ ํƒ์ด๊ธฐ๋„ ํ–ˆ์Šต๋‹ˆ๋‹ค. ๋„ค์ด๋ฒ„ํด๋ผ์šฐ๋“œ๋Š” OpenAI๋‚˜ ๊ตฌ๊ธ€, ๋งˆ์ดํฌ๋กœ์†Œํ”„ํŠธ์™€ cost ๊ฒฝ์Ÿ์—์„œ ์ง€์ง€ ์•Š์„ ์—ฌ๋Ÿฌ๊ฐ€์ง€ ์ตœ์ ํ™”/๊ฒฝ๋Ÿ‰ํ™” ๊ธฐ์ˆ ๋“ค์„ ๊ตฌ๋™ํ•ด ๋†“์•˜๊ณ , ์ค‘์žฅ๊ธฐ์—์„œ๋Š” ์–ด๋А ๋น…ํ…Œํฌ๋ณด๋‹ค ๋›ฐ์–ด๋‚œ ์„œ๋น„์Šค ๊ฒฝ์Ÿ๋ ฅ์„ ๊ฐ–์ถœ AI ๋ฐ˜๋„์ฒด๋ฅผ ์ž˜ ์ง„ํ–‰ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

ChatGPT has some push back for you (see below). Its quite good! The scenario you're describing is an incredibly optimistic one that assumes exponential growth across multiple dimensions of computing technology: chip performance, AI model capabilities, and capital expenditure for capacity scaling. In reality, while rapid advancements are certainly occurring, there are a number of physical, economic, and social constraints that could inhibit such a straightforward, exponential trajectory. Hereโ€™s a breakdown of challenges and limitations in each of the areas you've mentioned: 1. Chip Performance: The idea that chip performance would continue to double every year is a significant acceleration of Moore's Law, which historically observed that the number of transistors on a microchip doubles about every two years. However, we're reaching physical limitations of silicon technology, with features approaching the scale of a few nanometers. Quantum tunneling, heat dissipation, and energy efficiency are all significant challenges. While new materials and paradigms (like quantum computing or 3D stacking) are in development, they also come with their own sets of challenges that might not allow for a simple 2x improvement year over year. 2. Model Improvements: AI models are indeed improving rapidly, but there's more to consider than raw performance. Larger models require more data, more power, and better algorithms to be effective. Improvements in algorithms could potentially hit a point of diminishing returns, and there are concerns about the environmental impact of training extremely large models. Additionally, large models can become increasingly difficult to manage, debug, and ensure they do not perpetuate biases or make erroneous decisions. 3. Capital Expenditure: A 10x scale in capital expenditure assumes not only that companies have the economic capacity and willingness to invest at such levels but also that the supply chains and raw materials necessary for such a scale-up are available. This also presumes that there will be sufficient market demand to justify this level of investment. Now, letโ€™s address your main question: what won't be possible, even with such improvements? Here are some potential limitations: - Physical Laws: Computing is still bound by the laws of physics. Even if computers become vastly more powerful, there will still be things that are simply impossible, like violating the speed of light or the second law of thermodynamics. - Complexity and Emergent Phenomena: Many systems, like weather, human brains, or economies, are incredibly complex. Even with vastly improved models, the sheer number of variables and interactions can make prediction and understanding difficult. - Social and Ethical Constraints: Just because something can be done technically does not mean it should be. Ethical considerations regarding surveillance, privacy, and the use of AI in decision-making will likely place limits on how technology is applied. - Economic Considerations: Not every advancement will be economically feasible or provide a return on investment. Additionally, there could be resistance from industries that are disrupted by these advancements, potentially slowing adoption. - Human Adaptability: The rate at which society can adapt to new technologies is limited. There are educational, cultural, and institutional barriers that often delay the implementation of even the most promising technologies. In summary, while your scenario posits a world in which the capabilities of computing technology grow at an unprecedented rate, it's important to consider the multifaceted challenges that accompany such growth. The future is likely to be one of incredible technological advancement, but the path there will probably be more uneven and complex than the simple multiplication of current capabilities.

Here's some AI โ€œmathโ€ from today's baseline: - Chips improve by 2x next year - Models improve by 2x next year - Capex scales capacity by 10x next year This means that whatever you see today will be ~40x better in a year and another ~20-40x better a year after that. So we can expect an ~800-1,600x improvement by 2026. What *wonโ€™t* be possible is probably the better question.

Today, Satoshi Spainโ€™s machines are handling AI workloads for startups, universities and individual developers in Europe. โ€œYou can still make money from your mining rig,โ€ Satoshi Spain founder Alejandro Ibรกรฑez de Pedro said. โ€œItโ€™s mining 2.0.โ€ Around 20% of the freed-up chips can be repurposed to train AI models, said Vipul Ved Prakash, the CEO of alternative cloud provider Together. Together is using former crypto miners for GPUs and other hardware to build virtual server farms. It has leased thousands of GPUs to help power Togetherโ€™s cloud service for AI developers. In May, it raised $20 million in seed funding. Demi Guo, the founder of an AI applications startup called Mellis AI, couldnโ€™t get the data processing pricing or capacity she needed at the leading cloud providers so she is using repurposed hardware available through Togetherโ€™s cloud. Saurabh Vij has built an AI business around chips formerly used for mining. His startup, Monster, has access to more than 30,000 chipsโ€”most of them once used for miningโ€”that its customers can lease.

์–ผ๋งˆ์ „ ๊ตฌ๊ธ€ ๋ธŒ๋ ˆ์ธ, ๋”ฅ๋งˆ์ธ๋“œ์—์„œ ์ฝ”์–ด ๋”ฅ๋Ÿฌ๋‹ ํŒ€์— ์žˆ์—ˆ๋˜ ์นœ๊ตฌ์™€ ์ด์•ผ๊ธฐ ๋‚˜๋ˆ„๋ฉด์„œ ์ธ์ƒ์‹ถ์—ˆ๋˜ ๋ถ€๋ถ„๋“ค (์ด์„ธ์ข…๋‹˜) ์ง€๋‚œ๋ฒˆ ์„ธ์…˜์—์„œ ์ธ์ƒ๊นŠ์€ ๋ถ€๋ถ„์ž…๋‹ˆ๋‹ค. quantization ๋“ฑ ๋ชจ๋ธ ์ตœ์ ํ™”๋‚˜ ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜ ํšจ์œจํ™”๋กœ ๋ชจ๋ธ ํ•™์Šต/์ธํผ๋Ÿฐ์Šค ์ปดํ“จํŠธ(GPU) ๋น„์šฉ์ด ์ผ์‹œ์ ์œผ๋กœ ๊ฐ์†Œํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ์žฅ๊ธฐ์  ๊ด€์ ์—์„œ ์ปดํ“จํŠธ์˜ ์ ˆ๋Œ€์  ์ˆ˜์š”๋Š” ๋น ๋ฅด๊ฒŒ ์ฆ๊ฐ€ํ• ๊ฒƒ - ๋”ฅ๋งˆ์ธ๋“œ์—์„œ ์•ŒํŒŒํด๋“œ๋ฅผ 3๋ช…์˜ ์—ฐ๊ตฌ์›์ด ํ•ด๋‚ผ ์ˆ˜ ์žˆ์—ˆ๋˜ ๊ฒƒ์€ ์ด๋“ค์ด ๋” ๋˜‘๋˜‘ํ•ด์„œ๊ฐ€ ์•„๋‹ˆ๋ผ 1์ธ๋‹น ํ™œ์šฉ ๊ฐ€๋Šฅํ•œ ์ปดํ“จํŠธ๊ฐ€ ๋‹ค๋ฅธ ๊ธฐ์—…, ์—ฐ๊ตฌ์†Œ ๋ณด๋‹ค ์••๋„์ ์œผ๋กœ ๋งŽ์•˜๊ธฐ ๋•Œ๋ฌธ - ์ปดํ“จํŠธ๊ฐ€ ๋ณดํŽธํ™”๋˜์–ด ๊ฐ€๊ฒฉ์ด ์ €๋ ดํ•ด์งˆ ์ˆ˜๋ก ์•ŒํŒŒํด๋“œ ๊ธ‰์˜ ํ˜์‹ ์ด ๋ชจ๋“  ์‚ฐ์—…๊ณผ ์˜์—ญ์—์„œ ํŽผ์ณ์งˆ๊ฒƒ - ๋‚˜์•„๊ฐ€ ์šฐ๋ฆฌ๋Š” ์ปดํ“จํŠธ์˜ ํ•œ๊ณ„๋กœ ๊ธฐ๋ณธ์ ์ธ ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ ํ”„๋กœ์„ธ์‹ฑ์— ๋ฐœ์ด ๋ฌถ์—ฌ ์žˆ์—ˆ๋Š”๋ฐ ์•ž์œผ๋กœ ์˜์ƒ ๋“ฑ ๋ณต์žก๋„ ๋†’๊ณ  ๋ฌด๊ฑฐ์šด ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ์ˆ˜์š”๊ฐ€ ๊ธฐํ•˜๊ธ‰์ˆ˜์ ์œผ๋กœ ๋Š˜์–ด๋‚ ๊ฒƒ ์ด๋ฏธ ํ—ค์ง€ํŽ€๋“œ์—์„œ ์ฃผ๊ฐ€ ์˜ˆ์ธก์— ํŠนํ™”๋œ ๋ชจ๋ธ์„ ์ ๊ทน ํ™œ์šฉํ•ด ๋ง‰๋Œ€ํ•œ ์ˆ˜์ต ์ฐฝ์ถœ์ค‘ - ๋ชจ๋ธ ํ•™์Šต์— ์–ผ๋งˆ ๋“ค๊ณ , ๋ชจ๋ธ ๊ฒฝ์Ÿ์šฐ์œ„์˜ ์ง€์†์„ฑ๊ณผ ๊ธฐ๋Œ€ ๋งค์ถœ์„ ๊ณ ๋ คํ–ˆ์„ ๋•Œ ROI๊ฐ€ ์–ด๋–ป๊ฒŒ ๋ ์ง€๋ฅผ ๊ณ„์‚ฐํ•ด์„œ ๋™์‹œ ์—ฌ๋Ÿฌ๊ฐœ ๋ชจ๋ธ ํ•™์Šต์ค‘ - ๊ทธ์™ธ ๋ชจ๋“  ๊ธฐ์—…์—” ๋ ˆ๊ฑฐ์‹œ ์ฝ”๋“œ์™€ ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ๋Š”๋ฐ ๋ณด๊ด€๋œ ๋ฐฉ์‹๊ณผ ์ฝ”๋“œ์˜ ํ˜•ํƒœ ๋•Œ๋ฌธ์— ๋งค๋…„ ์–ด๋งˆํ•œ ๊ณ ์ •๋น„์šฉ์ด ๋ฐœ์ƒ. AI ๋ชจ๋ธ ๋„์›€์„ ๋ฐ›์•„ ์ฝ”๋“œ ์—…๋ฐ์ดํŠธ์™€ ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜์„ ํ†ตํ•ด ํšŒ์‚ฌ ์ „๋ฐ˜์˜ ์ˆ˜์ต์„ฑ์„ ๊ฐœ์„ ํ•˜๋Š” ๊ฒฝ์šฐ๋„ ์ž์ฃผ ๋ณด์ž„ "์‚ฌ๋žŒ์„ ๋•Œ๋ ค๋ฐ•์•„" ์Šค์ผ€์ผ์„ ์ถ”๊ตฌํ•˜๋Š” ๋ธ”๋ฆฌ์ธ ์Šค์ผ€์ผ๋ง ๋ฐฉ๋ฒ•์€ AI ์‹œ๋Œ€์—์„œ ์œ ํšจํ•˜์ง€ ์•Š์„ ์ˆ˜๋„ - ๊ธฐ์—…์—์„œ ์ „ํ†ต์ ์œผ๋กœ ์ธ๊ฑด๋น„๊ฐ€ ๋ˆ์ด ๊ฐ€์žฅ ๋งŽ์ด ๋“ค์–ด๊ฐ”๋Š”๋ฐ ์ตœ๊ทผ์— ๊ตฌ๊ธ€์—์„  ์ปดํ“จํŠธ ๋น„์šฉ์ด ๊ฐœ๋ฐœ์ž ๋น„์šฉ์„ ์•ž์„œ - ์˜คํ”ˆAI๋„ 400๋ช… ๋˜๋Š” ๊ธฐ์—…์ธ๋ฐ ์ปดํ“จํŠธ ๋น„์šฉ์ด ์ธ๋‹น ๊ฐœ๋ฐœ์ž ๋น„์šฉ์˜ 4๋ฐฐ - ์•ž์œผ๋กœ๋Š” ์ž‘์ง€๋งŒ ํ”„๋กœ๋•ํŠธ๋ถ€ํ„ฐ ์„ธ์ผ์ฆˆ๊นŒ์ง€ ์ „์‚ฌ ์˜คํผ๋ ˆ์ด์…˜์— ์–ผ๋ผ์ธ๋œ ๋‹จ๋‹จํ•œ ํŒ€์ด ๊ฒฝ์Ÿ๋ ฅ ์žˆ์ง€ ์•Š์„์ง€ ํฌ๊ณ  ์ž‘์€ ์—ฌ๋Ÿฌ ์œ ํ˜•์˜ ์ œ๋„ˆ๋Ÿด๋ฆฌ์ŠคํŠธ์™€ ํŠนํ™”๋œ SOTA ๋ชจ๋ธ์ด ๊ณต์กดํ•˜๊ฒŒ๋ ๊ฒƒ - ์œ ์ €๊ฐ€ ์›ํ•˜๋Š” ์š”์ฒญ์— ๋งž๋Š” ๋ชจ๋ธ์ด ์ž๋™ ์ถ”์ฒœ๋˜์–ด ํƒœ์Šคํฌ๊ฐ€ ์ฒ˜๋ฆฌ๋˜๋Š” Model of Experts ํ˜•ํƒœ - ๋ชจ๋ธ์€ ํฌ๊ฒŒ ๋‘ ์ถ•์˜ ๊ต์ง‘ํ•ฉ์œผ๋กœ ์กด์žฌํ• ๊ฑด๋ฐ ํ•œ ์ถ•์€ intelligence (์˜ˆ: 80 IQ ~ 150 IQ), ๊ทธ๋ฆฌ๊ณ  ๋‹ค๋ฅธ ์ถ•์€ ๋ฒ„ํ‹ฐ์ปฌ (์˜ˆ: ์ฝ”๋”ฉ, ๋ฒ•๋ฅ , ์˜๋ฃŒ ๋“ฑ) - ๋ชจ๋ธ ๊ฒ€์ƒ‰/์ถ”์ฒœ/์—ฐ๊ฒฐ์„ ๋•๋Š”orchestration ๋ ˆ์ด์–ด๊ฐ€ ํ•ต์‹ฌ ๊ธฐ์ˆ ๋กœ ๋ถ€์ƒํ• ๊ฒƒ ์ถœ์ฒ˜: ์ด์„ธ์ข…๋‹˜ ํŽ˜์ด์Šค๋ถ https://www.facebook.com/Saejlee

A new solution to the high-end chip shortage. Read in @WSJ about how Together worked with large former crypto mining farms to repurpose their best GPUs and acquire new GPUs to train AI models โ€” all with a specialized training stack for a fraction of the price. https://www.wsj.com/articles/crypto-miners-seek-a-new-life-in-ai-boom-after-an-implosion-in-mining-92a181fd