Coding/Articles worth reading

2022๋…„ ๋ฐ์ดํ„ฐ ์‚ฌ์ด์–ธํ‹ฐ์ŠคํŠธ๊ฐ€ ๋˜๊ธฐ ์œ„ํ•œ 52์ฃผ ์ปค๋ฆฌํ˜๋Ÿผ

๊น€์œ ๋‹ˆ์ฝ˜ 2021. 11. 25. 21:38

2020.12.24 ๋ฏธ๋””์—„์— ์“ฐ์ธ ๊ธ€์ž…๋‹ˆ๋‹ค.
์›๋ฌธ์€ A Complete 52 Week Course to Become a Data Scientist in 2021 ์—์„œ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๋ฒˆ์—ญ์€ ์‚ฌ์‹ค ํฌ๊ฒŒ ์˜๋ฏธ์—†๊ณ , ๋ฐ์ดํ„ฐ ์‚ฌ์ด์–ธํ‹ฐ์ŠคํŠธ๊ฐ€ ๋˜๊ธฐ ์œ„ํ•ด ๋„์›€์„ ๋ฐ›์„ ์ˆ˜ ์žˆ๋Š” ๋งํฌ๋“ค์„ ์ •๋ฆฌํ•ด๋‘” ๊ธ€์ž…๋‹ˆ๋‹ค.  ์ฐธ๊ณ  ์ž๋ฃŒ๋“ค์€ ๋‹ค ์˜์–ด์ž…๋‹ˆ๋‹ค!


A Complete 52 Week Course to Become a Data Scientist in 2022

Learn something every week for 52 weeks!

towardsdatascience.com

 

โ€œ๋‚ ๋กœ ๋จน๊ณ  ์‹ถ์–ดํ•˜๋Š” ์‚ฌ๋žŒ์€ ๋งŽ์ง€๋งŒ, ์ง์ ‘ ์‚ฌ๋ƒฅ์„ ๊ฐ€๋ ค๋Š” ์‚ฌ๋žŒ์€ ๋“œ๋ฌผ๋‹คโ€

์†Œ๊ฐœ 

๋งŒ์•ฝ ๋ฐ์ดํ„ฐ์‚ฌ์ด์–ธํ‹ฐ์ŠคํŠธ๊ฐ€ ๋˜๊ณ ์‹ถ์€๋ฐ, ์•„์ง ์‹ค์ฒœ์„ ๋ชปํ•˜๊ณ  ์žˆ๋‹ค๋ฉด? ์ง€๊ธˆ์ด ๋ฐ”๋กœ ์‹œ์ž‘ํ•  ๋•Œ์•ผ. 

์ž‘๋…„์— ๋‚˜๋Š” ๋ฐ์ดํ„ฐ ์‚ฌ์ด์–ธ์Šค ๊ด€๋ จ ์ƒˆ๋กœ์šด ๊ฒƒ๋“ค์„ 52์ฃผ๊ฐ„ ๋ฐฐ์šฐ๊ธฐ๋กœ ๋งˆ์Œ์„ ๋จน์—ˆ๊ณ , ๊ทธ ๊ฒฐ์ •์€ ๋‚ด๊ฐ€ ๋‚ด๋ฆฐ ๊ฒฐ์ • ์ค‘ ๊ฐ€์žฅ ์ข‹์€ ๊ฒฐ์ •์ด์—ˆ๋‹ค๊ณ  ์ƒ๊ฐํ•ด. ์ผ ๋…„์ด๋ž€ ์‹œ๊ฐ„์— ์–ผ๋งˆ๋‚˜ ๋งŽ์€ ๊ฒƒ์„ ์–ป์„ ์ˆ˜ ์žˆ๋Š”์ง€ ์•Œ๊ฒŒ๋˜๋ฉด ์•„๋งˆ ๋†€๋ž„๊ฑธ! 

๊ทธ๋ž˜์„œ, ๋„ค๊ฐ€ 2021๋…„ (๋ฒˆ์—ญ ์‹œ์  2022๋…„) ์‹ ๋…„ ๊ณ„ํš์œผ๋กœ ์„ธ์šธ ์ˆ˜ ์žˆ๋Š” 52์ฃผ๊ฐ„์˜ ์ปค๋ฆฌํ˜๋Ÿผ์„ ๋ณด์—ฌ์ฃผ๋ ค๊ณ  ํ•ด! ๋นก์„ธ๊ณ  ๊ฐˆ๋ฆฌ๋Š” ์ผ์ •์ด์ง€๋งŒ ๊ฐ€์น˜๊ฐ€ ์žˆ์„ ๊ฑฐ์•ผ. 

 

์ด ๊ฐ€์ด๋“œ๊ฐ€ ๋จธ์‹ ๋Ÿฌ๋‹์œผ๋กœ ์‹œ์ž‘ํ•˜์ง€ ์•Š๋Š”๋‹ค๋Š” ๊ฒƒ์„ ๋ˆˆ์น˜์ฑ˜๊ฒ ์ง€? ์ด์œ ๊ฐ€ ์žˆ์–ด. ๋งŒ์•ฝ ์™œ ๋จธ์‹ ๋Ÿฌ๋‹์ด ์ฒ˜์Œ์— ๋‚˜์˜ค์ง€ ์•Š๋Š”๊ฑด์ง€ ๊ถ๊ธˆํ•˜๋‹ค๋ฉด ๋‚ด ๋‹ค๋ฅธ ๊ธ€์„ ์ฐธ๊ณ ํ•ด์ค˜ : ๋ฐ”๋กœ๊ฐ€๊ธฐ 

์‹œ์ž‘ํ•˜๊ธฐ ์ „์— ์•Œ์•„์•ผ ํ•  ๊ฒƒ๋“ค์ด ์žˆ๋‹ค๋ฉด: 

  • ์ด ๊ณผ์ •์€ ๋ชจ๋“ ๊ฑธ ๋‹ค ๊ฐ–์ถ˜ ์™„๋ฒฝํ•œ ๋ฐ์ดํ„ฐ ์‚ฌ์ด์–ธํ‹ฐ์ŠคํŠธ๊ฐ€ ๋˜๊ธฐ ์œ„ํ•œ ๊ณผ์ •์€ ์•„๋‹ˆ์•ผ. ๋‚ด ๋ง์€, ๋‚ด๊ฐ€ ์ƒ๊ฐํ•˜๊ธฐ์— ๋ฐ์ดํ„ฐ ์‚ฌ์ด์–ธํ‹ฐ์ŠคํŠธ๋กœ ์ค‘์š”ํ•˜๋‹ค๊ณ  ์ƒ๊ฐํ•˜๋Š” ๊ธฐ์ดˆ ์Šคํ‚ฌ๋“ค์„ ๋‹ค๋ฃฌ๋‹ค๋Š” ์˜๋ฏธ์ด์ง€
  • ์ด๋ฏธ ๋„ˆ๊ฐ€ ๋ฏธ๋ถ„ํ•™์„ ์•Œ๊ณ  ์žˆ๋‹ค๋Š” ์ „์ œ์—์„œ ์‹œ์ž‘ํ•ด. ์šฐ๋ฆฌ ๊ณ ๋“ฑํ•™๊ต ๋•Œ ๋‹ค ๋ฐฐ์› ์ž–์•„? 
  • ์ด ์ปค๋ฆฌํ˜๋Ÿผ์— ๋”ฅ๋Ÿฌ๋‹ ๊ด€ํ•œ ๊ฒƒ์€ ์—†์–ด. ๋”ฅ๋Ÿฌ๋‹์€ ๊ทธ ์ž์ฒด๋กœ๋งŒ 52์ฃผ๊ฐ€ ๊ฑธ๋ฆด๊ฑธ? ๊ทธ๊ฑธ ์—ฌ๊ธฐ๋‹ค ๊พธ๊ฒจ๋„ฃ๋Š” ๊ฒƒ์€ ์ข€ ๋ถˆ์นœ์ ˆํ•œ ๊ฒƒ ๊ฐ™์•„์„œ! 

์ž ์ด์ œ ์‹œ์ž‘ํ•ด๋ณผ๊นŒ! 




 

๋ชฉ์ฐจ 

  1. ํ™•๋ฅ ๊ณผ ํ†ต๊ณ„ (Week 1 to Week 6)
  2. ์ˆ˜ํ•™ (Week 7 to 12)
  3. SQL (Week 13 to Week 21)
  4. Python๊ณผProgramming (Week 22 to Week 28)
  5. Pandas (Week 29 to Week 33)
  6. ๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™” (Week 34 to Week 35)
  7. ๋ฐ์ดํ„ฐ ํƒ์ƒ‰๊ณผ ์ „์ฒ˜๋ฆฌ(Week 36 to Week 39)
  8. ๋จธ์‹ ๋Ÿฌ๋‹ (Week 40 to Week 51)
  9. ๋ฐ์ดํ„ฐ ์‚ฌ์ด์–ธ์Šค ํ”„๋กœ์ ํŠธ (Week 52)
 
 
 
 

ํ™•๋ฅ ๊ณผ ํ†ต๊ณ„ 

์™œ ํ™•๋ฅ ๊ณผ ํ†ต๊ณ„์ผ๊นŒ? 

๋ฐ์ดํ„ฐ ์‚ฌ์ด์–ธ์Šค์™€ ๋จธ์‹ ๋Ÿฌ๋‹์€ ํ˜„๋Œ€ ๋ฒ„์ „์˜ ํ†ต๊ณ„ํ•™์ด์•ผ. ํ†ต๊ณ„ํ•™์„ ๋จผ์ € ๋ฐฐ์›€์œผ๋กœ์จ ๋จธ์‹ ๋Ÿฌ๋‹ ๊ฐœ๋… ๋ฐ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๋ฐฐ์šฐ๋Š”๋ฐ ์‹œ๊ฐ„์„ ๋‹จ์ถ•์‹œํ‚ฌ ์ˆ˜ ์žˆ์ง€! ๋น„๋ก ์‹ค์ฒด์—†๋Š” ๊ฒƒ๋“ค์„ ์ฒซ ๋ช‡ ์ฃผ๊ฐ„ ๋ฐฐ์šด๋‹ค๊ณ  ์ƒ๊ฐํ• ์ง€ ๋ชฐ๋ผ๋„, ๋‚˜์ค‘์— ๋‹ค ๋„์›€์ด ๋  ๊ฑฐ์•ผ. 

Week 1: ๊ธฐ์ˆ  ํ†ต๊ณ„ํ•™(Descriptive Statistics)

Week 2: ํ™•๋ฅ (Probability)

Week 3:  ์ˆœ์—ด๊ณผ ์กฐํ•ฉ(Combinations and Permutations)

Week 4: ์ •๊ทœ ๋ถ„ํฌ์™€ ํ‘œ๋ณธ ๋ถ„ํฌ(Normal Distribution and Sampling Distributions)

Week 5: ์‹ ๋ขฐ๊ตฌ๊ฐ„(Confidence Intervals)

Week 6: ๊ฐ€์„ค ๊ฒ€์ •(Hypothesis Testing)

 







์ˆ˜ํ•™

์™œ ์ˆ˜ํ•™์ธ๊ฐ€? 

ํ†ต๊ณ„์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ๋งŽ์€ ๋ฐ์ดํ„ฐ ์‚ฌ์ด์–ธ์Šค์˜ ๊ฐœ๋…๋“ค์ด ์ˆ˜ํ•™ ๊ฐœ๋…์— ๊ธฐ์ดˆํ•ด์žˆ์–ด. 

๋น„์šฉ ํ•จ์ˆ˜๋ฅผ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด์„œ ์ตœ์†Œํ•œ ๋ฏธ๋ถ„์€ ์•Œ์•„์•ผ ํ•ด. ๊ฐ€์„ค ๊ฒ€์ •์„ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์ ๋ถ„ ๊ฐœ๋…์„ ์ดํ•ดํ•ด์•ผ ํ•˜๊ณ . ๊ทธ๋ฆฌ๊ณ  ์˜ˆ์‹œ๋ฅผ ๋” ์ฃผ์ž๋ฉด, ์„ ํ˜• ๋Œ€์ˆ˜๋Š” ๋”ฅ๋Ÿฌ๋‹, ์ถ”์ฒœ ์‹œ์Šคํ…œ, ์ฃผ์„ฑ๋ถ„ ๋ถ„์„์˜ ๊ฐœ๋…์„ ์ดํ•ดํ•˜๋Š”๋ฐ ๊ผญ ํ•„์š”ํ•ด!

Week 7: ๋ฒกํ„ฐ ๊ณต๊ฐ„ (Vectors and Spaces)

Week 8: ์Šค์นผ๋ผ๊ณฑ๊ณผ ๋ณ€ํ™˜ ํ–‰๋ ฌ part.1(Dot Product and Matrix Transformations pt. 1)

Week 9: ๋ณ€ํ™˜ ํ–‰๋ ฌ part.2 (Matrix Transformations pt. 2)

Week 10: ๊ณ ์œ ๊ฐ’๊ณผ ๊ณ ์œ  ๋ฒกํ„ฐ(Eigenvalues and Eigenvectors)

  • Eigenvalues and Eigenvectors
  • ๊ทธ๋ฆฌ๊ณ  ์ง€๋‚œ ๋ช‡ ์ฃผ๊ฐ„ ๋ชปํ–ˆ๋˜ ๊ฒƒ ๋ณต์Šต!!

Week 11: ์ ๋ถ„ part. 1(Integrals)

Week 12: ์ ๋ถ„ part.2(Integrals Part 2)

 

 

 

 

 

SQL

์™œ SQL?

SQL์€  ๋ฐ์ดํ„ฐ ์‚ฌ์ด์–ธํ‹ฐ์ŠคํŠธ. ๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด, ๋ฐ์ดํ„ฐ ์• ๋„๋ฆฌ์ŠคํŠธ, ๋น„์ฆˆ๋‹ˆ์Šค ์• ๋„๋ฆฌ์ŠคํŠธ ๋“ฑ ์–ด๋–ค ๋ฐ์ดํ„ฐ ๊ด€๋ จ ๋ถ„์•ผ๋“ ์ง€ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๊ธฐ์ˆ ์ด์•ผ.

SQL์€ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์—์„œ ๊ตฌ์ฒด์ ์ธ ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”์ถœํ•ด์„œ, ๊ทธ๊ฑธ๋กœ ๋ถ„์„ํ•œ๋‹ค๊ฑฐ๋‚˜, ์‹œ๊ฐํ™”, ๋ชจ๋ธ๋ง ๋“ฑ๋“ฑ์„ ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด. ๊ทธ๋ž˜์„œ ํ›Œ๋ฅญํ•œ SQL ์‹ค๋ ฅ์„ ๊ฐ–์ถ˜๋‹ค๋ฉด ํ•œ ๋‹จ๊ณ„ ๋†’์€ ๋ถ„์„, ์‹œ๊ฐํ™”, ๋ชจ๋ธ๋ง ๋“ฑ์„ ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋ ๊ฑฐ์•ผ. ์™œ๋ƒ? ๋„ˆ๋Š” ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”์ถœํ•˜๊ณ  ๋‹ค๋ฃฐ ์ˆ˜ ์žˆ๊ฒŒ ๋ ํ…Œ๋‹ˆ๊นŒ! 

๋‚˜๋Š” ์šฐ์—ฐํžˆ SQL ๊ด€๋ จํ•ด์„œ Modeโ€™s curriculum ๋ฅผ ๋ฐœ๊ฒฌํ–ˆ๊ณ  ์ด๊ฑฐ ์ตœ๊ณ ์•ผ! using SQL in Mode ์—์„œ ์ข€ ์ต์ˆ™ํ•ด์ง„ ๋‹ค์Œ์— ์•„๋ž˜ ์ฃผ์ œ๋“ค๋กœ ์™„์„ฑ์‹œ์ผœ๋‚˜๊ฐ€์ž!!

Week 13: ๊ธฐ์ดˆ SQL(Basic SQL)

Week 14: ๋…ผ๋ฆฌ ์—ฐ์‚ฐ์ž/ ๋น„๊ต ์—ฐ์‚ฐ์ž (LOGICAL and COMPARISON Operators)

Week 15: ์ง‘๊ณ„ํ•จ์ˆ˜(AGGREGATES)

Week 16: DISTINCT, CASE WHEN

Week 17: JOINS, UNIONS

Week 18: Subqueries, Common Table Expressions

Week 19: ๋ฌธ์ž์—ด ์กฐ์„ฑ(String Manipulations)

Week 20: Date-time manipulation

Week 21: Windows Functions

  • Windows Functions (ROW_NUMBER(), RANK(), DENSE_RANK(), LAG, LEAD, SUM, COUNT, AVG)
  • ๋” ๋งŽ์€ Windows functions ๋ฅผ ๋ณด๊ธฐ ์œ„ํ•ด์„œ๋Š” ์—ฌ๊ธฐ๋ฅผ ๋ณด์„ธ์š”. 
 

 

 

 

 

 

 

 

ํŒŒ์ด์ฌ๊ณผ ํ”„๋กœ๊ทธ๋ž˜๋ฐ

์™œ ํŒŒ์ด์ฌ? 

๋‚˜๋Š” ํŒŒ์ด์ฌ์œผ๋กœ ์‹œ์ž‘ํ–ˆ๊ณ , ๋‚œ ์•„๋งˆ ํŒŒ์ด์ฌ์„ ํ‰์ƒ ์“ธ ๊ฒƒ ๊ฐ™์•„. ํŒŒ์ด์ฌ์€ ์˜คํ”ˆ์†Œ์Šค๋„ ๋งŽ๊ณ , ๋ฐฐ์šฐ๊ธฐ๋„ ์ง๊ด€์ ์ด์•ผ. ๋งŒ์•ฝ ์›ํ•œ๋‹ค๋ฉด R๊ณผ ํ•จ๊ป˜ ๋ฐฐ์›Œ๋„ ๋˜๋Š”๋ฐ, ๋‚˜๋Š” R ๊ด€๋ จํ•ด์„œ ๋„์›€์€ ์ฃผ๊ธฐ ํž˜๋“ค ๊ฒƒ ๊ฐ™์•„. 

Week 22: ํŒŒ์ด์ฌ ๊ธฐ์ดˆ 

Week 23: List, Tuples, Functions, Conditional Statements, Comparisons

Week 24: Dictionaries, Loops, Comments

Week 25: Try/Except, Reading & Writing files, Classes and Objects

Week 26: Recursion

Week 27: Binary Trees

Week 28: APIs, Anaconda

 

 

 

 

 

Pandas

์™œ Pandas?

ํŒ๋‹ค์Šค๋Š” ํŒŒ์ด์ฌ์—์„œ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์•ผ. ๋ฐ์ดํ„ฐ ์กฐ์ž‘ ๋ฐ ๋ถ„์„ ์šฉ๋„๋กœ ์‚ฌ์šฉํ•ด!

Week 29: ๋ฐ์ดํ„ฐ ์–ป๊ธฐ ๋ฐ ์•Œ์•„๊ฐ€๊ธฐ 

Week 30: Filtering, Sorting

Week 31: Grouping

Week 32: Apply

Week 33: Merge

 





๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™”?

์™œ ๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™”? 

๋ฐ์ดํ„ฐ์™€ ์ธ์‚ฌ์ดํŠธ๋ฅผ ์‹œ๊ฐํ™”ํ•˜๋Š” ๋Šฅ๋ ฅ์€ ๋„ˆ๋ฌด ์ค‘์š”ํ•ด. ์ˆ˜๋งŽ์€ ๋’ค์„ž์ธ ์ •๋ณด๋“ค์„ ํ•œ ๋ฒˆ์— ์†Œํ†ตํ•  ์ˆ˜ ์žˆ๋Š” ๊ฐ€์žฅ ์‰ฌ์šด ๋ฐฉ๋ฒ•์ด๊ธฐ ๋•Œ๋ฌธ์ด์•ผ. ๋ฐ์ดํ„ฐ ์‚ฌ์ด์–ธ์Šค๋กœ์„œ ๋„ˆ๋Š” ํ•ญ์ƒ ๋„ˆ์™€ ๋„ˆ์˜ ์•„์ด๋””์–ด๋ฅผ ์…€๋งํ•ด์•ผํ• ํ…๋ฐ - ์ƒˆ๋กœ์šด ํ”„๋กœ์ ํŠธ๋ฅผ ํ”ผ์นญํ•˜๊ฑฐ๋‚˜ ๋„ˆ์˜ ๋ชจ๋ธ์ด ์ƒํ’ˆํ™”๋  ์ˆ˜ ์žˆ๋‹ค๊ณ  ํƒ€์ธ์„ ์„ค๋“ํ•˜๋Š” ์ผ - ๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™”๋Š” ์—„์ฒญ ์œ ์šฉํ•œ ๋„๊ตฌ๊ฐ€ ๋ ๊ฑฐ์•ผ! 

์—„์ฒญ๋‚œ ์‹œ๊ฐํ™” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๊ฐ€ ์žˆ๋Š”๋ฐ, ๋‚˜๋Š” ๋‘ ๊ฐœ์— ์ง‘์ค‘์„ ํ• ๊ฒŒ: Matplotlib์™€ Plotly.

Week 34: Matplotlib์œผ๋กœ ๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™”ํ•˜๊ธฐ 

Week 35: Plotly๋กœ ๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™”ํ•˜๊ธฐ 

 






๋ฐ์ดํ„ฐ ํƒ์ƒ‰๊ณผ ์ „์ฒ˜๋ฆฌ 

์™œ ํ•„์š”ํ•œ๊ฐ€? 

โ€œ์“ฐ๋ ˆ๊ธฐ๊ฐ€ ๋“ค์–ด๊ฐ€๋ฉด ์“ฐ๋ ˆ๊ธฐ๊ฐ€ ๋‚˜์˜จ๋‹คโ€

๋„ค๊ฐ€ ๋งŒ๋“ค์–ด๋‚ธ ๋ชจ๋ธ๋“ค์€ ๋„ค๊ฐ€ ๋„ฃ์€ ๋ฐ์ดํ„ฐ์˜ ๊ฐ€์น˜์— ์˜ํ•ด ์ขŒ์šฐ๋ ๊ฑฐ์•ผ. ๋„ค๊ฐ€ ๋„ฃ์€ ๋ฐ์ดํ„ฐ์˜ ์ƒํƒœ๋ฅผ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š”(์ข‹์€์ง€ ๋‚˜์œ์ง€), ๋ฐ์ดํ„ฐ๋ฅผ ํƒ์ƒ‰ํ•˜๊ณ  ์ „์ฒ˜๋ฆฌ๋ฅผ ๊ฑฐ์ณ์•ผ๊ฒ ์ง€. ๋”ฐ๋ผ์„œ, ์•ž์œผ๋กœ 4์ฃผ๊ฐ„, ๋„ค๊ฐ€ ๋ฐ์ดํ„ฐ ํƒ์ƒ‰๊ณผ ์ „์ฒ˜๋ฆฌ์— ๋Œ€ํ•ด ์ž˜ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋„๋ก ๋ฉ‹์ง„ ์ž๋ฃŒ๋“ค์„ ์ œ๊ณตํ•ด์ฃผ๋ ค๊ณ  ํ•ด. 

Week 36: Exploratory Data Analysis (EDA)

Exploratory Data Analysis (EDA) ๋Š” ์–ด๋ ค์šธ ์ˆ˜ ์žˆ์–ด. ์ •ํ•ด์ง„ ๋ฐฉ๋ฒ•์ด ์—†๊ธฐ ๋•Œ๋ฌธ์ด์•ผ. ํ•˜์ง€๋งŒ ์ด๊ฒŒ EDA๊ฐ€ ์žฌ๋ฏธ์žˆ๋Š” ์ด์œ ์ด๊ธฐ๋„ ํ•˜์ง€. ์ผ๋ฐ˜์ ์œผ๋กœ ๋„ˆ๋Š”..

  • ๊ธฐ์ˆ  ํ†ต๊ณ„๋ฅผ ํ• ๊ฑฐ๊ณ  (eg. central tendency)
  • ์ผ ๋ณ€๋Ÿ‰ ๋ถ„์„(uni-variable analysis)์„ ํ• ๊ฑฐ๊ณ   (distributions and spread)
  • ๋‹ค๋ณ€๋Ÿ‰ ํ†ต๊ณ„๋ถ„์„(Perform multi-variable analysis)์„ ํ• ๊ฑฐ๊ณ  (scatterplots, correlation matrix, predictive power score, ๋“ฑโ€ฆ)
  • ์žƒ์–ด๋ฒ„๋ฆฐ ๋ฐ์ดํ„ฐ๋‚˜ ์•„์›ƒ๋ผ์ด์–ด ๋ฐ์ดํ„ฐ๋ฅผ ์ฒดํฌํ•˜๊ธธ ์›ํ• ๊ฑฐ์•ผ. 

 ์ดˆ์‹ฌ์ž๋ฅผ ์œ„ํ•œ EDA ๊ฐ€์ด๋“œ๋Š” ์—ฌ๊ธฐ๋ฅผ ํ™•์ธํ•ด๋ด!

Week 37: ๋ฐ์ดํ„ฐ ์ค€๋น„: ๊ฒฐ์ธก๊ฐ’ ๋Œ€์ฒด์™€ ์ •๊ทœํ™”

Week 38: Feature Engineering, Feature Selection

Week 39: ๋ถˆ๊ท ํ˜• ๋ฐ์ดํ„ฐ(Imbalanced Datasets)

 





๋จธ์‹ ๋Ÿฌ๋‹

์™œ ๋จธ์‹ ๋Ÿฌ๋‹์ผ๊นŒ? 

๋„ˆ๊ฐ€ ๋ฐฐ์šด ๋ชจ๋“  ๊ฒƒ๋“ค์€ ๋ฐ”๋กœ ์—ฌ๊ธฐ๋กœ ๊ท€๊ฒฐ๋ผ! ๋จธ์‹ ๋Ÿฌ๋‹์€ ํฅ๋ฏธ๋กญ๊ณ  ์‹ ๋‚˜๊ธฐ๋„ ํ•˜๊ณ , ๋ชจ๋“  ๋ฐ์ดํ„ฐ ์‚ฌ์ด์–ธํ‹ฐ์ŠคํŠธ๊ฐ€ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๊ธฐ์ˆ ์ด๊ธฐ๋„ ํ•ด. ๋ฌผ๋ก  ๋ชจ๋ธ๋ง์ด ๋ฐ์ดํ„ฐ ์‚ฌ์ด์–ธํ‹ฐ์ŠคํŠธ์˜ ์‹œ๊ฐ„์„ ๊ทธ๋ ‡๊ฒŒ ๋งŽ์ด ์ฐจ์ง€ํ•˜๋Š”๊ฑด ์•„๋‹ˆ์ง€๋งŒ, ์ค‘์š”ํ•˜์ง€ ์•Š๋‹ค๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•˜๋Š”๊ฑด ์•„๋‹ˆ์•ผ.

์ข€ ์ง€๋‚˜๋‹ค๋ณด๋ฉด, ๋‚ด๊ฐ€ ์•„๋งˆ k ์ตœ๊ทผ์ ‘ ์ด์›ƒ ์•Œ๊ณ ๋ฆฌ์ฆ˜, Gradient Boost, CatBoost ๋“ฑ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ํฌํ•จ์‹œํ‚ค์ง€ ์•Š์•˜๋‹ค๋Š”๊ฑธ ๋ˆˆ์น˜์ฑ„๊ฒŒ ๋ ๊ฑฐ์•ผ. ์ด๊ฑด ์˜๋„์ ์œผ๋กœ ๊ทธ๋Ÿฐ๊ฑฐ์•ผ. ๋งŒ์•ฝ ๋„ค๊ฐ€ ์•„๋ž˜์˜ ๋จธ์‹ ๋Ÿฌ๋‹ ๊ฐœ๋…๋“ค์„ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋‹ค๋ฉด, ๊ณง ๋‹ค๋ฅธ ๋จธ์‹ ๋Ÿฌ๋‹ ์•Œ๊ณ ๋ฆฌ์ฆ˜๋„ ์ถฉ๋ถ„ํžˆ ์ดํ•ดํ•  ๋Šฅ๋ ฅ์„ ๊ฐ–์ถ”๊ฒŒ ๋ ๊ฑฐ์•ผ!

 

Week 40: ๋จธ์‹ ๋Ÿฌ๋‹์˜ ๊ธฐ์ดˆ

Week 41: ์„ ํ˜• ํšŒ๊ท€

Week 42: Logistic ํšŒ๊ท€

Week 43: ์ •์น™ํ™”(Regularization)

Week 44: ๊ฒฐ์ • ํŠธ๋ฆฌ(Decision Trees)

Week 45: ๋‚˜์ด๋ธŒ ๋ฒ ์ด์ฆˆ(Naรฏve Bayes)

Week 46: Support Vector Machines

Week 47: Clustering

Week 48: ์ฃผ์„ฑ๋ถ„ ๋ถ„์„(Principal Component Analysis)

Week 49: Bootstrap Sampling, Bagging, Boosting

Week 50: Random Forests and Other Boosted Trees

Week 51: Model Evaluation Metrics

 
 
 
 
 

Week 52: ๋ฐ์ดํ„ฐ ์‚ฌ์ด์–ธ์Šค ํ”„๋กœ์ ํŠธ 

์œ„ ๋‚ด์šฉ๋“ค์— ์ถฉ๋ถ„ํžˆ ์ต์ˆ™ํ•ด์กŒ๋‹ค๋ฉด, ๋„ˆ๋Š” ์ด์ œ ๋„ค ๋ฐ์ดํ„ฐ ์‚ฌ์ด์–ธ์Šค ํ”„๋กœ์ ํŠธ๋ฅผ ์‹œ์ž‘ํ•  ๋•Œ๊ฐ€ ๋œ๊ฑฐ์•ผ! ํ˜น์‹œ ๋ชฐ๋ผ์„œ ๋„ค๊ฐ€ ์‹œ์ž‘ํ•  ์˜๊ฐ์„ ์–ป์„ ์ˆ˜ ์žˆ๋Š” ์„ธ ๊ฐ€์ง€ ์•„์ด๋””์–ด๋ฅผ ์ค„๊ฒŒ. ์‚ฌ์šฉํ•˜๋Š”๊ฑด ๋„ค ์ž์œ ์•ผ! 

Idea 1: SQL ์ผ€์ด์Šค ์Šคํ„ฐ๋”” 

์ผ€์ด์Šค ๋ฐ”๋กœ๊ฐ€๊ธฐ 

์ด ์ผ€์ด์Šค์˜ ๋ชฉ์ ์€ Yammer๋ผ๋Š” ์†Œ์…œ ๋„คํŠธ์›Œํฌ์˜ ์œ ์ € ๊ด€์—ฌ๋„์˜ ์›์ธ์„ ๊ฒฐ์ •ํ•˜๋Š” ๊ฑฐ์•ผ. ๋ฐ์ดํ„ฐ๋ฅผ ๋ณด๊ธฐ ์ „์— Yammer๊ฐ€ ๋ญํ•˜๋Š”์ง€ ์—ฌ๊ธฐ์„œ ๊ฐ„๋žตํ•˜๊ฒŒ ์ฝ์–ด๋ณด๋ฉด ์ข‹์„ ๊ฒƒ ๊ฐ™์•„. ์ด 4๊ฐœ์˜ ํ‘œ๋ฅผ ์ž‘์—…ํ•˜๊ฒŒ ๋ ๊ฑฐ์•ผ. 

์œ„ ๋งํฌ๋Š” ๋„ˆ์—๊ฒŒ ๋ฌธ์ œ ์ƒํ™ฉ ๊ด€๋ จ์˜ ๋””ํ…Œ์ผ, ๋ฐ์ดํ„ฐ, ๋‹ตํ•ด์•ผํ•  ์งˆ๋ฌธ๋“ค์„ ์ œ๊ณตํ• ๊ฑฐ์•ผ. 

๋‚ด๊ฐ€ ์–ด๋–ป๊ฒŒ ์ ‘๊ทผํ–ˆ๋Š”์ง€ ๊ถ๊ธˆํ•˜๋‹ค๋ฉด ์—ฌ๊ธฐ๋ฅผ ํ™•์ธํ•ด๋ณด๋ ด! 

๋„ˆ๋Š” ์•„๋ž˜์˜ ์Šคํ‚ฌ๋“ค์„ ๋ฐœ์ „์‹œํ‚ค๊ฒŒ ๋ ๊ฑฐ์•ผ. 

  • SQL
  • ๋ฐ์ดํ„ฐ ๋ถ„์„
  • (๋งŒ์•ฝ ๋„ˆ์˜ ์ธ์‚ฌ์ดํŠธ๋ฅผ ์‹œ๊ฐํ™”ํ•œ๋‹ค๋ฉด) ๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™”

Idea 2: Trustpilot Webscraper

๋ฐ์ดํ„ฐ ์›น์Šคํฌ๋ž˜ํ•‘์„ ํ•˜๋Š” ๋ฒ•์„ ๋ฐฐ์šฐ๋Š” ๊ฒƒ์€ ์‰ฝ์ง€๋งŒ ๊ต‰์žฅํžˆ ์œ ์šฉํ•ด. ํŠนํžˆ ๊ฐœ์ธ ํ”„๋กœ์ ํŠธ๋ฅผ ์œ„ํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•  ๋•Œ ๋ง์ด์ง€. ์›นํŽ˜์ด์ง€์—์„œ ๊ณ ๊ฐ ๋ฆฌ๋ทฐ๋ฅผ ์Šคํฌ๋ž˜ํ•‘ํ•˜๋Š” ๊ฒƒ์€ ํšŒ์‚ฌ์—๊ฒŒ ์•„์ฃผ ์†Œ์ค‘ํ•ด. ์™œ๋ƒํ•˜๋ฉด ๋ฆฌ๋ทฐ ํŠธ๋ Œ๋“œ(์ข‹์€ ์ชฝ์ด๋“  ๋‚˜์œ ์ชฝ์ด๋“ ) ๋ฅผ ์ดํ•ดํ•˜๊ณ  ์ž์—ฐ์–ด์ฒ˜๋ฆฌ๋ฅผ ํ†ตํ•ด ๊ณ ๊ฐ๋“ค์ด ์–ด๋–ป๊ฒŒ ๋ณด๋Š”์ง€ ์•Œ ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์ด์ง€. 

๋จผ์ € Trustpilot์ด ์–ด๋–ป๊ฒŒ ๊ตฌ์„ฑ๋˜์–ด์žˆ๋Š”์ง€ ์ต์ˆ™ํ•ด์ง€์ž, ๊ทธ๋ฆฌ๊ณ  ๋ถ„์„ํ•˜๊ณ  ์‹ถ์€ ๋น„์ฆˆ๋‹ˆ์Šค๋ฅผ ์„ ํƒํ•ด๋ณด์ž. ๊ทธ๋Ÿฐ ๋‹ค์Œ Trustpilot ๋ฆฌ๋ทฐ ์Šคํฌ๋ž˜ํ•‘ ํ•˜๋Š” ๋ฒ•์„ ๋ณด๋ฉด ์ข‹์„ ๊ฒƒ ๊ฐ™์•„. 

๋„ˆ๋Š” ์•„๋ž˜์˜ ์Šคํ‚ฌ๋“ค์„ ๋ฐœ์ „์‹œํ‚ค๊ฒŒ ๋ ๊ฑฐ์•ผ. 

  • Python Scripts๋ฅผ ์งœ๋Š” ๊ฒƒ
  • Data Wrangling
  • BeautifulSoup/Selenium (webscraping libraries)
  • Data ๋ถ„์„
  • ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ๋ฅผ ํ†ตํ•ด ๋ฆฌ๋ทฐ์—์„œ ์ธ์‚ฌ์ดํŠธ๋ฅผ ๋„์ถœํ•ด๋‚ผ ์ˆ˜ ์žˆ์Œ

Idea 3: Titanic Machine Learning Competition

๋‚ด ์ƒ๊ฐ์—, ๋Œ€ํšŒ๋ฅผ ํ†ตํ•ด์„œ ๋„ˆ์˜ ์ฝ”๋“œ๋ฅผ ๋ณด์—ฌ์ฃผ๋Š”๊ฑด ๋„ˆ๊ฐ€ ๋ฐ์ดํ„ฐ ์‚ฌ์ด์–ธ์Šค ๊ด€๋ จ ์ง์—…์— ์ค€๋น„๋˜์—ˆ๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค„ ๊ฐ€์žฅ ์ข‹์€ ๋ฐฉ๋ฒ•์ด์•ผ. Kaggle์€ ๋ฉ”ํŠธ๋ฆญ์Šค ์ตœ์ ํ™” ๋ชจ๋ธ์„ ๊ตฌ์ถ•ํ•˜๋Š” ๋‹ค์–‘ํ•œ ๋Œ€ํšŒ๋ฅผ ๊ฐœ์ตœํ•˜๊ฑฐ๋“ . ๊ทธ ์ค‘ ํ•˜๋‚˜๊ฐ€ Titanic Machine Learning Competition์ด์•ผ.

๋งŒ์•ฝ ์˜๊ฐ๊ณผ ๊ฐ€์ด๋“œ๋ฅผ ์ข€ ์–ป๊ณ  ์‹ถ๋‹ค๋ฉด this step-by-step walkthrough ์—์„œ ์†”๋ฃจ์…˜์„ ์ฐพ์•„๋ณด๋Š” ๊ฒƒ๋„ ์ข‹์„๋“ฏ! 

๋„ˆ๋Š” ์•„๋ž˜์˜ ์Šคํ‚ฌ๋“ค์„ ๋ฐœ์ „์‹œํ‚ค๊ฒŒ ๋ ๊ฑฐ์•ผ. 

  • Data Exploration and Cleaning with Pandas
  • Feature Engineering
  • Machine Learning Modelling
 







์ฝ์–ด์ค˜์„œ ๊ณ ๋งˆ์›Œ!

๋„์›€์ด ๋˜์—ˆ์œผ๋ฉด ์ข‹๊ฒ ๋‹ค! ๋งŒ์•ฝ ์ด๊ฑธ ๋‹ค ํ•œ๋‹ค๋ฉด, ๋„ˆ๋Š” ํ†ต๊ณ„ํ•™, ์ˆ˜ํ•™, SQL, ํŒŒ์ด์ฌ/ํŒ๋‹ค์Šค ๋ฐ ์—ฌ๋Ÿฌ ๋จธ์‹ ๋Ÿฌ๋‹ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๊ธฐ์ดˆ์— ๋Œ€ํ•ด ์ดํ•ดํ•˜๊ฒŒ ๋ ๊ฑฐ์•ผ! ๋‚˜๋Š” ์ด ๊ธ€์ด ๋„ˆ์—๊ฒŒ ๊พธ์ค€ํžˆ ๊ณต๋ถ€ํ•  ๋™๊ธฐ๋ฅผ ๋ถ€์—ฌํ–ˆ์œผ๋ฉด ์ข‹๊ฒ ์–ด. ์•„์ง ๋„ค๊ฐ€ ๊ณต๋ถ€ํ•ด์•ผํ•  ๊ฒƒ๋“ค์ด ๋งŽ์ด ๋‚จ์•„์žˆ๊ฑฐ๋“ ! ์˜ˆ๋ฅผ ๋“ค๋ฉด, ์ข€ ๋” ๋ฐœ์ „๋œ ๋ชจ๋ธ(eg. CatBoost), ๋”ฅ๋Ÿฌ๋‹,  experimental design, Bayesian modeling, cloud architecture ๋“ฑ๋“ฑ ๋ง์ด์•ผ. 

๋งŒ์•ฝ ์ด ๊ธ€์„ ์ข‹์•„ํ•˜๊ณ , ์•ž์œผ๋กœ ๋” ๋งŽ์€ ์ฝ˜ํ…์ธ ๋ฅผ ๋ณด๊ณ ์‹ถ๋‹ค๋ฉด Medium์—์„œ ๋‚˜๋ฅผ ํŒ”๋กœ์šฐํ•ด์ค˜! ์–ธ์ œ๋‚˜, ๋„ˆ์˜ ๋ฐ์ดํ„ฐ ์‚ฌ์ด์–ธ์Šค๋ฅผ ํ–ฅํ•œ ์—ด์ •์— ํ–‰์šด์„ ๋นŒ๋ฉฐ