Coding/Python

[Python]2021 ๋„ทํ”Œ๋ฆญ์Šค์— ์‹œ๊ฐ„์„ ์–ผ๋งˆ๋‚˜ ์Ÿ์•˜๋‹ˆ? (๋ถ€์ œ: ๋„ทํ”Œ๋ฆญ์Šค ๊ฐœ์ธ ์‹œ์ฒญ ๋ฐ์ดํ„ฐ ๋ถ„์„)

๊น€์œ ๋‹ˆ์ฝ˜ 2022. 1. 18. 00:52

์ถœ์ฒ˜: ๋„ทํ”Œ๋ฆญ์Šค ๊ณต์‹ ํŽ˜์ด์ง€

๋„ทํ”Œ๋ฆญ์Šค ๊ตฌ๋…ํ•œ์ง€ ์–ด์–ธ 2๋…„์ฐจ
๋„ทํ”Œ์€ ๊ณต๊ธฐ์™€ ๊ฐ™์€ ์กด์žฌ..

๊ทธ๋Ÿฌ๋‚˜ ๋ฌธ๋“ ๋“œ๋Š” ์ƒ๊ฐ์ด ์žˆ์—ˆ๋‹ค.
โ“ ๋‚˜๋Š” ๋„ทํ”Œ๋ฆญ์Šค ๋ณด๋Š”๋ฐ ์–ผ๋งˆ๋‚˜ ์‹œ๊ฐ„์„ ์“ฐ๊ณ ์žˆ๋‚˜?

MBTI power J์ธ ๋‚˜๋Š” ์ด ๊ฒƒ์ด ๊ถ๊ธˆํ•ด์กŒ๊ณ ,


โ“ ๋‚˜๋Š” ์ด ๋ช‡ํŽธ์˜ ์‹œ๋ฆฌ์ฆˆ๋ฅผ ๋ดค์„๊นŒ?
โ“ 2021๋…„ ๊ฐ€์žฅ ๋งŽ์€ ์‹œ๊ฐ„์„ ํ• ์• ํ•œ ๋„ทํ”Œ ์ฝ˜ํ…์ธ ๋Š”?
โ“ ๋ฌด์Šจ ์š”์ผ์— ๊ฐ€์žฅ ๋งŽ์ด ๋ดค์„๊นŒ?
โ“ ์ฃผ๋กœ ๋ณด๋Š” ์‹œ๊ฐ„๋Œ€๋Š” ์–ธ์ œ์ธ๊ฐ€?
.
.
.

๋“ฑ๋“ฑ๋“ฑ ๋“œ๋Š” ๊ถ๊ธˆ์ฆ์„ ํ•ด๊ฒฐํ•ด๋ณด๊ณ ์ž ์‹œ์ž‘ํ•ด๋ณธ ๋‚˜์˜ ๋„ทํ”Œ๋ฆญ์Šค ๋ฐ์ดํ„ฐ ๋ถ„์„

START!




Step 1. ๋„ทํ”Œ๋ฆญ์Šค ๋ฐ์ดํ„ฐ ๋‹ค์šด๋กœ๋“œ ๋ฐ›๊ธฐ

๋Œ€๋ถ€๋ถ„์˜ ์‚ฌ๋žŒ๋“ค์ด ์•„์ง ์ž˜ ๋ชจ๋ฅด๊ณ  ์žˆ์„ ๊ฟ€์ •๋ณด.
๋„ทํ”Œ๋ฆญ์Šค๋Š” ๊ตฌ๋…์ž์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์•„์ฃผ ๊ด€๋Œ€ํ•˜๊ฒŒ ์ œ๊ณตํ•ด์ค€๋‹ค.
์—ฌ๊ธฐ์— ์ ‘์†ํ•ด์„œ ๋„ทํ”Œ๋ฆญ์Šค ์ •๋ณด๋ฅผ ์š”์ฒญํ•˜๋ฉด ์ˆ˜ ์ผ ๋‚ด์— ๋‹ค์šด์„ ๋ฐ›์„ ์ˆ˜ ์žˆ๋‹ค๋Š” ์‚ฌ์‹ค.


๋นจ๊ฐ„ ๋ฒ„ํŠผ์˜ ์š”์ฒญํ•˜๊ธฐ ํด๋ฆญํ•˜๋ฉด, ๋„ทํ”Œ๋ฆญ์Šค์—์„œ ์ด๋ฉ”์ผ์„ ํ•˜๋‚˜ ๋ณด๋‚ด์ฃผ๋Š”๋ฐ, ํ™•์ธํ•ด์ฃผ๋ฉด ์‹ ์ฒญ ๋.

์ด๊ฑธ ๋ฐ”๋กœ ๋ฐ›์„ ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์€ ์•„๋‹ˆ๊ณ ,
์•ฝ 1~3์ผ์ •๋„ ๊ฑธ๋ฆฌ๋Š” ๋“ฏ ํ•˜๋‹ค
๋‹ค์šด๋กœ๋“œ ๊ธฐํ•œ๋„ ์ •ํ•ด์ ธ์žˆ์–ด์„œ ์‹ ์ฒญํ•˜๊ณ  ๊นŒ๋จน๊ณ  ์žˆ์œผ๋ฉด ์•ˆ๋จ
(๊นŒ๋จน๊ณ  ์žˆ๋‹ค๊ฐ€ ๋‹ค์šด๋กœ๋“œ ๋งˆ์ง€๋ง‰ ๋‚  ๊ฐ€๊นŒ์Šค๋กœ ๋‹ค์šด๋ฐ›์€ ์‚ฌ๋žŒ)

๋„ทํ”Œ์€ 30์ผ๊นŒ์ง€๋„ ๊ฑธ๋ฆด ์ˆ˜ ์žˆ๋‹ค๊ณ  ํ•จ.

.zipํŒŒ์ผ๋กœ ๋‹ค์šด๋กœ๋“œํ•˜๋ฉด ์ด์ œ
๋‚ด ๋„ทํ”Œ ๊ณ„์ •์˜ ๋ชจ๋“  ์ •๋ณด๋ฅผ ์†์†๋“ค์ด ๋ณผ ์ˆ˜ ์žˆ๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ง€๊ฒŒ ๋œ ๊ฒƒ์ด๋‹ค. โœบโ—Ÿ(∗โ›เธฑแด—โ›เธฑ∗)โ—žโœบ

ํŒŒ์ผ์„ ์—ด์–ด๋ณด๋ฉด ๋ฐ์ดํ„ฐ๊ฐ€ ๋ฌด์ง€๋ง‰์ง€ํ•˜๊ฒŒ ๋งŽ์€๋ฐ,
๊ทธ ์ค‘ ํ‘œ์ง€.pdf (ํ˜น์€ cover.pdf)๋ฅผ ์—ด์–ด๋ณด์ž.

์–ด๋–ค ๋ฐ์ดํ„ฐ๊ฐ€ ๋‹ด๊ธด ํŒŒ์ผ๋“ค์„ ์ œ๊ณตํ•˜๋Š”์ง€ ์นœ์ ˆํ•˜๊ฒŒ ์„ค๋ช…ํ•ด์ค€๋‹ค.
๋„ทํ”Œ๋ฆญ์Šค ๋ฐ์ดํ„ฐ์— ์–ด๋–ค ๋‚ด์šฉ์ด ๋‹ด๊ฒผ๋Š”์ง€ ๊ถ๊ธˆํ•˜๋‹ค๋ฉด ๋”๋ณด๊ธฐ๋ฅผ ํด๋ฆญ!


p.s. ํ›„๊ถ ๋ณด๊ณ  ์ˆจ๊น€์ฒ˜๋ฆฌํ•ด๋„ ์†Œ์šฉ์—†๋‹ค. ๋„ทํ”Œ๋ฆญ์Šค๋Š” ๋„ˆ๊ฐ€ ํ›„๊ถ ๋ช‡ ๋ถ„ ๋ช‡ ์ดˆ ๋ณด๊ณ  ์‹œ์ฒญ๊ธฐ๋ก์„ ์–ธ์ œ ์ˆจ๊น€ํ–ˆ๋Š”์ง€๋„ ์‹น ๋‹ค ๋ณด์—ฌ์ฃผ๋‹ˆ๊นŒ
์—ญ์‹œ ๊ธ€๋กœ๋ฒŒ ๋„˜๋ฒ„์› ๋ฐ์ดํ„ฐ ํšŒ์‚ฌ


Step 2. Jupyter์— ๋ถˆ๋Ÿฌ์˜ค๊ธฐ

์ˆ˜ ๋งŽ์€ ํŒŒ์ผ ์ค‘ ViewingActivity.csv ํŒŒ์ผ์„ ๋ถ„์„ํ•  ์˜ˆ์ •์ด๋‹ค.
์ด ๋ฐ์ดํ„ฐ๋“ค์„ ํŒŒ์ด์ฌ๊ณผ ํŒ๋‹ค์Šค๋ฅผ ์ด์šฉํ•ด์„œ ๋ถ„์„ํ•  ๊ฒƒ์ด๊ธฐ ๋•Œ๋ฌธ์—
Jupyter Notebook์œผ๋กœ ๋ถˆ๋Ÿฌ์˜จ๋‹ค.

๋งŒ์•ฝ ์ฃผํ”ผํ„ฐ ๋…ธํŠธ๋ถ๊ณผ ViewingAcitivty.csv ํŒŒ์ผ ๊ฒฝ๋กœ๊ฐ€ ๋‹ค๋ฅธ ๊ณณ์— ์œ„์น˜ํ•œ๋‹ค๋ฉด, ' ' ์•ˆ์— ํŒŒ์ผ path ๋„ฃ์–ด์ฃผ๋Š” ๊ฒƒ ์žŠ์ง€๋ง๊ณ ~

import pandas as pd df = pd.read_csv('ViewingActivity.csv')




df๊ฐ€ ์ž˜ ๋“ค์–ด์™”๋Š”์ง€ df.shape์œผ๋กœ ํ™•์ธํ•˜๋Š” ์ž‘์—… ํ•œ ๋ฒˆ ๊ฑฐ์ณ์ฃผ๊ณ 

์ด 13981 ํ–‰๊ณผ 10์—ด์ด ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.





df.head()๋กœ ๋ฏธ๋ฆฌ๋ณด๊ธฐ๋ฅผ ํ•ด๋ณผ๊นŒ?

์•„์ฃผ ์ž˜ ๋“ค์–ด์™€์žˆ๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค (∗โ›โŒ„โ›∗)


STEP3. ํ•„์š”์—†๋Š” ๋ฐ์ดํ„ฐ ์‚ญ์ œํ•˜๊ธฐ

ํ•„์š”์—†๋Š” ์—ด์„ ์‚ญ์ œํ•˜๋ฉด ํ›จ์”ฌ ๋” ์ ์€ ์šฉ๋Ÿ‰์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ง€๊ณ  ๋ถ„์„ํ•  ์ˆ˜ ์žˆ๋‹ค.
๋จผ์ € ๋ถˆํ•„์š”ํ•œ ์—ด์ด ๋ญ๊ฐ€ ์žˆ์„๊นŒ ๋ณด๋ฉด..
๋‚˜๋Š” Attribute, Video Type, Bookmark, Country .. ๋“ฑ๋“ฑ์ด ํ•„์š”์—†์„ ๊ฒƒ ๊ฐ™๋‹ค๊ณ  ํŒ๋‹จํ–ˆ๋‹ค.
์ž ๋“œ๋กญ์‹œ์ผœ์ฃผ์ž.

#axis=1์€ ํŒ๋‹ค์Šค์—์„œ ์—ด ์‚ญ์ œ ๋ช…๋ น df = df.drop([ 'Attributes', 'Supplemental Video Type', 'Device Type', 'Bookmark', 'Latest Bookmark', 'Country'], axis=1) df.head(1)

์ž ์ด๋ ‡๊ฒŒ ์•„์ฃผ ์ž˜ ์‚ญ์ œ๋œ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

+ ์ž, ์ด์ œ 2021๋…„๋„ ๋ฐ์ดํ„ฐ๋งŒ ๋ฝ‘์•„๋ณผ๊นŒ? Start Time์— '2021-'์ด ํฌํ•จ๋œ ๋ฐ์ดํ„ฐ๋งŒ ์ถ”์ถœํ•˜๋ฉด ๋˜๋‹ˆ๊นŒ,

df = df[df['Start Time'].str.contains('2021-')]

 

Step 4. String์„ Datetime๊ณผ Timedelta ํ˜•์œผ๋กœ ๋ณ€ํ™˜ํ•˜๊ธฐ

์ž ๊ณผ์—ฐ ์—ฌ๊ธฐ ์–ด๋–ค ํ˜•ํƒœ์˜ ๋ฐ์ดํ„ฐ ํƒ€์ž…๋“ค๋กœ ๋“ค์–ด๊ฐ€์žˆ๋Š”์ง€ ๋ณผ๊นŒ?


์‹น ๋‹ค Object์ž„
๊ทธ๋Ÿผ ๊ณ„์‚ฐ์„ ๋ชปํ•˜๋‹ˆ๊นŒ ๋ฐ์ดํ„ฐ ํ˜•๋ณ€ํ™˜์„ ํ•ด์ฃผ์ž.

ํ•ด์•ผํ•  ๊ฒƒ:

1. Start Time ๋ฐ์ดํ„ฐ๋ฅผ ์‹œ๊ฐ„ ๊ณ„์‚ฐ ๊ฐ€๋Šฅํ•œ datetime ํ˜•์‹์œผ๋กœ ๋ฐ”๊ฟ”์ฃผ๊ธฐ
2. UTC ๊ธฐ์ค€์œผ๋กœ ์ ํ˜€์žˆ๋Š” Start Time์„ KST(ํ•œ๊ตญ ํ‘œ์ค€์‹œ)๋กœ ๋ฐ”๊ฟ”์ฃผ๊ธฐ
3. Duration์„ ์‹œ๊ฐ„ ๊ณ„์‚ฐ ๊ฐ€๋Šฅํ•œ timedeltaํ˜•์œผ๋กœ ๋ฐ”๊ฟ”์ฃผ๊ธฐ

์ฐจ๊ทผํžˆ ํ•ด๋ณด๋ฉด:
1. Start Time ๋ฐ์ดํ„ฐ๋ฅผ ์‹œ๊ฐ„ ๊ณ„์‚ฐ ๊ฐ€๋Šฅํ•œ datetime ํ˜•์‹์œผ๋กœ ๋ฐ”๊ฟ”์ฃผ๊ธฐ (pd.to_datetime)

df['Start Time'] = pd.to_datetime(df['Start Time'], utc=True) df.dtypes

์ž˜ ๋ฐ”๋€๊ฑฐ ํ™•์ธํ–ˆ๊ณ ,


2. UTC ๊ธฐ์ค€์œผ๋กœ ์ ํ˜€์žˆ๋Š” Start Time์„ KST(ํ•œ๊ตญ ํ‘œ์ค€์‹œ)๋กœ ๋ฐ”๊ฟ”์ฃผ๊ธฐ (pd.tz_convert)
ํ‘œ์ค€์‹œ๋Š” ์—ฌ๊ธฐ์„œ ํ™•์ธ!

# ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์˜ ์ธ๋ฑ์Šค๋ฅผ Start Time์—ด๋กœ ๋ฐ”๊ฟ”์ค€๋‹ค df = df.set_index('Start Time') # UTC timezone์„ KST๋กœ ๋ฐ”๊ฟ”์คŒ df.index = df.index.tz_convert('Asia/Seoul') # Strat Time ๋‹ค์‹œ ์—ด๋กœ ๋ณต๊ตฌ df = df.reset_index() #๋”๋ธ”์ฒต์ฒต df.head(1)

์•„์ฃผ ์ž˜ ๋ฐ”๋€œ

3. Duration์„ ์‹œ๊ฐ„ ๊ณ„์‚ฐ ๊ฐ€๋Šฅํ•œ timedeltaํ˜•์œผ๋กœ ๋ฐ”๊ฟ”์ฃผ๊ธฐ (pd.to_timedelta)

df['Duration'] = pd.to_timedelta(df['Duration']) df.dtypes

๋ฐ์ดํ„ฐํ˜•์ด ์•„์ฃผ ์ž˜ ๋ฐ”๋€ ๊ฒƒ์„ ํ™•์ธ!

Step 5. ๋‚ด ๊ณ„์ • ๋ฐ์ดํ„ฐ๋งŒ ๋ฝ‘์•„๋‚ด๊ธฐ (optional)

์š”์ฆ˜ ๋„ทํ”Œ๋ฆญ์Šค ํ˜ผ์ž ๋ณด์‹œ๋Š” ๋ถ„?
๋ณดํ†ต ๋„ค ๋ช…์ด ํ•œ ๊ณ„์ •์„ ์‚ฌ์šฉํ•˜๋‹ˆ ์ด ๋ถ€๋ถ„๋„ ๋ฐ์ดํ„ฐ ์ •๋ฆฌ๊ฐ€ ํ•„์š”ํ•˜๋‹ค!
2๋…„ ๋„˜๊ฒŒ ํ•™๊ต ์‚ฌ๋žŒ๋“ค๊ณผ ํŒธ ๊ฒฐ์„ฑํ•ด์„œ ์•„๋ฌด๋„ ์•ˆ๋‚˜๊ฐ€๊ณ  ์‹œ์ฒญ ์ค‘... (๋„ทํ”Œ ์ธ์—ฐ์ด ์ด๋ ‡๊ฒŒ ์˜ค๋ž˜๊ฐˆ์ค„์ด์•ผ?)
๊ทธ๋ ‡๊ธฐ ์œ„ํ•ด ๋‚ด Profile ๊ณ„์ • ๋ฐ์ดํ„ฐ๋งŒ ๋ฝ‘์•„๋‚ด์•ผ๊ฒ ์ฅฌ?

ํ”„๋กœํ•„ ๋„ค์ž„์ด ์˜ค๋ ˆ์˜ค๋ฅผ ํฌํ•จํ•˜๊ณ  ์žˆ๋Š” ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ ๋งŒ๋“ค์–ด์ค˜!

#๋‚ด ๊ณ„์ •๊ฐ’๋งŒ ๊ฐ€์ ธ์˜ด(๋‚ด ๊ณ„์ • ์ด๋ฆ„ oreoeo๋ผ์„œ dataframe ์ด๋ฆ„์„ oreoeo๋กœ ์คฌ๋‹ค) oreoeo = df[df['Profile Name'].str.contains('oreoeo', regex=False)] #regex ๋Š” ์ •๊ทœ์‹ ์—ฌ๋ถ€๋ฅผ ๋ณด์—ฌ์ฃผ๋Š” ๊ฒƒ

 

Step 6. ์˜ˆ๊ณ ํŽธ ๋ฐ 3๋ถ„ ๋ฏธ๋งŒ์˜ ์‹œ์ฒญ ๊ธฐ๋ก์€ ์†”์งํžˆ ๋ดค๋‹คํ•  ์ˆ˜ ์—†์œผ๋‹ˆ ๋นผ์ฃผ์ž!

oreoeo = oreoeo[(oreoeo['Duration']>'0 days 00:03:00')]




๊ทธ๋Ÿฐ๋ฐ Title์ด ๋‚˜๋Š” ์†”๋กœ: ์‹œ์ฆŒ 1: 22ํ™” ์ด๋Ÿฐ ์‹์œผ๋กœ ํšŒ์ฐจ๊นŒ์ง€ ๋“ค์–ด์žˆ๋‹ค.
์ด๋Ÿฌ๋ฉด ๋˜ ๊ณค๋ž€ํ•˜์ง€.
์ œ๋ชฉ / ์‹œ์ฆŒ / ํšŒ์ฐจ ๋‚˜๋ˆ ์ฃผ๊ธฐ ์œ„ํ•œ ์ž‘์—…์„ ํ•ด๋ณด์ž.

oreoeo["์ œ๋ชฉ"] = oreoeo["Title"].str.split(": ", expand=True)[0] oreoeo["์‹œ์ฆŒ"] = oreoeo["Title"].str.split(": ", expand=True)[1]

์ƒˆ๋กœ์šด ์ปฌ๋Ÿผ์— ์ œ๋ชฉ๊ณผ ์‹œ์ฆŒ์ด ๋‚˜๋‰œ๊ฑฐ ๋ณด์ด์ฅฌ? โคด




์ž, ์ด์ œ 2021๋…„๋„ Oreoeo ๊ณ„์ •์ด ๋ณธ ๋„ทํ”Œ๋ฆญ์Šค ์‹œ์ฒญ ๋ฐ์ดํ„ฐ ์ค€๋น„ ์™„๋ฃŒ๊ตฌ์š”



Step 7. ์ด์ œ ์Šฌ์Šฌ ๊ถ๊ธˆ์ฆ์— ๋‹ต์„ ํ•ด๋ณผ๊นŒ?



Q. ๊ทธ๋ž˜์„œ ๋‚ด๊ฐ€ 2021๋…„์— ๋„ทํ”Œ๋ฆญ์Šค๋ฅผ ์–ผ๋งˆ๋‚˜ ๋ดค๋‹ค๊ณ ??

oreoeo['Duration'].sum()

A: ์˜ˆ,,,, ์ด 21์ผ 1์‹œ๊ฐ„ 32๋ถ„ 06์ดˆ




๊ฑฐ์ง„ ์ž ๋„ ์•ˆ์ž๊ณ  ํ•œ๋‹ฌ ๋‚ด๋‚ด ๋„ทํ”Œ๋ฆญ์Šค๋งŒ ๋ดค๋‹ค๋Š”๊ฑด๋ฐ...
♥๏ธ ์‚ฌ๋ž‘ํ•ด์š” ๋„ทํ”Œ๋ฆญ์Šค ♥๏ธ




Q. ์ด ๋ช‡ ํŽธ์˜ ์‹œ๋ฆฌ์ฆˆ๋ฅผ ๋ดค๋Š”๋ฐ?

์ด๊ฒŒ ์ง€๊ธˆ ์ œ๋ชฉ์˜ Unique๊ฐ’๋งŒ ์„ธ๋ฉด, ๋‚ด๊ฐ€ ๊ฐ€์‹ญ๊ฑธ ์‹œ์ฆŒ1 ์‹œ์ฆŒ2 ์‹œ์ฆŒ3 ์„ ํ•˜๋‚˜๋กœ ์น˜๋‹ค๋ณด๋‹ˆ,
์ œ๋ชฉ + ์‹œ์ฆŒ ํ•ฉ์นœ ์ปฌ๋Ÿผ์„ ๊ตฌํ•  ํ•„์š”๋„ ์žˆ์„ ๊ฒƒ ๊ฐ™๋‹จ ์ƒ๊ฐ์ด ๋“ค์—ˆ๋‹ค.

๊ทธ๋ž˜์„œ ์ œ๋ชฉ+์‹œ์ฆŒ ์ปฌ๋Ÿผ ํ•˜๋‚˜ ๋” ๋งŒ๋“ค์–ด์ฃผ๊ณ ์š”

cols = ['์ œ๋ชฉ', '์‹œ์ฆŒ'] oreoeo['์ œ๋ชฉ ๋ฐ ์‹œ์ฆŒ'] =oreoeo[cols].apply(lambda row: ':'.join(row.values.astype(str)), axis=1) oreoeo.head()

 

#์ œ๋ชฉ ๋ฐ ์‹œ์ฆŒ unique ๊ฐ’์˜ ๊ฐœ์ˆ˜ oreoeo['์ œ๋ชฉ ๋ฐ ์‹œ์ฆŒ'].describe()



์œ ๋‹‰๊ฐ’ ์ด 99ํŽธ

A: ๋‚˜๋Š” 2021๋…„์— ์ด 99ํŽธ์˜ ์‹œ๋ฆฌ์ฆˆ(๋“œ๋ผ๋งˆ/์˜ํ™”/๋‹คํ ๋“ฑ)์„ ๋ณด์•˜๋‹ค. Oh yeah!





Q. ๋ฌด์Šจ ์š”์ผ, ๋ช‡ ์‹œ์ฏค ์ œ์ผ ๋งŽ์ด ๋ณด๋Š”๊ฑฐ์ง€?

๊ฐ๊ฐ ์š”์ผ, ์‹œ๊ฐ„ ์ปฌ๋Ÿผ ๋งŒ๋“ค์–ด์ฃผ๊ณ :

oreoeo = oreoeo.copy() #SettingWithCopyWarning์˜ ๋ฐœ์ƒ์„ ์˜ˆ๋ฐฉ oreoeo['weekday'] = oreoeo['Start Time'].dt.weekday #์š”์ผ ์ปฌ๋Ÿผ ์ƒ์„ฑ oreoeo['hour'] = oreoeo['Start Time'].dt.hour #์‹œ๊ฐ„ ์ปฌ๋Ÿผ ์ƒ์„ฑ



- ์ฃผ ์‹œ์ฒญ ์š”์ผ์„ ์•Œ์•„๋ณด์ž

#๊ทธ๋ž˜ํ”„ ๊ทธ๋ฆฌ๋Š” matploylib ์ž„ํฌํŠธ ํ•ด์ฃผ์‹œ๊ณ  import matplotlib #์š”์ผ ์‹œ๊ฐํ™” ๊ทธ๋ž˜ํ”„ ๋งŒ๋“ค๊ธฐ oreoeo['weekday'] = pd.Categorical(oreoeo['weekday'], categories= [0,1,2,3,4,5,6], ordered=True) oreoeo_by_day = oreoeo['weekday'].value_counts() oreoeo_by_day = oreoeo_by_day.sort_index() oreoeo_by_day.plot(kind='bar', figsize=(20,10), title='Netflix watching Day')

(0๋ถ€ํ„ฐ ์›”์š”์ผ ํ•ด์„œ 6์ด ์ผ์š”์ผ์ด๋ผ๊ณ  ํ•œ๋‹ค, ํŒŒ์ด์ฌ์ด ์›๋ž˜ ๊ทธ๋ ‡๋‹ค๊ณ  ํ•œ๋‹ค)

A: ๋ชฉ์š”์ผ๋‚  ์ œ์ผ ์ ๊ฒŒ๋ณด๊ณ , ์ผ์š”์ผ๋‚  ์ œ์ผ ๋งŽ์ด ๋ณธ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๊ทธ๋ƒฅ ๋งค์ผ ๋ณด๊ธด ํ•œ๋‹ค. ๋„ท์นœ๋†ˆ....

 

 

- ์ž์ฃผ ๋ณด๋Š” ์‹œ๊ฐ„๋Œ€๋ฅผ ์•Œ์•„๋ณด์ž

#์‹œ๊ฐ„๋Œ€ ์‹œ๊ฐํ™” oreoeo['hour'] = pd.Categorical(oreoeo['hour'], categories= [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23], ordered=True) oreoeo_by_hour = oreoeo['hour'].value_counts() oreoeo_by_hour = oreoeo_by_hour.sort_index() oreoeo_by_hour.plot(kind='bar', figsize=(20,10), title='Netflix watching by Hour')

A: ์ƒˆ๋ฒฝ ํ•œ ์‹œ์— ์ฃผ๋กœ ๋ณด๊ธฐ ์‹œ์ž‘ํ•˜๊ณ , ์•„์นจ 8์‹œ์— ๊ฐ€์žฅ ์ ๊ฒŒ ๋ณธ๋‹ค..




Q. ์ž‘ํ’ˆ๋ณ„ ์‹œ์ฒญ ์‹œ๊ฐ„ ๋ฐ์ดํ„ฐ๊ฐ€ ๊ถ๊ธˆํ•˜๋‹ค๋ฉด?

oreoeo_duration_by_title = oreoeo_duration_by_title.groupby(['์ œ๋ชฉ','์‹œ์ฆŒ']).agg({'Duration':np.sum}) oreoeo_duration_by_title = oreoeo_duration_by_title.reset_index() #์ธ๋ฑ์Šค ๋‹ค์‹œ ์ œ์ž๋ฆฌ๋กœ ๋Œ๋ ค๋‘๊ณ  oreoeo_duration_by_title

์ด๋ ‡๊ฒŒ ์˜ˆ์œ ํ‘œ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค!


A: D.P๋ฅผ 10์‹œ๊ฐ„์ด๋‚˜ ๋ณธ๊ฑด... ์•„๋งˆ... ๋‚˜ ๋ณด๊ณ  ์—„๋งˆ๋„ ๋‚ด ๊ณ„์ •์œผ๋กœ ๋ณด์—ฌ์คฌ๊ธฐ ๋•Œ๋ฌธ์ธ๋“ฏ

 

 

 

 

- ์ด๊ฒƒ๋„ ๊ทธ๋ž˜ํ”„๋กœ ๊ทธ๋ ค๋ณด์ž

timedelta ํ˜•์‹์˜ ๊ฐ’์€ ๊ทธ๋ž˜ํ”„๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์—†๋‹ค. (์•„๊นŒ ์šฐ๋ฆฌ Duration ์„ Timedelta๊ฐ’์œผ๋กœ ๋‘”๊ฑฐ ๊ธฐ์–ตํ•˜์‹œ๋Š”์ง€?)
๊ทธ๋ž˜์„œ ์ด๊ฑธ int๋กœ ๋‹ค์‹œ ๋ฐ”๊ฟ”์ฃผ๋Š” ์ž‘์—…์„ ํ•ด์•ผํ•˜๋Š”๋ฐ,
๋ถˆํ–‰ํ•˜๊ฒŒ๋„ timedelta๋ฅผ int๋กœ ๋ฐ”๊พธ๋Š” ๋ฒ•์€ days ์™€ seconds๋งŒ ๋–ผ์„œ int๋กœ ๋ฐ”๊พธ๋Š” ๋ฒ•๋ฐ–์— ์•ˆ๋‚˜์™€์„œ
days์™€ seconds๋ฅผ ๋”ฐ๋กœ ๊ตฌํ•˜๊ณ , days๋ฅผ ์ดˆ๋กœ ๋ณ€ํ™˜ํ•ด์„œ seconds์™€ ๋”ํ•˜๋Š” ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•ด๋ณด๊ธฐ๋กœ ํ–ˆ๋‹ค.
๋ฌด์Šจ ๋ง์ด๋ƒ๋ฉด, ๋นˆ์„ผ์กฐ ์‹œ์ฒญ ์‹œ๊ฐ„์ด 1 days 00:01:40 ์ด๋ผ๊ณ  ํ•˜๋ฉด,
td.seconds ๋ฅผ ์“ฐ๋ฉด 60์ดˆ + 40์ดˆ = 100 ์ดˆ ๋งŒ int๋กœ ๋ฐ˜ํ™˜ํ•ด์ฃผ๊ณ ,
td.days๋ฅผ ์“ฐ๋ฉด 1 ๋งŒ ๋ฐ˜ํ™˜ํ•ด์ค˜์„œ
1 day = 86400 ์ดˆ์ด๋ฏ€๋กœ,
td.day * 86400 + td.seconds๋ฅผ ํ•ด์ฃผ๊ธฐ๋กœ ํ•œ ๊ฒƒ

oreoeo_duration_by_title['Duration_d'] = oreoeo_duration_by_title['Duration'].dt.days #day ์นผ๋Ÿผ ์ถ”๊ฐ€ oreoeo_duration_by_title['Duration_s'] = oreoeo_duration_by_title['Duration'].dt.seconds #second ์นผ๋Ÿผ ์ถ”๊ฐ€ oreoeo_duration_by_title['Duration_sum'] = (oreoeo_duration_by_title['Duration_d'] * 86400) + (oreoeo_duration_by_title['Duration_s']) #์ด ๋ช‡ ์ดˆ ๋ดค๋Š”์ง€ day + second ํ•ฉ #oreoeo_duration_by_title์—๋„ ์ œ๋ชฉ ๋ฐ ์‹œ์ฆŒ ์ปฌ๋Ÿผ ์ถ”๊ฐ€ํ•ด์ฃผ๊ณ  cols = ['์ œ๋ชฉ', '์‹œ์ฆŒ'] oreoeo_duration_by_title['์ œ๋ชฉ ๋ฐ ์‹œ์ฆŒ'] =oreoeo_duration_by_title[cols].apply(lambda row: ':'.join(row.values.astype(str)), axis=1)





์ด์ œ ์ด๊ฑธ ๊ทธ๋ž˜ํ”„ํ™” ์‹œ์ผœ๋ณด๋ฉด

import matplotlib.pyplot as plt plt.rcParams['font.family'] = 'AppleGothic'#๊ทธ๋ž˜ํ”„์— ํ•œ๊ธ€ ํฐํŠธ ์„ค์น˜ํ•ด์ฃผ๊ณ  #๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ๊ทธ๋ž˜ํ”„ ๊ทธ๋ฆฌ๊ธฐ ์œ„ํ•ด์„œ ascending ๊ฐ’ ์ฃผ๊ณ  oreoeo_duration_by_title = oreoeo_duration_by_title.sort_values(by=['Duration_sum'], ascending = True) #๊ทธ๋ž˜ํ”„ ๊ฐ€๋กœ๋กœ ๊ทธ๋ฆด๊ฑฐ๋ผ barh, ๋ฐ์ดํ„ฐ๊ฐ€ ๋„ˆ๋ฌด ๋งŽ์ด์‚ฌ figsize๋กœ ํ‘œ ์‚ฌ์ด์ฆˆ๋ฅผ ์กฐ์ ˆํ•ด์คฌ๋‹ค, ๊ทธ๋ฆฌ๊ณ  ์ƒ‰์ƒ๊ณผ ์—ฐํ•˜๊ธฐ๋„! oreoeo_duration_by_title.plot.barh(x ='์ œ๋ชฉ ๋ฐ ์‹œ์ฆŒ', y='Duration_sum',figsize=(20,30),fontsize=10,color='r', alpha=0.4)

์ด๋Ÿฐ ์ˆ˜๋‘๋ฃฉ ๋บต๋บต ๊ทธ๋ž˜ํ”„๋ฅผ ๋ณผ ์ˆ˜ ์žˆ๋Š”๋ฐ,

๋ˆˆ์ด ์นจ์นจํ—ˆ๋‹ˆ pdf๋กœ ๋‹ค์šด๋ฐ›์•„๋ณด๊ธฐ๋กœ ํ•˜์ž

plt.savefig('2021 Netflix์‹œ์ฒญ๋ฐ์ดํ„ฐ.pdf')


์จ˜!
์ฃผํ”ผํ„ฐ ๋…ธํŠธ๋ถ ํŒŒ์ผ์ด ์žˆ๋Š” ํด๋”์— ๋‚ด๊ฐ€ ๋งŒ๋“  ๊ทธ๋ž˜ํ”„ ํŒŒ์ผ์ด ์ƒ๊ฒผ์–ด์š” ๋ฟ…

๊ธฐํƒ€ ๋“ฑ๋“ฑ ์ƒ๋žต


2021 ๋‚˜์˜ ๋„ทํ”Œ๋ฆญ์Šค ์›ํƒ‘ ์‹œ๊ฐ„ ๋„๋‘‘ ์ฃผ์ธ๊ณต์€ ๋ฐ”๋กœ


๋‘‘๋‘๊ตฌ๋‘๊ตฌ๋‘๊ตฌ๋‘๊ตฌ




๋นˆ์„ผ์กฐ์˜€์Šด๋‹ค ๐ŸŽ‰๐ŸŽ‰๐ŸŽ‰๐ŸŽ‰


์™œ ์ด๋ ‡๊ฒŒ ์‹œ์ฒญ์‹œ๊ฐ„์ด ์œ ๋… ๊ธธ์ง€? ํ–ˆ๋Š”๋ฐ
3์›”์— ํ•œ ๋ฒˆ, 5์›”์— ํ•œ ๋ฒˆ ์ •์ฃผํ–‰ํ•œ ๊ธฐ๋ก์ด ๋‘ ๋ฒˆ์ด๋‚˜ ์žˆ๋”๋ผ๊ตฌ



๊ฐœ์ธ์ ์œผ๋กœ ๋นˆ์„ผ์กฐ๋Š” ๋‚จ์ฃผ์›ํ†ฑ๋ฌผ์˜ ์ •์„์ด๋ผ๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค.
์ด ์˜๊ด‘์„ ๋„˜๋ฒ„์› ์ฝ”๋ฆฌ์•ˆ ๋‹คํฌ ํžˆ์–ด๋กœ ๋นˆ์„ผ์กฐ ๊นŒ์‚ฌ๋…ธ,
๊ทธ๋ฆฌ๊ณ  ์•“๋‹ค ์ฃฝ์„ ์†ก์ค‘๊ธฐ ๋ฐฐ์šฐ๋‹˜๊ป˜ ๋ฐ”์นฉ๋‹ˆ๋‹ค...๐Ÿ‘‘

์ถœ์ฒ˜ : coldsunset



+ ์š”์ฆ˜ ๊ทธํ•ด์šฐ๋ฆฌ๋Š”์— ์ง€์›…์ด๋กœ ์•„๋ จ์•„๋ จ ์ฒ ์ฒ  ๋„˜์น˜๋Š” ๋‚˜์˜ ์›ํ”ฝ ๊น€์„ฑ์ฒ  ๋ฐฐ์šฐ๋‹˜๊ป˜๋„ ์ด ์˜๊ด‘์„...๐Ÿค

์•„์ง๋„ ๊ท“๊ฐ€์— ๋งด๋„๋Š” ํƒœ ํ˜ธ โ†—




์ž ์ด์ œ ์งˆ๋ฌธ์„ ์ •๋ฆฌํ•ด์„œ ๋‹ต์„ ํ•ด๋ณด์ž
โ“ ๋‚˜๋Š” ๋„ทํ”Œ๋ฆญ์Šค ๋ณด๋Š”๋ฐ ์–ผ๋งˆ๋‚˜ ์‹œ๊ฐ„์„ ์“ฐ๊ณ ์žˆ๋‚˜? --> ์ˆœ์ˆ˜ํ•˜๊ฒŒ ์‹œ์ฒญ ์‹œ๊ฐ„๋งŒ 21์ผ 1์‹œ๊ฐ„ 32๋ถ„ 06์ดˆ
โ“ ๋‚˜๋Š” ์ด ๋ช‡ํŽธ์˜ ์‹œ๋ฆฌ์ฆˆ๋ฅผ ๋ดค์„๊นŒ? --> 99ํŽธ
โ“ 2021๋…„ ๊ฐ€์žฅ ๋งŽ์€ ์‹œ๊ฐ„์„ ํ• ์• ํ•œ ๋„ทํ”Œ ์ฝ˜ํ…์ธ ๋Š”? --> ๋นˆ์„ผ์กฐ
โ“ ๋ฌด์Šจ ์š”์ผ์— ๊ฐ€์žฅ ๋งŽ์ด ๋ดค์„๊นŒ? --> ์ผ์š”์ผ
โ“ ์ฃผ๋กœ ๋ณด๋Š” ์‹œ๊ฐ„๋Œ€๋Š” ์–ธ์ œ์ธ๊ฐ€? --> ์ƒˆ๋ฒฝ ํ•œ ๋‘ ์‹œ





2021๋…„์€ Wavve, Tving, ์™“์ฐจ, Youtube ๊ทธ๋ฆฌ๊ณ  ๋ณธ๋ฐฉ ์‹œ์ฒญ ๋น„์œจ๋„ ๊ฝค ๋†’์•˜๊ธฐ ๋•Œ๋ฌธ์—
์ฝ˜ํ…์ธ  ์‹œ์ฒญ ์‹œ๊ฐ„์€ ์ด๊ฒƒ๋ณด๋‹ค ๋” ๊ธธ์—ˆ๊ฒ ์ง€





์‚ฌ์‹ค ์š”์ฆ˜์€ ๋„ทํ”Œ๋ณด๋‹ค ํ‹ฐ๋น™์ด๋ž‘ ์›จ์ด๋ธŒ ๋งŽ์ด ๋ณธ๋‹ค.
์ง€๋ฆฌ์‚ฐ, ์ˆ ๋„๋…€, ์ฒญ์™€๋Œ€, ์˜ท์†Œ๋งค ๋‹ค ์›จ์ด๋ธŒ๋ž‘ ํ‹ฐ๋น™

์˜ท์†Œ๋งค๊ฐ€ ๋„ทํ”Œ ์•ˆ๋“ค์–ด๊ฐ„๊ฒŒ ๋„˜ ์•„์‰ฌ์šธ ๋”ฐ๋ฆ„์ด๋‹ค.
์‚ฌ๊ทน ์ข‹์•„ํ•˜๋Š” ๊ธ€๋กœ๋ฒŒ ์‹œ์žฅ ํƒ€๊ฒŸ ์ œ๋Œ€๋กœ ํ•  ์ž‘ํ’ˆ์ด๋ผ๊ณ  ์ƒ๊ฐํ•˜๋Š”๋ฐ ใ… 


๊ทธ๋‚˜์ €๋‚˜
ํ‹ฐ๋น™ / ์›จ์ด๋ธŒ๋„ ๋ฐ์ดํ„ฐ์ข€ ์ฃผ์‹ค๋ž˜์š”? ใ… 




2022๋„ ํ’์š”๋กœ์šด ์ฝ˜ํ…์ธ ๋กœ ํ–‰๋ณต์„ ์ฑ„์›Œ๊ฐˆ ์˜ˆ์ •
2022 ์›ํƒ‘์€ ๋ˆ„๊ฐ€๋˜๋ ค๋‚˜~~~~


์‚ฌ๋ž‘ํ•ด์š” ๋„ทํ”Œ๋ฆญ์Šค
๋„ทํ”Œ๋ฆญ์Šค ์ž…์‚ฌํ•˜๊ณ  ์‹ถ๋‹ค

๋







* ์ฐธ๊ณ ์ž๋ฃŒ : https://support.dataquest.io/en/articles/366