지난 게시글에서 데이터 처리 연습한것을 토대로,
날짜에 따라 국가별 코로나 사망자 수를 시각화하기 위해 데이터 처리 연습해봤다.
확실히 강의 내용 없이 혼자 하려니 쉽지 않다.
코드를 더 깔끔하게 작성할 수 있겠지만, 일단 이게 내 수준이니까 우선 수정없이 올려본다.
import pandas as pd
import os
path = ('../COVID-19-master/csse_covid_19_data/csse_covid_19_daily_reports/')
df = pd.read_csv(path + '01-22-2020.csv', encoding = 'utf-8-sig')
print(df.shape)
df.head()
(38, 6)
| Province/State | Country/Region | Last Update | Confirmed | Deaths | Recovered | |
|---|---|---|---|---|---|---|
| 0 | Anhui | Mainland China | 1/22/2020 17:00 | 1.0 | NaN | NaN |
| 1 | Beijing | Mainland China | 1/22/2020 17:00 | 14.0 | NaN | NaN |
| 2 | Chongqing | Mainland China | 1/22/2020 17:00 | 6.0 | NaN | NaN |
| 3 | Fujian | Mainland China | 1/22/2020 17:00 | 1.0 | NaN | NaN |
| 4 | Gansu | Mainland China | 1/22/2020 17:00 | NaN | NaN | NaN |
df = pd.read_csv(path + '05-21-2020.csv', encoding = 'utf-8-sig')
df.head()
| FIPS | Admin2 | Province_State | Country_Region | Last_Update | Lat | Long_ | Confirmed | Deaths | Recovered | Active | Combined_Key | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 45001.0 | Abbeville | South Carolina | US | 2020-05-22 02:36:51 | 34.223334 | -82.461707 | 36 | 0 | 0 | 36 | Abbeville, South Carolina, US |
| 1 | 22001.0 | Acadia | Louisiana | US | 2020-05-22 02:36:51 | 30.295065 | -92.414197 | 269 | 15 | 0 | 254 | Acadia, Louisiana, US |
| 2 | 51001.0 | Accomack | Virginia | US | 2020-05-22 02:36:51 | 37.767072 | -75.632346 | 709 | 11 | 0 | 698 | Accomack, Virginia, US |
| 3 | 16001.0 | Ada | Idaho | US | 2020-05-22 02:36:51 | 43.452658 | -116.241552 | 792 | 23 | 0 | 769 | Ada, Idaho, US |
| 4 | 19001.0 | Adair | Iowa | US | 2020-05-22 02:36:51 | 41.330756 | -94.471059 | 6 | 0 | 0 | 6 | Adair, Iowa, US |
- 1월과 5월의 csv Data 확인시 Format이 달라짐
- 어짜피 내가 필요한건, 국가(Country_Region, Country/Region), Death, 날짜
df[['Country_Region', 'Deaths']].groupby('Country_Region').sum().isnull()
| Deaths | |
|---|---|
| Country_Region | |
| Afghanistan | False |
| Albania | False |
| Algeria | False |
| Andorra | False |
| Angola | False |
| ... | ... |
| West Bank and Gaza | False |
| Western Sahara | False |
| Yemen | False |
| Zambia | False |
| Zimbabwe | False |
188 rows × 1 columns
csse_covid_19_daily_reports 폴더에 있는 모든 Data를 받아오기
- csse_covid_19_daily_reports 폴더에 있는 모든 Data를 불러오기
- csv 파일에서 국가 컬러명, 사망자수만 가져오기
- 국가 컬럼명은 Country/Region으로 통일
- null 값 있는 Column은 삭제하기
- 국가명은 표준 국가명으로 통일
dir : 'COVID-19-master/csse_covid_19_data/country_convert.json'
- Country/Region으로 Groupby해서 값은 합해주기
- 새로운 파일 읽어질 때마다 기존 파일에 merge 하기
import json
import pandas as pd
with open('../COVID-19-master/csse_covid_19_data/country_convert.json', 'r', encoding ='utf-8-sig') as json_file:
json_data = json.load(json_file)
# print(json_data)
def change_country_name(row):
if row['Country/Region'] in json_data:
row['Country/Region'] = json_data[row['Country/Region']]
return row['Country/Region']
def read_df(file):
path = ('../COVID-19-master/csse_covid_19_data/csse_covid_19_daily_reports/')
df = pd.read_csv(path + file, encoding = 'utf-8-sig')
try:
df = df[['Country/Region', 'Deaths']]
except:
df = df[['Country_Region', 'Deaths']]
df.columns = ['Country/Region', 'Deaths']
df = df.dropna(subset = ['Deaths'])
df['Country/Region'] = df.apply(change_country_name, axis = 1) # axis = 1 행순서로 들어감
df = df.groupby('Country/Region').sum()
# print(df.head())
return df
file_list = os.listdir(path)
csv_list = list()
i = 0
first_doc = True
for file in file_list:
if file.split('.')[1] == 'csv':
csv_list.append(file)
for file in csv_list:
doc = read_df(file)
file_date = file.split('.')[0].lstrip('0').replace('-','/')
doc.columns = [file_date]
# doc.columns[0] = file_date
# 이렇게 적으면 'Index does not support mutable operations' 이런 Error 발생
# 찾아보니, index 데이터 객체는 인덱싱을 활용한 값 할당이 안됨.
# list 형태로 전부다 바꿔줘야함.
if first_doc:
final_doc = doc
first_doc = False
else:
final_doc = pd.merge(final_doc, doc, how = 'outer', on = 'Country/Region')
final_doc = final_doc.fillna(0)
final_doc.head()
| 1/22/2020 | 1/23/2020 | 1/24/2020 | 1/25/2020 | 1/26/2020 | 1/27/2020 | 1/28/2020 | 1/29/2020 | 1/30/2020 | 1/31/2020 | ... | 6/08/2020 | 6/09/2020 | 6/10/2020 | 6/11/2020 | 6/12/2020 | 6/13/2020 | 6/14/2020 | 6/15/2020 | 6/16/2020 | 6/17/2020 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Country/Region | |||||||||||||||||||||
| China | 17.0 | 18.0 | 26.0 | 42.0 | 56.0 | 82.0 | 131.0 | 133.0 | 171.0 | 213.0 | ... | 4638.0 | 4638.0 | 4638.0 | 4638.0 | 4638.0 | 4638.0 | 4638.0 | 4638.0 | 4638.0 | 4638.0 |
| Australia | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 102.0 | 102.0 | 102.0 | 102.0 | 102.0 | 102.0 | 102.0 | 102.0 | 102.0 | 102.0 |
| Cambodia | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| Canada | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 7910.0 | 7970.0 | 8038.0 | 8071.0 | 8125.0 | 8183.0 | 8218.0 | 8228.0 | 8271.0 | 8312.0 |
| Finland | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 323.0 | 324.0 | 324.0 | 325.0 | 325.0 | 325.0 | 326.0 | 326.0 | 326.0 | 326.0 |
5 rows × 148 columns
# final_doc.shape
final_doc.astype('int64')
| 1/22/2020 | 1/23/2020 | 1/24/2020 | 1/25/2020 | 1/26/2020 | 1/27/2020 | 1/28/2020 | 1/29/2020 | 1/30/2020 | 1/31/2020 | ... | 6/08/2020 | 6/09/2020 | 6/10/2020 | 6/11/2020 | 6/12/2020 | 6/13/2020 | 6/14/2020 | 6/15/2020 | 6/16/2020 | 6/17/2020 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Country/Region | |||||||||||||||||||||
| China | 17 | 18 | 26 | 42 | 56 | 82 | 131 | 133 | 171 | 213 | ... | 4638 | 4638 | 4638 | 4638 | 4638 | 4638 | 4638 | 4638 | 4638 | 4638 |
| Australia | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 102 | 102 | 102 | 102 | 102 | 102 | 102 | 102 | 102 | 102 |
| Cambodia | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Canada | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 7910 | 7970 | 8038 | 8071 | 8125 | 8183 | 8218 | 8228 | 8271 | 8312 |
| Finland | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 323 | 324 | 324 | 325 | 325 | 325 | 326 | 326 | 326 | 326 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| Sao Tome and Principe | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 |
| Yemen | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 112 | 127 | 129 | 136 | 139 | 160 | 164 | 208 | 214 | 244 |
| Comoros | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 3 | 3 |
| Tajikistan | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 48 | 48 | 48 | 49 | 49 | 50 | 50 | 50 | 50 | 51 |
| Lesotho | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
187 rows × 148 columns
Country/Region와 국기 이미지 불러오기 위한 iso2 매칭
한가지 문제가 있는데, Namibia 국가의 iso2 값이 'NA'이다. 결측치여서 NA가 아니라 iso2 값이 'NA'인 것이다 그래서 그냥 불러오면, NA를 결측치로 인식하기 때문에 옵션을 사용해야한다.
. keep_default_na = False : default 결측치 변환데이터를 사용하지 않고, na_values로 지정한 데이터만 결측치로 변환
. na_values = "" : 결측치로 변환할 값을 지정(여기서는 0(공백)을 결측치로 보면됨)
country_info = pd.read_csv("../COVID-19-master/csse_covid_19_data/UID_ISO_FIPS_LookUp_Table.csv", encoding='utf-8-sig', keep_default_na = False, na_values='')
country_info[country_info['Country_Region'] == 'Namibia']
| Unnamed: 0 | Unnamed: 0.1 | UID | iso2 | iso3 | code3 | FIPS | Admin2 | Province_State | Country_Region | Lat | Long_ | Combined_Key | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 115 | 115 | 115 | 516.0 | NA | NAM | 516.0 | NaN | NaN | NaN | Namibia | -22.9576 | 18.4904 | Namibia |
나는 국가 컬럼과 그에따른 iso2 값만 필요하다
country_info = country_info[['Country_Region', 'iso2']]
country_info.columns = ['Country/Region', 'iso2']
# drop_duplicates 로 country/region Column의 중복 데이터 제거해주고, 중복 제거시 마지막 data를 남긴다
country_info = country_info.drop_duplicates(subset = 'Country/Region', keep = 'last')
country_info.head()
| Country/Region | iso2 | |
|---|---|---|
| 0 | Botswana | BW |
| 1 | Burundi | BI |
| 2 | Sierra Leone | SL |
| 3 | Afghanistan | AF |
| 4 | Albania | AL |
일별 사망자 데이터 머지해놓은 dataframe과 Country_info를 merge해주자
final_graph_doc = pd.merge(final_doc, country_info, how = 'left', on = 'Country/Region')
# 국가와 iso2 매칭후에 iso2 결측값인 row를 확인해보자
# 필요하면 iso2를 수작업으로 매칭해주던지, 필요없다 생각되면 삭제하자
# 나는 삭제할거야
final_graph_doc[final_graph_doc['iso2'].isnull()]
| Country/Region | 1/22/2020 | 1/23/2020 | 1/24/2020 | 1/25/2020 | 1/26/2020 | 1/27/2020 | 1/28/2020 | 1/29/2020 | 1/30/2020 | ... | 6/09/2020 | 6/10/2020 | 6/11/2020 | 6/12/2020 | 6/13/2020 | 6/14/2020 | 6/15/2020 | 6/16/2020 | 6/17/2020 | iso2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 26 | Others | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN |
| 109 | Cruise Ship | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN |
| 173 | Diamond Princess | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 13.0 | 13.0 | 13.0 | 13.0 | 13.0 | 13.0 | 13.0 | 13.0 | 13.0 | NaN |
| 178 | MS Zaandam | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 2.0 | 2.0 | 2.0 | 2.0 | 2.0 | 2.0 | 2.0 | 2.0 | 2.0 | NaN |
| 182 | Sao Tome and Principe | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 12.0 | 12.0 | 12.0 | 12.0 | 12.0 | 12.0 | 12.0 | 12.0 | 12.0 | NaN |
| 183 | Yemen | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 127.0 | 129.0 | 136.0 | 139.0 | 160.0 | 164.0 | 208.0 | 214.0 | 244.0 | NaN |
| 184 | Comoros | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 2.0 | 2.0 | 2.0 | 2.0 | 2.0 | 2.0 | 2.0 | 3.0 | 3.0 | NaN |
| 185 | Tajikistan | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 48.0 | 48.0 | 49.0 | 49.0 | 50.0 | 50.0 | 50.0 | 50.0 | 51.0 | NaN |
| 186 | Lesotho | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN |
9 rows × 150 columns
final_graph_doc.dropna(subset = ['iso2']).isnull().sum()
# 이렇게 하면 iso2의 null 갑이 0가 된걸 볼 수 있다
Country/Region 0
1/22/2020 0
1/23/2020 0
1/24/2020 0
1/25/2020 0
..
6/14/2020 0
6/15/2020 0
6/16/2020 0
6/17/2020 0
iso2 0
Length: 150, dtype: int64
final_graph_doc = final_graph_doc.dropna(subset = ['iso2'])
print(final_graph_doc.isnull().sum())
final_graph_doc.head()
Country/Region 0
1/22/2020 0
1/23/2020 0
1/24/2020 0
1/25/2020 0
..
6/14/2020 0
6/15/2020 0
6/16/2020 0
6/17/2020 0
iso2 0
Length: 150, dtype: int64
| Country/Region | 1/22/2020 | 1/23/2020 | 1/24/2020 | 1/25/2020 | 1/26/2020 | 1/27/2020 | 1/28/2020 | 1/29/2020 | 1/30/2020 | ... | 6/09/2020 | 6/10/2020 | 6/11/2020 | 6/12/2020 | 6/13/2020 | 6/14/2020 | 6/15/2020 | 6/16/2020 | 6/17/2020 | iso2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | China | 17.0 | 18.0 | 26.0 | 42.0 | 56.0 | 82.0 | 131.0 | 133.0 | 171.0 | ... | 4638.0 | 4638.0 | 4638.0 | 4638.0 | 4638.0 | 4638.0 | 4638.0 | 4638.0 | 4638.0 | CN |
| 1 | Australia | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 102.0 | 102.0 | 102.0 | 102.0 | 102.0 | 102.0 | 102.0 | 102.0 | 102.0 | AU |
| 2 | Cambodia | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | KH |
| 3 | Canada | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 7970.0 | 8038.0 | 8071.0 | 8125.0 | 8183.0 | 8218.0 | 8228.0 | 8271.0 | 8312.0 | CA |
| 4 | Finland | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 324.0 | 324.0 | 325.0 | 325.0 | 325.0 | 326.0 | 326.0 | 326.0 | 326.0 | FI |
5 rows × 150 columns
국가별 국기 image 를 가져오기 위해서 iso2를 토대로 url을 만들어주자
flag_url = 'https://www.countryflags.io/' + final_graph_doc['iso2'] + '/flat/64.png'
final_graph_doc['iso2'] = flag_url
final_graph_doc.head()
| Country/Region | 1/22/2020 | 1/23/2020 | 1/24/2020 | 1/25/2020 | 1/26/2020 | 1/27/2020 | 1/28/2020 | 1/29/2020 | 1/30/2020 | ... | 6/09/2020 | 6/10/2020 | 6/11/2020 | 6/12/2020 | 6/13/2020 | 6/14/2020 | 6/15/2020 | 6/16/2020 | 6/17/2020 | iso2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | China | 17.0 | 18.0 | 26.0 | 42.0 | 56.0 | 82.0 | 131.0 | 133.0 | 171.0 | ... | 4638.0 | 4638.0 | 4638.0 | 4638.0 | 4638.0 | 4638.0 | 4638.0 | 4638.0 | 4638.0 | https://www.countryflags.io/CN/flat/64.png |
| 1 | Australia | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 102.0 | 102.0 | 102.0 | 102.0 | 102.0 | 102.0 | 102.0 | 102.0 | 102.0 | https://www.countryflags.io/AU/flat/64.png |
| 2 | Cambodia | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | https://www.countryflags.io/KH/flat/64.png |
| 3 | Canada | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 7970.0 | 8038.0 | 8071.0 | 8125.0 | 8183.0 | 8218.0 | 8228.0 | 8271.0 | 8312.0 | https://www.countryflags.io/CA/flat/64.png |
| 4 | Finland | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 324.0 | 324.0 | 325.0 | 325.0 | 325.0 | 326.0 | 326.0 | 326.0 | 326.0 | https://www.countryflags.io/FI/flat/64.png |
5 rows × 150 columns
그래프를 그리려면 table column 순서가 국가/iso2/사망자 데이터... 여야한다
iso2 column을 Coutnry/Region 옆으로 옮기자
cols = final_graph_doc.columns.tolist() # tolist()를 쓰면 list 타입으로 변수에 저장
cols.remove('iso2')
cols.insert(1, 'iso2')
cols
['Country/Region', 'iso2', '1/22/2020', '1/23/2020', '1/24/2020', '1/25/2020', '1/26/2020', '1/27/2020', '1/28/2020', '1/29/2020', '1/30/2020', '1/31/2020', '2/01/2020', '2/02/2020', '2/03/2020', '2/04/2020', '2/05/2020', '2/06/2020', '2/07/2020', '2/08/2020', '2/09/2020', '2/10/2020', '2/11/2020', '2/12/2020', '2/13/2020', '2/14/2020', '2/15/2020', '2/16/2020', '2/17/2020', '2/18/2020', '2/19/2020', '2/20/2020', '2/21/2020', '2/22/2020', '2/23/2020', '2/24/2020', '2/25/2020', '2/26/2020', '2/27/2020', '2/28/2020', '2/29/2020', '3/01/2020', '3/02/2020', '3/03/2020', '3/04/2020', '3/05/2020', '3/06/2020', '3/07/2020', '3/08/2020', '3/09/2020', '3/10/2020', '3/11/2020', '3/12/2020', '3/13/2020', '3/14/2020', '3/15/2020', '3/16/2020', '3/17/2020', '3/18/2020', '3/19/2020', '3/20/2020', '3/21/2020', '3/22/2020', '3/23/2020', '3/24/2020', '3/25/2020', '3/26/2020', '3/27/2020', '3/28/2020', '3/29/2020', '3/30/2020', '3/31/2020', '4/01/2020', '4/02/2020', '4/03/2020', '4/04/2020', '4/05/2020', '4/06/2020', '4/07/2020', '4/08/2020', '4/09/2020', '4/10/2020', '4/11/2020', '4/12/2020', '4/13/2020', '4/14/2020', '4/15/2020', '4/16/2020', '4/17/2020', '4/18/2020', '4/19/2020', '4/20/2020', '4/21/2020', '4/22/2020', '4/23/2020', '4/24/2020', '4/25/2020', '4/26/2020', '4/27/2020', '4/28/2020', '4/29/2020', '4/30/2020', '5/01/2020', '5/02/2020', '5/03/2020', '5/04/2020', '5/05/2020', '5/06/2020', '5/07/2020', '5/08/2020', '5/09/2020', '5/10/2020', '5/11/2020', '5/12/2020', '5/13/2020', '5/14/2020', '5/15/2020', '5/16/2020', '5/17/2020', '5/18/2020', '5/19/2020', '5/20/2020', '5/21/2020', '5/22/2020', '5/23/2020', '5/24/2020', '5/25/2020', '5/26/2020', '5/27/2020', '5/28/2020', '5/29/2020', '5/30/2020', '5/31/2020', '6/01/2020', '6/02/2020', '6/03/2020', '6/04/2020', '6/05/2020', '6/06/2020', '6/07/2020', '6/08/2020', '6/09/2020', '6/10/2020', '6/11/2020', '6/12/2020', '6/13/2020', '6/14/2020', '6/15/2020', '6/16/2020', '6/17/2020']
cols[1]
'iso2'
# 변환된 Columns 순서로 final_graph_doc 재할당
final_graph_doc = final_graph_doc[cols]
final_graph_doc.head()
# Column 이름 바꿔주기 iso2 -> Country_Flag
# print(cols[1])
# cols[1] = 'Country_Flag' # cols가 list type이어서 인덱싱으로 값변환 가능
# final_graph_doc.columns = [cols] # dataframe index는 list로 한번에 바꿔줘야함
# final_graph_doc.head()
| Country/Region | iso2 | 1/22/2020 | 1/23/2020 | 1/24/2020 | 1/25/2020 | 1/26/2020 | 1/27/2020 | 1/28/2020 | 1/29/2020 | ... | 6/08/2020 | 6/09/2020 | 6/10/2020 | 6/11/2020 | 6/12/2020 | 6/13/2020 | 6/14/2020 | 6/15/2020 | 6/16/2020 | 6/17/2020 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | China | https://www.countryflags.io/CN/flat/64.png | 17.0 | 18.0 | 26.0 | 42.0 | 56.0 | 82.0 | 131.0 | 133.0 | ... | 4638.0 | 4638.0 | 4638.0 | 4638.0 | 4638.0 | 4638.0 | 4638.0 | 4638.0 | 4638.0 | 4638.0 |
| 1 | Australia | https://www.countryflags.io/AU/flat/64.png | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 102.0 | 102.0 | 102.0 | 102.0 | 102.0 | 102.0 | 102.0 | 102.0 | 102.0 | 102.0 |
| 2 | Cambodia | https://www.countryflags.io/KH/flat/64.png | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 3 | Canada | https://www.countryflags.io/CA/flat/64.png | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 7910.0 | 7970.0 | 8038.0 | 8071.0 | 8125.0 | 8183.0 | 8218.0 | 8228.0 | 8271.0 | 8312.0 |
| 4 | Finland | https://www.countryflags.io/FI/flat/64.png | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 323.0 | 324.0 | 324.0 | 325.0 | 325.0 | 325.0 | 326.0 | 326.0 | 326.0 | 326.0 |
5 rows × 150 columns
cols[1] = 'Country_Flag'
final_graph_doc.columns = [cols]
final_graph_doc.head()
| Country/Region | Country_Flag | 1/22/2020 | 1/23/2020 | 1/24/2020 | 1/25/2020 | 1/26/2020 | 1/27/2020 | 1/28/2020 | 1/29/2020 | ... | 6/08/2020 | 6/09/2020 | 6/10/2020 | 6/11/2020 | 6/12/2020 | 6/13/2020 | 6/14/2020 | 6/15/2020 | 6/16/2020 | 6/17/2020 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | China | https://www.countryflags.io/CN/flat/64.png | 17.0 | 18.0 | 26.0 | 42.0 | 56.0 | 82.0 | 131.0 | 133.0 | ... | 4638.0 | 4638.0 | 4638.0 | 4638.0 | 4638.0 | 4638.0 | 4638.0 | 4638.0 | 4638.0 | 4638.0 |
| 1 | Australia | https://www.countryflags.io/AU/flat/64.png | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 102.0 | 102.0 | 102.0 | 102.0 | 102.0 | 102.0 | 102.0 | 102.0 | 102.0 | 102.0 |
| 2 | Cambodia | https://www.countryflags.io/KH/flat/64.png | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 3 | Canada | https://www.countryflags.io/CA/flat/64.png | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 7910.0 | 7970.0 | 8038.0 | 8071.0 | 8125.0 | 8183.0 | 8218.0 | 8228.0 | 8271.0 | 8312.0 |
| 4 | Finland | https://www.countryflags.io/FI/flat/64.png | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 323.0 | 324.0 | 324.0 | 325.0 | 325.0 | 325.0 | 326.0 | 326.0 | 326.0 | 326.0 |
5 rows × 150 columns
완성된 Data csv 파일로 추출
final_graph_doc.to_csv('./final_graph_doc.csv')
'데이터분석 > Pandas' 카테고리의 다른 글
| [Pandas] plotly 사용해서 시각화 해보기 1 (0) | 2021.07.18 |
|---|---|
| [Pandas] 데이터 처리 연습2 결과물 시각화 (0) | 2021.07.17 |
| [Pandas] 데이터 처리 연습 (0) | 2021.07.15 |
| [Pandas] DataFrame Join (concat, merge) (0) | 2021.07.12 |
| [Pandas] 데이터 처리 연습 (0) | 2021.07.12 |