背景介绍
气候变化在美国一直属于争议话题,现任总统特朗普曾指其为“中国制造的骗局”(牵制美国制造业)。我们提供的这个数据集,来自记录地表温度最权威的三家机构—英国的HadCrut、美国的NASA(航空航天局)和NOAA(海洋和大气管理局)。气候变化到底是伪科学,还是客观存在,在探索数据的过程中,你一定会得出自己的结论。
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import plotly.offline as py
py.init_notebook_mode(connected=True)
import plotly.graph_objs as go
import plotly.tools as tls
import seaborn as sns
import time
import warnings
warnings.filterwarnings('ignore')
global_temp_country = pd.read_csv('../input/GlobalLandTemperatures/GlobalLandTemperaturesByCountry.csv')
1) 绘制各国平均温度图
In [2]:
#移除重复的国家(殖民地不作为国家)和无温度信息的国家
global_temp_country_clear = global_temp_country[~global_temp_country['Country'].isin(
['Denmark', 'Antarctica', 'France', 'Europe', 'Netherlands',
'United Kingdom', 'Africa', 'South America'])]
global_temp_country_clear = global_temp_country_clear.replace(
['Denmark (Europe)', 'France (Europe)', 'Netherlands (Europe)', 'United Kingdom (Europe)'],
['Denmark', 'France', 'Netherlands', 'United Kingdom'])
#取各国家温度平均值
countries = np.unique(global_temp_country_clear['Country'])
mean_temp = []
for country in countries:
mean_temp.append(global_temp_country_clear[global_temp_country_clear['Country'] ==
country]['AverageTemperature'].mean())
data = [ dict(
type = 'choropleth',
locations = countries,
z = mean_temp,
locationmode = 'country names',
text = countries,
marker = dict(
line = dict(color = 'rgb(0,0,0)', width = 1)),
colorbar = dict(autotick = True, tickprefix = '',
title = '# Average\nTemperature,\n°C')
)
]layout = dict(
title =
'Average land temperature in countries',
geo = dict(
showframe = False,
showocean = True,
oceancolor = 'rgb(0,255,255)',
projection = dict(
type = 'orthographic',
rotation = dict(
lon = 60,
lat = 10),
),
lonaxis = dict(
showgrid = True,
gridcolor = 'rgb(102, 102, 102)'
),
lataxis = dict(
showgrid = True,
gridcolor = 'rgb(102, 102, 102)'
)
),
)fig = dict(data=data, layout=layout)
py.iplot(fig, validate=False, filename='worldmap')
俄罗斯和加拿大都有平均温度较低的值;最低温度出现在格陵兰(在地图上很明显);最热的国家自然在非洲,赤道区域。
2)按平均温度给国家分类,并绘制水平主图。
In [3]:
mean_temp_bar, countries_bar = (list(x)
for x in zip(*sorted(zip(mean_temp, countries),
reverse = True)))
sns.set(font_scale=0.9) f, ax = plt.subplots(figsize=(4.5, 50))
colors_cw = sns.color_palette('coolwarm', len(countries))
sns.barplot(mean_temp_bar, countries_bar, palette = colors_cw[::-1])
Text = ax.set(xlabel='Average temperature', title='Average land temperature in countries')
3) 是否存在全球变暖?
我们先读入"GlobalTemperatures.csv"中信息(包含地球每月温度),并在图中展示。
In [5]:
global_temp = pd.read_csv("../input/GlobalLandTemperatures/GlobalTemperatures.csv")#从日期中抽取年years = np.unique(global_temp['dt'].apply(lambda x: x[:4]))mean_temp_world = []mean_temp_world_uncertainty = []for year in years:
mean_temp_world.append(global_temp[global_temp['dt'].apply(
lambda x: x[:4
]) == year]['LandAverageTemperature'].mean())
mean_temp_world_uncertainty.append(global_temp[global_temp['dt'].apply(
lambda x: x[:4]) == year]['LandAverageTemperatureUncertainty'].mean())trace0 = go.Scatter(
x = years,
y = np.array(mean_temp_world) + np.array(mean_temp_world_uncertainty),
fill= None,
mode='lines',
name='Uncertainty top',
line=dict(
color='rgb(0, 255, 255)',
))trace1 = go.Scatter(
x = years,
y = np.array(mean_temp_world) - np.array(mean_temp_world_uncertainty),
fill='tonexty',
mode='lines',
name='Uncertainty bot',
line=dict(
color='rgb(0, 255, 255)',
))trace2 = go.Scatter(
x = years,
y = mean_temp_world,
name='Average Temperature',
line=dict(
color='rgb(199, 121, 093)',
))
data = [trace0, trace1, trace2]layout = go.Layout(
xaxis=dict(title='year'),
yaxis=dict(title='Average Temperature, °C'),
title='Average land temperature in world',
showlegend = False)
fig = go.Figure(data=data, layout=layout)
py.iplot(fig)
从表中可以看出,全球在变暖,地表平均温度在过去30年达到了顶峰,最快的温度攀升也发生在这30年!这很让人担心啊,我希望人类能有办法利用生态能源,减少二氧化碳排放,否则我们就完蛋了。这张图也显示了置信区间,标明温度测量在过去几年中越来越精确了。
我们来看看一些国家的每年温度变化,一个大陆挑一个国家,并把格林兰标记为地球上最冷的地方。
In [6]:
continent = ['Russia', 'United States', 'Niger', 'Greenland', 'Australia', 'Bolivia']mean_temp_year_country = [ [0] * len(years[70:]) for i in range(len(continent))]j = 0for country in continent:
all_temp_country = global_temp_country_clear[global_temp_country_clear['Country'] == country]
i = 0
for year in years[70:]:
mean_temp_year_country[j][i] = all_temp_country[all_temp_country['dt'].apply(
lambda x: x[:4]) == year]['AverageTemperature'].mean()
i +=1
j += 1traces = []colors = ['rgb(0, 255, 255)', 'rgb(255, 0, 255)', 'rgb(0, 0, 0)',
'rgb(255, 0, 0)', 'rgb(0, 255, 0)', 'rgb(0, 0, 255)']for i in range(len(continent)):
traces.append(go.Scatter(
x=years[70:],
y=mean_temp_year_country[i],
mode='lines',
name=continent[i],
line=dict(color=colors[i]),
))
layout = go.Layout(
xaxis=dict(title='year'),
yaxis=dict(title='Average Temperature, °C'),
title='Average land temperature on the continents',)
fig = go.Figure(data=traces, layout=layout)py.iplot(fig)
自从1980年我们能看到国家平均年气温的持续增长,尤其寒冷国家的变化尤为剧烈。表中温度值的中断是由于这些年观测的缺失。
4) 动态图
我用plotly在Jupyter做了可视化,但当我把报告上传科赛时,发现Stream()功能不能和pyplot.offline协同工作。所以,没办法展现动态变化了。
我们创建一个地图,展示10年期间国家平均温度的变化。
In [7]:
#从日期中抽取年years =