Python數(shù)據(jù)可視化：類別比較圖表可視化

發(fā)布人：美男子玩編程時(shí)間：2022-11-23 來源：工程師

加入技術(shù)交流群
- 掃碼加入
  和技術(shù)大咖面對(duì)面交流
  海量資料庫查詢

在學(xué)習(xí)本篇博文之前請(qǐng)先看一看之前發(fā)過的關(guān)聯(lián)知識(shí)：

Python數(shù)據(jù)可視化：如何選擇合適的圖表可視化？

根據(jù)表達(dá)數(shù)據(jù)的側(cè)重內(nèi)容點(diǎn)，將圖表類型分為6大類：類別比較圖表、數(shù)據(jù)關(guān)系圖表、數(shù)據(jù)分布圖表、時(shí)間序列圖表、局部整體圖表和地理空間圖表（有些圖表也可以歸類于兩種或多種圖表類型）。

本篇將介紹類別比較圖表的可視化方法。

類別比較型圖表的數(shù)據(jù)一般分為：數(shù)值型和類別型兩種數(shù)據(jù)類型，主要包括：柱形圖、條形圖、雷達(dá)圖、詞云圖等，通常用來比較數(shù)據(jù)的規(guī)模。如下所示：

柱狀圖

柱形圖是一種以長方形的長度為變量的統(tǒng)計(jì)圖表。柱形圖用于顯示一段時(shí)間內(nèi)的數(shù)據(jù)變化或顯示各項(xiàng)之間的比較情況。

在柱形圖中，類別型或序數(shù)型變量映射到橫軸的位置，數(shù)值型變量映射到矩形的高度?？刂浦螆D的兩個(gè)重要參數(shù)是：“系列重疊"和“分類間距”。

“分類間距"控制同一數(shù)據(jù)系列的柱形寬度；
“系列重疊"控制不同數(shù)據(jù)系列之間的距離。

下圖為常見的柱形圖類型：單數(shù)據(jù)系列柱形圖、多數(shù)據(jù)系列柱形圖、堆積柱形圖和百分比堆積柱形圖。

1.1、單數(shù)據(jù)系列柱形圖

通過一個(gè)示例了解單數(shù)據(jù)系列柱形圖的使用，實(shí)現(xiàn)代碼如下所示：

mydata = pd.DataFrame({'Cut': ["Fair", "Good", "Very Good", "Premium", "Ideal"],                       'Price': [4300, 3800, 3950, 4700, 3500]})
Sort_data = mydata.sort_values(by='Price', ascending=False)
fig = plt.figure(figsize=(6, 7), dpi=70)plt.subplots_adjust(left=0.1, right=0.9, top=0.9, bottom=0.1)plt.grid(axis="y", c=(217/256, 217/256, 217/256))  # 設(shè)置網(wǎng)格線# 將網(wǎng)格線置于底層ax = plt.gca()  # 獲取邊框ax.spines['top'].set_color('none')  # 設(shè)置上‘脊梁’為紅色ax.spines['right'].set_color('none')  # 設(shè)置上‘脊梁’為無色ax.spines['left'].set_color('none')  # 設(shè)置上‘脊梁’為無色
plt.bar(Sort_data['Cut'], Sort_data['Price'],        width=0.6, align="center", label="Cut")
plt.ylim(0, 6000)  # 設(shè)定x軸范圍plt.xlabel('Cut')plt.ylabel('Price')plt.show()

效果如下所示：

1.2、多數(shù)據(jù)系列柱形圖

通過一個(gè)示例了解多數(shù)據(jù)系列柱形圖的使用，實(shí)現(xiàn)代碼如下所示：

x_label = np.array(df["Catergory"])x = np.arange(len(x_label))y1 = np.array(df["1996"])y2 = np.array(df["1997"])
fig = plt.figure(figsize=(5, 5))plt.subplots_adjust(left=0.1, right=0.9, top=0.9, bottom=0.1)  # 設(shè)置繪圖區(qū)域大小位置
plt.bar(x, y1, width=0.3, color='#00AFBB', label='1996', edgecolor='k',        linewidth=0.25)  # 調(diào)整y1軸位置，顏色，label為圖例名稱，與下方legend結(jié)合使用plt.bar(x+0.3, y2, width=0.3, color='#FC4E07', label='1997',        edgecolor='k', linewidth=0.25)  # 調(diào)整y2軸位置，顏色，label為圖例名稱，與下方legend結(jié)合使用plt.xticks(x+0.15, x_label, size=12)  # 設(shè)置x軸刻度，位置,大小
# 顯示圖例，loc圖例顯示位置(可以用坐標(biāo)方法顯示），ncol圖例顯示幾列，默認(rèn)為1列,frameon設(shè)置圖形邊框plt.legend(loc=(1, 0.5), ncol=1, frameon=False)
plt.yticks(size=12)  # 設(shè)置y軸刻度，位置,大小plt.grid(axis="y", c=(217/256, 217/256, 217/256))  # 設(shè)置網(wǎng)格線# 將y軸網(wǎng)格線置于底層# plt.xlabel("Quarter",labelpad=10,size=18,)                          #設(shè)置x軸標(biāo)簽,labelpad設(shè)置標(biāo)簽距離x軸的位置# plt.ylabel("Amount",labelpad=10,size=18,)                                   #設(shè)置y軸標(biāo)簽,labelpad設(shè)置標(biāo)簽距離y軸的位置

ax = plt.gca()  ax.spines['top'].set_color('none')  # 設(shè)置上‘脊梁’為無色ax.spines['right'].set_color('none')  # 設(shè)置右‘脊梁’為無色ax.spines['left'].set_color('none')  # 設(shè)置左‘脊梁’為無色
plt.show()

效果如下所示：

1.3、堆積柱形圖

通過一個(gè)示例了解堆積柱形圖的使用，實(shí)現(xiàn)代碼如下所示：

Sum_df = df.apply(lambda x: x.sum(), axis=0).sort_values(ascending=False)df = df.loc[:, Sum_df.index]
meanRow_df = df.apply(lambda x: x.mean(), axis=1)Sing_df = meanRow_df.sort_values(ascending=False).index
n_row, n_col = df.shape# x_label=np.array(df.columns)x_value = np.arange(n_col)
cmap = cm.get_cmap('YlOrRd_r', n_row)color = [colors.rgb2hex(cmap(i)[:3]) for i in range(cmap.N)]
bottom_y = np.zeros(n_col)
fig = plt.figure(figsize=(5, 5))#plt.subplots_adjust(left=0.1, right=0.9, top=0.7, bottom=0.1)
for i in range(n_row):    label = Sing_df[i]    plt.bar(x_value, df.loc[label, :], bottom=bottom_y, width=0.5,            color=color[i], label=label, edgecolor='k', linewidth=0.25)    bottom_y = bottom_y+df.loc[label, :].values
plt.xticks(x_value, df.columns, size=10)  # 設(shè)置x軸刻度# plt.tick_params(axis="x",width=5)
plt.legend(loc=(1, 0.3), ncol=1, frameon=False)
plt.grid(axis="y", c=(166/256, 166/256, 166/256))
ax = plt.gca()  # 獲取整個(gè)表格邊框ax.spines['top'].set_color('none')  # 設(shè)置上‘脊梁’為無色ax.spines['right'].set_color('none')  # 設(shè)置右‘脊梁’為無色ax.spines['left'].set_color('none')  # 設(shè)置左‘脊梁’為無色
plt.show()

效果如下所示：

1.4、百分比堆積柱形圖

通過一個(gè)示例了解百分比堆積柱形圖的使用，實(shí)現(xiàn)代碼如下所示：

SumCol_df = df.apply(lambda x: x.sum(), axis=0)
df = df.apply(lambda x: x/SumCol_df, axis=1)
meanRow_df = df.apply(lambda x: x.mean(), axis=1)

Per_df = df.loc[meanRow_df.idxmax(), :].sort_values(ascending=False)

Sing_df = meanRow_df.sort_values(ascending=False).index

df = df.loc[:, Per_df.index]
n_row, n_col = df.shape
x_value = np.arange(n_col)
cmap = cm.get_cmap('YlOrRd_r', n_row)color = [colors.rgb2hex(cmap(i)[:3]) for i in range(cmap.N)]
bottom_y = np.zeros(n_col)
fig = plt.figure(figsize=(5, 5))#plt.subplots_adjust(left=0.1, right=0.9, top=0.7, bottom=0.1)
for i in range(n_row):    label = Sing_df[i]    plt.bar(x_value, df.loc[label, :], bottom=bottom_y, width=0.5,            color=color[i], label=label, edgecolor='k', linewidth=0.25)    bottom_y = bottom_y+df.loc[label, :].values
plt.xticks(x_value, df.columns, size=10)  # 設(shè)置x軸刻度plt.gca().set_yticklabels(['{:.0f}%'.format(x*100)                           for x in plt.gca().get_yticks()])
plt.legend(loc=(1, 0.3), ncol=1, frameon=False)
plt.grid(axis="y", c=(166/256, 166/256, 166/256))
ax = plt.gca()  # 獲取整個(gè)表格邊框ax.spines['top'].set_color('none')  # 設(shè)置上‘脊梁’為無色ax.spines['right'].set_color('none')  # 設(shè)置右‘脊梁’為無色ax.spines['left'].set_color('none')  # 設(shè)置左‘脊梁’為無色
plt.show()

效果如下所示：

1.5、不等寬柱形圖

有時(shí)候，我們需要在柱形圖中同時(shí)表達(dá)兩個(gè)維度的數(shù)據(jù)，除了每個(gè)柱形的高度表達(dá)了某個(gè)對(duì)象的數(shù)值大?。╕軸縱坐標(biāo)），還希望柱形的寬度也能表達(dá)該對(duì)象的另外一個(gè)數(shù)值大?。╔軸橫坐標(biāo)），以便直觀地比較這兩個(gè)維度。這時(shí)可以使用不等寬柱形圖（variablewidth column chart）來展示數(shù)據(jù)，如下圖所示：

不等寬柱形圖是常規(guī)柱形圖的一種變化形式，它用柱形的高度反映一個(gè)數(shù)值的大小，同時(shí)用柱形的寬度反映另一個(gè)數(shù)值的大小，多用在市場調(diào)查研究、維度分析等方面。上圖實(shí)現(xiàn)代碼如下所示：

# -*- coding: utf-8 -*-# %%import pandas as pdimport numpy as npfrom plotnine import *

mydata = pd.DataFrame(dict(Name=['A', 'B', 'C', 'D', 'E'],                           Scale=[35, 30, 20, 10, 5],                           ARPU=[56, 37, 63, 57, 59]))
# 構(gòu)造矩形X軸的起點(diǎn)（最小點(diǎn)）mydata['xmin'] = 0for i in range(1, 5):    mydata['xmin'][i] = np.sum(mydata['Scale'][0:i])
# 構(gòu)造矩形X軸的終點(diǎn)（最大點(diǎn)）mydata['xmax'] = 0for i in range(0, 5):    mydata['xmax'][i] = np.sum(mydata['Scale'][0:i+1])
mydata['label'] = 0for i in range(0, 5):    mydata['label'][i] = np.sum(mydata['Scale'][0:i+1])-mydata['Scale'][i]/2
base_plot = (ggplot(mydata) +             geom_rect(aes(xmin='xmin', xmax='xmax', ymin=0, ymax='ARPU', fill='Name'), colour="black", size=0.25) +             geom_text(aes(x='label', y='ARPU+3', label='ARPU'), size=14, color="black") +             geom_text(aes(x='label', y=-4, label='Name'), size=14, color="black") +             scale_fill_hue(s=0.90, l=0.65, h=0.0417, color_space='husl') +             ylab("ARPU") +             xlab("scale") +             ylim(-5, 80) +             theme(  # panel_background=element_rect(fill="white"),    #panel_grid_major = element_line(colour = "grey",size=.25,linetype ="dotted" ),    #panel_grid_minor = element_line(colour = "grey",size=.25,linetype ="dotted" ),    text=element_text(size=15),    legend_position="none",    aspect_ratio=1.15,    figure_size=(5, 5),    dpi=100))print(base_plot)

條形圖

條形圖與柱形圖類似，幾乎可以表達(dá)相同多的數(shù)據(jù)信息。

在條形圖中，類別型或序數(shù)型變量映射到縱軸的位置，數(shù)值型變量映射到矩形的寬度。條形圖的柱形變?yōu)闄M向，從而導(dǎo)致與柱形圖相比，條形圖更加強(qiáng)調(diào)項(xiàng)目之間的大小對(duì)比。尤其在項(xiàng)目名稱較長以及數(shù)量較多時(shí)，采用條形圖可視化數(shù)據(jù)會(huì)更加美觀、清晰，如下圖所示：

2.1、單數(shù)據(jù)系列條形圖

通過一個(gè)示例了解單數(shù)據(jù)系列條形圖的使用，實(shí)現(xiàn)代碼如下所示：

df = pd.read_csv('Stackedbar_Data.csv')
df = df.sort_values(by='Pensions', ascending=True)
df['Country'] = pd.Categorical(df['Country'], categories=df['Country'], ordered=True)df
# %%base_plot = (ggplot(df, aes('Country', 'Pensions')) +             # "#00AFBB"             geom_bar(stat="identity", color="black", width=0.6, fill="#FC4E07", size=0.25) +             # scale_fill_manual(values=brewer.pal(9,"YlOrRd")[c(6:2)])+             coord_flip() +             theme(    axis_title=element_text(size=15, face="plain", color="black"),    axis_text=element_text(size=12, face="plain", color="black"),    legend_title=element_text(size=13, face="plain", color="black"),    legend_position="right",    aspect_ratio=1.15,    figure_size=(6.5, 6.5),    dpi=50))
print(base_plot)

實(shí)現(xiàn)效果如下所示：

2.2、多數(shù)據(jù)系列條形圖

通過一個(gè)示例了解多數(shù)據(jù)系列條形圖的使用，實(shí)現(xiàn)代碼如下所示：

df = pd.read_csv('Stackedbar_Data.csv')
df = df.iloc[:, [0, 2, 1]]df = df.sort_values(by='Pensions', ascending=True)mydata = pd.melt(df, id_vars='Country')
mydata['Country'] = pd.Categorical(mydata['Country'], categories=df['Country'], ordered=True)

base_plot = (ggplot(mydata, aes('Country', 'value', fill='variable')) +             geom_bar(stat="identity", color="black", position=position_dodge(), width=0.7, size=0.25) +             scale_fill_manual(values=("#00AFBB", "#FC4E07", "#E7B800")) +             coord_flip() +             theme(    axis_title=element_text(size=15, face="plain", color="black"),    axis_text=element_text(size=12, face="plain", color="black"),    legend_title=element_text(size=14, face="plain", color="black"),    legend_background=element_blank(),    legend_position=(0.8, 0.2),    aspect_ratio=1.15,    figure_size=(6.5, 6.5),    dpi=50))print(base_plot)

效果如下所示：

2.3、堆積條形圖

通過一個(gè)示例了解堆積條形圖的使用，實(shí)現(xiàn)代碼如下所示：

df = pd.read_csv('Stackedbar_Data.csv')Sum_df = df.iloc[:, 1:].apply(    lambda x: x.sum(), axis=0).sort_values(ascending=True)meanRow_df = df.iloc[:, 1:].apply(lambda x: x.mean(), axis=1)Sing_df = df['Country'][meanRow_df.sort_values(ascending=True).index]mydata = pd.melt(df, id_vars='Country')mydata['variable'] = pd.Categorical(mydata['variable'], categories=Sum_df.index, ordered=True)mydata['Country'] = pd.Categorical(mydata['Country'], categories=Sing_df, ordered=True)

base_plot = (ggplot(mydata, aes('Country', 'value', fill='variable')) +             geom_bar(stat="identity", color="black", position='stack', width=0.65, size=0.25) +             scale_fill_brewer(palette="YlOrRd") +             coord_flip() +             theme(    axis_title=element_text(size=18, face="plain", color="black"),    axis_text=element_text(size=16, face="plain", color="black"),    legend_title=element_text(size=18, face="plain", color="black"),    legend_text=element_text(size=16, face="plain", color="black"),    legend_background=element_blank(),    legend_position='right',    aspect_ratio=1.15,    figure_size=(6.5, 6.5),    dpi=50))print(base_plot)

效果如下所示：

2.4、百分比堆積條形圖

通過一個(gè)示例了解百分比堆積條形圖的使用，實(shí)現(xiàn)代碼如下所示：

df = pd.read_csv('Stackedbar_Data.csv')SumCol_df = df.iloc[:, 1:].apply(lambda x: x.sum(), axis=1)df.iloc[:, 1:] = df.iloc[:, 1:].apply(lambda x: x/SumCol_df, axis=0)
meanRow_df = df.iloc[:, 1:].apply(    lambda x: x.mean(), axis=0).sort_values(ascending=True)Per_df = df.loc[:, meanRow_df.idxmax()].sort_values(ascending=True)Sing_df = df['Country'][Per_df.index]
mydata = pd.melt(df, id_vars='Country')mydata['Country'] = pd.Categorical(mydata['Country'], categories=Sing_df, ordered=True)mydata['variable'] = pd.Categorical(mydata['variable'], categories=meanRow_df.index, ordered=True)

base_plot = (ggplot(mydata, aes(x='Country', y='value', fill='variable'))             + geom_bar(stat="identity", color="black",                        position='fill', width=0.7, size=0.25)             + scale_fill_brewer(palette="GnBu")             + coord_flip()             + theme(    # text=element_text(size=15,face="plain",color="black"),    axis_title=element_text(size=18, face="plain", color="black"),    axis_text=element_text(size=16, face="plain", color="black"),    legend_title=element_text(size=18, face="plain", color="black"),    legend_text=element_text(size=16, face="plain", color="black"),    aspect_ratio=1.15,    figure_size=(6.5, 6.5),    dpi=50))print(base_plot)

效果如下所示：

雷達(dá)圖

雷達(dá)圖又稱為蜘蛛圖、極地圖或星圖，是用來比較多個(gè)定量變量的方法，可用于查看哪些變量具有相似數(shù)值，或者每個(gè)變量中有沒有異常值。此外，雷達(dá)圖也可用于查看數(shù)據(jù)集中哪些變量得分較高/低，是顯示性能表現(xiàn)的理想之選。如下圖所示：

雷達(dá)圖每個(gè)變量都具有自己的軸（從中心開始），所有的軸都以徑向排列，彼此之間的距離相等，所有軸都有相同的刻度，軸與軸之間的網(wǎng)格線通常只是作為指引用途，每個(gè)變量數(shù)值會(huì)畫在其所屬軸線之上，數(shù)據(jù)集內(nèi)的所有變量將連在一起形成一個(gè)多邊形。

雷達(dá)圖有一些重大缺點(diǎn)：

在一個(gè)雷達(dá)圖中使用多個(gè)多邊形，會(huì)令圖表難以閱讀，而且相當(dāng)混亂。特別是如果用顏色填滿多邊形，那么表面的多邊形會(huì)覆蓋下面的其他多邊形；
過多變量也會(huì)導(dǎo)致出現(xiàn)太多的軸線，使圖表難以閱讀和變得復(fù)雜，故雷達(dá)圖只能保持簡單，因而限制了可用變量的數(shù)量；
它未能很有效地比較每個(gè)變量的數(shù)值，即使借助蜘蛛網(wǎng)般的網(wǎng)格指引，也沒有直線軸上比較數(shù)值容易。

通過一個(gè)示例了解雷達(dá)圖的使用，實(shí)現(xiàn)代碼如下所示：

# -*- coding: utf-8 -*-# %%import numpy as npimport matplotlib.pyplot as pltimport pandas as pdfrom math import pifrom matplotlib.pyplot import figure, show, rcplt.rcParams["patch.force_edgecolor"] = True

df = pd.DataFrame(dict(categories=['var1', 'var2', 'var3', 'var4', 'var5'], group_A=[                  38.0, 29, 8, 7, 28], group_B=[1.5, 10, 39, 31, 15]))N = df.shape[0]angles = [n / float(N) * 2 * pi for n in range(N)]angles += angles[:1]
fig = figure(figsize=(4, 4), dpi=90)ax = fig.add_axes([0.1, 0.1, 0.6, 0.6], polar=True)ax.set_theta_offset(pi / 2)ax.set_theta_direction(-1)ax.set_rlabel_position(0)plt.xticks(angles[:-1], df['categories'], color="black", size=12)plt.ylim(0, 45)plt.yticks(np.arange(10, 50, 10), color="black", size=12,           verticalalignment='center', horizontalalignment='right')plt.grid(which='major', axis="x", linestyle='-',         linewidth='0.5', color='gray', alpha=0.5)plt.grid(which='major', axis="y", linestyle='-',         linewidth='0.5', color='gray', alpha=0.5)
values = df['group_A'].values.flatten().tolist()values += values[:1]ax.fill(angles, values, '#7FBC41', alpha=0.3)ax.plot(angles, values, marker='o', markerfacecolor='#7FBC41',        markersize=8, color='k', linewidth=0.25, label="group A")
values = df['group_B'].values.flatten().tolist()values += values[:1]ax.fill(angles, values, '#C51B7D', alpha=0.3)ax.plot(angles, values, marker='o', markerfacecolor='#C51B7D',        markersize=8, color='k', linewidth=0.25, label="group B")plt.legend(loc="center", bbox_to_anchor=(1.25, 0, 0, 1))
plt.show()

效果如下所示：

詞云圖

詞云圖通過使每個(gè)字的大小與其出現(xiàn)頻率成正比，顯示不同單詞在給定文本中的出現(xiàn)頻率，這會(huì)過濾掉大量的文本信息，使瀏覽者只要一眼掃過文本就可以領(lǐng)略文本的主旨。

詞云圖會(huì)將所有的字詞排在一起，形成云狀圖案，也可以任何格式排列：水平線、垂直列或其他形狀，也可用于顯示獲分配元數(shù)據(jù)的單詞。如下圖所示：

詞云圖通常用于網(wǎng)站或博客上，用于描述關(guān)鍵字或標(biāo)簽，也可用來比較兩個(gè)不同的文本。

詞云圖雖然簡單易懂，但有著一些重大缺點(diǎn)：

較長的字詞會(huì)更引人注意；
字母含有很多升部/降部的單詞可能會(huì)更受人關(guān)注；
分析精度不足，較多時(shí)候是為了美觀。

通過一個(gè)示例了解詞云圖的使用，實(shí)現(xiàn)代碼如下所示：

# -*- coding: utf-8 -*-# %%import chardetimport jiebaimport numpy as npfrom PIL import Imageimport osfrom os import pathfrom wordcloud import WordCloud, STOPWORDS, ImageColorGeneratorfrom matplotlib import pyplot as pltfrom matplotlib.pyplot import figure, show, rc
# %%# -------------------------------------English-白色背景的方形詞云圖-----------------------------------------# 獲取當(dāng)前文件路徑d = path.dirname(__file__) if "__file__" in locals() else os.getcwd()# 獲取文本texttext = open(path.join(d, 'WordCloud.txt')).read()# 生成詞云#wc = WordCloud(scale=2,max_font_size = 100)wc = WordCloud(font_path=None,  # 字體路徑，英文不用設(shè)置路徑，中文需要，否則無法正確顯示圖形               width=400,  # 默認(rèn)寬度               height=400,  # 默認(rèn)高度               margin=2,  # 邊緣               ranks_only=None,               prefer_horizontal=0.9,               mask=None,  # 背景圖形，如果想根據(jù)圖片繪制，則需要設(shè)置               scale=2,               color_func=None,               max_words=100,  # 最多顯示的詞匯量               min_font_size=4,  # 最小字號(hào)               stopwords=None,  # 停止詞設(shè)置，修正詞云圖時(shí)需要設(shè)置               random_state=None,               background_color='white',  # 背景顏色設(shè)置，可以為具體顏色,比如white或者16進(jìn)制數(shù)值               max_font_size=None,  # 最大字號(hào)               font_step=1,               mode='RGB',               relative_scaling='auto',               regexp=None,               collocations=True,               colormap='Reds',  # matplotlib 色圖，可更改名稱進(jìn)而更改整體風(fēng)格               normalize_plurals=True,               contour_width=0,               contour_color='black',               repeat=False)
wc.generate_from_text(text)# 顯示圖像
fig = figure(figsize=(4, 4), dpi=100)plt.imshow(wc, interpolation='bilinear')plt.axis('off')plt.tight_layout()# fig.savefig("詞云圖1.pdf")plt.show()
# %%# -------------------------------------中文-黑色背景的圓形詞云圖-----------------------------------------text = open(path.join(d, 'WordCloud_Chinese.txt'), 'rb').read()text_charInfo = chardet.detect(text)print(text_charInfo)# 結(jié)果#{'encoding': 'UTF-8-SIG', 'confidence': 1.0, 'language': ''}text = open(path.join(d, r'WordCloud_Chinese.txt'),            encoding='GB2312', errors='ignore').read()

# 獲取文本詞排序，可調(diào)整 stopwordsprocess_word = WordCloud.process_text(wc, text)sort = sorted(process_word.items(), key=lambda e: e[1], reverse=True)print(sort[:50])  # 獲取文本詞頻最高的前50個(gè)詞
text += ' '.join(jieba.cut(text, cut_all=False))  # cut_all=False 表示采用精確模式# 設(shè)置中文字體font_path = 'SourceHanSansCN-Regular.otf'  # 思源黑體# 讀取背景圖片background_Image = np.array(Image.open(path.join(d, "WordCloud_Image.jpg")))# 提取背景圖片顏色img_colors = ImageColorGenerator(background_Image)# 設(shè)置中文停止詞stopwords = set('')stopwords.update(['但是', '一個(gè)', '自己', '因此', '沒有', '很多', '可以', '這個(gè)', '雖然', '因?yàn)?#39;, '這樣', '已經(jīng)', '現(xiàn)在',                  '一些', '比如', '不是', '當(dāng)然', '可能', '如果', '就是', '同時(shí)', '比如', '這些', '必須', '由于', '而且', '并且', '他們'])
wc = WordCloud(    font_path=font_path,  # 中文需設(shè)置路徑    # width=400, # 默認(rèn)寬度    # height=400, # 默認(rèn)高度    margin=2,  # 頁面邊緣    mask=background_Image,    scale=2,    max_words=200,  # 最多詞個(gè)數(shù)    min_font_size=4,    stopwords=stopwords,    random_state=42,    background_color='black',  # 背景顏色    # background_color = '#C3481A', # 背景顏色    colormap='RdYlGn_r',  # matplotlib 色圖，可更改名稱進(jìn)而更改整體風(fēng)格    max_font_size=100,)wc.generate(text)# 獲取文本詞排序，可調(diào)整 stopwordsprocess_word = WordCloud.process_text(wc, text)sort = sorted(process_word.items(), key=lambda e: e[1], reverse=True)print(sort[:50])  # 獲取文本詞頻最高的前50個(gè)詞# 設(shè)置為背景色，若不想要背景圖片顏色，就注釋掉# wc.recolor(color_func=img_colors)# 存儲(chǔ)圖像# wc.to_file('浪潮之巔basic.png')# 顯示圖像fig = figure(figsize=(4, 4), dpi=100)plt.imshow(wc, interpolation='bilinear')plt.axis('off')plt.tight_layout()
# fig.savefig("詞云圖2.pdf")
plt.show()

效果如下所示：

克利夫蘭點(diǎn)圖

在講解克利夫蘭點(diǎn)圖時(shí)需要延升講解一下棒棒圖和啞鈴圖，如下圖所示：

從圖形效果來看，棒棒圖、克利夫蘭點(diǎn)圖和啞鈴圖十分相像，因?yàn)楸举|(zhì)上來看棒棒圖和啞鈴圖也屬于克利夫蘭點(diǎn)圖。

棒棒糖圖傳達(dá)了與柱形圖或條形圖相同的信息，只是將矩形轉(zhuǎn)變成線條，這樣可以減少展示空間，重點(diǎn)放在數(shù)據(jù)點(diǎn)上，從而看起來更加簡潔與美觀。相對(duì)于柱形圖與條形圖，棒棒糖圖更加適合數(shù)據(jù)量比較多的情況；
克利夫蘭點(diǎn)圖也是滑珠散點(diǎn)圖，非常類似于棒棒糖圖，只是沒有連接的線條，重點(diǎn)強(qiáng)調(diào)數(shù)據(jù)的排序展示以及互相之間的差距；
啞鈴圖可以看作多數(shù)據(jù)系列的克利夫蘭點(diǎn)圖，只是使用直線連接了兩個(gè)數(shù)據(jù)系列的數(shù)據(jù)點(diǎn)。啞鈴圖主要用于：1、展示在同一時(shí)間段兩個(gè)數(shù)據(jù)點(diǎn)的相對(duì)位置（增加或者減少）；2、比較兩個(gè)類別之間的數(shù)據(jù)值差別。

5.1、棒棒糖圖

通過一個(gè)示例了解棒棒糖圖的使用，實(shí)現(xiàn)代碼如下所示：

df = pd.read_csv('DotPlots_Data.csv')
df['sum'] = df.iloc[:, 1:3].apply(np.sum, axis=1)
df = df.sort_values(by='sum', ascending=True)df['City'] = pd.Categorical(df['City'], categories=df['City'], ordered=True)df
# %%base_plot = (ggplot(df, aes('sum', 'City')) +             geom_segment(aes(x=0, xend='sum', y='City', yend='City')) +             geom_point(shape='o', size=3, colour="black", fill="#FC4E07") +             theme(    axis_title=element_text(size=12, face="plain", color="black"),    axis_text=element_text(size=10, face="plain", color="black"),    # legend_title=element_text(size=14,face="plain",color="black"),    aspect_ratio=1.25,    figure_size=(4, 4),    dpi=100))
print(base_plot)

效果如下所示：

5.2、克利夫蘭點(diǎn)圖

通過一個(gè)示例了解克利夫蘭點(diǎn)圖的使用，實(shí)現(xiàn)代碼如下所示：

base_plot = (ggplot(df, aes('sum', 'City')) +
             geom_point(shape='o', size=3, colour="black", fill="#FC4E07") +             theme(    axis_title=element_text(size=12, face="plain", color="black"),    axis_text=element_text(size=10, face="plain", color="black"),    # legend_title=element_text(size=14,face="plain",color="black"),    aspect_ratio=1.25,    figure_size=(4, 4),    dpi=100))
print(base_plot)

效果如下所示：

5.3、啞鈴圖

通過一個(gè)示例了解啞鈴圖的使用，實(shí)現(xiàn)代碼如下所示：

df = pd.read_csv('DotPlots_Data.csv')
df = df.sort_values(by='Female', ascending=True)df['City'] = pd.Categorical(df['City'], categories=df['City'], ordered=True)mydata = pd.melt(df, id_vars='City')
base_plot = (ggplot(mydata, aes('value', 'City', fill='variable')) +             geom_line(aes(group='City')) +             geom_point(shape='o', size=3, colour="black") +             scale_fill_manual(values=("#00AFBB", "#FC4E07", "#36BED9")) +             theme(    axis_title=element_text(size=13, face="plain", color="black"),    axis_text=element_text(size=10, face="plain", color="black"),    legend_title=element_text(size=12, face="plain", color="black"),    legend_text=element_text(size=10, face="plain", color="black"),    legend_background=element_blank(),    legend_position=(0.75, 0.2),    aspect_ratio=1.25,    figure_size=(4, 4),    dpi=100))
print(base_plot)

效果如下所示：

坡度圖

坡度圖顧名思義是展示坡度變化的圖形，其實(shí)他和啞鈴圖有異曲同工之處，只不過坡度圖可以更加清楚的展示數(shù)據(jù)前后的變化趨勢，具體是增加了還是減少了。

通過一個(gè)示例了解坡度圖的使用，實(shí)現(xiàn)代碼如下所示：

base_plot = (ggplot(df) +             # 連接線             geom_segment(aes(x=1, xend=2, y='1970', yend='1979', color='class'), size=.75, show_legend=False) +             # 1952年的垂直直線             geom_vline(xintercept=1, linetype="solid", size=.1) +             # 1957年的垂直直線             geom_vline(xintercept=2, linetype="solid", size=.1) +             # 1952年的數(shù)據(jù)點(diǎn)             geom_point(aes(x=1, y='1970'), size=3, shape='o', fill="grey", color="black") +             # 1957年的數(shù)據(jù)點(diǎn)             geom_point(aes(x=2, y='1979'), size=3, shape='o', fill="grey", color="black") +             scale_color_manual(labels=("Up", "Down"), values=("#A6D854", "#FC4E07")) +             xlim(.5, 2.5))# 添加文本信息base_plot = (base_plot + geom_text(label=left_label, y=df['1970'], x=0.95,  size=10, ha='right')             + geom_text(label=right_label,                         y=df['1979'], x=2.05, size=10, ha='left')             + geom_text(label="1970", x=1, y=1.02 *                         (np.max(np.max(df[['1970', '1979']]))),  size=12)             + geom_text(label="1979", x=2, y=1.02 *                         (np.max(np.max(df[['1970', '1979']]))),  size=12)             + theme_void()             + theme(    aspect_ratio=1.5,    figure_size=(5, 6),    dpi=100))print(base_plot)

效果如下所示：

徑向柱圖

徑向柱圖也稱為圓形柱圖或星圖，這種圖表使用同心圓網(wǎng)格來繪制條形圖，如下圖所示：

每個(gè)圓圈表示一個(gè)數(shù)值刻度，而徑向分隔線（從中心延伸出來的線）則用于區(qū)分不同類別或間隔（如果是直方圖）?？潭壬陷^低的數(shù)值通常由中心點(diǎn)開始，然后數(shù)值會(huì)隨著每個(gè)圓形往外增加，但也可以把任何外圓設(shè)為零值，這樣里面的內(nèi)圓就可用來顯示負(fù)值。條形通常從中心點(diǎn)開始向外延伸，但也可以以別處為起點(diǎn)，顯示數(shù)值范圍（如跨度圖）。

此外，條形也可以如堆疊式條形圖般堆疊起來，如下圖所示：

通過一個(gè)示例了解單數(shù)據(jù)徑向柱圖的使用，實(shí)現(xiàn)代碼如下所示：

import datetimeimport numpy as npfrom matplotlib import cm, colorsfrom matplotlib import pyplot as pltfrom matplotlib.pyplot import figure, show, rcimport pandas as pdplt.rcParams["patch.force_edgecolor"] = True# plt.rc('axes',axisbelow=True)plt.rcParams['axes.axisbelow'] = True

def dateRange(beginDate, endDate):    dates = []    dt = datetime.datetime.strptime(beginDate, "%Y-%m-%d")    date = beginDate[:]    while date <= endDate:        dates.append(date)        dt = dt + datetime.timedelta(1)        date = dt.strftime("%Y-%m-%d")    return dates

mydata = pd.DataFrame(dict(day=dateRange("2016-01-01", "2016-02-01"),                           Price=-np.sort(-np.random.normal(loc=30, scale=10, size=32)) +                           np.random.normal(loc=3, scale=3, size=32)))
mydata['day'] = pd.to_datetime(mydata['day'], format="%Y/%m/%d")mydata
# %%n_row = mydata.shape[0]angle = np.arange(0, 2*np.pi, 2*np.pi/n_row)radius = np.array(mydata.Price)
fig = figure(figsize=(4, 4), dpi=90)ax = fig.add_axes([0.1, 0.1, 0.8, 0.8], polar=True)
# 方法用于設(shè)置角度偏離,參數(shù)值為弧度值數(shù)值ax.set_theta_offset(np.pi/2-np.pi/n_row)# 當(dāng)set_theta_direction的參數(shù)值為1,'counterclockwise'或者是'anticlockwise'的時(shí)候，正方向?yàn)槟鏁r(shí)針；# 當(dāng)set_theta_direction的參數(shù)值為-1或者是'clockwise'的時(shí)候，正方向?yàn)轫槙r(shí)針；ax.set_theta_direction(-1)# 方法用于設(shè)置極徑標(biāo)簽顯示位置,參數(shù)為標(biāo)簽所要顯示在的角度ax.set_rlabel_position(360-180/n_row)
cmap = cm.get_cmap('Blues_r', n_row)color = [colors.rgb2hex(cmap(i)[:3]) for i in range(cmap.N)]
plt.bar(angle, radius, color=color, alpha=0.9,        width=0.2, align="center", linewidth=0.25)
plt.ylim(-15, 60)index = np.arange(0, n_row, 3)plt.xticks(angle[index], labels=[x.strftime('%m-%d')                                 for x in mydata.day[index]], size=12)plt.yticks(np.arange(0, 60, 10), verticalalignment='center',           horizontalalignment='right')

plt.grid(which='major', axis="x", linestyle='-',         linewidth='0.5', color='gray', alpha=0.5)plt.grid(which='major', axis="y", linestyle='-',         linewidth='0.5', color='gray', alpha=0.5)
plt.show()

效果如下所示：

通過一個(gè)示例了解多數(shù)據(jù)徑向柱圖的使用，實(shí)現(xiàn)代碼如下所示：

import numpy as npfrom matplotlib import cm, colorsfrom matplotlib import pyplot as pltfrom matplotlib.pyplot import figure, show, rcimport pandas as pd
plt.rcParams["patch.force_edgecolor"] = True
mydata = pd.DataFrame(dict(day=["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"],                           Peter=[10, 60, 50, 20, 10, 90, 30],                           Jack=[20, 50, 10, 10, 30, 60, 50],                           Eelin=[30, 50, 20, 40, 10, 40, 50]))
n_row = mydata.shape[0]n_col = mydata.shape[1]angle = np.arange(0, 2*np.pi, 2*np.pi/n_row)# 繪制的數(shù)據(jù)
cmap = cm.get_cmap('Reds', n_col)color = [colors.rgb2hex(cmap(i)[:3]) for i in range(cmap.N)]
radius1 = np.array(mydata.Peter)radius2 = np.array(mydata.Jack)radius3 = np.array(mydata.Eelin)
fig = figure(figsize=(4, 4), dpi=90)ax = fig.add_axes([0.1, 0.1, 0.8, 0.8], polar=True)
# 方法用于設(shè)置角度偏離,參數(shù)值為弧度值數(shù)值ax.set_theta_offset(np.pi/2)# 當(dāng)set_theta_direction的參數(shù)值為1,'counterclockwise'或者是'anticlockwise'的時(shí)候，正方向?yàn)槟鏁r(shí)針；# 當(dāng)set_theta_direction的參數(shù)值為-1或者是'clockwise'的時(shí)候，正方向?yàn)轫槙r(shí)針；ax.set_theta_direction(-1)# 方法用于設(shè)置極徑標(biāo)簽顯示位置,參數(shù)為標(biāo)簽所要顯示在的角度ax.set_rlabel_position(360)

barwidth1 = 0.2barwidth2 = 0.2plt.bar(angle, radius1, width=barwidth2, align="center",        color=color[0], edgecolor="k", alpha=1, label="Peter")plt.bar(angle+barwidth1, radius2, width=barwidth2, align="center",        color=color[1], edgecolor="k", alpha=1, label="Jack")plt.bar(angle+barwidth1*2, radius3, width=barwidth2, align="center",        color=color[2], edgecolor="k", alpha=1, label="Eelin")
plt.legend(loc="center", bbox_to_anchor=(1.2, 0, 0, 1))
plt.ylim(-30, 100)plt.xticks(angle+2*np.pi/n_row/4, labels=mydata.day, size=12)
plt.yticks(np.arange(0, 101, 30), verticalalignment='center',           horizontalalignment='right')

plt.grid(which='major', axis="x", linestyle='-',         linewidth='0.5', color='gray', alpha=0.5)plt.grid(which='major', axis="y", linestyle='-',         linewidth='0.5', color='gray', alpha=0.5)
plt.show()

實(shí)現(xiàn)效果如下所示：

熱力圖

熱力圖是一種通過對(duì)色塊著色來顯示數(shù)據(jù)的統(tǒng)計(jì)圖表，繪圖時(shí)需指定顏色映射的規(guī)則。例如，較大的值由較深的顏色表示，較小的值由較淺的顏色表示；較大的值由偏暖的顏色表示，較小的值由較冷的顏色表示等。

通過一個(gè)示例了解熱力圖的使用，實(shí)現(xiàn)代碼如下所示：

import numpy as npimport pandas as pdfrom plotnine import *from plotnine.data import mtcars
mat_corr = np.round(mtcars.corr(), 1).reset_index()mydata = pd.melt(mat_corr, id_vars='index', var_name='var', value_name='value')mydata
# %%base_plot = (ggplot(mydata, aes(x='index', y='var', fill='value', label='value')) +             geom_tile(colour="black") +             geom_text(size=8, colour="white") +             scale_fill_cmap(name='RdYlBu_r') +             coord_equal() +             theme(dpi=100, figure_size=(4, 4)))print(base_plot)
# %%mydata['AbsValue'] = np.abs(mydata.value)
base_plot = (ggplot(mydata, aes(x='index', y='var', fill='value', size='AbsValue')) +             geom_point(shape='o', colour="black") +             # geom_text(size=8,colour="white")+             scale_size_area(max_size=11, guide=False) +             scale_fill_cmap(name='RdYlBu_r') +             coord_equal() +             theme(dpi=100, figure_size=(4, 4)))print(base_plot)
# %%base_plot = (ggplot(mydata, aes(x='index', y='var', fill='value', size='AbsValue')) +             geom_point(shape='s', colour="black") +             # geom_text(size=8,colour="white")+             scale_size_area(max_size=10, guide=False) +             scale_fill_cmap(name='RdYlBu_r') +             coord_equal() +             theme(dpi=100, figure_size=(4, 4)))print(base_plot)

效果如下所示：

*博客內(nèi)容為網(wǎng)友個(gè)人發(fā)布，僅代表博主個(gè)人觀點(diǎn)，如有侵權(quán)請(qǐng)聯(lián)系工作人員刪除。

博客專欄

Python數(shù)據(jù)可視化：類別比較圖表可視化

相關(guān)推薦

技術(shù)專區(qū)