|NO.Z.00049|——————————|BigDataEnd|——|Hadoop&Python.v13|——|Arithmetic.v13|Pandas数据分析库:Pandas分组聚合|
一、分组聚合
Walter Savage Landor:strove with none,for none was worth my strife.Nature I loved and, next to Nature, Art:I warm'd both hands before the fire of life.It sinks, and I am ready to depart ——W.S.Landor
### --- 分组聚合
import numpy as np
import pandas as pd
~~~ # 准备数据
df = pd.DataFrame(data = {'sex':np.random.randint(0,2,size = 300), # 0男,1?
'class':np.random.randint(1,9,size = 300), # 1~8?个班
'Python':np.random.randint(0,151,size = 300), # Python成绩
'Keras':np.random.randint(0,151,size =300), # Keras成绩
'Tensorflow':np.random.randint(0,151,size=300),
'Java':np.random.randint(0,151,size = 300),
'C++':np.random.randint(0,151,size = 300)})
df['sex'] = df['sex'].map({0:'男',1:'?'}) # 将0,1映射成男?
~~~ # 1、分组->可迭代对象
~~~ # 1.1 先分组再获取数据
g = df.groupby(by = 'sex')[['Python','Java']] # 单分组
for name,data in g:
print('组名:',name)
print('数据:',data)
df.groupby(by = ['class','sex'])[['Python']] # 多分组
~~~ # 1.2 对?列值进?分组
df['Python'].groupby(df['class']) # 单分组
df['Keras'].groupby([df['class'],df['sex']]) # 多分组
~~~ # 1.3 按数据类型分组
df.groupby(df.dtypes,axis = 1)
~~~ # 1.4 通过字典进?分组
m =
{'sex':'category','class':'category','Python':'IT','Keras':'IT','Tensorflow':'I
T','Java':'IT','C++':'IT'}
for name,data in df.groupby(m,axis = 1):
print('组名',name)
print('数据',data)
二、分组聚合
### --- 分组聚合
~~~ # 分组直接调?函数进?聚合
~~~ # 按照性别分组,其他列均值聚合
df.groupby(by = 'sex').mean().round(1) # 保留1位?数
~~~ # 按照班级和性别进?分组,Python、Keras的最?值聚合
df.groupby(by = ['class','sex'])[['Python','Keras']].max()
~~~ # 按照班级和性别进?分组,计数聚合。统计每个班,男??数
df.groupby(by = ['class','sex']).size()
~~~ # 基本描述性统计聚合
df.groupby(by = ['class','sex']).describe()
三、分组聚合apply、transform
### --- 分组后调?apply,transform封装单?函数计算
~~~ # 返回分组结果
df.groupby(by = ['class','sex'])[['Python','Keras']].apply(np.mean).round(1)
def normalization(x):
return (x - x.min())/(x.max() - x.min()) # 最?值最?值归?化
~~~ # 返回全数据,返回DataFrame.shape和原DataFrame.shape?样。
df.groupby(by = ['class','sex'])
[['Python','Tensorflow']].transform(normalization).round(3)
四、分组聚合agg
### --- agg 多中统计汇总操作
~~~ # 分组后调?agg应?多种统计汇总
df.groupby(by = ['class','sex'])
[['Tensorflow','Keras']].agg([np.max,np.min,pd.Series.count])
~~~ # 分组后不同属性应?多种不同统计汇总
df.groupby(by = ['class','sex'])[['Python','Keras']].agg({'Python':[('最?值',np.max),('最?值',np.min)],
'Keras':[('计数',pd.Series.count),('中位数',np.median)]})
五、透视表pivot_table
### --- 透视表
~~~ # 透视表也是?种分组聚合运算
def count(x):
return len(x)
df.pivot_table(values=['Python','Keras','Tensorflow'], # 要透视分组的值
index=['class','sex'], # 分组透视指标
aggfunc={'Python':[('最?值',np.max)], # 聚合运算
'Keras':[('最?值',np.min),('中位数',np.median)],
'Tensorflow':[('最?值',np.min),('平均值',np.mean),('计数',count)]})
Walter Savage Landor:strove with none,for none was worth my strife.Nature I loved and, next to Nature, Art:I warm'd both hands before the fire of life.It sinks, and I am ready to depart ——W.S.Landor