如何理解pandas中的groupby的用法,有例子吗[手机光明网]

如何理解pandas中的groupby的用法,有例子吗[手机光明网]

已邀请:
1.Pandas Groupby 多列分组
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
data = pd.DataFrame({
'Name': ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
'Gender': ["Female", "Male", "Male", "Female", "Female", "Male"],
'Employed': ["Yes", "No", "Yes", "No", "Yes", "No"],
'Age': [30, 28, 27, 24, 28, 25]
})

print(data)
print("")
print("Groups in DataFrame:")
groups = data.groupby(['Gender', 'Employed'])
for group_key, group_value in groups:
group = groups.get_group(group_key)
print(group)
print("")
输出:
Name Gender Employed Age
0 Jennifer Female Yes 30
1 Travis Male No 28
2 Bob Male Yes 27
3 Emma Female No 24
4 Luna Female Yes 28
5 Anish Male No 25

Groups in DataFrame:
Name Gender Employed Age
3 Emma Female No 24

Name Gender Employed Age
0 Jennifer Female Yes 30
4 Luna Female Yes 28

Name Gender Employed Age
1 Travis Male No 28
5 Anish Male No 25

Name Gender Employed Age
2 Bob Male Yes 27
它从 DataFrame 中创建了 4 个组。所有 Gender 和 Employed 列值相同的行都会被放在同一个组。
2.计算每组的行数 Pandas
要使用 DataFrame.groupby() 方法统计每个创建的组的行数,我们可以使用 size() 方法。
import pandas as pd

roll_no = [501, 502, 503, 504, 505]

data = pd.DataFrame({
'Name': ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
'Gender': ["Female", "Male", "Male", "Female", "Female", "Male"],
'Employed': ["Yes", "No", "Yes", "No", "Yes", "No"],
'Age': [30, 28, 27, 24, 28, 25]
})

print(data)
print("")
print("Count of Each group:")
grouped_df = data.groupby(['Gender', 'Employed']
).size().reset_index(name="Count")
print(grouped_df)
输出:

Name Gender Employed Age
0 Jennifer Female Yes 30
1 Travis Male No 28
2 Bob Male Yes 27
3 Emma Female No 24
4 Luna Female Yes 28
5 Anish Male No 25

Count of Each group:
Gender Employed Count
0 Female No 1
1 Female Yes 2
2 Male No 2
3 Male Yes 1
它显示 DataFrame,从 DataFrame 中创建的组,以及每个组的元素数。

如果我们想得到 Employed 列中每个值的最大计数值,我们可以从上面创建的组再组成一个组,并对值进行计数,然后使用 max() 方法得到计数的最大值。
import pandas as pd

roll_no = [501, 502, 503, 504, 505]

data = pd.DataFrame({
'Name': ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
'Gender': ["Female", "Male", "Male", "Female", "Female", "Male"],
'Employed': ["Yes", "No", "Yes", "No", "Yes", "No"],
'Age': [30, 28, 27, 24, 28, 25]
})

print(data)
print("")

groups = data.groupby(['Gender', 'Employed']).size().groupby(level=1)
print(groups.max())
输出:

Name Gender Employed Age
0 Jennifer Female Yes 30
1 Travis Male No 28
2 Bob Male Yes 27
3 Emma Female No 24
4 Luna Female Yes 28
5 Anish Male No 25

Employed
No 2
Yes 2
dtype: int64
它显示了从 Gender 和 Employed 列创建的组中,Employed 列值的最大计数。

要回复问题请先登录注册