透视包含字符串的 Pandas 数据框 - “没有可聚合的数字类型”错误

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34442214/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:25:39  来源:igfitidea点击:

Pivoting a Pandas Dataframe containing strings - 'No numeric types to aggregate' error

pythonpandaspivot-table

提问by jmhead

There is a good number of questions about this error, but after looking around I'm still not able to find/wrap my mind around a solution yet. I'm trying to pivot a data frame with strings, to get some row data to become columns, but not working out so far.

关于这个错误有很多问题,但环顾四周后,我仍然无法找到/解决解决方案。我正在尝试使用字符串旋转数据框,以使一些行数据成为列,但到目前为止还没有解决。

Shape of my df

我的 df 的形状

<class 'pandas.core.frame.DataFrame'>
Int64Index: 515932 entries, 0 to 515931
Data columns (total 5 columns):
id                 515932 non-null object
cc_contact_id      515932 non-null object
Network_Name       515932 non-null object
question           515932 non-null object
response_answer    515932 non-null object
dtypes: object(5)
memory usage: 23.6+ MB

Sample format

样本格式

id  contact_id  question    response_answer
16  137519  2206    State   Ca
17  137520  2206    State   Ca
18  137521  2206    State   Ca
19  137522  2206    State   Ca
20  137523  2208    City    Lancaster
21  137524  2208    City    Lancaster
22  137525  2208    City    Lancaster
23  137526  2208    City    Lancaster
24  137527  2208    Trip_End Location   Home
25  137528  2208    Trip_End Location   Home
26  137529  2208    Trip_End Location   Home
27  137530  2208    Trip_End Location   Home

What I would like to pivot to

我想转向什么

id  contact_id      State   City       Trip_End Location
16  137519  2206    Ca      None       None None
20  137523  2208    None    Lancaster  None None
24  137527  2208    None    None       None Home
etc. etc. 

Where the questionvalues become the columns, with the response_answerbeing in it's corresponding column, and retaining the ids

问题值将成为列,与response_answer它是的相应数列,并保留IDS

What I have tried

我试过的

unified_df = pd.DataFrame(unified_data, columns=target_table_headers, dtype=object)

pivot_table = unified_df.pivot_table('response_answer',['id','cc_contact_id'],'question')
# OR
pivot_table = unified_df.pivot_table('response_answer','question')

DataError: No numeric types to aggregate

DataError:没有要聚合的数字类型

What is the way to pivot a data frame with string values?

用字符串值透视数据框的方法是什么?

回答by cwharland

The default aggfuncin pivot_tableis np.sumand it doesn't know what to do with strings and you haven't indicated what the index should be properly. Trying something like:

默认aggfuncpivot_tableisnp.sum并且它不知道如何处理字符串,并且您还没有指出索引应该是什么。尝试类似:

pivot_table = unified_df.pivot_table(index=['id', 'contact_id'],
                                     columns='question', 
                                     values='response_answer',
                                     aggfunc=lambda x: ' '.join(x))

This explicitly sets one row per id, contact_idpair and pivots the set of response_answervalues on question. The aggfuncjust assures that if you have multiple answers to the same question in the raw data that we just concatenate them together with spaces. The syntax of pivot_tablemight vary depending on your pandas version.

这明确地为每id, contact_id对设置一行,并在 上旋转这组response_answerquestion。The aggfuncjust 确保如果您对原始数据中的同一问题有多个答案,我们只需将它们用空格连接在一起。的语法pivot_table可能因您的 Pandas 版本而异。

Here's a quick example:

这是一个快速示例:

In [24]: import pandas as pd

In [25]: import random

In [26]: df = pd.DataFrame({'id':[100*random.randint(10, 50) for _ in range(100)], 'question': [str(random.randint(0,3)) for _ in range(100)], 'response': [str(random.randint(100,120)) for _ in range(100)]})

In [27]: df.head()
Out[27]:
     id question response
0  3100        1      116
1  4500        2      113
2  5000        1      120
3  3900        2      103
4  4300        0      117

In [28]: df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 100 entries, 0 to 99
Data columns (total 3 columns):
id          100 non-null int64
question    100 non-null object
response    100 non-null object
dtypes: int64(1), object(2)
memory usage: 3.1+ KB

In [29]: df.pivot_table(index='id', columns='question', values='response', aggfunc=lambda x: ' '.join(x)).head()
Out[29]:
question        0        1    2        3
id
1000      110 120      NaN  100      NaN
1100          NaN  106 108  104      NaN
1200      104 113      119  NaN      101
1300          102      NaN  116  108 120
1400          NaN      NaN  116      NaN

回答by johnInHome

There are several ways.

有几种方法。

1

1

df1 = df.groupby(["id","contact_id","Network_Name","question"])['response_answer'].aggregate(lambda x: x).unstack().reset_index()
df1.columns=df1.columns.tolist()
print (df1)

2

2

df1 = df.set_index(["id","contact_id","Network_Name","question"])['response_answer'].unstack().reset_index()
df1.columns=df1.columns.tolist()
print (df1)

3

3

df1 = df.groupby(["id","contact_id","Network_Name","question"])['response_answer'].aggregate('first').unstack().reset_index()
df1.columns=df1.columns.tolist()
print (df1)

4

4

df1 = df.pivot_table(index=["id","contact_id","Network_Name"], columns='question', values=['response_answer'], aggfunc='first')
df1.columns = df1.columns.droplevel()
df1 = df1.reset_index()
df1.columns=df1.columns.tolist()
print (df1)

Same ans.

同答。

    id  contact_id  Network_Name       City State Trip_End_Location
0   16      137519          2206       None    Ca              None
1   17      137520          2206       None    Ca              None
2   18      137521          2206       None    Ca              None
3   19      137522          2206       None    Ca              None
4   20      137523          2208  Lancaster  None              None
5   21      137524          2208  Lancaster  None              None
6   22      137525          2208  Lancaster  None              None
7   23      137526          2208  Lancaster  None              None
8   24      137527          2208       None  None              Home
9   25      137528          2208       None  None              Home
10  26      137529          2208       None  None              Home
11  27      137530          2208       None  None              Home