pandas - 按部分字符串分组

Question

提问by Fabio Lamanna

I would like to group a DataFrame by partial substrings. This is a sample .csv file:

我想按部分子字符串对 DataFrame 进行分组。这是一个示例 .csv 文件：

GridCode,Key
1000,Colour
1000,Colours
1001,Behaviours
1001,Behaviour
1002,Favourite
1003,COLORS
1004,Honours

What I did so far is importing the file as df = pd.read_csv(sample.csv), and then I put all the strings to lowercases with df['Key'] = df['Key'].str.lower(). The first thing I tried is groupby by GridCode and Key with:

到目前为止我所做的是将文件导入为df = pd.read_csv(sample.csv)，然后我将所有字符串都用df['Key'] = df['Key'].str.lower(). 我尝试的第一件事是通过 GridCode 和 Key 使用 groupby：

g = df.groupby([df['GridCode'],df['Key']]).size()

then unstack and fill:

然后拆开并填充：

d = g.unstack().fillna(0)

and the resulting DataFrame is:

结果数据帧是：

Key       behaviour  behaviours  colors  colour  colours  favourite  honours
GridCode                                                                    
1000              0           0       0       1        1          0        0
1001              1           1       0       0        0          0        0
1002              0           0       0       0        0          1        0
1003              0           0       1       0        0          0        0
1004              0           0       0       0        0          0        1

Now what I would like to do is to group only strings containing the substring 'our', in this case avoiding only the colors Key, creating a new column with the desired substring. The expected result would be like:

现在我想做的是只对包含子字符串“我们”的字符串进行分组，在这种情况下，只避免颜色键，创建一个包含所需子字符串的新列。预期的结果是这样的：

Key       'our'
GridCode                                                                    
1000        2              
1001        2
1002        1
1003        0
1004        1

I tried also to mask the DataFrame with masked = df['Key'].str.contains('our'), then df1 = df[mask], but I can't figured out how to make a new column with the new groupby counts. Any help would be really appreciated.

我还尝试使用masked = df['Key'].str.contains('our'), then来屏蔽 DataFrame df1 = df[mask]，但是我不知道如何使用新的 groupby 计数创建一个新列。任何帮助将非常感激。

Answer 1

回答by behzad.nouri

>>> import re  # for the re.IGNORECASE flag
>>> df['Key'].str.contains('our', re.IGNORECASE).groupby(df['GridCode']).sum()
GridCode
1000        2
1001        2
1002        1
1003        0
1004        1
Name: Key, dtype: float64

also, instead of

也，而不是

df.groupby([df['GridCode'],df['Key']])

it is better to do:

最好这样做：

df.groupby(['GridCode', 'Key'])

pandas - 按部分字符串分组

提问by Fabio Lamanna

回答by behzad.nouri

相关推荐

最近更新

标签

pandas - 按部分字符串分组

提问by Fabio Lamanna

回答by behzad.nouri

相关推荐

在 iPython/pandas 中绘制多条线会产生多个图

pandas 从多个列表创建一个熊猫数据框

pandas 使用第一行作为列名？熊猫 read_html

Pandas 中的多索引旋转

相关推荐

最近更新

标签