pandas 基于具有特定值的行创建一个新的数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/51004029/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:44:02  来源:igfitidea点击:

Create a new dataframe based on rows with a certain value

pythonpandasdataframe

提问by Tom

I have a large dataframe of transactions which I want to break into two smaller dataframes based on a certain column ("Type"). If "Type" is "S" then add that entire row to the "cust_sell" dataframe, and if "Type" is "P" to the "cust_buy" dataframe. I am using a for loop, but this is only adding the index value to the dataframe. Any help is appreciated!

我有一个很大的交易数据框,我想根据某个列(“类型”)将其分成两个较小的数据框。如果“Type”为“S”,则将整行添加到“cust_sell”数据框中,如果“Type”为“P”,则添加到“cust_buy”数据框中。我正在使用 for 循环,但这只是将索引值添加到数据帧。任何帮助表示赞赏!

from win32com.shell import shell, shellcon
import pandas as pd

filename = (shell.SHGetFolderPath(0, shellcon.CSIDL_PERSONAL, None, 0)) + '\MSRB T-1_test.xlsx'
wb = pd.read_excel(filename, sheet_name='T1-20062017', index_col=0, header=0)
cust_buy = []
cust_sell = []

# Create a list of customer buys and sells separately
for i in wb.index:
    if wb['Type'][i] == 'S':
        cust_sell.append([i])
    elif wb['Type'][i] == 'P':
        cust_buy.append([i])

回答by Ankur Sinha

You do not need to write loops. You can do it easily with pandas.

您不需要编写循环。您可以使用Pandas轻松完成。

Assuming your dataframe looks like this:

假设您的数据框如下所示:

import pandas as pd  

mainDf = pd.DataFrame()
mainDf['Type'] = ['S', 'S', 'S', 'P', 'P', 'S', 'P', 'S']
mainDf['Dummy'] = [1, 2, 3, 4, 5, 6, 7, 8]

To create dataframe for S and P types, you can just do this:

要为 S 和 P 类型创建数据框,您可以这样做:

cust_sell = mainDf[mainDf.Type == 'S']
cust_buy = mainDf[mainDf.Type == 'P']

cust_sell output:

cust_sell 输出:

  Type  Dummy
0    S      1
1    S      2
2    S      3
5    S      6
7    S      8

cust_buy output:

cust_buy 输出:

  Type  Dummy
3    P      4
4    P      5
6    P      7

回答by Seb

Like @trollster said, it is indeed better to create dataframes for cust_sell and cust_buy. But let's understand what is not working with your code. When you do:

就像@trollster 所说的那样,为 cust_sell 和 cust_buy 创建数据帧确实更好。但是让我们了解什么不适用于您的代码。当你这样做时:

for i in wb.index

it means i will take the values of wb.index. And when you print wb.index, you get:

这意味着我将采用 wb.index 的值。当你打印 wb.index 时,你会得到:

Int64Index([0, 1, 2, 3, 4, 5, 6, 7], dtype='int64')

Meaning i will take the values 1,2,3 ... So when you do:

这意味着我将取值 1,2,3 ... 所以当你这样做时:

cust_sell.append([i])

what you are doing is adding to the list_cust_sell a list with inside a single element, i (an integer). If you want to add the entire row, you should use:

您正在做的是向 list_cust_sell 添加一个包含单个元素 i (整数)的列表。如果要添加整行,则应使用:

cust_sell.append(list(wb.loc[i,:]))

You will end up with a list of lists, each one containing a row

您最终会得到一个列表列表,每个列表包含一行

回答by jpp

Using dict+ groupbyyou can create a dictionary of dataframes. This solution does not require you to manually specify all unique types and is more easily extendable than a manual loop.

使用dict+groupby您可以创建数据框字典。此解决方案不需要您手动指定所有唯一类型,并且比手动循环更易于扩展。

Data from @trollster.

来自@trollster 的数据。

res = dict(tuple(mainDf.groupby('Type')))

{'P':   Type  Dummy
      3    P      4
      4    P      5
      6    P      7,
 'S':   Type  Dummy
      0    S      1
      1    S      2
      2    S      3
      5    S      6
      7    S      8}