python pandas:根据列值拆分数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36192633/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:55:45  来源:igfitidea点击:

python pandas : split a data frame based on a column value

pythonnumpypandas

提问by Edamame

I have a csv file, when I read into pandas data frame, it looks like:

我有一个 csv 文件,当我读入 Pandas 数据框时,它看起来像:

data = pd.read_csv('test1.csv')
print(data)

output looks like:

输出看起来像:

   v1  v2  v3  result
0  12  31  31       0
1  34  52   4       1
2  32   4   5       1
3   7  89   2       0


Is there a way to split the data frame base on the value in the result column.I.e. If the result=0, go to a new data frame data_0:

有没有办法根据结果列中的值拆分数据框。即如果结果=0,则转到新的数据框data_0:

   v1  v2  v3  result
0  12  31  31       0
1   7  89   2       0

and if result=1, go to a data frame data_1

如果 result=1,则转到数据框 data_1

   v1  v2  v3  result
0  34  52   4       1
1  32   4   5       1


Is there any pandas function can do that? Or I have to write my own loop function to create two data frames? Thanks a lot!

有没有Pandas功能可以做到这一点?或者我必须编写自己的循环函数来创建两个数据框?非常感谢!

回答by anOkCoder

Pandas allow you to slice and manipulate the data in a very straightforward way. You may also do the same as Yakym accessing with the key instead of attribute name.

Pandas 允许您以非常简单的方式对数据进行切片和操作。您也可以像 Yakym 一样使用密钥而不是属性名称进行访问。

data_0 = data[data['result'] == 0]
data_1 = data[data['result'] == 1]

You can even add results columns by manipulating row data directly eg:

您甚至可以通过直接操作行数据来添加结果列,例如:

data['v_sum'] = data[v1] + data[v2] + data[v3]

回答by jezrael

You can try create dictionaryof DataFramesby groupby, if column resulthas many different values:

如果列有许多不同的值,您可以尝试 create dictionaryof DataFramesby :groupbyresult

print data
   v1  v2  v3  result
0  12  31  31       0
1  34  52   4       1
2  32   4   5       1
3   7  89   2       0

datas = {}
for i, g in data.groupby('result'):
    #print 'data_' + str(i)
    #print g
    datas.update({'data_' + str(i) : g.reset_index(drop=True)})

print datas['data_0']
   v1  v2  v3  result
0  12  31  31       0
1   7  89   2       0

print datas['data_1']
   v1  v2  v3  result
0  34  52   4       1
1  32   4   5       1

回答by hilberts_drinking_problem

df1 = data[data.result==0]
df2 = data[data.result==1]

Have a look at this.

看看这个