python pandas:根据列值拆分数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36192633/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
python pandas : split a data frame based on a column value
提问by Edamame
I have a csv file, when I read into pandas data frame, it looks like:
我有一个 csv 文件,当我读入 Pandas 数据框时,它看起来像:
data = pd.read_csv('test1.csv')
print(data)
output looks like:
输出看起来像:
v1 v2 v3 result
0 12 31 31 0
1 34 52 4 1
2 32 4 5 1
3 7 89 2 0
Is there a way to split the data frame base on the value in the result column.I.e. If the result=0, go to a new data frame data_0:
有没有办法根据结果列中的值拆分数据框。即如果结果=0,则转到新的数据框data_0:
v1 v2 v3 result
0 12 31 31 0
1 7 89 2 0
and if result=1, go to a data frame data_1
如果 result=1,则转到数据框 data_1
v1 v2 v3 result
0 34 52 4 1
1 32 4 5 1
Is there any pandas function can do that? Or I have to write my own loop function to create two data frames? Thanks a lot!
有没有Pandas功能可以做到这一点?或者我必须编写自己的循环函数来创建两个数据框?非常感谢!
回答by anOkCoder
Pandas allow you to slice and manipulate the data in a very straightforward way. You may also do the same as Yakym accessing with the key instead of attribute name.
Pandas 允许您以非常简单的方式对数据进行切片和操作。您也可以像 Yakym 一样使用密钥而不是属性名称进行访问。
data_0 = data[data['result'] == 0]
data_1 = data[data['result'] == 1]
You can even add results columns by manipulating row data directly eg:
您甚至可以通过直接操作行数据来添加结果列,例如:
data['v_sum'] = data[v1] + data[v2] + data[v3]
回答by jezrael
You can try create dictionary
of DataFrames
by groupby
, if column result
has many different values:
如果列有许多不同的值,您可以尝试 create dictionary
of DataFrames
by :groupby
result
print data
v1 v2 v3 result
0 12 31 31 0
1 34 52 4 1
2 32 4 5 1
3 7 89 2 0
datas = {}
for i, g in data.groupby('result'):
#print 'data_' + str(i)
#print g
datas.update({'data_' + str(i) : g.reset_index(drop=True)})
print datas['data_0']
v1 v2 v3 result
0 12 31 31 0
1 7 89 2 0
print datas['data_1']
v1 v2 v3 result
0 34 52 4 1
1 32 4 5 1