Pandas 按列将 CSV 拆分为多个 CSV（或 DataFrame）

Question

提问by Elias Cort Aguelo

I'm very lost with a problem and some help or tips will be appreciated.

我对一个问题感到非常困惑，将不胜感激一些帮助或提示。

The problem: I've a csv file with a column with the possibility of multiple values like:

问题：我有一个 csv 文件，其中有一列可能有多个值，例如：

Fruit;Color;The_evil_column
Apple;Red;something1
Apple;Green;something1
Orange;Orange;something1
Orange;Green;something2
Apple;Red;something2
Apple;Red;something3

I've loaded the data into a dataframe and i need to split that dataframe into multiple dataframes based on the value of the column "The_evil_column":

我已将数据加载到数据帧中，我需要根据“The_evil_column”列的值将该数据帧拆分为多个数据帧：

df1
Fruit;Color;The_evil_column
Apple;Red;something1
Apple;Green;something1
Orange;Orange;something1

df2
Fruit;Color;The_evil_column
Orange;Green;something2
Apple;Red;something2

df3
Fruit;Color;The_evil_column
Apple;Red;something3

After reading some posts i'm even more confused and i need some tip about this please.

阅读一些帖子后，我更加困惑，我需要一些关于此的提示。

Answer 1

回答by MaxU

you can generate a dictionary of DataFrames:

您可以生成一个 DataFrame 字典：

d = {g:x for g,x in df.groupby('The_evil_column')}

In [95]: d.keys()
Out[95]: dict_keys(['something1', 'something2', 'something3'])

In [96]: d['something1']
Out[96]:
    Fruit   Color The_evil_column
0   Apple     Red      something1
1   Apple   Green      something1
2  Orange  Orange      something1

or a list of DataFrames:

或数据帧列表：

In [103]: l = [x for _,x in df.groupby('The_evil_column')]

In [104]: l[0]
Out[104]:
    Fruit   Color The_evil_column
0   Apple     Red      something1
1   Apple   Green      something1
2  Orange  Orange      something1

In [105]: l[1]
Out[105]:
    Fruit  Color The_evil_column
3  Orange  Green      something2
4   Apple    Red      something2

In [106]: l[2]
Out[106]:
   Fruit Color The_evil_column
5  Apple   Red      something3

UPDATE:

更新：

In [111]: g = pd.read_csv(filename, sep=';').groupby('The_evil_column')

In [112]: g.ngroups   # number of unique values in the `The_evil_column` column
Out[112]: 3

In [113]: g.apply(lambda x: x.to_csv(r'c:\temp\{}.csv'.format(x.name)))
Out[113]:
Empty DataFrame
Columns: []
Index: []

will produce 3 files:

将产生 3 个文件：

In [115]: glob.glob(r'c:\temp\something*.csv')
Out[115]:
['c:\temp\something1.csv',
 'c:\temp\something2.csv',
 'c:\temp\something3.csv']

Answer 2

回答by Bart?omiej

you can just filter the frame by the value of the column:

您可以通过列的值过滤框架：

frame=pd.read_csv('file.csv',delimiter=';')
frame['The_evil_column']=='something1'

this returns:

这将返回：

0     True
1     True
2     True
3    False
4    False
5    False
Name: The_evil_column, dtype: bool

Therefore you access these columns:

因此，您可以访问这些列：

frame1 = frame[frame['The_evil_column']=='something1']

Later you can drop the column:

稍后您可以删除该列：

frame1 = frame1.drop('The_evil_column', axis=1)

Answer 3

回答by Rahul Chawla

Simpler but less efficient way is:

更简单但效率较低的方法是：

data = pd.read_csv('input.csv')

out = []

for evil_element in list(set(list(data['The_evil_column']))):
    out.append(data[data['The_evil_column']==evil_element])

outwill have list of all data dataframes.

out将有所有数据数据框的列表。

Pandas 按列将 CSV 拆分为多个 CSV（或 DataFrame）

提问by Elias Cort Aguelo

回答by MaxU

回答by Bart?omiej

回答by Rahul Chawla

相关推荐

最近更新

标签

Pandas 按列将 CSV 拆分为多个 CSV（或 DataFrame）

提问by Elias Cort Aguelo

回答by MaxU

回答by Bart?omiej

回答by Rahul Chawla

相关推荐

Pandas - 具有基于行索引的条件的 lambda 函数

Pandas 按工作日分组 (M/T/W/T/F/S/S)

pandas 如何关联熊猫中的序数分类列？

pandas 将一个 DataFrame 分组到一个新的 DataFrame 中，并以 arange 作为索引

相关推荐

最近更新

标签