Python Pandas 用逗号将列分成多列

Question

提问by Anekdotin

I am trying to split a column into multiple columns based on comma/space separation.

我正在尝试根据逗号/空格分隔将一列拆分为多列。

My dataframe currently looks like

我的数据框目前看起来像

     KEYS                                                  1
0   FIT-4270                                          4000.0439
1   FIT-4269                                          4000.0420, 4000.0471
2   FIT-4268                                          4000.0419
3   FIT-4266                                          4000.0499
4   FIT-4265                                          4000.0490, 4000.0499, 4000.0500, 4000.0504,

I would like

我想

   KEYS                                                  1           2            3        4 
0   FIT-4270                                          4000.0439
1   FIT-4269                                          4000.0420  4000.0471
2   FIT-4268                                          4000.0419
3   FIT-4266                                          4000.0499
4   FIT-4265                                          4000.0490  4000.0499  4000.0500  4000.0504

My code currently removes The KEYS column and I'm not sure why. Could anyone improve or help fix the issue?

我的代码目前删除了 KEYS 列，我不知道为什么。任何人都可以改进或帮助解决问题吗？

v = dfcleancsv[1]

#splits the columns by spaces into new columns but removes KEYS?

dfcleancsv = dfcleancsv[1].str.split(' ').apply(Series, 1)

Answer 1

回答by Anthony R

In case someone else wants to split a single column (deliminated by a value) into multiple columns - try this:

如果其他人想要将单列（由值分隔）拆分为多列 - 试试这个：

series.str.split(',', expand=True)

This answered the question I came here looking for.

这回答了我来这里寻找的问题。

Credit to EdChum'scode that includes adding the split columns back to the dataframe.

感谢EdChum的代码，包括将分离列返回到数据帧。

pd.concat([df[[0]], df[1].str.split(', ', expand=True)], axis=1)

Note: The first argument df[[0]]is DataFrame.

注意：第一个参数df[[0]]是DataFrame.

The second argument df[1].str.splitis the series that you want to split.

第二个参数df[1].str.split是您要拆分的系列。

Answer 2

回答by Anekdotin

Using Edchums answer of

使用 Edchums 的答案

pd.concat([df[[0]], df[1].str.split(', ', expand=True)], axis=1)

I was able to solve it by substituting my variables.

我能够通过替换我的变量来解决它。

dfcleancsv = pd.concat([dfcleancsv['KEYS'], dfcleancsv[1].str.split(', ', expand=True)], axis=1)

Answer 3

回答by Siraj S.

maybe this should work:

也许这应该有效：

df = pd.concat([df['KEYS'],df[1].apply(pd.Series)],axis=1)

Answer 4

回答by Paul Rougieux

The OP had a variable number of output columns. In the particular case of a fixed number of output columns another elegant solution to give name to the resulting columns is to use a multiple assignation

OP 具有可变数量的输出列。在固定数量的输出列的特殊情况下，另一个给结果列命名的优雅解决方案是使用多重分配

Load a sample dataset and reshape it to long format to obtain a variable called organ_dimension.

加载示例数据集并将其整形为长格式以获得名为的变量organ_dimension。

import seaborn
iris = seaborn.load_dataset('iris')
df = iris.melt(id_vars='species', var_name='organ_dimension', value_name='value')

Split the organ_dimensionvariable in 2 variables organand dimensionbased on the _separator. Based on this answer"How to split a column into two columns?"

拆分organ_dimension变量2个变量organ并dimension基于该_分离器。基于这个答案“如何将一列拆分为两列？”

df['organ'], df['dimension'] = df['organ_dimension'].str.split('_', 1).str
df.head()

Out[10]: 
  species organ_dimension  value  organ dimension
0  setosa    sepal_length    5.1  sepal    length
1  setosa    sepal_length    4.9  sepal    length
2  setosa    sepal_length    4.7  sepal    length
3  setosa    sepal_length    4.6  sepal    length
4  setosa    sepal_length    5.0  sepal    length

Answer 5

回答by yafomars

Better and fester using vectorization below as :

使用下面的矢量化更好和更糟：

df = df.apply(lambda x:pd.Series(x))

Answer 6

回答by Kanishk Arya

Check this out

看一下这个

Responder_id    LanguagesWorkedWith
0   1   HTML/CSS;Java;JavaScript;Python
1   2   C++;HTML/CSS;Python
2   3   HTML/CSS
3   4   C;C++;C#;Python;SQL
4   5   C++;HTML/CSS;Java;JavaScript;Python;SQL;VBA
... ... ...
87564   88182   HTML/CSS;Java;JavaScript
87565   88212   HTML/CSS;JavaScript;Python
87566   88282   Bash/Shell/PowerShell;Go;HTML/CSS;JavaScript;W...
87567   88377   HTML/CSS;JavaScript;Other(s):
87568   88863   Bash/Shell/PowerShell;HTML/CSS;Java;JavaScript...`
###Split the LanguagesWorkedWith column into  multiple columns  by using` data= data1['LanguagesWorkedWith'].str.split(';').apply(pd.Series)`.###
` data1 = pd.read_csv('data.csv', sep=',')
data1.set_index('Responder_id',inplace=True)
data1
data1.loc[1,:]
data= data1['LanguagesWorkedWith'].str.split(';').apply(pd.Series)
data.head()`

Python Pandas 用逗号将列分成多列

提问by Anekdotin

回答by Anthony R

回答by Anekdotin

回答by Siraj S.

回答by Paul Rougieux

回答by yafomars

回答by Kanishk Arya

相关推荐

最近更新

标签

Python Pandas 用逗号将列分成多列

提问by Anekdotin

回答by Anthony R

回答by Anekdotin

回答by Siraj S.

回答by Paul Rougieux

回答by yafomars

回答by Kanishk Arya

相关推荐

Python Pip 问题 - 由于 EnvironmentError 无法安装软件包

Python 什么是 dtype('O')？

Python 试图合并 2 个数据帧但得到 ValueError

Python 如何让用户从有限列表中选择输入？

相关推荐

最近更新

标签