pandas 使用python pandas将一列拆分为多列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/32813112/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:56:15  来源:igfitidea点击:

Splitting one column into multiple columns with python pandas

pythonpandassplit

提问by lorelai

I would like to split a column into multiple columns in my data frame. It is separated with commas.

我想在我的数据框中将一列拆分为多列。它用逗号分隔。

I would like to apply something like 'text to columns' function in excel.

我想在 excel 中应用类似“文本到列”的功能。

I will give my own headings after I split the columns. 'Turnstile' is the name of my column. I have:

拆分列后,我将给出自己的标题。'Turnstile' 是我的专栏的名字。我有:

(A006, R079, 00-00-04, 5 AVE-59 ST)

types of data in each row. In the end I would like to have:

每行中的数据类型。最后,我想要:

A006    R079   00-00-04   5 AVE-59 ST

with the headings I will create.

带有我将创建的标题。

I lastly tried:

我最后试过:

df.Turnstile.str.split().tolist()

But all i have is 'nan'

但我只有'nan'

When I check the type of 'Turnstile' column, it says 'object. I tried to convert that series into string with:

当我检查“转门”列的类型时,它显示“对象”。我尝试将该系列转换为字符串:

df['Turnstile'] = df[['Turnstile'].astype(str)]

but it gives me:

但它给了我:

AttributeError: 'list' object has no attribute 'astype'

Please advise.

请指教。

Thank you.

谢谢你。

回答by measureallthethings

Maybe another way of looking at this is converting a column of tuples to a DataFrame, like so:

也许另一种看待这个的方式是将一列元组转换为 a DataFrame,如下所示:

In [10]: DataFrame(df['Turnstile'].tolist())
Out[10]:
      0     1         2            3
0  A006  R079  00-00-04  5 AVE-59 ST
1  A006  R079  00-00-04  5 AVE-59 ST
2  A006  R079  00-00-04  5 AVE-59 ST
3  A006  R079  00-00-04  5 AVE-59 ST
4  A006  R079  00-00-04  5 AVE-59 ST
5  A006  R079  00-00-04  5 AVE-59 ST
6  A006  R079  00-00-04  5 AVE-59 ST
7  A006  R079  00-00-04  5 AVE-59 ST
8  A006  R079  00-00-04  5 AVE-59 ST
9  A006  R079  00-00-04  5 AVE-59 ST

If that's the case, here's an example that converts the column of tuples to a DataFrameand adds it back to the original dataframe:

如果是这种情况,下面是一个将元组列转换为 aDataFrame并将其添加回原始数据帧的示例:

import numpy as np
import pandas as pd
from pandas import Series, DataFrame

# create a fake dataframe, repeating the tuple given in the example
In [2]: df = DataFrame(data={'Observations': np.random.randn(10) * np.arange(10),
...:     'Turnstile': (('A006', 'R079', '00-00-04', '5 AVE-59 ST'),)*10})

In [3]: df.head()
Out[3]:
   Observations                            Turnstile
0     -0.000000  (A006, R079, 00-00-04, 5 AVE-59 ST)
1     -0.022668  (A006, R079, 00-00-04, 5 AVE-59 ST)
2     -2.380515  (A006, R079, 00-00-04, 5 AVE-59 ST)
3     -4.209983  (A006, R079, 00-00-04, 5 AVE-59 ST)
4      3.932902  (A006, R079, 00-00-04, 5 AVE-59 ST)

# all at once turn the column of tuples into a dataframe and concat that with the original df
In [4]: df = pd.concat([df,DataFrame(df['Turnstile'].tolist())], axis=1, join='outer')

In [5]: df.head()
Out[5]:
       Observations                            Turnstile     0     1         2  \
    0     -0.000000  (A006, R079, 00-00-04, 5 AVE-59 ST)  A006  R079  00-00-04
    1     -0.022668  (A006, R079, 00-00-04, 5 AVE-59 ST)  A006  R079  00-00-04
    2     -2.380515  (A006, R079, 00-00-04, 5 AVE-59 ST)  A006  R079  00-00-04
    3     -4.209983  (A006, R079, 00-00-04, 5 AVE-59 ST)  A006  R079  00-00-04
    4      3.932902  (A006, R079, 00-00-04, 5 AVE-59 ST)  A006  R079  00-00-04

         3
0  5 AVE-59 ST
1  5 AVE-59 ST
2  5 AVE-59 ST
3  5 AVE-59 ST
4  5 AVE-59 ST

# i assume you don't need this column anymore
In [6]: del df['Turnstile']

If that works you can of course name the new columns as needed.

如果可行,您当然可以根据需要命名新列。

回答by maxymoo

Couple options here, if your data is in true csv format, say as an export from Excel, you can use pandas.read_csvto read in the file, and it will automatically be split into columns based on the column delimiters.

这里有几个选项,如果您的数据是真正的 csv 格式,比如从 Excel 导出,您可以使用pandas.read_csv读取文件,它会根据列分隔符自动拆分为列。

If your data is a string column with commas, you can use str.splitto redefine your columns, but as far as I know, you need to dump the resulting column as a raw Python list and then recast as dataframe:

如果您的数据是带逗号的字符串列,您可以使用str.split重新定义您的列,但据我所知,您需要将结果列转储为原始 Python 列表,然后重新转换为数据帧:

import pandas as pd
df = pd.DataFrame([["A006, R079, 00-00-04, 5 AVE-59 ST"]])
df2 = pd.DataFrame(df[0].str.split(',').tolist())

回答by lightyagami96

try doing df.Turnstile.str.split(',')

尝试做 df.Turnstile.str.split(',')