通过分隔符将 Pandas 系列拆分为 DataFrame

Question

提问by O.rka

I'm trying to split a pandasseriesobject by a particular delimiter "; "in this case. I want to turn it into a dataframethere will always be the same amount of "columns" or to be more exact, same amount of "; "that will indicate columns. I thought this would do the trick but it didnt python, how to convert a pandas series into a pandas DataFrame?I dont want to iterate through, I'm sure pandashas made a shortcut that's more effective.

在这种情况下，我试图pandasseries用特定的分隔符分割一个对象"; "。我想把它变成一个dataframe总是有相同数量的“列”，或者更准确地说，相同数量的“列”"; "将指示列。我认为这可以解决问题，但它没有使用python，如何将 Pandas 系列转换为 Pandas DataFrame？我不想重复，我肯定pandas已经做了一个更有效的捷径。

Does anyone know of the most efficient way to split this series into a dataframe by "; "?

有谁知道将这个系列拆分为数据帧的最有效方法是"; "？

#Example Data
SR_test = pd.Series(["a; b; c; d; e","aa; bb; cc; dd; ee","a1; b2; c3; d4; e5"])
# print(SR_test)
# 0         a; b; c; d; e
# 1    aa; bb; cc; dd; ee
# 2    a1; b2; c3; d4; e5

#Convert each row one at a time (not efficient)
tmp = []
for element in SR_test:
    tmp.append([e.strip() for e in element.split("; ")])
DF_split = pd.DataFrame(tmp)
# print(DF_split)
#     0   1   2   3   4
# 0   a   b   c   d   e
# 1  aa  bb  cc  dd  ee
# 2  a1  b2  c3  d4  e5

Answer 1

回答by jezrael

You can use str.split:

您可以使用str.split：

df = SR_test.str.split('; ', expand=True)
print df

    0   1   2   3   4
0   a   b   c   d   e
1  aa  bb  cc  dd  ee
2  a1  b2  c3  d4  e5

Another faster solution, if Serieshave no NaNvalues:

另一个更快的解决方案，如果Series没有NaN值：

print pd.DataFrame([ x.split('; ') for x in SR_test.tolist() ])
    0   1   2   3   4
0   a   b   c   d   e
1  aa  bb  cc  dd  ee
2  a1  b2  c3  d4  e5

Timings:

时间：

SR_test = pd.concat([SR_test]*1000).reset_index(drop=True)

In [21]: %timeit SR_test.str.split('; ', expand=True)
10 loops, best of 3: 34.5 ms per loop

In [22]: %timeit pd.DataFrame([ x.split('; ') for x in SR_test.tolist() ])
100 loops, best of 3: 9.59 ms per loop

Answer 2

回答by EdChum

Use the vectorised str.splitwith param expand=Trueand pass as the data arg to the DataFramector:

使用str.split带有 param 的向量化expand=True并作为数据 arg 传递给DataFramector：

In [4]:
df = pd.DataFrame(SR_test.str.split(';',expand=True))
df

Out[4]:
    0    1    2    3    4
0   a    b    c    d    e
1  aa   bb   cc   dd   ee
2  a1   b2   c3   d4   e5

通过分隔符将 Pandas 系列拆分为 DataFrame

提问by O.rka

回答by jezrael

回答by EdChum

相关推荐

最近更新

标签

通过分隔符将 Pandas 系列拆分为 DataFrame

提问by O.rka

回答by jezrael

回答by EdChum

相关推荐

pandas ValueError：数组长度与索引长度不匹配

pandas 如何将列表列表转换为数据框并将列表的第一个元素作为索引

python pandas read_csv quotechar 不起作用

在 Pandas to_csv 方法中保留列顺序

相关推荐

最近更新

标签