通过分隔符将 Pandas 系列拆分为 DataFrame
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37224002/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Split Pandas Series into DataFrame by delimiter
提问by O.rka
I'm trying to split a pandas
series
object by a particular delimiter "; "
in this case. I want to turn it into a dataframe
there will always be the same amount of "columns" or to be more exact, same amount of "; "
that will indicate columns. I thought this would do the trick but it didnt python, how to convert a pandas series into a pandas DataFrame?I dont want to iterate through, I'm sure pandas
has made a shortcut that's more effective.
在这种情况下,我试图pandas
series
用特定的分隔符分割一个对象"; "
。我想把它变成一个dataframe
总是有相同数量的“列”,或者更准确地说,相同数量的“列”"; "
将指示列。我认为这可以解决问题,但它没有使用python,如何将 Pandas 系列转换为 Pandas DataFrame?我不想重复,我肯定pandas
已经做了一个更有效的捷径。
Does anyone know of the most efficient way to split this series into a dataframe by "; "
?
有谁知道将这个系列拆分为数据帧的最有效方法是"; "
?
#Example Data
SR_test = pd.Series(["a; b; c; d; e","aa; bb; cc; dd; ee","a1; b2; c3; d4; e5"])
# print(SR_test)
# 0 a; b; c; d; e
# 1 aa; bb; cc; dd; ee
# 2 a1; b2; c3; d4; e5
#Convert each row one at a time (not efficient)
tmp = []
for element in SR_test:
tmp.append([e.strip() for e in element.split("; ")])
DF_split = pd.DataFrame(tmp)
# print(DF_split)
# 0 1 2 3 4
# 0 a b c d e
# 1 aa bb cc dd ee
# 2 a1 b2 c3 d4 e5
回答by jezrael
You can use str.split
:
您可以使用str.split
:
df = SR_test.str.split('; ', expand=True)
print df
0 1 2 3 4
0 a b c d e
1 aa bb cc dd ee
2 a1 b2 c3 d4 e5
Another faster solution, if Series
have no NaN
values:
另一个更快的解决方案,如果Series
没有NaN
值:
print pd.DataFrame([ x.split('; ') for x in SR_test.tolist() ])
0 1 2 3 4
0 a b c d e
1 aa bb cc dd ee
2 a1 b2 c3 d4 e5
Timings:
时间:
SR_test = pd.concat([SR_test]*1000).reset_index(drop=True)
In [21]: %timeit SR_test.str.split('; ', expand=True)
10 loops, best of 3: 34.5 ms per loop
In [22]: %timeit pd.DataFrame([ x.split('; ') for x in SR_test.tolist() ])
100 loops, best of 3: 9.59 ms per loop