pandas 熊猫为每个字符拆分数据框列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43848680/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas split dataframe column for every character
提问by Warry S.
i have multiple dataframe columns which look like this:
我有多个数据框列,如下所示:
Day1
0 DDDDDDDDDDBBBBBBAAAAAAAAAABBBBBBDDDDDDDDDDDDDDDD
1 DDDDDDDDDDBBBBBBAAAAAAAAAABBBBBBDDDDDDDDDDDDDDDD
2 DDDDDDDDDDBBBBBBAAAAAAAAAABBBBBBDDDDDDDDDDDDDDDD
3 DDDDDDDDDDBBBBBBAAAAAAAAAABBBBBBDDDDDDDDDDDDDDDD
4 DDDDDDDDDDBBBBBBAAAAAAAAAABBBBBBDDDDDDDDDDDDDDDD
What i want is that every character is seperated in a own column:
我想要的是每个字符都在自己的列中分开:
012345678910111213....
0 DDDDDDDDDDBBBBBBAAAAAAAAAABBBBBBDDDDDDDDDDDDDDDD
1 DDDDDDDDDDBBBBBBAAAAAAAAAABBBBBBDDDDDDDDDDDDDDDD
2 DDDDDDDDDDBBBBBBAAAAAAAAAABBBBBBDDDDDDDDDDDDDDDD
3 DDDDDDDDDDBBBBBBAAAAAAAAAABBBBBBDDDDDDDDDDDDDDDD
4 DDDDDDDDDDBBBBBBAAAAAAAAAABBBBBBDDDDDDDDDDDDDDDD
So that "Day 1-Column" is splitted in 48 Columns and every Column has one of the Value A/B/C/D
这样“第 1 天的列”被分成 48 列,并且每列都有一个值 A/B/C/D
i tried with split, but that didnt work.
我尝试拆分,但没有奏效。
回答by EdChum
You can call apply
and for each row call pd.Series
on the the list
of the values:
您可以拨打apply
和每一行调用pd.Series
上的list
价值观:
In [16]:
df['Day1'].apply(lambda x: pd.Series(list(x)))
Out[16]:
0 1 2 3 4 5 6 7 8 9 ... 38 39 40 41 42 43 44 45 46 47
0 D D D D D D D D D D ... D D D D D D D D D D
1 D D D D D D D D D D ... D D D D D D D D D D
2 D D D D D D D D D D ... D D D D D D D D D D
3 D D D D D D D D D D ... D D D D D D D D D D
4 D D D D D D D D D D ... D D D D D D D D D D
[5 rows x 48 columns]
It looks like you have trailing spaces, remove these using str.rstrip
:
看起来您有尾随空格,请使用以下方法删除它们str.rstrip
:
df['Day1'] = df['Day1'].str.rstip()
then do the above
然后做上面的
回答by MaxU
use Series.str.extractall()method:
In [19]: df.Day1.str.extractall('(.)', flags=re.U)[0].unstack().rename_axis(None, 1)
Out[19]:
0 1 2 3 4 5 6 7 8 9 ... 38 39 40 41 42 43 44 45 46 47
0 D D D D D D D D D D ... D D D D D D D D D D
1 D D D D D D D D D D ... D D D D D D D D D D
2 D D D D D D D D D D ... D D D D D D D D D D
3 D D D D D D D D D D ... D D D D D D D D D D
4 D D D D D D D D D D ... D D D D D D D D D D
[5 rows x 48 columns]
回答by arjepak
Try this:
尝试这个:
df['Day1'].str.split(pat ="\s*", expand = True)
df['Day1'].str.split(pat ="\s*", expand = True)
It will have empty 1st and last columns so you have to trim the dataframe using
df['Day1'].iloc[:,1:-1]
它将有空的第一列和最后一列,因此您必须使用修剪数据框
df['Day1'].iloc[:,1:-1]