pandas 将字符串转换为数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/32357545/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:51:14 来源:igfitidea点击:
Convert a string to dataframe
提问by Colonel Beauvel
I have a string like this:
我有一个这样的字符串:
txt = 'A AGILENT TECH INC \nAA ALCOA INC '
And want to obtain a DataFramelike this:
并想获得DataFrame这样的:
In [185]: pd.DataFrame({'col1':['A','AA'],'col2':['AGILENT TECH INC','ALCOA INC']})
Out[185]:
col1 col2
0 A AGILENT TECH INC
1 AA ALCOA INC
I tried so far:
到目前为止我尝试过:
from StringIO import StringIO
import re
pd.DataFrame.from_csv(StringIO(re.sub(' +\n', ';', txt)), sep=';')
Out[204]:
Empty DataFrame
Columns: [AA ALCOA INC ]
Index: []
But the result is not the one expected. It seems I do not handle all optionality of from_csvor StringIO.
但结果并不是预期的那样。似乎我没有处理from_csvor 的所有可选性StringIO。
It is certainly linked to this question.
它肯定与这个问题有关。
回答by EdChum
回答by Cody Bouche
import re
txt = 'A AGILENT TECH INC \nAA ALCOA INC '
result = {'col{0}'.format(i + 1): re.split(r'\s{2,}', x.strip()) for i, x in enumerate(txt.splitlines())}
#{'col1':['A','AA'],'col2':['AGILENT TECH INC','ALCOA INC']}
回答by Nader Hisham
txt = 'A AGILENT TECH INC \nAA ALCOA INC '
# First create a list , each element in the list represents new line
# at the same step replace the first occurrences of `spaces` with '__'
lines = [re.sub('\s+' , '__' , line.strip() , 1) for line in txt.split('\n')]
#
Out[143]:
['A__AGILENT TECH INC', 'AA__ALCOA INC']
# then create a series of all resulting lines
S = pd.Series(lines)
Out[144]:
0 A__AGILENT TECH INC
1 AA__ALCOA INC
dtype: object
# split on `__` which replaced the first occurrences of `spaces` before and then convert the series to a list
data = S.str.split('__').tolist()
Out[145]:
[['A', 'AGILENT TECH INC'], ['AA', 'ALCOA INC']]
pd.DataFrame( data, columns = ['col1' , 'col2'])
Out[142]:
col1 col2
0 A AGILENT TECH INC
1 AA ALCOA INC

