pandas 将字符串转换为数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/32357545/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:51:14  来源:igfitidea点击:

Convert a string to dataframe

pythonstringpandas

提问by Colonel Beauvel

I have a string like this:

我有一个这样的字符串:

txt = 'A      AGILENT TECH INC              \nAA     ALCOA INC                     '

And want to obtain a DataFramelike this:

并想获得DataFrame这样的:

In [185]: pd.DataFrame({'col1':['A','AA'],'col2':['AGILENT TECH INC','ALCOA INC']})
Out[185]:
  col1              col2
0    A  AGILENT TECH INC
1   AA         ALCOA INC

I tried so far:

到目前为止我尝试过:

from StringIO import StringIO
import re

pd.DataFrame.from_csv(StringIO(re.sub(' +\n', ';', txt)), sep=';')

Out[204]:
Empty DataFrame
Columns: [AA     ALCOA INC                     ]
Index: []

But the result is not the one expected. It seems I do not handle all optionality of from_csvor StringIO.

但结果并不是预期的那样。似乎我没有处理from_csvor 的所有可选性StringIO

It is certainly linked to this question.

它肯定与这个问题有关

回答by EdChum

Use read_fwfand pass the column widths:

使用read_fwf并传递列宽:

In [15]:
import io
import pandas as pd    
col2
txt = 'A      AGILENT TECH INC              \nAA     ALCOA INC                     '
df = pd.read_fwf(io.StringIO(txt), header=None, widths=[7, 37], names=['col1', 'col2'])
df
Out[15]:
  col1              col2
0    A  AGILENT TECH INC
1   AA         ALCOA INC

回答by Cody Bouche

import re

txt = 'A      AGILENT TECH INC              \nAA     ALCOA INC                     '

result = {'col{0}'.format(i + 1): re.split(r'\s{2,}', x.strip()) for i, x in enumerate(txt.splitlines())}

#{'col1':['A','AA'],'col2':['AGILENT TECH INC','ALCOA INC']}

回答by Nader Hisham

txt = 'A      AGILENT TECH INC              \nAA     ALCOA INC                     '
# First create a list , each element in the list represents new line
# at the same step replace the first occurrences of `spaces` with '__'
lines = [re.sub('\s+' , '__' , line.strip() , 1) for line in txt.split('\n')]
# 
Out[143]:
['A__AGILENT TECH INC', 'AA__ALCOA INC']
# then create a series of all resulting lines 
S = pd.Series(lines)

Out[144]:
0    A__AGILENT TECH INC
1          AA__ALCOA INC
dtype: object
# split on `__` which replaced the first occurrences of `spaces` before and then convert the series to a list
data = S.str.split('__').tolist()
Out[145]:
[['A', 'AGILENT TECH INC'], ['AA', 'ALCOA INC']]
pd.DataFrame( data, columns = ['col1' , 'col2'])
Out[142]:
col1    col2
0   A   AGILENT TECH INC
1   AA  ALCOA INC