pandas 将字符串转换为数据框

Question

提问by Colonel Beauvel

I have a string like this:

我有一个这样的字符串：

txt = 'A      AGILENT TECH INC              \nAA     ALCOA INC                     '

And want to obtain a DataFramelike this:

并想获得DataFrame这样的：

In [185]: pd.DataFrame({'col1':['A','AA'],'col2':['AGILENT TECH INC','ALCOA INC']})
Out[185]:
  col1              col2
0    A  AGILENT TECH INC
1   AA         ALCOA INC

I tried so far:

到目前为止我尝试过：

from StringIO import StringIO
import re

pd.DataFrame.from_csv(StringIO(re.sub(' +\n', ';', txt)), sep=';')

Out[204]:
Empty DataFrame
Columns: [AA     ALCOA INC                     ]
Index: []

But the result is not the one expected. It seems I do not handle all optionality of from_csvor StringIO.

但结果并不是预期的那样。似乎我没有处理from_csvor 的所有可选性StringIO。

It is certainly linked to this question.

它肯定与这个问题有关。

Answer 1

回答by EdChum

Use read_fwfand pass the column widths:

使用read_fwf并传递列宽：

In [15]:
import io
import pandas as pd    
col2
txt = 'A      AGILENT TECH INC              \nAA     ALCOA INC                     '
df = pd.read_fwf(io.StringIO(txt), header=None, widths=[7, 37], names=['col1', 'col2'])
df
Out[15]:
  col1              col2
0    A  AGILENT TECH INC
1   AA         ALCOA INC

Answer 2

回答by Cody Bouche

import re

txt = 'A      AGILENT TECH INC              \nAA     ALCOA INC                     '

result = {'col{0}'.format(i + 1): re.split(r'\s{2,}', x.strip()) for i, x in enumerate(txt.splitlines())}

#{'col1':['A','AA'],'col2':['AGILENT TECH INC','ALCOA INC']}

Answer 3

回答by Nader Hisham

txt = 'A      AGILENT TECH INC              \nAA     ALCOA INC                     '
# First create a list , each element in the list represents new line
# at the same step replace the first occurrences of `spaces` with '__'
lines = [re.sub('\s+' , '__' , line.strip() , 1) for line in txt.split('\n')]
# 
Out[143]:
['A__AGILENT TECH INC', 'AA__ALCOA INC']
# then create a series of all resulting lines 
S = pd.Series(lines)

Out[144]:
0    A__AGILENT TECH INC
1          AA__ALCOA INC
dtype: object
# split on `__` which replaced the first occurrences of `spaces` before and then convert the series to a list
data = S.str.split('__').tolist()
Out[145]:
[['A', 'AGILENT TECH INC'], ['AA', 'ALCOA INC']]
pd.DataFrame( data, columns = ['col1' , 'col2'])
Out[142]:
col1    col2
0   A   AGILENT TECH INC
1   AA  ALCOA INC

pandas 将字符串转换为数据框

提问by Colonel Beauvel

回答by EdChum

回答by Cody Bouche

回答by Nader Hisham

相关推荐

最近更新

标签

pandas 将字符串转换为数据框

提问by Colonel Beauvel

回答by EdChum

回答by Cody Bouche

回答by Nader Hisham

相关推荐

pandas 熊猫标准偏差返回 NaN

Python Pandas：instancemethod 对象不可迭代

pandas 如何使用熊猫按 10 分钟对时间序列进行分组？

pandas Python：计算数据帧列中所有行中特定字符的实例

相关推荐

最近更新

标签