Python 使用多个分隔符将文本导入到 Pandas

Question

提问by CastleH

I have some data that looks like this:

我有一些看起来像这样的数据：

c stuff
c more header
c begin data         
 1 1:.5
 1 2:6.5
 1 3:5.3

I want to import it into a 3 column data frame, with columns e.g.

我想将它导入到一个 3 列的数据框中，列例如

a , b, c
1,  1, 0.5
etc

I have been trying to read in the data as 2 columns split on ':', and then to split the first column on ' '. However I'm finding it irksome. Is there a better way to sort it out on import directly?

我一直在尝试将数据读取为在 ':' 上拆分的 2 列，然后在 ' ' 上拆分第一列。不过我觉得很烦。有没有更好的方法可以直接在导入时进行排序？

currently:

目前：

data1 = pd.read_csv(file_loc, skiprows = 3, delimiter = ':', names = ['AB', 'C'])
data2 = pd.DataFrame(data1.AB.str.split(' ',1).tolist(), names = ['A','B'])

However this is further complicated by the fact my data has a leading space...

但是，由于我的数据具有领先空间，这使情况变得更加复杂......

I feel like this should be a simple task, but currently I'm thinking of reading it line by line and using some find replace to sanitise the data before importing.

我觉得这应该是一项简单的任务，但目前我正在考虑逐行阅读并在导入之前使用一些查找替换来清理数据。

Answer 1

采纳答案by DSM

One way might be to use the regex separators permitted by the python engine. For example:

一种方法可能是使用 python 引擎允许的正则表达式分隔符。例如：

>>> !cat castle.dat
c stuff
c more header
c begin data         
 1 1:.5
 1 2:6.5
 1 3:5.3
>>> df = pd.read_csv('castle.dat', skiprows=3, names=['a', 'b', 'c'], 
                     sep=' |:', engine='python')
>>> df
   a  b    c
0  1  1  0.5
1  1  2  6.5
2  1  3  5.3

Python 使用多个分隔符将文本导入到 Pandas

提问by CastleH

采纳答案by DSM

相关推荐

最近更新

标签

Python 使用多个分隔符将文本导入到 Pandas

提问by CastleH

采纳答案by DSM

相关推荐

如何将此列表转换为 Python 中的字典？

Python 是否可以在服务器端的 Flask 中动态更新呈现的模板？

python np.round() 十进制选项大于 2

如何在python字典列表中找到一个值？

相关推荐

最近更新

标签