在 Python Pandas read_csv 中使用多字符分隔符

Question

提问by slaw

It appears that the pandas read_csvfunction only allows single character delimiters/separators. Is there some way to allow for a string of characters to be used like, "*|*" or "%%" instead?

看来Pandasread_csv函数只允许单字符分隔符/分隔符。有没有办法允许使用像“*|*”或“%%”这样的字符串？

Answer 1

采纳答案by slaw

The solution would be to use read_tableinstead of read_csv:

解决方案是使用read_table而不是 read_csv：

1*|*2*|*3*|*4*|*5
12*|*12*|*13*|*14*|*15
21*|*22*|*23*|*24*|*25

So, we could read this with:

所以，我们可以这样阅读：

pd.read_table('file.csv', header=None, sep='\*\|\*')

Answer 2

回答by jvans

Pandas does now support multi character delimiters

Pandas 现在支持多字符分隔符

import panda as pd
pd.read_csv(csv_file, sep="\*\|\*")

Answer 3

回答by Ami Tavory

As Padraic Cunningham writes in the comment above, it's unclear why you want this. The Wiki entry for the CSV Specstates about delimiters:

正如 Padraic Cunningham 在上面的评论中所写的那样，目前还不清楚你为什么想要这个。CSV 规范的Wiki 条目说明了有关分隔符的内容：

... separated by delimiters (typically a single reserved character such as comma, semicolon, or tab; sometimes the delimiter may include optional spaces),

... 由分隔符分隔（通常是单个保留字符，例如逗号、分号或制表符；有时分隔符可能包含可选空格），

It's unsurprising, that both the csvmodule and pandasdon't support what you're asking.

毫不奇怪，csv模块和pandas不支持您的要求。

However, if you really want to do so, you're pretty much down to using Python's string manipulations. The following example shows how to turn the dataframe to a "csv" with $$separating lines, and %%separating columns.

然而，如果你真的想这样做，你几乎只能使用 Python 的字符串操作。以下示例显示了如何将数据框转换为带有$$分隔线和%%分隔列的“csv” 。

'$$'.join('%%'.join(str(r) for r in rec) for rec in df.to_records())

Of course, you don't have to turn it into a string like this prior to writing it into a file.

当然，在将其写入文件之前，您不必将其转换为这样的字符串。

Answer 4

回答by control-zed

Not a pythonic way but definitely a programming way, you can use something like this:

不是pythonic方式，但绝对是编程方式，你可以使用这样的东西：

import re

def row_reader(row,fd):
    arr=[]
    in_arr = str.split(fd)
    i = 0
    while i < len(in_arr):
        if re.match('^".*',in_arr[i]) and not re.match('.*"$',in_arr[i]):
            flag = True
            buf=''
            while flag and i < len(in_arr):
                buf += in_arr[i]
                if re.match('.*"$',in_arr[i]):
                    flag = False
                i+=1
                buf += fd if flag else ''
            arr.append(buf)
        else:
            arr.append(in_arr[i])
            i+=1
    return arr

with open(file_name,'r') as infile:
    for row in infile:
        for field in  row_reader(row,'%%'):
            print(field)

在 Python Pandas read_csv 中使用多字符分隔符

提问by slaw

采纳答案by slaw

回答by jvans

回答by Ami Tavory

回答by control-zed

相关推荐

最近更新

标签

在 Python Pandas read_csv 中使用多字符分隔符

提问by slaw

采纳答案by slaw

回答by jvans

回答by Ami Tavory

回答by control-zed

相关推荐

AttributeError: 'TimedeltaProperties' 对象在 Pandas 中没有属性 'years'

Python pandas - pd.melt 带有日期时间索引的数据帧结果为 NaN

pandas 使用重复的索引值重新索引数据框

Python Pandas DataFrame 按周一至周日的每周定义将每日数据重新采样到每周？

相关推荐

最近更新

标签