pandas Python导入CSV短代码(熊猫?)用';'分隔 和 ',' 完整的

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37904450/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:24:54  来源:igfitidea点击:

Python import CSV short code (pandas?) delimited with ';' and ',' in entires

pythoncsvpandasdataframeseparator

提问by Alexei Martianov

I need to import a CSV file in Python on Windows. My file is delimited by ';' and has strings with non-English symbols and commas (',').

我需要在 Windows 上的 Python 中导入一个 CSV 文件。我的文件由';'分隔 并且包含带有非英文符号和逗号 (',') 的字符串。

I've read posts:

我读过帖子:

Importing a CSV file into a sqlite3 database table using Python

使用 Python 将 CSV 文件导入 sqlite3 数据库表

Python import csv to list

Python 导入 csv 到列表

When I run:

当我运行时:

with open('d:/trade/test.csv', 'r') as f1:
    reader1 = csv.reader(f1)
    your_list1 = list(reader1)

I get an issue: comma is changed to '-' symbol.

我遇到一个问题:逗号更改为“-”符号。

When I try:

当我尝试:

df = pandas.read_csv(csvfile)

I got errors:

我有错误:

pandas.io.common.CParserError: Error tokenizing data. C error: Expected 1 fields in line 13, saw 2.

pandas.io.common.CParserError:标记数据时出错。C 错误:第 13 行应为 1 个字段,看到 2 个。

Please help. I would prefer to use pandas as the code is shorter without listing all field names from the CSV file.

请帮忙。我更喜欢使用 Pandas,因为代码更短,而没有列出 CSV 文件中的所有字段名称。

I understand there could be the work around of temporarily replacing commas. Still, I would like to solve it by some parameters to pandas.

我知道可能有临时替换逗号的解决方法。不过,我想通过一些参数来解决它。

回答by jezrael

Pandassolution - use read_csvwith regex separator [;,]. You need add engine='python', because warning:

Pandas解决方案 -read_csv与正则表达式分隔符一起使用[;,]。您需要添加engine='python',因为警告:

ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.

ParserWarning:回退到 'python' 引擎,因为 'c' 引擎不支持正则表达式分隔符(分隔符 > 1 个字符且不同于 '\s+' 被解释为正则表达式);您可以通过指定 engine='python' 来避免此警告。

import pandas as pd
import io

temp=u"""a;b;c
1;1,8
1;2,1
1;3,6
1;4,3
1;5,7
"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), sep="[;,]", engine='python')
print (df)

   a  b  c
0  1  1  8
1  1  2  1
2  1  3  6
3  1  4  3
4  1  5  7

回答by Alexei Martianov

Pandas documentation says for parameters:

Pandas 文档说参数:

pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

sep : str, default ‘,'

    Delimiter to use. If sep is None, will try to automatically determine this.

Pandas did not parse my file delimited by ;because default is not Nonedenoted for automatic but ,. Adding sepparameter set to ;for pandasfixed the issue.

Pandas 没有解析我的由 分隔的文件,;因为默认情况下不是None自动表示而是,. 添加sep参数集;进行pandas固定的问题。

回答by totoro

Unless your CSV file is broken, you can try to make csvguess your format.

除非您的 CSV 文件损坏,否则您可以尝试csv猜测您的格式。

import csv

with open('d:/trade/test.csv', 'r') as f1:
    dialect = csv.Sniffer().sniff(f1.read(1024))
    f1.seek(0)
    r = csv.reader(f1, dialect=dialect)
    for row in r:
        print(row)

回答by Santosh Pathak

Try to specify the encoding, you will need to find out what is the encoding of file one is trying to read.

尝试指定编码,您需要找出正在尝试读取的文件的编码。

I have used ASCII for this example, but it could be different.

我在这个例子中使用了 ASCII,但它可能会有所不同。

df = pd.read_csv(fname, encoding='ascii')