pandas Python导入CSV短代码(熊猫?)用';'分隔 和 ',' 完整的
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37904450/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python import CSV short code (pandas?) delimited with ';' and ',' in entires
提问by Alexei Martianov
I need to import a CSV file in Python on Windows. My file is delimited by ';' and has strings with non-English symbols and commas (',').
我需要在 Windows 上的 Python 中导入一个 CSV 文件。我的文件由';'分隔 并且包含带有非英文符号和逗号 (',') 的字符串。
I've read posts:
我读过帖子:
Importing a CSV file into a sqlite3 database table using Python
使用 Python 将 CSV 文件导入 sqlite3 数据库表
When I run:
当我运行时:
with open('d:/trade/test.csv', 'r') as f1:
reader1 = csv.reader(f1)
your_list1 = list(reader1)
I get an issue: comma is changed to '-' symbol.
我遇到一个问题:逗号更改为“-”符号。
When I try:
当我尝试:
df = pandas.read_csv(csvfile)
I got errors:
我有错误:
pandas.io.common.CParserError: Error tokenizing data. C error: Expected 1 fields in line 13, saw 2.
pandas.io.common.CParserError:标记数据时出错。C 错误:第 13 行应为 1 个字段,看到 2 个。
Please help. I would prefer to use pandas as the code is shorter without listing all field names from the CSV file.
请帮忙。我更喜欢使用 Pandas,因为代码更短,而没有列出 CSV 文件中的所有字段名称。
I understand there could be the work around of temporarily replacing commas. Still, I would like to solve it by some parameters to pandas.
我知道可能有临时替换逗号的解决方法。不过,我想通过一些参数来解决它。
回答by jezrael
Pandassolution - use read_csv
with regex separator [;,]
. You need add engine='python'
, because warning:
Pandas解决方案 -read_csv
与正则表达式分隔符一起使用[;,]
。您需要添加engine='python'
,因为警告:
ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.
ParserWarning:回退到 'python' 引擎,因为 'c' 引擎不支持正则表达式分隔符(分隔符 > 1 个字符且不同于 '\s+' 被解释为正则表达式);您可以通过指定 engine='python' 来避免此警告。
import pandas as pd
import io
temp=u"""a;b;c
1;1,8
1;2,1
1;3,6
1;4,3
1;5,7
"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), sep="[;,]", engine='python')
print (df)
a b c
0 1 1 8
1 1 2 1
2 1 3 6
3 1 4 3
4 1 5 7
回答by Alexei Martianov
Pandas documentation says for parameters:
Pandas 文档说参数:
pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
sep : str, default ‘,'
Delimiter to use. If sep is None, will try to automatically determine this.
Pandas did not parse my file delimited by ;
because default is not None
denoted for automatic but ,
. Adding sep
parameter set to ;
for pandas
fixed the issue.
Pandas 没有解析我的由 分隔的文件,;
因为默认情况下不是None
自动表示而是,
. 添加sep
参数集;
进行pandas
固定的问题。
回答by totoro
Unless your CSV file is broken, you can try to make csv
guess your format.
除非您的 CSV 文件损坏,否则您可以尝试csv
猜测您的格式。
import csv
with open('d:/trade/test.csv', 'r') as f1:
dialect = csv.Sniffer().sniff(f1.read(1024))
f1.seek(0)
r = csv.reader(f1, dialect=dialect)
for row in r:
print(row)
回答by Santosh Pathak
Try to specify the encoding, you will need to find out what is the encoding of file one is trying to read.
尝试指定编码,您需要找出正在尝试读取的文件的编码。
I have used ASCII for this example, but it could be different.
我在这个例子中使用了 ASCII,但它可能会有所不同。
df = pd.read_csv(fname, encoding='ascii')