如何使用 Pandas 数据框打开 csv 文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/48383288/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:05:10  来源:igfitidea点击:

How to Open csv file with pandas data frame

pythonpandascsvdataframe

提问by Antenna_

There is a CSV format file with three column dataframe. The third column has long text. This error message occurred, when i tried to open the file using pandas.read_csv

有一个带有三列数据框的 CSV 格式文件。第三列有长文本。当我尝试使用打开文件时出现此错误消息pandas.read_csv

message : UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa1 in position 0: invalid start byte.

But there is no problem opening the file with

但是打开文件没有问题

with open('file.csv', 'r', encoding='utf-8', errors = "ignore") as csvfile:

I don't know how converting this data to dataframe and i don't think pandas.read_csvhandle this error properly.

我不知道如何将此数据转换为数据帧,我认为无法pandas.read_csv正确处理此错误。

So, how can i open this file and get dataframe?

那么,我如何打开这个文件并获取数据框?

回答by

Try this:

尝试这个:

Open the cvs file in a text editor and make sure to save it in utf-8 format.

在文本编辑器中打开 cvs 文件并确保将其保存为 utf-8 格式。

Then read the file as normal:

然后正常读取文件:

import pandas
csvfile = pandas.read_csv('file.csv', encoding='utf-8')

回答by jamescampbell

I would try using the built-in csv reader then put the data into pandas.

我会尝试使用内置的 csv 阅读器,然后将数据放入 Pandas。

import csv
with open('eggs.csv', newline='') as csvfile:
     spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
     for row in spamreader:
         print(', '.join(row))

If this doesn't work, then at least you can confirm that it is a csv issue and not a pandas issue choking on encodings.

如果这不起作用,那么至少您可以确认这是一个 csv 问题,而不是一个因编码而窒息的 Pandas 问题。

The other recommendation is to ensure you are using Python 3.x that handles encoding issues much better than 2.7.

另一个建议是确保您使用的 Python 3.x 能够比 2.7 更好地处理编码问题。

If you can provide your sample, I can test it myself and update my answer accordingly.

如果您可以提供您的样本,我可以自己测试并相应地更新我的答案。

回答by Shubham Yadav

You can try another option for encoding as "ISO-8859-1"

您可以尝试另一种编码为“ISO-8859-1”的选项

In your case:

在你的情况下:

with open('file.csv', 'r', encoding = 'ISO-8859-1', errors = "ignore") as csvfile:

or try this:

或者试试这个:

import pandas as pd
data_file = pd.read_csv("file.csv", encoding = "ISO-8859-1")
print(data_file)