Pandas 导入 CSV 和 Excel 文件错误

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19293316/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:14:05  来源:igfitidea点击:

Pandas importing CSV and Excel file error

pythonpandasimport-from-excelimport-from-csv

提问by Baktaawar

I am trying to use Python Pandas to import a CSV file. The example data in this file is as follows where the first row is the column names separated by commas.

我正在尝试使用 Python Pandas 导入 CSV 文件。该文件中的示例数据如下,其中第一行是用逗号分隔的列名。

End Customer Organization ID,End Customer Organization Name,End Customer Top Parent Organization ID,End Customer Top Parent Organization Name,Reseller Top Parent ID,Reseller Top Parent Name,Business,Rev Sum Division,Rev Sum Category,Product Family,Version,Pricing Level,Summary Pricing Level,Detail Pricing Level,MS Sales Amount,MS Sales Licenses,Fiscal Year,Sales Date 
11027676,Baroda Western Uttar Pradesh Gramin Bankgfhgfnjgfnmjmhgmghmghmghmnghnmghnmhgnmghnghngh,4078446,Bank Of Barodadfhhgfjyjtkyukujkyujkuhykluiluilui;iooi';po'fserwefvegwegf,1809012,"Hcl Infosystems Ltd - Partnerdghftrutyhb frhywer5y5tyu6ui7iukluyj,lgjmfgnhfrgweffw",Server & CALsdgrgrfgtrhytrnhjdgthjtyjkukmhjmghmbhmgfngdfbndfhtgh,SQL Server & CALdfhtrhtrgbhrghrye5y45y45yu56juhydsgfaefwe,SQL CALdhdfthtrutrjurhjethfdehrerfgwerweqeadfawrqwerwegtrhyjuytjhyj,SQL CALdtrye45y3t434tjkabcjkasdhfhasdjkcbaksmjcbfuigkjasbcjkasbkdfhiwh,2005,Openfkvgjesropiguwe90fujklascnioawfy98eyfuiasdbcvjkxsbhg,Open Lklbjdfoigueroigbjvwioergyuiowerhgosdhvgfoisdhyguiserhguisrh,"Open Stddfm,vdnoghioerivnsdflierohgushdfovhsiodghuiohdbvgsjdhgouiwerho",125.85,1,FY07,12/28/2006
12835756,Uttam Strips Pvt Ltd,12835756,Uttam Strips Pvt Ltd,12565538,Redington C/O Fortis Financial Services Ltd,MBS,Dynamics ERP,Dynamics NAV,Dynamics NAV Business Essentials,Non-specific,Other,MBS SA,MBS New Customer Enhanc. Def,0,0,FY09,9/15/2008
12233135,Bhagwan Singh Tondon,12233135,Bhagwan Singh Tondon,2652941,H B S Systems Pvt Ltd,Server & CAL,SQL Server & CAL,SQL CAL,SQL CAL,Non-specific,Open,Open L&SA,Deferred Open L&SA - New,0,0,FY09,9/15/2008
11602305,Maya Academy Of Advanced Cinematics,9750934,Maya Entertainment Ltd,336146,Embee Software Pvt Ltd,Server & CAL,Windows Server & CAL,Windows Server HPC,Windows Compute Cluster Server,Non-specific,Open,Open V/MYO - Rec,OLV Perpet L&SA Recur-Def,0,0,FY09,9/25/2008
13336009,Remiel Softech Solution Pvt Ltd,13336009,Remiel Softech Solution Pvt Ltd,13335482,Redington C/O Remiel Softech Solutions Pvt Ltd,MBS,Dynamics ERP,Dynamics NAV,Dynamics NAV Business Essentials,Non-specific,Other,MBS SA,MBS New Customer Enhanc. Def,0,0,FY09,12/23/2008

I am using the below code to import:

我正在使用以下代码导入:

import pandas as pd

df=pd.read_csv('file path.csv',sep=',')

It gave the following error:

它给出了以下错误:

Traceback (most recent call last):
  File "<pyshell#25>", line 1, in <module>
    df=pd.read_csv(filename,sep=',')
  File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 400, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 205, in _read
    return parser.read()
  File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 608, in read
    ret = self._engine.read(nrows)
  File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 1028, in read
    data = self._reader.read(nrows)
  File "parser.pyx", line 706, in pandas.parser.TextReader.read (pandas\parser.c:6745)
  File "parser.pyx", line 728, in pandas.parser.TextReader._read_low_memory (pandas\parser.c:6964)
  File "parser.pyx", line 804, in pandas.parser.TextReader._read_rows (pandas\parser.c:7780)
  File "parser.pyx", line 890, in pandas.parser.TextReader._convert_column_data (pandas\parser.c:8793)
  File "parser.pyx", line 950, in pandas.parser.TextReader._convert_tokens (pandas\parser.c:9484)
  File "parser.pyx", line 1026, in pandas.parser.TextReader._convert_with_dtype (pandas\parser.c:10642)
  File "parser.pyx", line 1046, in pandas.parser.TextReader._string_convert (pandas\parser.c:10853)
  File "parser.pyx", line 1278, in pandas.parser._string_box_utf8 (pandas\parser.c:15657)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc1 in position 90: invalid start byte

Since it looked like a Unicode error I ran with encoding changed this time:

由于它看起来像一个 Unicode 错误,我这次运行的编码改变了:

df=pd.read_csv(filename,encoding='utf-16',sep=',')

It gave the following error:

它给出了以下错误:

Traceback (most recent call last):
  File "<pyshell#26>", line 1, in <module>
    df=pd.read_csv(filename,encoding='utf-16',sep=',')
  File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 400, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 198, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 479, in __init__
    self._make_engine(self.engine)
  File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 586, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 957, in __init__
    self._reader = _parser.TextReader(src, **kwds)
  File "parser.pyx", line 477, in pandas.parser.TextReader.__cinit__ (pandas\parser.c:4434)
  File "parser.pyx", line 592, in pandas.parser.TextReader._get_header (pandas\parser.c:5660)
  File "parser.pyx", line 768, in pandas.parser.TextReader._tokenize_rows (pandas\parser.c:7451)
  File "parser.pyx", line 1661, in pandas.parser.raise_parser_error (pandas\parser.c:18744)
pandas.parser.CParserError: Error tokenizing data. C error: Calling read(nbytes) on source failed. Try engine='python'.

Not sure why is this happening? Even tried converting the CSV file into Excel with Text to Columns and used read_excel function of Pandas. That too gave the error(below):

不知道为什么会这样?甚至尝试将 CSV 文件转换为带有文本到列的 Excel 并使用 Pandas 的 read_excel 函数。这也给出了错误(如下):

Traceback (most recent call last):
  File "<pyshell#30>", line 1, in <module>
    df=pd.read_excel('J:\dmqp on 192.168.1.41\MS Sales Dump (FY09)xls','MS Sales Dump (FY09)')
  File "C:\Python33\lib\site-packages\pandas\io\excel.py", line 52, in read_excel
    return ExcelFile(path_or_buf,kind=kind).parse(sheetname=sheetname,
  File "C:\Python33\lib\site-packages\pandas\io\excel.py", line 68, in __init__
    import xlrd # throw an ImportError if we need to
ImportError: No module named 'xlrd'

Can someone help with the above errors and what is wrong here while importing both as CSV and as Excel.

有人可以帮助解决上述错误以及在导入 CSV 和 Excel 时出现的问题。

I tried this code with encoding changed:

我尝试更改编码后使用此代码:

df=pd.read_csv(filename,encoding='iso-8859-1',sep=',')

It didn't give any error but imported as one single column rather than breaking it to separate columns.

它没有给出任何错误,而是作为一列导入而不是将其拆分为单独的列。

>>>df
<class 'pandas.core.frame.DataFrame'>
Int64Index: 263244 entries, 0 to 263243
Data columns (total 1 columns):
End Customer Organization ID,End Customer Organization Name,End Customer Top Parent Organization ID,End Customer Top Parent Organization Name,Reseller Top Parent ID,Reseller Top Parent Name,Business,Rev Sum Division,Rev Sum Category,Product Family,Version,Pricing Level,Summary Pricing Level,Detail Pricing Level,MS Sales Amount,MS Sales Licenses,Fiscal Year,Sales Date    263244  non-null values
dtypes: object(1)

After just checking for the above example data by storing it in a text file and then importing this is the output I got:

通过将上面的示例数据存储在文本文件中然后导入它来检查上面的示例数据后,这是我得到的输出:

>>> df =pd.read_csv(r'J:\Data.txt')
>>> print(df)
   End Customer Organization ID  \
0                      11027676   
1                      12835756   
2                      12233135   
3                      11602305   
4                      13336009   

                      End Customer Organization Name  \
0  Baroda Western Uttar Pradesh Gramin Bankgfhgfn...   
1                               Uttam Strips Pvt Ltd   
2                               Bhagwan Singh Tondon   
3                Maya Academy Of Advanced Cinematics   
4                    Remiel Softech Solution Pvt Ltd   

   End Customer Top Parent Organization ID  \
0                                  4078446   
1                                 12835756   
2                                 12233135   
3                                  9750934   
4                                 13336009   

           End Customer Top Parent Organization Name  Reseller Top Parent ID  \
0  Bank Of Barodadfhhgfjyjtkyukujkyujkuhykluiluil...                 1809012   
1                               Uttam Strips Pvt Ltd                12565538   
2                               Bhagwan Singh Tondon                 2652941   
3                             Maya Entertainment Ltd                  336146   
4                    Remiel Softech Solution Pvt Ltd                13335482   

                            Reseller Top Parent Name  \
0  Hcl Infosystems Ltd - Partnerdghftrutyhb frhyw...   
1        Redington C/O Fortis Financial Services Ltd   
2                              H B S Systems Pvt Ltd   
3                             Embee Software Pvt Ltd   
4     Redington C/O Remiel Softech Solutions Pvt Ltd   

                                            Business  \
0  Server & CALsdgrgrfgtrhytrnhjdgthjtyjkukmhjmgh...   
1                                                MBS   
2                                       Server & CAL   
3                                       Server & CAL   
4                                                MBS   

                                    Rev Sum Division  \
0  SQL Server & CALdfhtrhtrgbhrghrye5y45y45yu56ju...   
1                                       Dynamics ERP   
2                                   SQL Server & CAL   
3                               Windows Server & CAL   
4                                       Dynamics ERP   

                                    Rev Sum Category  \
0  SQL CALdhdfthtrutrjurhjethfdehrerfgwerweqeadfa...   
1                                       Dynamics NAV   
2                                            SQL CAL   
3                                 Windows Server HPC   
4                                       Dynamics NAV   

                                      Product Family       Version  \
0  SQL CALdtrye45y3t434tjkabcjkasdhfhasdjkcbaksmj...          2005   
1                   Dynamics NAV Business Essentials  Non-specific   
2                                            SQL CAL  Non-specific   
3                     Windows Compute Cluster Server  Non-specific   
4                   Dynamics NAV Business Essentials  Non-specific   

                                       Pricing Level  \
0  Openfkvgjesropiguwe90fujklascnioawfy98eyfuiasd...   
1                                              Other   
2                                               Open   
3                                               Open   
4                                              Other   

                               Summary Pricing Level  \
0  Open Lklbjdfoigueroigbjvwioergyuiowerhgosdhvgf...   
1                                             MBS SA   
2                                          Open L&SA   
3                                   Open V/MYO - Rec   
4                                             MBS SA   

                                Detail Pricing Level  MS Sales Amount  \
0  Open Stddfm,vdnoghioerivnsdflierohgushdfovhsio...           125.85   
1                       MBS New Customer Enhanc. Def             0.00   
2                           Deferred Open L&SA - New             0.00   
3                          OLV Perpet L&SA Recur-Def             0.00   
4                       MBS New Customer Enhanc. Def             0.00   

   MS Sales Licenses Fiscal Year Sales Date   
0                  1        FY07  12/28/2006  
1                  0        FY09   9/15/2008  
2                  0        FY09   9/15/2008  
3                  0        FY09   9/25/2008  
4                  0        FY09  12/23/2008  
>>> 

This is adding '\' after each column and column names are not one after the other. Instead they seem to be on new line after each column is imported.

这是在每一列之后添加 '\' 并且列名不是一个接一个。相反,它们似乎在每列导入后都在新行上。

回答by BVJ

I guess your main problem has to do with encoding. I have suffered the pain of dealing with weird encodings in csv files. What helped me in those cases was to try to detect the real encoding of the file and load it correctly with pandas.

我想您的主要问题与编码有关。我已经忍受了处理 csv 文件中奇怪编码的痛苦。在这些情况下帮助我的是尝试检测文件的真实编码并使用 Pandas 正确加载它。

give this next code a try:

试试这个下一个代码:

from chardet.universaldetector import UniversalDetector

def test_encoding(file_name):
    detector = UniversalDetector()
    with open(file_name, 'rb') as f:
        for line in f:
            detector.feed(line)
            if detector.done:
                 break
        detector.close()
    r = detector.result
    return "Detected encoding %s with confidence %s" % (r['encoding'], r['confidence'])

This will try to infer the encoding of your file and than you can try to load it correctly using pandas. Hope it helps...

这将尝试推断您的文件的编码,然后您可以尝试使用 Pandas 正确加载它。希望能帮助到你...