使用 Pandas 从 URL 读取 excel 文件 - XLRDError

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37243121/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:14:41  来源:igfitidea点击:

using Pandas to read in excel file from URL - XLRDError

pythonpandasxlrd

提问by s666

I am trying to read in excel files to Pandas from the following URLs:

我正在尝试从以下 URL 读取 excel 文件到 Pandas:

url1 = 'https://cib.societegenerale.com/fileadmin/indices_feeds/CTA_Historical.xls'

url2 = 'https://cib.societegenerale.com/fileadmin/indices_feeds/STTI_Historical.xls'

using the code:

使用代码:

pd.read_excel(url1)

However it doesn't work and I get the error:

但是它不起作用,我收到错误消息:

XLRDError: Unsupported format, or corrupt file: Expected BOF record; found '2000/01/'

After searching on Google it seems that sometimes .xls files offered through URLs are actually held in a different file format behind the scenes such as html or xml.

在 Google 上搜索后,似乎有时通过 URL 提供的 .xls 文件实际上在幕后以不同的文件格式保存,例如 html 或 xml。

When I manually download the excel file and open it using Excel I get presented with an error message: The file format and extension don't match. The file could be corrupted or unsafe. Unless you trust it's source don't open it"

当我手动下载 Excel 文件并使用 Excel 打开它时,我收到一条错误消息:文件格式和扩展名不匹配。该文件可能已损坏或不安全。除非你相信它的来源,否则不要打开它”

When I do open it, it appears just like a normal excel file.

当我打开它时,它看起来就像一个普通的 excel 文件。

I came across a post online that suggested I open the file in a text editor to see if there is any additional info held as to proper file format but I don't see any additional info when opened using notepad++.

我在网上看到一个帖子,建议我在文本编辑器中打开文件,看看是否有任何关于正确文件格式的附加信息,但在使用记事本 ++ 打开时我没有看到任何附加信息。

Could someone please help me get this "xls" file read into a pandas DataFramj properly please?

有人可以帮我把这个“xls”文件正确读入PandasDataFramj吗?

回答by jezrael

It seems you can use read_csv:

看来你可以使用read_csv

import pandas as pd

df = pd.read_csv('https://cib.societegenerale.com/fileadmin/indices_feeds/CTA_Historical.xls',
                 sep='\t',
                 parse_dates=[0],
                 names=['a','b','c','d','e','f'])
print df

Then I check last column fif there are some other values as NaN:

然后我检查最后一列f是否还有其他值NaN

print df[df.f.notnull()]

Empty DataFrame
Columns: [a, b, c, d, e, f]
Index: []

So there are only NaN, so you can filter last column fby parameter usecols:

所以只有NaN,所以你可以f按参数过滤最后一列usecols

import pandas as pd

df = pd.read_csv('https://cib.societegenerale.com/fileadmin/indices_feeds/CTA_Historical.xls',
                 sep='\t',
                 parse_dates=[0],
                 names=['a','b','c','d','e','f'],
                 usecols=['a','b','c','d','e'])
print df

回答by ihightower

If this helps someone.. you can read a Google Drive File directly by URL in to Excel without any login requirements. I tried in Google Colab it worked.

如果这对某人有帮助……您可以直接通过 URL 将 Google Drive 文件读取到 Excel 中,无需任何登录要求。我在 Google Colab 中尝试过,它奏效了。

  • Upload an XL File to Google Drive, or use an already uploaded one
  • Share the File to Anyone with the Link (i don't know if view only works, but i tried with full access)
  • Copy the Link
  • 将 XL 文件上传到 Google Drive,或使用已上传的文件
  • 通过链接将文件分享给任何人(我不知道查看是否有效,但我尝试使用完全访问权限)
  • 复制链接

You will get something like this.

你会得到这样的东西。

share url: https://drive.google.com/file/d/---some--long--string/view?usp=sharing

分享网址: https://drive.google.com/file/d/---some--long--string/view?usp=sharing

Get the download url from attempting to download the file (copy the url from there)

从尝试下载文件中获取下载 url(从那里复制 url)

It will be something like this: (it has got the same google file id as above)

它将是这样的:(它具有与上面相同的 google 文件 ID)

download url: https://drive.google.com/u/0/uc?id=---some--long--string&export=download

下载地址: https://drive.google.com/u/0/uc?id=---some--long--string&export=download

Now go to Google Colab and paste the following code:

现在转到 Google Colab 并粘贴以下代码:

import pandas as pd

fileurl   = r'https://drive.google.com/file/d/---some--long--string/view?usp=sharing'
filedlurl = r'https://drive.google.com/u/0/uc?id=---some--long--string&export=download'

df = pd.read_excel(filedlurl)
df

That's it.. the file is in your df.

就是这样..文件在你的df中。