使用 Pandas 从 URL 读取 excel 文件 - XLRDError

Question

提问by s666

I am trying to read in excel files to Pandas from the following URLs:

我正在尝试从以下 URL 读取 excel 文件到 Pandas：

url1 = 'https://cib.societegenerale.com/fileadmin/indices_feeds/CTA_Historical.xls'

url2 = 'https://cib.societegenerale.com/fileadmin/indices_feeds/STTI_Historical.xls'

using the code:

使用代码：

pd.read_excel(url1)

However it doesn't work and I get the error:

但是它不起作用，我收到错误消息：

XLRDError: Unsupported format, or corrupt file: Expected BOF record; found '2000/01/'

After searching on Google it seems that sometimes .xls files offered through URLs are actually held in a different file format behind the scenes such as html or xml.

在 Google 上搜索后，似乎有时通过 URL 提供的 .xls 文件实际上在幕后以不同的文件格式保存，例如 html 或 xml。

When I manually download the excel file and open it using Excel I get presented with an error message: The file format and extension don't match. The file could be corrupted or unsafe. Unless you trust it's source don't open it"

当我手动下载 Excel 文件并使用 Excel 打开它时，我收到一条错误消息：文件格式和扩展名不匹配。该文件可能已损坏或不安全。除非你相信它的来源，否则不要打开它”

When I do open it, it appears just like a normal excel file.

当我打开它时，它看起来就像一个普通的 excel 文件。

I came across a post online that suggested I open the file in a text editor to see if there is any additional info held as to proper file format but I don't see any additional info when opened using notepad++.

我在网上看到一个帖子，建议我在文本编辑器中打开文件，看看是否有任何关于正确文件格式的附加信息，但在使用记事本 ++ 打开时我没有看到任何附加信息。

Could someone please help me get this "xls" file read into a pandas DataFramj properly please?

有人可以帮我把这个“xls”文件正确读入PandasDataFramj吗？

Answer 1

回答by jezrael

It seems you can use read_csv:

看来你可以使用read_csv：

import pandas as pd

df = pd.read_csv('https://cib.societegenerale.com/fileadmin/indices_feeds/CTA_Historical.xls',
                 sep='\t',
                 parse_dates=[0],
                 names=['a','b','c','d','e','f'])
print df

Then I check last column fif there are some other values as NaN:

然后我检查最后一列f是否还有其他值NaN：

print df[df.f.notnull()]

Empty DataFrame
Columns: [a, b, c, d, e, f]
Index: []

So there are only NaN, so you can filter last column fby parameter usecols:

所以只有NaN，所以你可以f按参数过滤最后一列usecols：

import pandas as pd

df = pd.read_csv('https://cib.societegenerale.com/fileadmin/indices_feeds/CTA_Historical.xls',
                 sep='\t',
                 parse_dates=[0],
                 names=['a','b','c','d','e','f'],
                 usecols=['a','b','c','d','e'])
print df

Answer 2

回答by ihightower

If this helps someone.. you can read a Google Drive File directly by URL in to Excel without any login requirements. I tried in Google Colab it worked.

如果这对某人有帮助……您可以直接通过 URL 将 Google Drive 文件读取到 Excel 中，无需任何登录要求。我在 Google Colab 中尝试过，它奏效了。

Upload an XL File to Google Drive, or use an already uploaded one
Share the File to Anyone with the Link (i don't know if view only works, but i tried with full access)
Copy the Link

将 XL 文件上传到 Google Drive，或使用已上传的文件
通过链接将文件分享给任何人（我不知道查看是否有效，但我尝试使用完全访问权限）
复制链接

You will get something like this.

你会得到这样的东西。

share url: https://drive.google.com/file/d/---some--long--string/view?usp=sharing

分享网址： https://drive.google.com/file/d/---some--long--string/view?usp=sharing

Get the download url from attempting to download the file (copy the url from there)

从尝试下载文件中获取下载 url（从那里复制 url）

It will be something like this: (it has got the same google file id as above)

它将是这样的：（它具有与上面相同的 google 文件 ID）

download url: https://drive.google.com/u/0/uc?id=---some--long--string&export=download

下载地址： https://drive.google.com/u/0/uc?id=---some--long--string&export=download

Now go to Google Colab and paste the following code:

现在转到 Google Colab 并粘贴以下代码：

import pandas as pd

fileurl   = r'https://drive.google.com/file/d/---some--long--string/view?usp=sharing'
filedlurl = r'https://drive.google.com/u/0/uc?id=---some--long--string&export=download'

df = pd.read_excel(filedlurl)
df

That's it.. the file is in your df.

就是这样..文件在你的df中。

使用 Pandas 从 URL 读取 excel 文件 - XLRDError

提问by s666

回答by jezrael

回答by ihightower

相关推荐

最近更新

标签

使用 Pandas 从 URL 读取 excel 文件 - XLRDError

提问by s666

回答by jezrael

回答by ihightower

相关推荐

在 Pandas to_csv 方法中保留列顺序

pandas：相当于 SQL 的 datediff() 的函数？

Python：用于元组的 Pandas DataFrame

即使在带有“left”选项的“pandas.merge”之后，行数也会发生变化

相关推荐

最近更新

标签