pandas HTTP?Error?403:? 阅读 HTML 时禁止

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43590153/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:27:50  来源:igfitidea点击:

HTTP?Error?403:?Forbidden when reading HTML

pythonpandas

提问by ge00rge

I would like to read the following html,

我想阅读以下html,

 import pandas as pd

daily_info=pd.read_html('https://www.investing.com/earnings-calendar/',flavor='html5lib')

print(daily_info)

Unfortunatelly appears :

不幸出现:

urllib.error.HTTPError:?HTTP?Error?403:?Forbidden

Is there anyway to fix it?

反正有办法解决吗?

回答by MaxU

Pretend to be a browser:

假装是浏览器:

import requests

url = 'https://www.investing.com/earnings-calendar/'

header = {
  "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36",
  "X-Requested-With": "XMLHttpRequest"
}

r = requests.get(url, headers=header)

dfs = pd.read_html(r.text)

Result:

结果:

In [201]: len(dfs)
Out[201]: 7

In [202]: dfs[0]
Out[202]:
    0   1   2   3
0 NaN NaN NaN NaN

In [203]: dfs[1]
Out[203]:
                 Unnamed: 0                                      Company    EPS /??Forecast Revenue /??Forecast.1 Market Cap  Time  \
0    Monday, April 24, 2017                                          NaN    NaN         NaN     NaN           NaN        NaN   NaN
1                       NaN                                 Acadia?(AKR)     --      / 0.11      --          / --      2.63B   NaN
2                       NaN                                  Agree?(ADC)     --      / 0.39      --          / --      1.34B   NaN
3                       NaN                                   Alcoa?(AA)     --      / 0.53      --          / --      5.84B   NaN
4                       NaN                        American Campus?(ACC)     --      / 0.27      --          / --      6.62B   NaN
5                       NaN                   Ameriprise Financial?(AMP)     --      / 2.52      --          / --     19.76B   NaN
6                       NaN                          Avacta Group?(AVTG)     --        / --   1.26M          / --     47.53M   NaN
7                       NaN                         Bank of Hawaii?(BOH)    1.2      / 1.08  165.8M          / --      3.48B   NaN
8                       NaN                         Bank of Marin?(BMRC)   0.74       / 0.8      --          / --    422.29M   NaN
9                       NaN                                Banner?(BANR)     --      / 0.68      --          / --      1.82B   NaN
10                      NaN                           Barrick Gold?(ABX)     --       / 0.2      --          / --     22.44B   NaN
11                      NaN                           Barrick Gold?(ABX)     --      / 0.28      --          / --     30.28B   NaN
12                      NaN               Berkshire Hills Bancorp?(BHLB)     --      / 0.54      --          / --      1.25B   NaN
13                      NaN   Brookfield Canada Office Properties?(BOXC)     --        / --      --          / --        NaN   NaN

...