pandas 使用pandas读取下载的html文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25056120/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Using pandas to read downloaded html file
提问by lokheart
As title, I tried using read_htmlbut give me the following error:
作为标题,我尝试使用read_html但给我以下错误:
In [17]:temp = pd.read_html('C:/age0.html',flavor='lxml')
File "<string>", line unknown
XMLSyntaxError: htmlParseStartTag: misplaced <html> tag, line 65, column 6
What have I done wrong?
我做错了什么?
update 01
更新 01
The HTML contains some javascript on top and then a html table. I used R to process it by parsing the html by XML package to give me a dataframe. I want to do it in python, should I use something else like beautifulsoup before giving it to pandas?
HTML 在顶部包含一些 javascript,然后是一个 html 表。我使用 R 来处理它,通过 XML 包解析 html 给我一个数据框。我想用 python 来做,我应该在给Pandas之前使用像 beautifulsoup 这样的其他东西吗?
采纳答案by ZJS
I think you are on to the right track by using an html parser like beautiful soup. pandas.read_html() reads an html table not an html page.
我认为您通过使用像美丽汤这样的 html 解析器走上了正确的轨道。pandas.read_html() 读取 html 表而不是 html 页面。
You would want to do something like this...
你会想做这样的事情......
from bs4 import BeautifulSoup
import pandas as pd
table = BeautifulSoup(open('C:/age0.html','r').read()).find('table')
df = pd.read_html(table) #I think it accepts BeatifulSoup object
#otherwise try str(table) as input
回答by srana
first of all install below packages for parsing purpose
- pip install BeautifulSoup4
- pip install lxml
- pip install html5lib
then use 'read_html' to read html table on any html page.
import pandas as pds pds_df = pds.read_html('C:/age0.html') pds_df[0]
首先安装以下软件包以进行解析
- pip 安装 BeautifulSoup4
- pip 安装 lxml
- pip 安装 html5lib
然后使用“read_html”读取任何 html 页面上的 html 表。
import pandas as pds pds_df = pds.read_html('C:/age0.html') pds_df[0]
I hope this will help.
我希望这将有所帮助。
Good Luck!!
祝你好运!!

