python 将 XML 导入 SQL 数据库
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/723757/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Import XML into SQL database
提问by Jacob Lyles
I'm working with a 20 gig XML file that I would like to import into a SQL database (preferably MySQL, since that is what I am familiar with). This seems like it would be a common task, but after Googling around a bit I haven't been able to figure out how to do it. What is the best way to do this?
我正在处理一个 20 gig XML 文件,我想将其导入 SQL 数据库(最好是 MySQL,因为这是我熟悉的)。这似乎是一项常见任务,但在谷歌搜索后我一直无法弄清楚如何去做。做这个的最好方式是什么?
I know this ability is built into MySQL 6.0, but that is not an option right now because it is an alpha development release.
我知道此功能内置于 MySQL 6.0 中,但目前还不是一个选项,因为它是一个 alpha 开发版本。
Also, if I have to do any scripting I would prefer to use Python because that's what I am most familiar with.
另外,如果我必须编写任何脚本,我更喜欢使用 Python,因为这是我最熟悉的。
Thanks.
谢谢。
回答by Ryan Ginstrom
You can use the getiterator() function to iterate over the XML file without parsing the whole thing at once. You can do this with ElementTree, which is included in the standard library, or with lxml.
您可以使用 getiterator() 函数遍历 XML 文件,而无需一次性解析整个文件。您可以使用包含在标准库中的ElementTree或lxml 来完成此操作。
for record in root.getiterator('record'):
add_element_to_database(record) # Depends on your database interface.
# I recommend SQLAlchemy.
回答by Ryan Ginstrom
Take a look at the iterparse()
function from ElementTree
or cElementTree
(I guess cElementTree would be best if you can use it)
查看iterparse()
来自ElementTree
or的函数cElementTree
(如果你可以使用它,我想 cElementTree 会是最好的)
This piece describes more or less what you need to do: http://effbot.org/zone/element-iterparse.htm#incremental-parsing
这篇文章或多或少地描述了您需要做的事情:http: //effbot.org/zone/element-iterparse.htm#incremental-parsing
This will probably be the most efficient way to do it in Python. Make sure not to forget to call .clear()
on the appropriate elements (you reallydon't want to build an in memory tree of a 20gig xml file: the .getiterator()
method described in another answer is slightly simpler, but doesrequire the whole tree first - I assume that the poster actually had iterparse()
in mind as well)
这可能是在 Python 中最有效的方法。请务必不要忘记叫.clear()
上相应的元素(你真的不希望建立一个在20gig xml文件的内存树:对.getiterator()
在另一个答案描述的方法是稍微简单,但它首先需要整个树-我认为海报实际上也iterparse()
考虑到了)
回答by Joel Hooks
I've done this several times with Python, but never with such a big XML file. ElementTreeis an excellent XML library for Python that would be of assistance. If it was possible, I would divide the XML up into smaller files to make it easier to load into memory and parse.
我已经用 Python 做过几次了,但从来没有用过这么大的 XML 文件。ElementTree是一个出色的 Python XML 库,可以提供帮助。如果可能,我会将 XML 分成更小的文件,以便更容易加载到内存中并进行解析。
回答by John Saunders
It may be a common task, but maybe 20GB isn't as common with MySQL as it is with SQL Server.
这可能是一项常见的任务,但 20GB 可能在 MySQL 中不像在 SQL Server 中那么常见。
I've done this using SQL Server Integration Services and a bit of custom code. Whether you need either of those depends on what you need to do with 20GB of XML in a database. Is it going to be a single column of a single row of a table? One row per child element?
我已经使用 SQL Server Integration Services 和一些自定义代码完成了这项工作。您是否需要其中任何一个取决于您需要对数据库中的 20GB XML 做什么。它会是表格单行的单列吗?每个子元素一行?
SQL Server has an XML datatype if you simply want to store the XML as XML. This type allows you to do queries using XQuery, allows you to create XML indexes over the XML, and allows the XML column to be "strongly-typed" by referring it to a set of XML schemas, which you store in the database.
如果您只想将 XML 存储为 XML,则 SQL Server 具有 XML 数据类型。这种类型允许您使用 XQuery 进行查询,允许您在 XML 上创建 XML 索引,并允许通过将 XML 列引用到存储在数据库中的一组 XML 模式来“强类型化”XML 列。
回答by bortzmeyer
The MySQL documentationdoes not seem to indicate that XML import is restricted to version 6. It apparently works with 5, too.
在MySQL文档似乎并没有表示XML导入仅限于6个版本,显然与5作品了。