Oracle:加载一个大的 xml 文件?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/998055/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Oracle: loading a large xml file?
提问by Mark Harrison
So now that I have a large bit of XML data I'm interested in:
所以现在我有很多我感兴趣的 XML 数据:
http://blog.stackoverflow.com/2009/06/stack-overflow-creative-commons-data-dump
http://blog.stackoverflow.com/2009/06/stack-overflow-creative-commons-data-dump
I'd like to load this into Oracle to play with.
我想将其加载到 Oracle 中进行播放。
How can I directly load a large XML file directly into Oracle? Server-side solutions (where the data file can be opened on the server) and client-side solutions welcomed.
如何将大型 XML 文件直接加载到 Oracle 中?欢迎服务器端解决方案(可以在服务器上打开数据文件)和客户端解决方案。
Here's a bit of badges.xml for a concrete example.
这是一个具体示例的一些badges.xml。
<?xml version="1.0" encoding="UTF-8" ?>
<badges>
<row UserId="3718" Name="Teacher" Date="2008-09-15T08:55:03.923"/>
<row UserId="994" Name="Teacher" Date="2008-09-15T08:55:03.957"/>
...
回答by Vincent Malgrat
You can access the XML files on the server via SQL. With your data in the /tmp/tmp.xml, you would first declare the directory:
您可以通过 SQL 访问服务器上的 XML 文件。使用 /tmp/tmp.xml 中的数据,您首先要声明目录:
SQL> create directory d as '/tmp';
Directory created
You could then query your XML File directly:
然后您可以直接查询您的 XML 文件:
SQL> SELECT XMLTYPE(bfilename('D', 'tmp.xml'), nls_charset_id('UTF8')) xml_data
2 FROM dual;
XML_DATA
--------------------------------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<badges>
[...]
To access the fields in your file, you could use the method described in another SOfor example:
要访问文件中的字段,您可以使用另一个 SO 中描述的方法,例如:
SQL> SELECT UserId, Name, to_timestamp(dt, 'YYYY-MM-DD"T"HH24:MI:SS.FF3') dt
2 FROM (SELECT XMLTYPE(bfilename('D', 'tmp.xml'),
nls_charset_id('UTF8')) xml_data
3 FROM dual),
4 XMLTable('for $i in /badges/row
5 return $i'
6 passing xml_data
7 columns UserId NUMBER path '@UserId',
8 Name VARCHAR2(50) path '@Name',
9 dt VARCHAR2(25) path '@Date');
USERID NAME DT
---------- ---------- ---------------------------
3718 Teacher 2008-09-15 08:55:03.923
994 Teacher 2008-09-15 08:55:03.957
回答by Jim Hudson
Seems like you're talking about 2 issues -- first, getting the XML document to where Oracle can see it. And then maybe making it so that standard relational tools can be applied to the data.
似乎您在谈论 2 个问题——首先,将 XML 文档放到 Oracle 可以看到的地方。然后也许可以将标准关系工具应用于数据。
For the first, you or your DBA can create a table with a BLOB, CLOB, or BFILE column and load the data. If you have access to the server on which the database lives, you can define a DIRECTORY object in the database that points to an operating system directory. Then put your file there. And then either set it up as a BFILE or read it in. (CLOB and BLOB store in the database; BFILE stores a pointed to a file on the operating system side).
首先,您或您的 DBA 可以创建一个包含 BLOB、CLOB 或 BFILE 列的表并加载数据。如果您有权访问数据库所在的服务器,则可以在数据库中定义一个指向操作系统目录的 DIRECTORY 对象。然后把你的文件放在那里。然后要么将其设置为 BFILE,要么将其读入。(CLOB 和 BLOB 存储在数据库中;BFILE 存储指向操作系统端的文件)。
Alternatively , use some tool that will let you directly write CLOBs to the database. Anyway, that gets you to the point where you can see the XML instance document in the database.
或者,使用一些工具可以让您直接将 CLOB 写入数据库。无论如何,这让您可以在数据库中看到 XML 实例文档。
So now you have the instance document visible. Step 1 is done.
所以现在您可以看到实例文档。步骤 1 完成。
Depending on the version, Oracle has some pretty good tools for shredding the XML into relational tables.
根据版本的不同,Oracle 有一些非常好的工具可以将 XML 分解为关系表。
It can be pretty declarative. While this gets beyond what I've actually done (I have a project where I'll be trying it this fall), you can theoretically load your XML Schema into the database and annotate it with the crosswalk between the relational tables and the XML. Then take your CLOB or BFILE and convert it to an XMLTYPE column with the defined schema and you're done -- the shredding happens automatically, the data is all there, it's all relational, it's all available to standard SQL without the XQUERY or XML extensions.
它可以是非常声明性的。虽然这超出了我实际完成的范围(我有一个项目,我将在今年秋天尝试它),但理论上您可以将 XML 模式加载到数据库中,并使用关系表和 XML 之间的交叉路口对其进行注释。然后将您的 CLOB 或 BFILE 并将其转换为具有定义模式的 XMLTYPE 列,您就完成了 - 分解会自动发生,数据都在那里,都是关系型的,所有这些都可用于标准 SQL,无需 XQUERY 或 XML扩展名。
Of course, if you'd rather use XQUERY, then just take the CLOB or BFILE, convert it to an XMLTYPE, and go for it.
当然,如果您更愿意使用 XQUERY,那么只需使用 CLOB 或 BFILE,将其转换为 XMLTYPE,然后继续使用。
回答by Pierre
I would do a simple:
我会做一个简单的:
grep '<row' file.xml |\
gawk -F '"' '{printf("insert into badges(userid,name,date) values (\"%s\",\"%s\",\"%s\");\n",,,); } > request.sql
or you can create A java program using a SAX parser. Each time your handler finds a new Element 'row', you get the attributes and insert a new record in your database.
或者您可以使用 SAX 解析器创建一个 java 程序。每次您的处理程序找到一个新的元素“行”时,您都会获取属性并在数据库中插入一条新记录。