从 oracle 生成大型 xml 文件:最佳实践

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16897448/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-19 01:42:13  来源:igfitidea点击:

Generating large xml files from oracle: Best practices

oracleplsql

提问by ErikL

I'm currently starting work on a project where I will have to write code (pl/sql) to export large XML files, based upon several tables in a database.

我目前正在开始一个项目,我将不得不编写代码 (pl/sql) 以根据数据库中的几个表导出大型 XML 文件。

The export files could become quite large and could contain up to 700.000 customers (with their addresses, orders, telephone numbers, etc..).

导出文件可能会变得非常大,最多可以包含 700.000 个客户(包括他们的地址、订单、电话号码等)。

I was wondering if anyone had some tips on the best approach for this. I could obviously just write out selects with lost of XMLELEMENTS in them, but that would mean the whole file would be generated in memory.

我想知道是否有人对最佳方法有一些提示。我显然可以只写出其中丢失 XMLELEMENTS 的选择,但这意味着整个文件将在内存中生成。

There's also a XML schema (XSD) available to which the files have to comply. I was also wondering if there's any way to "map" the table to the XML schema.

还有一个 XML 模式 (XSD) 可用,文件必须遵守该模式。我还想知道是否有任何方法可以将表“映射”到 XML 模式。

Any tips are appreciated.

任何提示表示赞赏。

采纳答案by Ben

XML has some... shortcomings in this area. Large XML files, as you note, can hog RAM and UNDO like there's no tomorrow.

XML 在这方面有一些……缺点。正如您所注意到的,大型 XML 文件可能会占用 RAM 和 UNDO,就像没有明天一样。

I don't honestly believe that something called "best practices" exists, it all depends on your own database, server and queries. However, here's what a colleague's (I can't claim credit) just done in order to write huge amounts (4.5GB) of XML to disk from a large number (20?) of large tables (10-400m rows) with extremely complex sub-queries.

老实说,我不相信存在所谓的“最佳实践”,这完全取决于您自己的数据库、服务器和查询。但是,这是一位同事(我不能声称功劳)刚刚做的事情,以便将大量(4.5GB)的 XML 从大量(20?)的大表(10-400m 行)中写入磁盘,并且非常复杂子查询。

  • Actually write out all of those XMLElements

  • If your SELECT statement is at all complex create a table first.

  • Select from the table, taking a reasonable element, hopefully based on your ID. For instance if you have the following structure it would make sense to split it on <record>

    <someXML>
        <record ID="1">
            <blah>
                <moreBlah/>
            </blah>
        </record>
        <record ID="2">
            <blah>
                <moreBlah/>
            </blah>
        </record>
    </someXML>
    
  • Select each record as a CLOB from the database. You then end up with a series of CLOBs that will make up your output XML.

  • Write the opening tag first then individually, or in chunks, write each CLOB to disk

  • Ensure you write to disk locally. If it's not avoidable write to a network share where there's a have a big fat cable pointing at it. You can always move your file afterwards and this'll be more efficient than writing across the network (or a city/country) in chunks.

  • Parallelize! This isn't always possible but if you can do it then do so.

  • Be careful of parallelizing. You don't want to be writing malformed XML.

  • 实际上写出所有这些 XMLElements

  • 如果您的 SELECT 语句非常复杂,请先创建一个表。

  • 从表格中选择一个合理的元素,希望基于您的 ID。例如,如果您具有以下结构,则将其拆分为<record>

    <someXML>
        <record ID="1">
            <blah>
                <moreBlah/>
            </blah>
        </record>
        <record ID="2">
            <blah>
                <moreBlah/>
            </blah>
        </record>
    </someXML>
    
  • 从数据库中选择每条记录作为 CLOB。然后,您最终会得到一系列 CLOB,这些 CLOB 将构成您的输出 XML。

  • 首先写入开始标记,然后单独或以块的形式将每个 CLOB 写入磁盘

  • 确保您在本地写入磁盘。如果无法避免,请写入网络共享,那里有一根粗大的电缆指向它。之后您可以随时移动文件,这比在网络(或城市/国家)中分块写入更有效。

  • 并行化!这并不总是可能的,但如果你能做到,那就去做吧。

  • 小心并行化。您不想编写格式错误的 XML。

I'm effectively advocating tbone's approach, save doing it in chunks instead. Whatever you do avoid putting the entire thing in memory.

我正在有效地提倡 tbone 的方法,而不是分块进行。无论你做什么,都要避免把整个事情都放在内存中。

回答by tbone

Try using DBMS_XMLGEN first. There are other approaches as well, see this Oracle XML DBdoc

首先尝试使用 DBMS_XMLGEN。还有其他方法,请参阅此Oracle XML DB文档

DECLARE
  v_ctx   DBMS_XMLGEN.ctxhandle;
  v_file  UTL_FILE.file_type;
  v_xml   CLOB;
  v_more  BOOLEAN := TRUE;
BEGIN
  -- Create XML context.
  v_ctx := DBMS_XMLGEN.newcontext('SELECT table_name, tablespace_name FROM user_tables WHERE rownum < 6');

  -- Set parameters to alter default Rowset and Row tag names and default case.
  DBMS_XMLGEN.setrowsettag(v_ctx, 'USER_TABLES'); 
  DBMS_XMLGEN.setrowtag(v_ctx, 'TABLE'); 
  --DBMS_XMLGEN.settagcase(v_ctx, DBMS_XMLGen.LOWER_CASE);

  -- Add an IE specfic XSL stylesheet reference so browser can transform the file.
  --DBMS_XMLGEN.setstylesheetheader(v_ctx, 'C:\Development\XML\IEStyle.xsl', 'text/xsl');

  -- Create the XML document.
  v_xml := DBMS_XMLGEN.getxml(v_ctx);
  DBMS_XMLGEN.closecontext(v_ctx);

  -- Output XML document to file.
  v_file := UTL_FILE.fopen('C:\Development\XML\', 'test1.xml', 'w');
  WHILE v_more LOOP
    UTL_FILE.put(v_file, SUBSTR(v_xml, 1, 32767));
    IF LENGTH(v_xml) > 32767 THEN
      v_xml :=  SUBSTR(v_xml, 32768);
    ELSE
      v_more := FALSE;
    END IF;
  END LOOP;
  UTL_FILE.fclose(v_file);

  -- test insert into table
  /*
  insert into t_clob (clob_col) values (v_xml);
  commit;
  */
EXCEPTION
  WHEN OTHERS THEN
    DBMS_OUTPUT.put_line(Substr(SQLERRM,1,255));
    UTL_FILE.fclose(v_file);
END;

Note that I borrowed most of this from the excellent oracle-basesite

请注意,我从优秀的oracle-base站点借用了大部分内容

回答by Nime Cloud

Another trick is writing the results into multiple XML files, ie say 10.000 rows per file like Table_01_Rows_00001_99999.xml. Then merge XML files later if necessary.

另一个技巧是将结果写入多个 XML 文件,即每个文件 10.000 行,如 Table_01_Rows_00001_99999.xml。如有必要,稍后合并 XML 文件。