java Talend tExtractXMLField

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4991318/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 08:58:23  来源:igfitidea点击:

Talend tExtractXMLField

javaxmlxpathtalend

提问by AntonioCS

I have this job in Talend that is supposed to retrieve a field and loop through it.

我在 Talend 有这项工作,它应该检索一个字段并循环遍历它。

My big problem is that the code is looping through the XML fields but it's returning null. Here is a sample of the XML:

我的大问题是代码在 XML 字段中循环,但它返回 null。这是 XML 的示例:

<?xml version="1.0" encoding="ISO-8859-1"?>
<empresas>
    <empresa>
        <imoveis>
            <imovel>
                [-- some fields --  ]

                <fotos>
                    <nome id="" order="">photo1</nome>
                    <nome id="" order=""></nome>
                    <nome id="" order=""></nome>
                    <nome id="" order=""></nome>
                </fotos>
            </imovel>
            [ -- other entries here -- ]
        </imoveis>
    </empresa>
</empresas>

Now using the tExtractXMLField component I am trying to get the "fotos" element. Here is what I have in the component: enter image description here

现在使用 tExtractXMLField 组件,我正在尝试获取“fotos”元素。这是我在组件中的内容: 在此处输入图片说明

I have tried to change the XPath query and the XPath loop query but the result is either I don't loop through the field or I get the null in the value field in the tMap.

我试图更改 XPath 查询和 XPath 循环查询,但结果是我没有循环遍历该字段,或者我在 tMap 的值字段中获得了空值。

Here is an image of the job:

这是作业的图像:

enter image description here

在此处输入图片说明

You can see that I have retrieved 4 items from the XML but what I get is null in the "nome" field. There must be something wrong with the XPath but I can't seem to find the problem :(

您可以看到我从 XML 中检索了 4 个项目,但在“nome”字段中我得到的是空值。XPath 肯定有问题,但我似乎找不到问题:(

Hope someone can help me out. Thanks Notes: I am using talendv4.1.2 on ubuntu 10.10 64bit

希望有人可以帮助我。感谢注意:我在 ubuntu 10.10 64bit 上使用 talendv4.1.2

回答by bluish

If you want to loop on <nome>nodes your Loop XPath Query has to be

如果您想在<nome>节点上循环,您的 Loop XPath 查询必须是

"/empresas/empresa/imoveis/imovel/fotos/nome"

and foto_nome XPath Query something like

和 foto_nome XPath 查询类似

"text()"

Take care: I also corrected an error in your XML that could bring issues (</imoveis>missing the "s").

注意:我还更正了您的 XML 中可能带来问题的错误(</imoveis>缺少“s”)。

回答by Andrei B.

There are two ways to go about it. One way is to use directly XMLinput and the instructions that bluishmentioned.

有两种方法可以解决这个问题。一种方法是直接使用XMLinput和指令偏蓝提及。

The other way is to continue on the path that you chose. In the XMLinput, make sure that your Loop XPath query is set to "/empresas/empresa/imoveis/imovel/fotos"and that you pass through the fotoselement with the Get Nodesoption checked. The XPath Query of your fotoselement should be "../fotos"or ".".

另一种方式是继续走你选择的道路。在 XMLinput 中,确保您的 Loop XPath 查询设置为"/empresas/empresa/imoveis/imovel/fotos"并且您通过选中Get Nodes选项的fotos元素。您的fotos元素的 XPath 查询应该是或。"../fotos""."

Your extractXMLField component looks to be well configured. Also, I don't know what tSetGlobalVar does in your design, but make sure it doesn't affect the fotoselement that you're trying to pass through.

您的 extractXMLField 组件看起来配置良好。另外,我不知道 tSetGlobalVar 在您的设计中做了什么,但请确保它不会影响您尝试通过的fotos元素。

回答by Brij

sample talend job
I have made a test job, this will help you definitely. If I'm not wrong you want to get all the "nome" under the "fotos" tag.

样本talend工作
我做了一个测试工作,这肯定会对你有帮助。如果我没猜错的话,您希望在“fotos”标签下获得所有“nome”。

回答by MordicusEtCubitus

I think you are confusing reading XML and extracting XML from XML.

我认为您混淆了阅读 XML 和从 XML 中提取 XML。

Reading XML: If the part of XML you have provided is the file readed by you tFileInputXML you don't need tExtractXMLField, just configure the tFileInputXML as this:

读取 XML:如果您提供的 XML 部分是您 tFileInputXML 读取的文件,则不需要 tExtractXMLField,只需将 tFileInputXML 配置为:

  • set the xpath loop to the <nome>elements, like this "//nome"
  • add 3 columns in the tFileInputXML component id, order and content
  • get content column with xpath query "."
  • get id value with xpath query "@id"
  • get order value with xpath query "@order"
  • 将 xpath 循环设置为<nome>元素,就像这样“//nome”
  • 在 tFileInputXML 组件 id、order 和 content 中添加 3 列
  • 使用 xpath 查询“.”获取内容列
  • 使用 xpath 查询“@id”获取 id 值
  • 使用 xpath 查询“@order”获取订单值

enter image description here

在此处输入图片说明

Extracting XML from XML: That is the goal of the tExtractXMLField component: It allows to parse XML data contained in a database column or another XML document as if it was itself a data flow.

从 XML 中提取 XML:这是 tExtractXMLField 组件的目标:它允许解析包含在数据库列或另一个 XML 文档中的 XML 数据,就好像它本身就是一个数据流一样。

To put it in a nutshell, tExtractXMLField create a flow of data from a column record containing XML. It is very useful when parsing soap query result: server reply is usually provided as xml, like this one:

简而言之,tExtractXMLField 从包含 XML 的列记录创建数据流。解析soap查询结果时非常有用:服务器回复通常以xml形式提供,如下所示:

<arg2> 
  <![CDATA[
    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <exportInscriptionEnLigneType>
      <date>2015-04-10</date>
      <nbDossiers>2</nbDossiers>
      <reference>20150410100</reference>
      <listeDossiers>
        <dossier>
          <numOrdre>1</numOrdre>
          <identifiantDossier>AAAAA</identifiantDossier>
        </dossier>
        <dossier>
          <numOrdre>2</numOrdre>
          <identifiantDossier>BBBBB</identifiantDossier>
        </dossier>
      </listeDossiers>
    </exportInscriptionEnLigneType>
]]>
</arg2> 

In XML above, arg2>element contains an XML document that you may need to parse.

在上面的 XML 中,arg2>element 包含您可能需要解析的 XML 文档。

tExtractXMLField has been created for this purpose. I've written a tutorial on how to achieve this work, please have a look here "how to extract xml from xml". It is in french but screenshots may help understanding the few comments provided.

为此目的创建了 tExtractXMLField。我写了一篇关于如何实现这项工作的教程,请看这里“如何从 xml 中提取 xml”。它是法语,但屏幕截图可能有助于理解所提供的少数评论。

Hope it will help.

希望它会有所帮助。

Best regards,

最好的祝福,

回答by OpenCoderX

Try to change your loop xpath to the top level in the file, "empresas". Sometimes that works for me, also I have seem the "?xml version="1.0" encoding="ISO-8859-1"?" tag cause problems before, you could try to remove that.

尝试将循环 xpath 更改为文件“empresas”中的顶级。有时这对我有用,我似乎也有“?xml version="1.0" encoding="ISO-8859-1"?” 标记之前会导致问题,您可以尝试将其删除。

Also make sure that the encoding is set correctly in the tFileInputXML.

还要确保在 tFileInputXML 中正确设置了编码。