使用 JSON Input 步骤处理不均匀数据

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/9397884/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-03 18:11:58  来源:igfitidea点击:

Use JSON Input step to process uneven data

jsonpentahokettledata-integration

提问by rsilva4

I'm trying to process the following with an JSON Input step:

我正在尝试使用 JSON 输入步骤处理以下内容:

{"address":[
  {"AddressId":"1_1","Street":"A Street"},
  {"AddressId":"1_101","Street":"Another Street"},
  {"AddressId":"1_102","Street":"One more street", "Locality":"Buenos Aires"},
  {"AddressId":"1_102","Locality":"New York"}
]}

However this seems not to be possible:

然而,这似乎是不可能的:

Json Input.0 - ERROR (version 4.2.1-stable, build 15952 from 2011-10-25 15.27.10 by buildguy) : 
The data structure is not the same inside the resource! 
We found 1 values for json path [$..Locality], which is different that the number retourned for path [$..Street] (3509 values). 
We MUST have the same number of values for all paths.

The step provides Ignore Missing Pathflag but it only works if all the rows misses the same path. In that case that step acts as as expected an fills the missing values with null.

该步骤提供了忽略缺失路径标志,但它仅在所有行都错过相同路径时才有效。在这种情况下,该步骤按预期运行,并用空值填充缺失值。

This limits the power of this step to read uneven data, which was really one of my priorities.

这限制了这一步读取不均匀数据的能力,这确实是我的优先事项之一。

My step Fields are defined as follows:

我的步骤字段定义如下:

JSON Input Fields definition

JSON 输入字段定义

Am I missing something? Is this the correct behavior?

我错过了什么吗?这是正确的行为吗?

回答by rsilva4

What I have done is use JSON Input using $.address[*] to read to a jsonRow field the full map of each element p.e:

我所做的是使用 JSON Input 使用 $.address[*] 将每个元素 pe 的完整映射读取到 jsonRow 字段:

{"address":[
    {"AddressId":"1_1","Street":"A Street"},  
    {"AddressId":"1_101","Street":"Another Street"},  
    {"AddressId":"1_102","Street":"One more street", "Locality":"Buenos Aires"},   
    {"AddressId":"1_102","Locality":"New York"} 
]}

This results in 4 jsonRows one for each element, p.e. jsonRow = {"AddressId":"1_101","Street":"Another Street"}. Then using a Javascript step I map my values using this:

这导致每个元素 pe 有 4 个 jsonRows jsonRow = {"AddressId":"1_101","Street":"Another Street"}。然后使用 Javascript 步骤,我使用以下方法映射我的值:

var AddressId = getFromMap('AddressId', jsonRow);
var Street = getFromMap('Street', jsonRow);
var Locality = getFromMap('Locality', jsonRow);

In a second script tab I inserted minified JSON parse code from https://github.com/douglascrockford/JSON-jsand the getFromMap function:

在第二个脚本选项卡中,我从https://github.com/douglascrockford/JSON-js和 getFromMap 函数插入了缩小的 JSON 解析代码:

function getFromMap(key,jsonRow){
  try{
   var map = JSON.parse(jsonRow);
  }
  catch(e){
   var message = "Unparsable JSON: "+jsonRow+" Desc: "+e.message;
   var nr_errors = 1;
   var field = "jsonRow";
   var errcode = "JSON_PARSE";
   _step_.putError(getInputRowMeta(), row, nr_errors, message, field, errcode);
   trans_Status = SKIP_TRANSFORMATION;
   return null;
  }

  if(map[key] == undefined){
   return null;
  }
  trans_Status = CONTINUE_TRANSFORMATION;
  return map[key]
}

回答by bsecker

You can solve this by changing the JSONPath and splitting up the steps in two JSON input steps. The following website explains a lot about JSONPath: http://goessner.net/articles/JsonPath/

您可以通过更改 JSONPath 并将步骤拆分为两个 JSON 输入步骤来解决此问题。以下网站对 JSONPath 进行了大量解释:http://goessner.net/articles/JsonPath/

$..AddressId

Does in fact return all the AddressId's in the address array, BUT since Pentaho is using grid rows for input and output [4 rows x 3 columns], it can't handle a missing value aka null value when you want as results return all the Streets (3 rows) and return all the Locality (2 rows), simply because there are no null values in the array itself as in you can't drive out of your garage with 3 wheels on your car instead of the usual 4.

实际上是否返回地址数组中的所有 AddressId,但是由于 Pentaho 使用网格行进行输入和输出 [4 行 x 3 列],当您希望结果返回所有Streets(3 行)并返回所有 Locality(2 行),这仅仅是因为数组本身没有空值,因为你不能用 3 个轮子而不是通常的 4 个轮子把你的车开出车库。

I guess your script returns null (where X is zero) values like:

我猜您的脚本返回 null (其中 X 为零)值,例如:

A S X
A S X
A S L
A X L

The scripting step can be avoided same by changing the Fields path of the first JSONinput step into:

通过将第一个 JSONinput 步骤的 Fields 路径更改为:

$.address[*]

This is to retrieve all the 4 address lines. Create a next JSONinput step based on the new source field which contains the address line(s) to retrieve the address details per line:

这是为了检索所有 4 个地址线。根据包含地址行的新源字段创建下一个 JSONinput 步骤,以检索每行的地址详细信息:

$.AddressId
$.Street
$.Locality

This yields the null values on the four address lines when a address details is not available in an address line.

当地址线中的地址详细信息不可用时,这会在四个地址线上产生空值。