java Talend -- 一行到多,输出行数可变

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7920570/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 22:01:53  来源:igfitidea点击:

Talend -- one row to many, variable number of output rows

javatalend

提问by batman

Background:It's common in Talend to use something like tSplitRow to map one row with many fields into multiple rows. A row with fields:

背景:在 Talend 中使用 tSplitRow 之类的东西将具有多个字段的一行映射到多行是很常见的。包含字段的行:

Date | Name | MorningPhone | Day Phone | EveningPhone...could be split into:

Date | Name | MorningPhone | Day Phone | EveningPhone...可以分为:

Date | Name | Phone... and you'll always have 3 resulting rows from one row.

Date | Name | Phone...并且您将始终从一行中得到 3 个结果行。



Question:What if I want number of rows from a variable number of fields?

问题:如果我想要来自可变数量字段的行数怎么办?

I have a schema: UniqueID | FieldSetwhere FieldSet is a delimited field of columns divisible by nine. If there are 45 fields, in this delimited column, I want 5 rows. 81 fields => 9 rows.

我有一个架构:UniqueID | FieldSet其中 FieldSet 是可被九整除的列的分隔字段。如果有 45 个字段,在此分隔列中,我需要 5 行。81 个字段 => 9 行。

I'm trying to use tJavaRow to parse the fields, but I don't know how to combine that with tSplitRow to generate the appropriate number of fields.

我正在尝试使用 tJavaRow 来解析字段,但我不知道如何将它与 tSplitRow 结合以生成适当数量的字段。

Ideas? Thanks!

想法?谢谢!

回答by batman

I used a custom tJavaRow -- this turned a specially formatted string into a new table. Sort of a hack, but it worked.

我使用了一个自定义的 tJavaRow——这将一个特殊格式的字符串变成了一个新表。有点像黑客,但它奏效了。

String input = "";
String OUT = "";


try {
      input = java.net.URLDecoder.decode(input_row.CustomField16, "ASCII");

} catch (UnsupportedEncodingException e) {
      e.printStackTrace();
}

String[] pieces = input.split(";");

/*for(int a=0; a<pieces.length; a++)
      System.out.println("Piece "+a+"\n"+pieces[a]);*/



String[] allfields = pieces[0].split("\|");

//System.out.println("num_full_rows="+num_full_rows);


int fieldnum=9;
int totalrows=1;
for (int i=0; i+8<allfields.length; i++) {

      String xrow = allfields[i];
      i++;
      for (int j=i; j<fieldnum*totalrows;j++){
            xrow=xrow+"\t"+allfields[j];
      }
      i+=fieldnum-2;

      totalrows++;
      OUT += (input_row.LoadTime + "\t"
                  + input_row.minutepart + "\t" + input_row.TXID
                  + "\t" + input_row.SessionString + "\t" + xrow + "\n");


}

output_row.BULK = OUT;

回答by Daniel San

Talend has evolved since this question was made, and a much better way of doing this, is to use tNormalize component.

自从提出这个问题以来,Talend 已经发展了,而更好的方法是使用 tNormalize 组件。

enter image description here

在此处输入图片说明

First, we use a file like this as input:

首先,我们使用这样的文件作为输入:

pepe|123|123
juan|454|2423|34343|5454

We read this file using tFileInputRegex component. We have to define the regular expression and the schema. The regular expression will be:

我们使用 tFileInputRegex 组件读取此文件。我们必须定义正则表达式和模式。正则表达式将是:

"^([^|]+)\|(.+)"

The schema will be:

架构将是:

enter image description here

在此处输入图片说明

Then, we connect tFileInputRegex with a tNormalize. We set the separator to:

然后,我们将 tFileInputRegex 与 tNormalize 连接起来。我们将分隔符设置为:

"\|"

And finally we use the output as we need.

最后,我们根据需要使用输出。