Java ETL:很难找到合适的

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4251336/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-14 15:01:08  来源:igfitidea点击:

Java ETL: hard to find a suitable one

javaetlembeddable

提问by tpdi

I'm looking for an embeddable Java ETL, i.e., an Extract Transform Load engine that can be called from Java code.

我正在寻找可嵌入的 Java ETL,即可以从 Java 代码调用的提取转换加载引擎。

I'm finding it surprisingly hard to find a suitable one.

我发现很难找到合适的。

I'm mainly looking at loading delimited text files into database tables, with some minor transforms along the way.

我主要着眼于将带分隔符的文本文件加载到数据库表中,并在此过程中进行一些小的转换。

I'd like the following features:

我想要以下功能:

  • the ability to specify the simple mappings externally, e.g, text column 5 to database column foo, specified some xml mapping file
  • the ability to give the the database node a javax.sql.Datasource
  • 能够在外部指定简单的映射,例如,文本列 5 到数据库列 foo,指定一些 xml 映射文件
  • 为数据库节点提供 javax.sql.Datasource 的能力

CloverETL allows mapping to be specified in XML, but database connections must be either JNDI names or a properties file specifying driverClass, url, dbusername, password, etc. Since I already have javax.sql.Datasources set up by my dependency injection framework, properties files seem painful and non-robust, especially if I want this to work in several environments (dev, test, prod).

CloverETL 允许在 XML 中指定映射,但数据库连接必须是 JNDI 名称或指定 driverClass、url、dbusername、密码等的属性文件。由于javax.sql.Datasource我的依赖注入框架已经设置了 s,因此属性文件看起来很痛苦并且不健壮,特别是如果我希望它在多种环境(开发、测试、生产)中工作。

KETL tells me that "We are currently in the process of completely overhauling our documentation for KETL?. Because of this, only the installation guide has been updated." Honest, but not helpful.

KETL 告诉我“我们目前正在彻底检查 KETL 的文档?因此,只更新了安装指南。” 诚实,但没有帮助。

Octopus is now "http://www.together.at/prod/database/tdt", which is "under construction".

Octopus现在是“http://www.together.at/prod/database/tdt”,“正在建设中”。

Pentaho seemsto use the same "specify driverClass" style that CloverETL does, rather that using a datasource, but Pentaho's documentation for calling their engine from java code is just difficult to find.

Pentaho似乎使用与 CloverETL 相同的“指定驱动程序类”样式,而不是使用数据源,但是 Pentaho 从 Java 代码调用其引擎的文档很难找到。

Basically I'd really like to be able to do this pseudo-code:

基本上我真的很想能够做这个伪代码:

extractTransformLoad(         
        getInputFile( "input.csv" ) , 
        getXMLMapping( "myMappingFile.xml") ,
        new DatabaseWriter( getDatasource() );

Any suggestions?

有什么建议?

回答by Aravind Yarram

Hereis a list of all the java based open source ETL libraries. I see you have evaluated few of them already but there are more. Also this seems to be a duplicate of https://stackoverflow.com/questions/272517/please-recommend-a-powerful-java-based-etl-framework

是所有基于 Java 的开源 ETL 库的列表。我看到您已经评估了其中的一些,但还有更多。此外,这似乎是https://stackoverflow.com/questions/272517/please-recommend-a-powerful-java-based-etl-framework的副本

回答by Lo?c Guillois

Do you know Talend?

你知道泰伦吗?

It's a tool based on Eclipse (Talend Open Studio), but you can use it directly in Java by writing your own code or by exporting jobs to Java classes.

它是一个基于 Eclipse (Talend Open Studio) 的工具,但您可以通过编写自己的代码或将作业导出到 Java 类来直接在 Java 中使用它。

回答by Agad

CloverETL Engine is easily embeddable as well as extendible, so you can write your own connection and plug it in to CLoverETL. The DBConnection object will be slightly changed in CloverETL 3.1, to be more extendible and the implementation of its descendant, that uses DataSource for connection to database will be as a child's play.

CloverETL 引擎易于嵌入和扩展,因此您可以编写自己的连接并将其插入 CLoverETL。DBConnection 对象将在 CloverETL 3.1 中略有更改,以使其更具可扩展性,并且其后代的实现,即使用 DataSource 连接到数据库的实现将是儿戏。

回答by ejboy

Disclosure: I'm the author of Scriptella ETL, but I believe this tool might be useful for your case.

披露:我是Scriptella ETL的作者,但我相信这个工具可能对你的情况有用。

It's a lightweight open source ETL with a one-liner integration with Java. It also supports Spring Frameworkand comes with built-in driversfor CSV, text, XML, Excel and other data-sources.

它是一个轻量级的开源 ETL,与 Java 进行了单线集成。它还支持 Spring Framework并带有用于 CSV、文本、XML、Excel 和其他数据源的内置驱动程序

Example of importing a CSV file into a table:

将 CSV 文件导入表的示例:

<!DOCTYPE etl SYSTEM "http://scriptella.org/dtd/etl.dtd">
<etl>
  <connection id="in" driver="csv" url="data.csv" />
  <connection id="out" driver="oracle" url="jdbc:oracle:thin:@localhost:1521:ORCL" 
      classpath="ojdbc14.jar" user="scott" password="tiger" />
  <!-- Copy all CSV rows to a database table -->
  <query connection-id="in">
      <!-- Empty query means select all columns -->
      <script connection-id="out">
          INSERT INTO Table_Name VALUES (?id,?priority, ?summary, ?status)
      </script>
  </query>
</etl>

Running from Java:

从 Java 运行:

// Execute etl.xml file
EtlExecutor.newExecutor(new File("etl.xml")).execute();

Running from command-line:

从命令行运行:

scriptella [file_name]

Integration with Spring:

与 Spring 集成:

  1. Use "spring"driver and the name of the bean to references data-sources. Example:

    <connection id="spring" driver="spring" url="datasourceBeanName" />
    
  2. Add EtlExecutorBeanto the application context in order to execute the job:

    <bean id="createDb" class="scriptella.driver.spring.EtlExecutorBean">
        <property name="configLocation" value="create-db.etl.xml" />
        <property name="progressIndicator"><ref local="progress" /></property>
        <property name="autostart" value="true" /> <!-- Etl will be run during app context initialization -->
    </bean>
    
  1. 使用"spring"驱动程序和 bean 的名称来引用数据源。例子:

    <connection id="spring" driver="spring" url="datasourceBeanName" />
    
  2. 添加EtlExecutorBean到应用程序上下文以执行作业:

    <bean id="createDb" class="scriptella.driver.spring.EtlExecutorBean">
        <property name="configLocation" value="create-db.etl.xml" />
        <property name="progressIndicator"><ref local="progress" /></property>
        <property name="autostart" value="true" /> <!-- Etl will be run during app context initialization -->
    </bean>
    

For additional details see the Spring example.

有关其他详细信息,请参阅Spring 示例