Java 通过 JDBC 对 CSV 文件执行 SQL

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1468998/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-12 12:38:10  来源:igfitidea点击:

Execute SQL on CSV files via JDBC

javasqltextjdbccsv

提问by Markos Fragkakis

I need to apply an SQL query to CSV files (comma-separated text files). My SQL is predefined from another tool, and is not eligible to change. It may contain embedded selects and table aliases in the FROM part.

我需要将 SQL 查询应用于 CSV 文件(逗号分隔的文本文件)。我的 SQL 是从另一个工具预定义的,不能更改。它可能在 FROM 部分中包含嵌入的选择和表别名。

For my task I have found two open-source(this is a project requirement) libraries that provide JDBC drivers:

对于我的任务,我找到了两个提供 JDBC 驱动程序的开源(这是一个项目要求)库:

  1. CsvJdbc
  2. XlSQL
  3. JBoss Teiid
  4. Create an Apache Derby DB, load all CSVs as tables and execute the query.
  1. 数据表
  2. XlSQL
  3. JBoss Teiid
  4. 创建一个 Apache Derby DB,将所有 CSV 加载为表并执行查询。

These are the problems I encountered:

这些是我遇到的问题:

  1. it does not accept the syntax of the SQL (it uses internal selects and table aliases). Furthermore, it has not been maintained since 2004.
  2. I could not get it to work, as it has as dependency a SAX Parser that causes exception when parsing other documents. Similarly, no change since 2004.
  3. Have not checked if it supports the syntax, but seems like an overhead. It needs several entities defines (Virtual Databases, Bindings). From the mailing list they told me that last release supports runtime creation of required objects. Has anyone used it for such simple task (normally it can connect to several types of data, like CSV, XML or other DBS and create a virtual, unified one)?
  4. Can this even be done easily?
  1. 它不接受 SQL 的语法(它使用内部选择和表别名)。此外,它自 2004 年以来就没有得到维护。
  2. 我无法让它工作,因为它依赖一个 SAX 解析器,在解析其他文档时会导致异常。同样,自 2004 年以来没有变化。
  3. 没有检查它是否支持语法,但似乎是一个开销。它需要定义多个实体(虚拟数据库、绑定)。从邮件列表中,他们告诉我上一个版本支持所需对象的运行时创建。有没有人将它用于如此简单的任务(通常它可以连接到多种类型的数据,如 CSV、XML 或其他 DBS 并创建一个虚拟的、统一的)?
  4. 这甚至可以轻松完成吗?

From the 4 things I considered/tried, only 3 and 4 seem to me viable. Any advice on these, or any other way in which I can query my CSV files?

从我考虑/尝试过的 4 件事中,只有 3 和 4 件事在我看来是可行的。关于这些的任何建议,或我可以查询我的 CSV 文件的任何其他方式?

Cheers

干杯

采纳答案by ChssPly76

If your SQL is predefined and cannot be changed your best option is to load your CSV into a database and run queries against it.

如果您的 SQL 是预定义的并且无法更改,您最好的选择是将您的 CSV 加载到数据库中并对其运行查询。

Apache Derby is a viable option, so are MySQL, which even has a CSV storage engineor PostgreSQL.

Apache Derby 是一个可行的选择,MySQL也是一个可行的选择,它甚至有一个CSV 存储引擎PostgreSQL

Does your SQL use any proprietary functions / extensions? If so, that may limit your choices.

您的 SQL 是否使用任何专有函数/扩展?如果是这样,那可能会限制您的选择。

回答by alex

I'd say embedded db. I'd suggest either Javadb (Derby built into the Java API) or H2 if you don't care about pulling the extra dependency.

我会说嵌入式数据库。如果您不关心拉出额外的依赖项,我建议您使用 Javadb(Derby 内置于 Java API)或 H2。

回答by Vladimir Dyuzhev

I would load the data into HSQL (HypersonicSQL). Pure Java, correct SQL, well-proven. Pretty much anything else has a bigger footprint.

我会将数据加载到 HSQL (HysonicSQL) 中。纯 Java,正确的 SQL,久经考验。几乎任何其他东西都有更大的足迹。

回答by Kolmogorov

There is a Groovy script, gcsvsql that lets you treat csv files as database tables, including joins. With gcsvsql you can do things like:

有一个 Groovy 脚本 gcsvsql 可让您将 csv 文件视为数据库表,包括连接。使用 gcsvsql,您可以执行以下操作:

gcsvsql "select * from people.csv where age > 40"

gcsvsql "select people.name,children.child from people.csv,children.csv where people.name=children.name"

gcsvsql "select avg(score) from people.csv where age < 40"

gcsvsql "select * from people.csv where age > 40"

gcsvsql "从 people.csv,children.csv 中选择 people.name,children.child,其中 people.name=children.name"

gcsvsql“从 people.csv 中选择 avg(score),其中年龄 < 40”

You can find this script, which is based on the h2 database engine, at Google code here:

你可以在谷歌代码中找到这个基于 h2 数据库引擎的脚本:

http://code.google.com/p/gcsvsql/

http://code.google.com/p/gcsvsql/

回答by Kolmogorov

If you are wanting to treat csv files as databases from within a Java program, you should look at the h2 database engine.It has really nice support for reading/writing CSV files and working with in-memory databases. It's a successor to hsql, faster and with added features. You can read about the csv support in the h2 tutorial.

如果您想在 Java 程序中将 csv 文件视为数据库,您应该查看h2 数据库引擎。它对读/写 CSV 文件和使用内存数据库有很好的支持。它是 hsql 的后继者,速度更快且具有附加功能。您可以在 h2 教程中阅读有关 csv 支持的信息。

回答by mariotomo

maybe a bit late, sorry for that.

可能有点晚了,抱歉。

I've been developing the csvjdbcfor over a year now and since a few weeks I've got "administrator" rights on that project so I've been able to release the most recent version I had produced. it does all "we" need (we: me and my current my colleagues) need, and I'm adding things as bugs are filed.

我开发csvjdbc已经一年多了,几周后我就获得了该项目的“管理员”权限,因此我能够发布我制作的最新版本。它满足了所有“我们”需要(我们:我和我现在的同事)的需要,并且我正在添加一些东西作为错误提交。

have a look at it now and decide again. (the web documentation still needs reviewing, for better insight, check the test cases, which are very extensive).

现在看看它,然后再决定。(Web 文档仍然需要,为了更好地了解,请检查非常广泛的测试用例)。

回答by Mirko

I know, it's a very old case, but...

我知道,这是一个非常古老的案例,但是......

CsvJdbcis a cool library, but there are some issues using DbUtilswhile mapping ResultsSets to PoJos. A second bad thing is, that dosn't have a good support for different Datatypes.

CsvJdbc是一个很酷的库,但是在将 ResultsSets 映射到 PoJos 时使用DbUtils存在一些问题。第二个坏事是,它对不同的数据类型没有很好的支持。

After playing with CSVJdbcI will use a stupid CsvParser to read the Files an pump them into a HsqlDB or something like that.

在玩过CSVJdbc之后,我将使用一个愚蠢的 CsvParser 来读取文件并将它们泵入 HsqlDB 或类似的东西。