database 查看非常大的 CSV 文件?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19936620/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 09:07:21  来源:igfitidea点击:

Viewing a very large CSV file?

databasecsv

提问by David michael

I have a very large 3.5 GB CSV file that I'd like to be able to read, sort through and filter for results based on various inputs. I'm pretty sure I can just import it to a MySQL database and go from there, but is there any program or online tool available that involves simply uploading the CSV and the rest is automatic?

我有一个非常大的 3.5 GB CSV 文件,我希望能够根据各种输入读取、排序和过滤结果。我很确定我可以将它导入 MySQL 数据库并从那里开始,但是是否有任何程序或在线工具只需要上传 CSV 而其余的都是自动的?

采纳答案by Neil McGuigan

You could try PostgreSQL 9.1+ and its file_fdw(File Foreign Data Wrapper) which would pretend that the CSV file is a table. If you replaced the CSV file with another CSV file of the same name, then you would see the new info immediately in the database.

您可以尝试 PostgreSQL 9.1+ 及其file_fdw(文件外部数据包装器),它会假装 CSV 文件是一个表。如果您将 CSV 文件替换为另一个同名的 CSV 文件,那么您将立即在数据库中看到新信息。

You can improve performance by using a materialized view(PG 9.3+) which essentially creates a real database table from the CSV data. You could use pgAgentto refreshthe materialized view on a schedule.

您可以通过使用实体化视图(PG 9.3+)来提高性能,该视图实质上从 CSV 数据创建了一个真实的数据库表。您可以使用pgAgent按计划刷新物化视图。

Another alternative would be to use the COPYstatement:

另一种选择是使用COPY语句:

/* the columns in this table are the same as the columns in your csv: */
create table if not exists my_csv (
  some_field text, ...
);

/* COPY appends, so truncate the table if loading fresh data again: */
truncate table my_csv;

/* 
you need to be a postgres superuser to use COPY 
use psql \copy if you can't be superuser 
put the csv file in /srv/vendor-name/
*/

copy 
  my_csv 
from 
  '/srv/vendor-name/my.csv'
with (
  format csv
);

回答by Estev?o Lucas

Yes, there is.

就在这里。

You can use OpenRefine(or Google Refine). OpenRefine is like a spreadsheet on steroids.

您可以使用OpenRefine(或 Google Refine)。OpenRefine 就像一个强大的电子表格。

The file size that you can manipulate depend on your computer's memory.

您可以操作的文件大小取决于计算机的内存。

回答by Siddarth Kanted

Since it is a CSV file.

因为它是一个 CSV 文件。

  1. Download http://openrefine.org/download.html
  2. It is open source. Unzip openrefine.zip.
  3. Run openrefine-2.7-rc.1\openrefine.exe.
  4. It is a web app. So open http://127.0.0.1:3333/in Chrome.
  5. Upload the large csv file. In my case the file size was 3.61 GB and it got opened successfully.
  1. 下载http://openrefine.org/download.html
  2. 它是开源的。解压 openrefine.zip。
  3. 运行 openrefine-2.7-rc.1\openrefine.exe。
  4. 它是一个网络应用程序。所以在 Chrome 中打开http://127.0.0.1:3333/
  5. 上传大型 csv 文件。就我而言,文件大小为 3.61 GB,并且成功打开。

https://snag.gy/02WECq.jpg

https://snag.gy/02WECq.jpg

回答by Gunnar Bernstein

I had the same problem with a csv-file having over 3 Million lines. Could not open in OpenOffice Calc, Writer or Notepad++.

我对超过 300 万行的 csv 文件有同样的问题。无法在 OpenOffice Calc、Writer 或 Notepad++ 中打开。

Then I used OpenOffice 4 base as a poor mans solution, which can link to csv. Short description (wording may not be correct as I use german OpenOffice).

然后我使用 OpenOffice 4 base 作为穷人的解决方案,它可以链接到 csv。简短说明(因为我使用德语 OpenOffice,所以措辞可能不正确)。

  1. Prepare: Your file needs .csv extension. First line should have field names. Put file as only file in a new directory to avoid confusion. Otherwise all files will be imported.
  2. File - New - Database. The assistant should come up.
  3. Connect to an existing database, format TEXT (your file needs to have .csv extension).
  4. Next. Choose path to file (oddly not the file itself). Choose csv. Choose correct field delimiters.
  5. Next and Finish.
  6. Choose a name for your newly created db.
  1. 准备:您的文件需要 .csv 扩展名。第一行应该有字段名称。将文件作为唯一文件放在新目录中以避免混淆。否则将导入所有文件。
  2. 文件 - 新建 - 数据库。助理该上来了。
  3. 连接到现有数据库,格式化 TEXT(您的文件需要具有 .csv 扩展名)。
  4. 下一个。选择文件路径(奇怪的是不是文件本身)。选择.csv。选择正确的字段分隔符。
  5. 下一步和完成。
  6. 为新创建的数据库选择一个名称。

If everything is right you now see the table view with your newly created table.

如果一切正常,您现在可以看到带有新创建表格的表格视图。

You can also use gVim to view the file like in notepad, e.g. to add the first column descriptiom line.

您还可以使用 gVim 像在记事本中一样查看文件,例如添加第一列描述行。

You may create queries on this table. As the table has no indexes it is quite slow. Since OpenOffice does not make use of the hourglass it may seem the system has crashed.

您可以在此表上创建查询。由于该表没有索引,所以速度很慢。由于 OpenOffice 没有使用沙漏,因此系统似乎已经崩溃。

Base is very limited and feels like early beta. Create new tables in that DB is not possible (thus no insert query to select from text file).

Base 非常有限,感觉像是早期的测试版。无法在该数据库中创建新表(因此没有插入查询以从文本文件中进行选择)。

Export to csv is not possible. Reasonably sized query results can be (time consuming) copied and pasted to calc.

无法导出到 csv。可以(耗时)将合理大小的查询结果复制并粘贴到 calc。

回答by RangerRick

Sure- there are quite a few Spreadsheet-like tools that support big data - IBM BigSheets being a major example.

当然,有很多类似电子表格的工具支持大数据——IBM BigSheets 就是一个主要的例子。

For an online product with a free trial period, I'd recommend DatameerI've had relatively good success with them.

对于具有免费试用期的在线产品,我会推荐Datameer,我在它们上取得了相对较好的成功。

回答by jastr

CSV Exploreris an online tool to read, sort, and filter CSVs with millions of rows. Upload the CSV and it will automatically import it and let you start working with the data.

CSV Explorer是一个在线工具,用于读取、排序和过滤具有数百万行的 CSV。上传 CSV,它会自动导入它并让您开始处理数据。

https://www.CSVExplorer.com

https://www.CSVExplorer.com

回答by Mohammad Alqudah

I had a file with ~100 million records, I used linux command line to view the files (just taking a look).

我有一个包含约 1 亿条记录的文件,我使用 linux 命令行查看文件(只是看一下)。

$ more myBigFile.CSV

$ more myBigFile.CSV

or

或者

$ nano myBigFile.CSV

$ nano myBigFile.CSV

it worked with a 6 GB file

它适用于 6 GB 的文件