java 如何查找文件是否为 CSV 文件?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3068545/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 00:11:38  来源:igfitidea点击:

How to find if the file is a CSV file?

javacsvweb-applications

提问by Mithun

I have a scenario wherein the user uploads a file to the system. The only file that the system understands in a CSV, but the user can upload any type of file eg: jpeg, doc, html. I need to throw an exception if the user uploads anything other than CSV file.

我有一个场景,其中用户将文件上传到系统。系统可以识别的唯一 CSV 文件,但用户可以上传任何类型的文件,例如:jpeg、doc、html。如果用户上传 CSV 文件以外的任何内容,我需要抛出异常。

Can anybody let me know how can I find if the uploaded file is a CSV file or not?

任何人都可以让我知道如何找到上传的文件是否为 CSV 文件?

采纳答案by Willis Blackburn

I can think of several methods.

我能想到几种方法。

One way is to try to decode the file using UTF-8. (This is built into Java and is probably built into .NET too.) If the file decodes properly, then you at least know that it's a text file of some kind.

一种方法是尝试使用 UTF-8 解码文件。(这是内置于 Java 中的,也可能内置于 .NET 中。)如果文件解码正确,那么您至少知道它是某种文本文件。

Once you know it's a text file, parse out the individual fields from each line and check that you get the number of fields that you expect. If the number of fields per line is inconsistent then you might just have a file that contains text but is not organized into lines and fields.

一旦你知道它是一个文本文件,从每一行中解析出单独的字段并检查你是否得到了你期望的字段数。如果每行的字段数不一致,那么您可能只有一个包含文本但未组织成行和字段的文件。

Otherwise you have a CSV. Then you can validate the fields.

否则你有一个CSV。然后您可以验证字段。

回答by Vinko Vrsalovic

CSV files vary a lot, and they all could be called, legitimately, CSV files.

CSV 文件变化很大,它们都可以合法地称为 CSV 文件。

I guess your approach is not the best one, the correct approach would be to tell if the uploaded file is a text file the application can parseinstead of it it's a CSV or not.

我想您的方法不是最好的方法,正确的方法是判断上传的文件是否是应用程序可以解析的文本文件而不是 CSV 文件。

You would report errors whenever you can't parse the file, be it a JPG, MP3 or CSV in a format you cannot parse.

每当您无法解析文件时,您都会报告错误,无论是 JPG、MP3 还是无法解析格式的 CSV。

To do that, I would try to find a library to parse various CSV file formats, else you have a long road ahead writing code to parse many possible types of CSV files (or restricting the application's flexibility by supporting few CSV formats.)

为此,我会尝试找到一个库来解析各种 CSV 文件格式,否则您在编写代码来解析许多可能类型的 CSV 文件(或通过支持少数 CSV 格式来限制应用程序的灵活性)方面还有很长的路要走。

One such library for Java is opencsv

一个这样的 Java 库是opencsv

回答by Jamie Wong

If you're using some library CSV parser, all you would have to do is catch any errors it throws.

如果您正在使用某个库 CSV 解析器,您所要做的就是捕获它抛出的任何错误。

If the CSV parser you're using is remotely robust, it will throw some useful errors in the event that it doesn't understand the file format.

如果您使用的 CSV 解析器远程健壮,它会在不理解文件格式的情况下抛出一些有用的错误。

回答by Hans Olsson

I don't know if you can tell for 100% certain in any way, but I'd suggest that the first validations should be:

我不知道你是否能以任何方式 100% 确定,但我建议第一次验证应该是:

  1. Is the file extension .csv
  2. Count the number of commas in the file per line, there should normally be the same amount of commas on each line of the file for it to be a valid CSV file. (As Jkramer said, this only works if the files can't contain quoted commas).
  1. 文件扩展名是 .csv
  2. 计算每行文件中的逗号数量,文件的每一行通常应该有相同数量的逗号,才能使其成为有效的 CSV 文件。(正如 Jkramer 所说,这仅在文件不能包含带引号的逗号时才有效)。

回答by jkramer

If it's a web application, you might want to check the content-type HTTP header the browser sends when uploading/posting a file through a form. If there's a bind for the language you're using, you might also try using libmagic, is pretty good at recognizing file types. For example, the UNIX tool fileuses it.

如果它是一个 Web 应用程序,您可能需要检查浏览器在通过表单上传/发布文件时发送的内容类型 HTTP 标头。如果您使用的语言有绑定,您也可以尝试使用 libmagic,它非常擅长识别文件类型。例如,UNIX 工具file使用它。

http://sourceforge.net/projects/libmagic/

http://sourceforge.net/projects/libmagic/

回答by Vivek G

try this one :

试试这个:

String type = Files.probeContentType(Paths.get(filepath));

回答by hariszhr

I solved it like this: read the file with UTF-16 encoding, if no comma is found in the file, it means UTF-16 encoding didnt work. Which means that this csv file is of Excel format (NOT plain text).

我是这样解决的:用UTF-16编码读取文件,如果文件中没有找到逗号,则表示UTF-16编码不起作用。这意味着这个 csv 文件是 Excel 格式(不是纯文本)。

      if(fileA.endsWith(".csv") && fileB.endsWith(".csv")) {
            second_list=readCSVFile(fileA);
            new_list=readCSVFile(fileB);
            if(!String.join("", second_list).contains(",") || !String.join("", new_list).contains(",")) {
                  //read these files with UTF-8 encoding
                    System.out.println("[WARN] csv files will be read like text files. (UTF-16 encoding couldnt find any comma in the file i.e., UTF-16 encoding didn't work)");
                    second_list=readFile(fileA);
                    new_list=readFile(fileB);
                } else {
                    //                  keep the csv as UTF-16 encoded
                }