php 导入在实际字段中具有换行符的 CSV

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5470991/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 21:32:38  来源:igfitidea点击:

Importing CSV that has line breaks within the actual fields

phpcsvimportline-breaks

提问by Horse

I am using PHP to import a CSV file, which originates from an excel spreadsheet. Some of the fields contain line breaks in them, so when I reopen the csv in excel / open office spreadsheet, it misinterprets where the line breaks should happen.

我正在使用 PHP 导入来自 Excel 电子表格的 CSV 文件。某些字段中包含换行符,因此当我在 excel/open office 电子表格中重新打开 csv 时,它会误解应该发生换行符的位置。

Also in my script, using fgetcsv to go through each line, it is incorrectly line breaking where it shouldn't be.

同样在我的脚本中,使用 fgetcsv 遍历每一行,它在不应该出现的地方错误地换行。

I could manually cleanse the data but a) that would take ages as its a 10k line file, and b) the data is exported from a clients existing piece of software

我可以手动清理数据,但是 a) 作为一个 10k 行的文件需要很长时间,并且 b) 数据是从客户现有的软件中导出的

Any ideas on how to automatically solve this on the import process? I would have thought delimiting the fields would have sorted it but it does not.

关于如何在导入过程中自动解决这个问题的任何想法?我原以为分隔字段会对它进行排序,但事实并非如此。

采纳答案by MacGucky

I had that problem too and did not find an way to read the data correctly.

我也有这个问题,没有找到正确读取数据的方法。

In my case it was an one-time-import, so i made an script that searched for all line-breaks within an column and replaced it with something like #####. Then I imported the data and replaced that by linebreaks.

就我而言,它是一次性导入,因此我制作了一个脚本,用于搜索列中的所有换行符并将其替换为#####. 然后我导入了数据并用换行符替换了它。

If you need an regular import you could write you own CSV-Parser, that handles the problem. If the text-columns are within ""you could treat everything between two ""as one columns (with check for escaped "within the content).

如果您需要定期导入,您可以编写自己的 CSV-Parser 来处理问题。如果文本列在其中,""您可以将两""列之间的所有内容视为一列(检查"内容中是否存在转义)。

回答by danieltalsky

The accepted answer didn't solve the problem for me, but I eventually found this CSV parser library on google code that works well for multiline fields in CSV's.

接受的答案并没有解决我的问题,但我最终在谷歌代码上找到了这个 CSV 解析器库,它适用于 CSV 中的多行字段。

parsecsv-for-php:
https://github.com/parsecsv/parsecsv-for-php

parsecsv-for-php:
https://github.com/parsecsv/parsecsv-for-php



For historical purposes, the original project home was:
http://code.google.com/p/parsecsv-for-php/

出于历史目的,原始项目主页是:http:
//code.google.com/p/parsecsv-for-php/

回答by Mike Wilding

My solution is the following:

我的解决方案如下:

nl2br(string);

http://php.net/manual/en/function.nl2br.php

http://php.net/manual/en/function.nl2br.php

Once you get to the individual cell (string) level, run it on the string and it will convert the linebreaks to html breaks for you.

一旦到达单个单元格(字符串)级别,在字符串上运行它,它将为您将换行符转换为 html 中断。

回答by V. H?gman

It's an old thread but i encountered this problem and i solved it with a regex so you can avoid a library just for that. Here the code is in PHP but it can be adapted to other language.

这是一个旧线程,但我遇到了这个问题,我用正则表达式解决了它,这样你就可以避免为此而使用库。这里的代码是用 PHP 编写的,但它可以适应其他语言。

$parsedCSV = preg_replace('/(,|\n|^)"(?:([^\n"]*)\n([^\n"]*))*"/', '$1"$2 $3"', $parsedCSV);

$parsedCSV = preg_replace('/(,|\n|^)"(?:([^\n"]*)\n([^\n"]*))*"/', '$1"$2 $3"', $parsedCSV);

This solutions supposes the fields containing a linebreak are enclosed by double quotes, which seems to be a valid assumption, at least for what i have seen so far. Also, the double quotes should follow a ,or be placed at the start of a new line (or first line).

这个解决方案假设包含换行符的字段用双引号括起来,这似乎是一个有效的假设,至少对于我目前所看到的。另外,双引号应该跟在 a 之后,或放在新行(或第一行)的开头。

Example:

例子:

field1,"field2-part1\nfield2-part2",field3

field1,"field2-part1\nfield2-part2",field3

Here the \n is replaced by a whitespace so the result would be:

这里的 \n 被一个空格替换,所以结果是:

field1,"field2-part1 field2-part2",field3

field1,"field2-part1 field2-part2",field3

The regex should handle multiple linebreaks as well.

正则表达式也应该处理多个换行符。

This might not be efficient if the content is too large, but it can help for many cases and the idea can be reused, maybe optimized by doing this for smaller chunks (but you'd need to handle the cuts with fix-sized buffered).

如果内容太大,这可能效率不高,但它可以在许多情况下提供帮助,并且可以重复使用该想法,可以通过为较小的块执行此操作进行优化(但您需要使用固定大小的缓冲处理剪切) .

回答by ghispi

Although it is old question the answer might be still relevant to ppl. There is currently new library (framework independent) http://csv.thephpleague.com/which supports NL chars in fields as well as some filtering.

尽管这是个老问题,但答案可能仍然与 ppl 相关。目前有一个新的库(独立于框架)http://csv.thephpleague.com/,它支持字段中的 NL 字符以及一些过滤。

回答by Aditya P Bhatt

Yes you needs to find that comma and replace by some special characters like combination of {()}and finally replace them with ,that you are originally looking for.

是的,您需要找到该逗号并替换为一些特殊字符,如组合,{()}最后将它们替换为,您最初要查找的字符。

Hope that helps you.

希望能帮到你。