php-excel-reader - UTF-8 问题

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3666412/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 10:37:51  来源:igfitidea点击:

php-excel-reader - problem with UTF-8

phpexcelunicode

提问by Viktor Stískala

I'm using php-excel-reader2.21 for converting XLS file to CSV. I wrote a simple script to do that, but I have some problems with unicode characters. It does not return values from some cells.

我正在使用php-excel-reader2.21 将 XLS 文件转换为 CSV。我写了一个简单的脚本来做到这一点,但我对 unicode 字符有一些问题。它不会从某些单元格返回值。

For example it doesn't have problems with cell content ceník polo?ekbut have problems with nákup, VYROBCE, PáS, HRUBY,NáKLADNíand some others. In these cells it returns empty value ("").

例如,它在单元格内容方面ceník polo?ek没有问题nákup,但在VYROBCEPáSHRUBYNáKLADNí和其他一些方面有问题。在这些单元格中,它返回空值 ( "")。

Here is the code snippet I use for conversion:

这是我用于转换的代码片段:

<?php    
set_time_limit(120);    
require_once 'excel_reader2.php';    
$data = new Spreadsheet_Excel_Reader("cenik.xls", false, 'UTF-8');    

$f = fopen('file.csv', 'w');    
for($row = 1; $row <= $data->rowcount(); $row++)    
{    
    $out = '';    
    for($col = 1; $col <= $data->colcount(); $col++)    
    {    
        $val = $data->val($row,$col);

        // escape " and \ characters inside the cell    
        $escaped = preg_replace(array('#”#u', '#\\#u', '#[”"]#u'), array('"', '\\\\', '\"'), $val);    
        if(empty($val))    
            $out .= ',';    
        else    
            $out .= '"' . $escaped . '",';    
    }
    // remove last comma (,)    
    fwrite($f, substr($out, 0, -1));    
    fwrite($f, "\n");
}
fclose($f);

?>

Note that the cell and row indexes starts from 1. Any suggestions?

请注意,单元格和行索引从 1 开始。有什么建议吗?

回答by cypher

I hope it's the same problem as I had: In excel_reader2.php on line 1120, replace

我希望它和我遇到的问题一样:在 excel_reader2.php 的第 1120 行,替换

$retstr = ($asciiEncoding) ? $retstr : $this->_encodeUTF16($retstr);

with

$retstr = ($asciiEncoding) ? iconv('cp1250', 'utf-8', $retstr) : $this->_encodeUTF16($retstr);

That should fix it, however I suggest you use a different excel reader, such as PHPExcelto avoid problems like these.
Note that you need iconvextension enabled on the server.

那应该可以解决它,但是我建议您使用不同的 excel 阅读器,例如PHPExcel以避免此类问题。
请注意,您需要iconv在服务器上启用扩展。

回答by thuclh

I has the answer for this problem, use php_excel_reader like common! Add a function to Spreadsheet_Excel_Reader class:

我有这个问题的答案,像普通一样使用 php_excel_reader !向 Spreadsheet_Excel_Reader 类添加一个函数:

function seems_utf8($str) {
        for ($i=0; $i<strlen($str); $i++) {
            if (ord($str[$i]) < 0x80) continue; # 0bbbbbbb
            elseif ((ord($str[$i]) & 0xE0) == 0xC0) $n=1; # 110bbbbb
            elseif ((ord($str[$i]) & 0xF0) == 0xE0) $n=2; # 1110bbbb
            elseif ((ord($str[$i]) & 0xF8) == 0xF0) $n=3; # 11110bbb
            elseif ((ord($str[$i]) & 0xFC) == 0xF8) $n=4; # 111110bb
            elseif ((ord($str[$i]) & 0xFE) == 0xFC) $n=5; # 1111110b
            else return false; # Does not match any model
            for ($j=0; $j<$n; $j++) { # n bytes matching 10bbbbbb follow ?
                if ((++$i == strlen($str)) || ((ord($str[$i]) & 0xC0) != 0x80))
                    return false;
            }
        }
        return true;
    }

And add below line 1120: $retstr = $this->seems_utf8($retstr)?$retstr:utf8_encode($retstr);

并在第 1120 行下面添加: $retstr = $this->seems_utf8($retstr)?$retstr:utf8_encode($retstr);

Finish!

结束!

You can use file php_excel_reader, that i modify! Download here : File excel_reader2.phpUse like common with Original-excel-reader

您可以使用我修改的文件 php_excel_reader!在此处下载: 文件 excel_reader2.phpOriginal-excel-reader 一样使用