如何使用 UTF-8 字符串在 PHP 中使用文件系统函数?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1525830/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 02:56:40  来源:igfitidea点击:

How do I use filesystem functions in PHP, using UTF-8 strings?

phputf-8directoryfilesystemsmkdir

提问by Acacio Nerull

I can't use mkdirto create folders with UTF-8 characters:

我无法使用mkdirUTF-8 字符创建文件夹:

<?php
$dir_name = "Depósito";
mkdir($dir_name);
?>

when I browse this folder in Windows Explorer, the folder name looks like this:

当我在 Windows 资源管理器中浏览此文件夹时,文件夹名称如下所示:

Dep?3sito

What should I do?

我该怎么办?

I'm using php5

我正在使用 php5

采纳答案by Steve Clay

Just urlencodethe string desired as a filename.Allcharacters returned from urlencodeare valid in filenames (NTFS/HFS/UNIX), then you can just urldecodethe filenames back to UTF-8 (or whatever encoding they were in).

只是urlencode需要作为文件名的字符串。从返回的所有字符urlencode在文件名 (NTFS/HFS/UNIX) 中都是有效的,然后您可以urldecode将文件名恢复为 UTF-8(或它们使用的任何编码)。

Caveats (all apply to the solutions below as well):

注意事项(均适用于以下解决方案):

  • After url-encoding, the filename must be less that 255 characters (probably bytes).
  • UTF-8 has multiple representationsfor many characters (using combining characters). If you don't normalize your UTF-8, you may have trouble searching with globor reopening an individual file.
  • You can't rely on scandiror similar functions for alpha-sorting. You must urldecodethe filenames then use a sorting algorithm aware of UTF-8 (and collations).
  • url 编码后,文件名必须少于 255 个字符(可能是字节)。
  • UTF-8对许多字符有多种表示(使用组合字符)。如果您不规范化 UTF-8,则可能无法搜索glob或重新打开单个文件。
  • 您不能依赖scandir或类似的功能进行 alpha 排序。您必须urldecode使用文件名然后使用识别 UTF-8(和排序规则)的排序算法。

Worse Solutions

更糟糕的解决方案

The following are less attractive solutions, more complicated and with more caveats.

以下是不太吸引人的解决方案,但更复杂,并且有更多的注意事项。

On Windows, the PHP filesystem wrapper expects and returns ISO-8859-1 strings for file/directory names. This gives you two choices:

在 Windows 上,PHP 文件系统包装器期望并返回文件/目录名称的 ISO-8859-1 字符串。这给了你两个选择:

  1. Use UTF-8 freely in your filenames, but understand that non-ASCII characters will appear incorrectoutside PHP. A non-ASCII UTF-8 char will be stored as multiple singleISO-8859-1 characters. E.g. ówill be appear as ?3in Windows Explorer.

  2. Limit your file/directory names to characters representable in ISO-8859-1. In practice, you'll pass your UTF-8 strings through utf8_decodebefore using them in filesystem functions, and pass the entries scandirgives you through utf8_encodeto get the original filenames in UTF-8.

  1. 在您的文件名中自由使用 UTF-8,但要了解非 ASCII 字符在 PHP 之外会显示不正确。非 ASCII UTF-8 字符将存储为多个单个ISO-8859-1 字符。例如,ó将出现?3在 Windows 资源管理器中。

  2. 将您的文件/目录名称限制为可在 ISO-8859-1 中表示的字符。在实践中,您将utf8_decode在文件系统函数中使用UTF-8 字符串之前传递它们,并传递scandir提供给您的条目utf8_encode以获取 UTF-8 中的原始文件名。

Caveats galore!

警告一应俱全!

  • If any bytepassed to a filesystem function matches an invalid Windows filesystem characterin ISO-8859-1, you're out of luck.
  • Windows mayuse an encoding other than ISO-8859-1 in non-English locales. I'd guess it will usually be one of ISO-8859-#, but this means you'll need to use mb_convert_encodinginstead of utf8_decode.
  • 如果传递给文件系统函数的任何字节与ISO-8859-1中的无效 Windows 文件系统字符匹配,那么您就不走运了。
  • Windows可能会在非英语语言环境中使用 ISO-8859-1 以外的编码。我猜这通常会是ISO-8859-#之一,但是这意味着你将需要使用mb_convert_encoding代替utf8_decode

This nightmare is why you should probably just transliterateto create filenames.

这个噩梦就是为什么你应该音译来创建文件名。

回答by Umberto Salsi

Under Unix and Linux (and possibly under OS X too), the current file system encoding is given by the LC_CTYPElocale parameter (see function setlocale()). For example, it may evaluate to something like en_US.UTF-8that means the encoding is UTF-8. Then file names and their paths can be created with fopen()or retrieved by dir()with this encoding.

在 Unix 和 Linux(也可能在 OS X 下)下,当前文件系统编码由LC_CTYPElocale 参数给出(请参阅函数setlocale())。例如,它可能评估为类似的东西en_US.UTF-8,这意味着编码是 UTF-8。然后可以使用此编码创建fopen()或检索文件名及其路径dir()

Under Windows, PHP operates as a "non-Unicode aware program", then file names are converted back and forth from the UTF-16 used by the file system (Windows 2000 and later) to the selected "code page". The control panel "Regional and Language Options", tab panel "Formats" sets the code page retrieved by the LC_CTYPEoption, while the "Administrative -> Language for non-Unicode Programs" sets the translation code page for file names. In western countries the LC_CTYPEparameter evaluates to something like language_country.1252where 1252 is the code page, also known as "Windows-1252 encoding" which is similar (but not exactly equal) to ISO-8859-1. In Japan the 932 code page is usually set instead, and so on for other countries. Under PHP you may create files whose name can be expressed with the current code page. Vice-versa, file names and paths retrieved from the file system are converted from UTF-16 to bytes using the "best-fit" current code page.

在 Windows 下,PHP 作为“非 Unicode 感知程序”运行,然后文件名从文件系统(Windows 2000 及更高版本)使用的 UTF-16 来回转换为选定的“代码页”。控制面板“区域和语言选项”,选项卡面板“格式”设置LC_CTYPE选项检索的代码页,而“管理 -> 非 Unicode 程序的语言”设置文件名的翻译代码页。在西方国家,该LC_CTYPE参数评估为类似language_country.1252其中 1252 是代码页,也称为“Windows-1252 编码”,与 ISO-8859-1 类似(但不完全相同)。在日本,通常设置 932 代码页,其他国家也如此。在 PHP 下,您可以创建名称可以用当前代码页表示的文件。反之亦然,从文件系统检索的文件名和路径使用“最适合”的当前代码页从 UTF-16 转换为字节。

This mapping is approximated, so some characters might be mangled in an unpredictable way. For example, Caffé Brillì.txtwould be returned by dir()as the PHP string Caff\xE9 Brill\xEC.txtas expected if the current code page is 1252, while it would return the approximate Caffe Brilli.txton a Japanese system because accented vowels are missing from the 932 code page and then replaced with their "best-fit" non-accented vowels. Characters that cannot be translated at all are retrieved as ?(question mark). In general, under Windows there is no safe way to detect such artifacts.

此映射是近似的,因此某些字符可能会以不可预测的方式被破坏。例如,Caffé Brillì.txt将被返回dir()的PHP字符串Caff\xE9 Brill\xEC.txt如预期如果当前的代码页是1252,而它会返回近似Caffe Brilli.txt日语系统上,因为重音元音是从932代码页丢失,然后用自己的“最适合取代" 非重音元音。根本无法翻译的字符检索为?(问号)。通常,在 Windows 下没有安全的方法来检测此类工件。

More details are available in my reply to the PHP bug no. 47096.

我对PHP 错误号的回复中提供了更多详细信息47096

回答by Anatol Belski

PHP 7.1 supports UTF-8 filenames on Windows disregarding the OEM codepage.

PHP 7.1 在 Windows 上支持 UTF-8 文件名,不考虑 OEM 代码页。

回答by Lars D

The problem is that Windows uses utf-16 for filesystem strings, whereas Linux and others use different character sets, but often utf-8. You provided a utf-8 string, but this is interpreted as another 8-bit character set encoding in Windows, maybe Latin-1, and then the non-ascii character, which is encoded with 2 bytes in utf-8, is handled as if it was 2 characters in Windows.

问题是 Windows 对文件系统字符串使用 utf-16,而 Linux 和其他人使用不同的字符集,但通常使用 utf-8。您提供了一个 utf-8 字符串,但这在 Windows 中被解释为另一个 8 位字符集编码,可能是 Latin-1,然后用 utf-8 中的 2 个字节编码的非 ascii 字符被处理为如果它是 Windows 中的 2 个字符。

A normal solution is to keep your source code 100% in ascii, and to have strings somewhere else.

正常的解决方案是将源代码 100% 保留为 ASCII,并在其他地方使用字符串。

回答by Nicolas Grekas

Using the com_dotnetPHP extension, you can access Windows' Scripting.FileSystemObject, and then do everything you want with UTF-8 files/folders names.

使用com_dotnetPHP 扩展,您可以访问 Windows' Scripting.FileSystemObject,然后使用 UTF-8 文件/文件夹名称执行您想要的任何操作。

I packaged this as a PHP stream wrapper, so it's very easy to use :

我将其打包为 PHP 流包装器,因此非常易于使用:

https://github.com/nicolas-grekas/Patchwork-UTF8/blob/lab-windows-fs/class/Patchwork/Utf8/WinFsStreamWrapper.php

https://github.com/nicolas-grekas/Patchwork-UTF8/blob/lab-windows-fs/class/Patchwork/Utf8/WinFsStreamWrapper.php

First verify that the com_dotnetextension is enabled in your php.inithen enable the wrapper with:

首先验证com_dotnet您的扩展程序是否已启用,php.ini然后使用以下命令启用包装器:

stream_wrapper_register('win', 'Patchwork\Utf8\WinFsStreamWrapper');

Finally, use the functions you're used to (mkdir, fopen, rename, etc.), but prefix your path with win://

最后,使用您习惯的函数(mkdir、fopen、rename 等),但在路径前加上前缀 win://

For example:

例如:

<?php
$dir_name = "Depósito";
mkdir('win://' . $dir_name );
?>

回答by Oleg

You could use this extension to solve your issue: https://github.com/kenjiuno/php-wfio

您可以使用此扩展来解决您的问题:https: //github.com/kenjiuno/php-wfio

$file = fopen("wfio://多国語.txt", "rb"); // in UTF-8
....
fclose($file);

回答by RafaSashi

My set of tools to use filesystem with UTF-8 on windows ORlinux via PHPand compatible with .htaccesscheck file exists:

我的一组工具在 windowslinux上使用带有 UTF-8 的文件系统,PHP并与.htaccess检查文件兼容:

function define_cur_os(){

    //$cur_os=strtolower(php_uname());

    $cur_os=strtolower(PHP_OS);

    if(substr($cur_os, 0, 3) === 'win'){

        $cur_os='windows';

    }

    define('CUR_OS',$cur_os);

}

function filesystem_encode($file_name=''){

    $file_name=urldecode($file_name);

    if(CUR_OS=='windows'){

        $file_name=iconv("UTF-8", "ISO-8859-1//TRANSLIT", $file_name);

    }     

    return $file_name;

}

function custom_mkdir($dir_path='', $chmod=0755){

    $dir_path=filesystem_encode($dir_path);

    if(!is_dir($dir_path)){

        if(!mkdir($dir_path, $chmod, true)){

            //handle mkdir error

        }
    }
    return $dir_path;
}

function custom_fopen($dir_path='', $file_name='', $mode='w'){

    if($dir_path!='' && $file_name!=''){

        $dir_path=custom_mkdir($dir_path);

        $file_name=filesystem_encode($file_name);

        return fopen($dir_path.$file_name, $mode);

    }

    return false;

}

function custom_file_exists($file_path=''){

    $file_path=filesystem_encode($file_path);

    return file_exists($file_path);

}

function custom_file_get_contents($file_path=''){

    $file_path=filesystem_encode($file_path);

    return file_get_contents($file_path);

}

Additional resources

其他资源

回答by Yesterday

I don't need to write much, it works well:

我不需要写太多,它运行良好:

<?php
$dir_name = mb_convert_encoding("Depósito", "ISO-8859-1", "UTF-8");
mkdir($dir_name);
?>

回答by TomoMiha

Try CodeIgniter Text helper from this linkRead about convert_accented_characters() function, it can be costumised

这个链接尝试 CodeIgniter Text helper 阅读关于 convert_accented_characters() 函数,它可以被服装化