使用 Oracle、PHP 和 Oci8 处理 eacute 和其他特殊字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2357680/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 20:06:13  来源:igfitidea点击:

Dealing with eacute and other special characters using Oracle, PHP and Oci8

phporacleutf-8character-encodingoci8

提问by ddallala

Hi I am trying to store names into an Oracle database and fetch them back using PHP and oci8.

嗨,我正在尝试将名称存储到 Oracle 数据库中并使用 PHP 和 oci8 取回它们。

However, if I insert the édirectly into the Oracle database and use oci8 to fetch it back I just receive an e

但是,如果我将é直接插入 Oracle 数据库并使用 oci8 取回它,我只会收到一个e

Do I have to encode all special characters (including é) into html entities (ie: é) before inserting into database ... or am I missing something ?

在插入数据库之前,我是否必须将所有特殊字符(包括é)编码为 html 实体(即:é)...或者我是否遗漏了什么?

Thx

谢谢



UPDATE: Mar 1 at 18:40

更新:3 月 1 日 18:40

found this function: http://www.php.net/manual/en/function.utf8-decode.php#85034

发现这个功能:http: //www.php.net/manual/en/function.utf8-decode.php#85034

function charset_decode_utf_8($string) {
    if(@!ereg("[0-7]",$string) && @!ereg("[1-7]",$string)) {
        return $string;
    }
$string = preg_replace("/([0-7])([0-7])([0-7])/e","'&#'.((ord('\1')-224)*4096 + (ord('\2')-128)*64 + (ord('\3')-128)).';'",$string);
$string = preg_replace("/([0-7])([0-7])/e","'&#'.((ord('\1')-192)*64+(ord('\2')-128)).';'",$string);
return $string;
}

seems to work, although not sure if its the optimal solution

似乎有效,虽然不确定它是否是最佳解决方案



UPDATE: Mar 8 at 15:45

更新:3 月 8 日 15:45

Oracle's character set is ISO-8859-1.
in PHP I added:

Oracle 的字符集是 ISO-8859-1。
在 PHP 中我添加了:

putenv("NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P1");

to force the oci8 connection to use that character set. Retrieving the éusing oci8 from PHP now worked ! (for varchars, but not CLOBshad to do utf8_encodeto extract it )
So then I tried saving the data from PHP to Oracle ... and it doesnt work..somewhere along the way from PHP to Oracle the ébecomes a ?

强制 oci8 连接使用该字符集。é从 PHP 中检索using oci8 现在可以工作了!(因为varchars,但CLOBs不必utf8_encode提取它)
所以然后我尝试将数据从 PHP 保存到 Oracle ......但它不起作用......从 PHP 到 Oracle 的é某个地方变成了?



UPDATE: Mar 9 at 14:47

更新:3 月 9 日 14:47

So getting closer. After adding the NLS_LANG variable, doing direct oci8 inserts with éworks.

所以越来越近了。添加 NLS_LANG 变量后,直接进行 oci8 插入é工作。

The problem is actually on the PHP side. By using ExtJs framework, when submitting a form it encodes it using encodeURIComponent.
So éis sent as %C3%A9and then re-encoded into é.
However it's length is now 2(strlen($my_sent_value) = 2)and not 1. And if in PHP I try: $my_sent_value == é= FALSE

问题实际上出在 PHP 方面。通过使用 ExtJs 框架,在提交表单时,它使用encodeURIComponent.
Soé以 as 发送%C3%A9,然后重新编码为é.
但是它的长度现在是2(strlen($my_sent_value) = 2)而不是 1。如果在 PHP 中我尝试: $my_sent_value == é= FALSE

I think if I am able to re-encode all these characters in PHP back into lengths of byte size 1 and then inserting them into Oracle, it should work.

我想如果我能够将 PHP 中的所有这些字符重新编码回字节大小为 1 的长度,然后将它们插入到 Oracle 中,它应该可以工作。

Still no luck though

仍然没有运气



UPDATE: Mar 10 at 11:05

更新:3 月 10 日 11:05

I keep thinking I am so close (yet so far away).

我一直认为我很近(但又很远)。

putenv("NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P9");works very sporadicly.

putenv("NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P9");工作非常零星。

I created a small php script to test:

我创建了一个小的 php 脚本来测试:

header('Content-Type: text/plain; charset=ISO-8859-1');
putenv("NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P9");
$conn= oci_connect("user", "pass", "DB");
$stmt = oci_parse($conn, "UPDATE temp_tb SET string_field = '|é|'");
oci_execute($stmt, OCI_COMMIT_ON_SUCCESS);

After running this once and loggin into the Oracle Database directly I see that STRING_FIELD is set to |?|. Obviously not what I had come to expect from my previous experience.
However, if I refresh that PHP page twice quickly.... it worked !!!
In Oracle I correctly saw |é|.

运行一次并直接登录 Oracle 数据库后,我看到 STRING_FIELD 设置为|?|. 显然不是我从以前的经历中所期望的。
但是,如果我快速刷新该 PHP 页面两次......它起作用了!!!
在 Oracle 中,我正确地看到了|é|.

It seems like maybe the environment variable is not being correctly set or sent in time for the first execution of the script, but is available for the second execution.

似乎环境变量可能没有在第一次执行脚本时正确设置或发送,但可用于第二次执行。

My next experiment is to export the variable into PHP's environment, however, I need to reset Apache for that...so we'll see what happens, hopefully it works.

我的下一个实验是将变量导出到 PHP 的环境中,但是,我需要为此重置 Apache...所以我们会看看会发生什么,希望它可以工作。

采纳答案by ddallala

This is what I finally ended up doing to solve this problem:

这就是我最终解决这个问题的方法:

Modified the profile of the daemon running PHP to have:

修改了运行 PHP 的守护进程的配置文件:

NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P1

So that the oci8 connection uses ISO-8859-1.

以便 oci8 连接使用 ISO-8859-1。

Then in my PHP configuration set the default content-type to ISO-8859-1:

然后在我的 PHP 配置中将默认内容类型设置为 ISO-8859-1:

default_charset = "iso-8859-1"

When I am inserting into an Oracle Table via oci8 from PHP, I do:

当我通过 oci8 从 PHP 插入 Oracle 表时,我会这样做:

utf8_decode($my_sent_value)

And when receiving data from Oracle, printing the variable should just work as so:

当从 Oracle 接收数据时,打印变量应该是这样的:

echo $my_received_value

However when sending that data over ajax I have had to use:

但是,当通过 ajax 发送该数据时,我不得不使用:

utf8_encode($my_received_value)

回答by álvaro González

I presume you are aware of these facts:

我想你知道这些事实:

  • There are many different character sets: you have to pick one and, of course, know which one you are using.
  • Oracle is perfectly capable of storing text without HTML entities (é). HTML entities are used in, well, HTML. Oracle is not a web browser ;-)
  • 有许多不同的字符集:您必须选择一种,当然还要知道您使用的是哪一种。
  • Oracle 完全有能力在没有 HTML 实体的情况下存储文本 ( é)。HTML 实体用于 HTML 中。Oracle 不是 Web 浏览器 ;-)

You must also know that HTML entities are not bind to a specific charset; on the contrary, they're used to represent characters in a charset-independent context.

您还必须知道 HTML 实体不会绑定到特定的字符集;相反,它们用于在与字符集无关的上下文中表示字符。

You indistinctly talk about ISO-8859-1 and UTF-8. What charset do you want to use? ISO-8859-1 is easy to use but it can only store text in some latin languages (such as Spanish) and it lacks some common chars like the symbol. UTF-8 is trickier to use but it can store all characters defined by the Unicode consortium (which include everything you'll ever need).

你含糊不清地谈论 ISO-8859-1 和 UTF-8。你想使用什么字符集?ISO-8859-1 易于使用,但它只能存储某些拉丁语言(如西班牙语)的文本,并且缺少一些常见的字符,如符号。UTF-8 使用起来比较棘手,但它可以存储 Unicode 联盟定义的所有字符(包括您需要的所有字符)。

Once you've taken the decision, you must configure Oracle to hold data in such charset and choose an appropriate column type. E.g., VARCHAR2 is fine for plain ASCII, NVARCHAR2 is good for UTF-8.

一旦您做出决定,您必须配置 Oracle 以在此类字符集中保存数据并选择适当的列类型。例如,VARCHAR2 适用于纯 ASCII,NVARCHAR2 适用于 UTF-8。

回答by Javier Campo

I had to face this problem : the LatinAmerican special characters are stored as "?" or "?" in my Oracle database ... I can't change the NLS_CHARACTER_SET because we're not the database owners.

我不得不面对这个问题:拉丁美洲的特殊字符被存储为“?” 或者 ”?” 在我的 Oracle 数据库中...我无法更改 NLS_CHARACTER_SET 因为我们不是数据库所有者。

So, I found a workaround :

所以,我找到了一个解决方法:

1) ASP.NET code Create a function that converts string to hexadecimal characters:

1) ASP.NET 代码创建一个将字符串转换为十六进制字符的函数:

    public string ConvertirStringAHex(String input)
    {
        Encoding encoding = System.Text.Encoding.GetEncoding("ISO-8859-1");
        Byte[] stringBytes = encoding.GetBytes(input);
        StringBuilder sbBytes = new StringBuilder(stringBytes.Length);
        foreach (byte b in stringBytes)
        {
            sbBytes.AppendFormat("{0:X2}", b);
        }
        return sbBytes.ToString();
    }

2) Apply the function above to the variable you want to encode, like this

2)将上面的函数应用到你要编码的变量上,像这样

     myVariableHex = ConvertirStringZHex( myVariable );

In ORACLE, use the following:

在 ORACLE 中,使用以下内容:

 PROCEDURE STORE_IN_TABLE( iTEXTO IN VARCHAR2 )
 IS
 BEGIN
   INSERT INTO myTable( SPECIAL_TEXT )  
   VALUES ( UTL_RAW.CAST_TO_VARCHAR2(HEXTORAW( iTEXTO ));
   COMMIT;
 END;

Of course, iTEXTO is the Oracle parameter which receives the value of "myVariableHex" from ASP.NET code.

当然,iTEXTO 是 Oracle 参数,它从 ASP.NET 代码接收“myVariableHex”的值。

Hope it helps ... if there's something to improve pls don't hesitate to post your comments.

希望它有所帮助...如果有什么需要改进的,请不要犹豫,发表您的评论。

Sources: http://www.nullskull.com/faq/834/convert-string-to-hex-and-hex-to-string-in-net.aspxhttps://forums.oracle.com/thread/44799

来源:http: //www.nullskull.com/faq/834/convert-string-to-hex-and-hex-to-string-in-net.aspx https://forums.oracle.com/thread/44799

回答by jah

If you really cannot change the character set that oracle will use then how about Base64 encoding your data before storing it in the database. That way, you can accept characters from any character set and store them as ISO-8859-1 (because Base64 will output a subset of the ASCII character set which maps exactly to ISO-8859-1). Base64 encoding will increase the length of the string by, on average, 37%

如果您确实无法更改 oracle 将使用的字符集,那么在将数据存储到数据库之前,如何对数据进行 Base64 编码。这样,您可以接受来自任何字符集的字符并将它们存储为 ISO-8859-1(因为 Base64 将输出 ASCII 字符集的一个子集,该子集与 ISO-8859-1 完全映射)。Base64 编码将使字符串的长度平均增加 37%

If your data is only ever going to be displayed as HTML then you might as well store HTML entities as you suggested, but be aware that a single entity can be up to 10 characters per unencoded character e.g. ϑ is ϑ

如果您的数据只显示为 HTML,那么您不妨按照您的建议存储 HTML 实体,但请注意,单个实体每个未编码字符最多可包含 10 个字符,例如 ϑ 是 ϑ