Oracle Text 不适用于 NVARCHAR2。还有什么可能不可用?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4401043/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-10 02:57:25  来源:igfitidea点击:

Oracle Text will not work with NVARCHAR2. What else might be unavailable?

oracleunicodecharacter-encodingnvarchar

提问by Benoit

We are going to migrate an application to have it support Unicode and have to choose between unicode character set for the whole database, or unicode columns stored in N[VAR]CHAR2.

我们将迁移应用程序以使其支持 Unicode,并且必须在整个数据库的 unicode 字符集或存储在 N[VAR]CHAR2 中的 unicode 列之间进行选择。

We know that we will no more have the possibility of indexing column contents with Oracle Text if we choose NVARCHAR2, because Oracle Text can only index columns based on the CHAR type.

我们知道,如果我们选择 NVARCHAR2,我们将不再有可能使用 Oracle Text 索引列内容,因为 Oracle Text 只能基于 CHAR 类型索引列。

Apart that, is it likely that other major differences arise when harvesting from Oracle possibilities?

除此之外,从 Oracle 的可能性中收获时是否可能出现其他主要差异?

Also, is it likely that some new features are added in newer versions of Oracle, but only supporting either CHAR columns or NCHAR columns but not both?

此外,是否有可能在较新版本的 Oracle 中添加了一些新功能,但仅支持 CHAR 列或 NCHAR 列而不支持两者?

Thank you for your answers.

谢谢您的回答。

Note following Justin's answer:

请注意贾斯汀的回答:

Thank you for your answer. I will discuss your points, applied to our case:

谢谢您的回答。我将讨论您的观点,适用于我们的案例:

Our application is usually alone on the Oracle database and takes care of the data itself. Other software that connect to the database are limited to Toad, Tora or SQL developer.

我们的应用程序通常单独存在于 Oracle 数据库中,并自行处理数据。连接到数据库的其他软件仅限于 Toad、Tora 或 SQL 开发人员。

We also use SQL*Loader and SQL*Plus to communicate with the database for basic statements or to upgrade between versions of the product. We have not heard of any specific problem with all those software regarding NVARCHAR2.

我们还使用 SQL*Loader 和 SQL*Plus 与数据库通信以获取基本语句或在产品版本之间进行升级。我们还没有听说过所有关于 NVARCHAR2 的软件有任何具体问题。

We are also not aware that database administrators among our customers would like to use other tools on the database that could not support data on NVARCHAR2 and we are not really concerned whether their tools might disrupt, after all they are skilled in their job and may find other tools if necessary.

我们也不知道我们客户中的数据库管理员想在数据库上使用其他不能支持 NVARCHAR2 数据的工具,我们并不真正担心他们的工具是否会中断,毕竟他们的工作很熟练,可能会发现必要时使用其他工具。

Your last two points are more insightful for our case. We do not use many built-in packages from Oracle but it still happens. We will explore that problem.

你的最后两点对我们的案例更有见地。我们不使用 Oracle 的许多内置包,但它仍然发生。我们将探讨这个问题。

Could we also expect performance breakage if our application (that is compiled under Visual C++), that uses wchar_tto store UTF-16, has to perform encoding conversions on all processed data?

如果wchar_t用于存储 UTF-16的应用程序(在 Visual C++ 下编译)必须对所有处理过的数据执行编码转换,我们是否也能预料到性能会下降?

回答by Justin Cave

If you have anything close to a choice, use a Unicode character set for the entire database. Life in general is just blindingly easier that way.

如果您有任何选择,请为整个数据库使用 Unicode 字符集。一般来说,生活就是这样简单得多。

  • There are plenty of third party utilities and libraries that simply don't support NCHAR/ NVARCHAR2 columns or that don't make working with NCHAR/ NVARCHAR2 columns pleasant. It's extremely annoying, for example, when your shiny new reporting tool can't report on your NVARCHAR2 data.
  • For custom applications, working with NCHAR/ NVARCHAR2 columns requires jumping through some hoops that working with CHAR/ VARCHAR2 Unicode encoded columns does not. In JDBC code, for example, you'd constantly be calling the Statement.setFormOfUse method. Other languages and frameworks will have other gotchas; some will be relatively well documented and minor others will be relatively obscure.
  • Many built-in packages will only accept (or return) a VARCHAR2 rather than a NVARCHAR2. You'll still be able to call them because of implicit conversion but you may end up with character set conversion issues.
  • In general, being able to avoid character set conversion issues within the database and relegating those issues to the edge where the database is actually sending or receiving data from a client makes the job of developing an application much easier. It's enough work to debug character set conversion issues that result from network transmission-- figuring out that some data got corrupted when a stored procedure concatenated data from a VARCHAR2 and a NVARCHAR2 and stored the result in a VARCHAR2 before it was sent over the network can be excruciating.
  • 有很多第三方实用程序和库根本不支持 NCHAR/NVARCHAR2 列,或者不使使用 NCHAR/NVARCHAR2 列变得愉快。例如,当您闪亮的新报告工具无法报告您的 NVARCHAR2 数据时,这非常烦人。
  • 对于自定义应用程序,使用 NCHAR/NVARCHAR2 列需要跳过一些使用 CHAR/VARCHAR2 Unicode 编码列不会的问题。例如,在 JDBC 代码中,您会不断调用 Statement.setFormOfUse 方法。其他语言和框架会有其他问题;有些会被相对完整地记录下来,而其他次要的会相对模糊。
  • 许多内置包只接受(或返回)一个 VARCHAR2 而不是 NVARCHAR2。由于隐式转换,您仍然可以调用它们,但最终可能会遇到字符集转换问题。
  • 一般而言,能够避免数据库中的字符集转换问题并将这些问题归结到数据库实际从客户端发送或接收数据的边缘会使开发应用程序的工作变得更加容易。调试由网络传输引起的字符集转换问题就足够了——当存储过程将来自 VARCHAR2 和 NVARCHAR2 的数据连接起来并将结果存储在 VARCHAR2 中然后通过网络发送时,可以确定某些数据已损坏很难受。

Oracle designed the NCHAR/ NVARCHAR2 data types for cases where you are trying to support legacy applications that don't support Unicode in the same database as new applications that are using Unicode and for cases where it is beneficial to store some Unicode data with a different encoding (i.e. you have a large amount of Japanese data that you would prefer to store using the UTF-16 encoding in a NVARCHAR2 rather than the UTF-8 encoding). If you are not in one of those two situations, and it doesn't sound like you are, I would avoid NCHAR/ NVARCHAR2 at all costs.

Oracle 设计了 ​​NCHAR/NVARCHAR2 数据类型,用于以下情况:您尝试在与使用 Unicode 的新应用程序相同的数据库中支持不支持 Unicode 的遗留应用程序,以及用于将某些 Unicode 数据存储为不同的数据类型的情况。编码(即您有大量的日语数据,您更愿意使用 UTF-16 编码存储在 NVARCHAR2 中,而不是使用 UTF-8 编码)。如果您不是这两种情况之一,并且听起来不像您,我会不惜一切代价避免 NCHAR/NVARCHAR2。

Responding to your followups

回应你的跟进

Our application is usually alone on the Oracle database and takes care of the data itself. Other software that connect to the database are limited to Toad, Tora or SQL developer.

我们的应用程序通常单独存在于 Oracle 数据库中,并自行处理数据。连接到数据库的其他软件仅限于 Toad、Tora 或 SQL 开发人员。

What do you mean "takes care of the data itself"? I'm hoping you're not saying that you've configured your application to bypass Oracle's character set conversion routines and that you do all the character set conversion yourself.

你是什​​么意思“照顾数据本身”?我希望您不是说您已将应用程序配置为绕过 Oracle 的字符集转换例程,而是您自己完成所有字符集转换。

I'm also assuming that you are using some sort of API/ library to access the database even if that is OCI. Have you looked into what changes you'll need to make to your application to support NCHAR/ NVARCHAR2 and whether the API you're using supports NCHAR/ NVARCHAR2? The fact that you're getting Unicode data in C++ doesn't actually indicate that you won't need to make (potentially significant) changes to support NCHAR/ NVARCHAR2 columns.

我还假设您正在使用某种 API/库来访问数据库,即使它是 OCI。您是否研究过需要对应用程序进行哪些更改以支持 NCHAR/NVARCHAR2 以及您使用的 API 是否支持 NCHAR/NVARCHAR2?您在 C++ 中获取 Unicode 数据这一事实实际上并不表示您不需要进行(可能很重要的)更改来支持 NCHAR/NVARCHAR2 列。

We also use SQL*Loader and SQL*Plus to communicate with the database for basic statements or to upgrade between versions of the product. We have not heard of any specific problem with all those software regarding NVARCHAR2.

我们还使用 SQL*Loader 和 SQL*Plus 与数据库通信以获取基本语句或在产品版本之间进行升级。我们还没有听说过所有关于 NVARCHAR2 的软件有任何具体问题。

Those applications all work with NCHAR/ NVARCHAR2. NCHAR/ NVARCHAR2 introduce some additional complexities into scripts particularly if you are trying to encode string constants that are not representable in the database character set. You can certainly work around the issues, though.

这些应用程序都使用 NCHAR/NVARCHAR2。NCHAR/NVARCHAR2 向脚本中引入了一些额外的复杂性,特别是当您尝试对无法在数据库字符集中表示的字符串常量进行编码时。不过,您当然可以解决这些问题。

We are also not aware that database administrators among our customers would like to use other tools on the database that could not support data on NVARCHAR2 and we are not really concerned whether their tools might disrupt, after all they are skilled in their job and may find other tools if necessary.

我们也不知道我们客户中的数据库管理员想在数据库上使用其他不能支持 NVARCHAR2 数据的工具,我们并不真正担心他们的工具是否会中断,毕竟他们的工作很熟练,可能会发现必要时使用其他工具。

While I'm sure that your customers can find alternate ways of working with your data, if your application doesn't play nicely with their enterprise reporting tool or their enterprise ETL tool or whatever desktop tools they happen to be experienced with, it's very likely that the customer will blame your application rather than their tools. It probably won't be a show stopper, but there is also no benefit to causing customers grief unnecessarily. That may not drive them to use a competitor's product, but it won't make them eager to embrace your product.

虽然我确信您的客户可以找到处理您的数据的替代方法,但如果您的应用程序不能很好地与他们的企业报告工具或他们的企业 ETL 工具或他们碰巧使用过的任何桌面工具配合使用,则很有可能客户会责怪您的应用程序而不是他们的工具。它可能不会成为节目的终结者,但让客户不必要地悲伤也没有任何好处。这可能不会促使他们使用竞争对手的产品,但不会让他们渴望接受您的产品。

Could we also expect performance breakage if our application (that is compiled under Visual C++), that uses wchar_t to store UTF-16, has to perform encoding conversions on all processed data?

如果使用 wchar_t 存储 UTF-16 的应用程序(在 Visual C++ 下编译)必须对所有处理过的数据执行编码转换,我们是否也能预料到性能会下降?

I'm not sure what "conversions" you're talking about. This may get back to my initial question about whether you're stating that you are bypassing Oracle's NLS layer to do character set conversion on your own.

我不确定你在说什么“转换”。这可能会回到我最初的问题,即您是否在说明您正在绕过 Oracle 的 NLS 层自行进行字符集转换。

My bottom line, though, is that I don't see any advantages to using NCHAR/ NVARCHAR2 given what you're describing. There are plenty of potential downsides to using them. Even if you can eliminate 99% of the downsides as irrelevant to your particular needs, however, you're still facing a situation where at best it's a wash between the two approaches. Given that, I'd much rather go with the approach that maximizes flexibility going forward, and that's converting the entire database to Unicode (AL32UTF8 presumably) and just using that.

不过,我的底线是,鉴于您所描述的内容,我认为使用 NCHAR/NVARCHAR2 没有任何优势。使用它们有很多潜在的缺点。即使您可以消除 99% 的与您的特定需求无关的缺点,但是,您仍然面临这样一种情况,即充其量只能在两种方法之间进行清洗。鉴于此,我更愿意采用最大限度提高灵活性的方法,即将整个数据库转换为 Unicode(大概是 AL32UTF8)并使用它。