postgresql 如何更改 postgres 数据库的字符编码?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5090858/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-10 22:50:31  来源:igfitidea点击:

How do you change the character encoding of a postgres database?

postgresqlunicode

提问by Nick Retallack

I have a database that was set up with the default character set SQL_ASCII. I want to switch it to UNICODE. Is there an easy way to do that?

我有一个使用默认字符集 SQL_ASCII 设置的数据库。我想将其切换为 UNICODE。有没有简单的方法来做到这一点?

采纳答案by Daniel Kutik

To change the encoding of your database:

要更改数据库的编码:

  1. Dump your database
  2. Drop your database,
  3. Create new database with the different encoding
  4. Reload your data.
  1. 转储你的数据库
  2. 删除你的数据库,
  3. 使用不同的编码创建新数据库
  4. 重新加载您的数据。

Make sure the client encoding is set correctly during all this.

确保在所有这些过程中正确设置了客户端编码。

Source: http://archives.postgresql.org/pgsql-novice/2006-03/msg00210.php

来源:http: //archives.postgresql.org/pgsql-novice/2006-03/msg00210.php

回答by araqnid

First off, Daniel's answer is the correct, safe option.

首先,丹尼尔的答案是正确、安全的选择。

For the specific case of changing from SQL_ASCII to something else, you can cheat and simply poke the pg_database catalogue to reassign the database encoding. This assumes you've already stored any non-ASCII characters in the expected encoding (or that you simply haven't used any non-ASCII characters).

对于从 SQL_ASCII 更改为其他内容的特定情况,您可以欺骗并简单地戳 pg_database 目录以重新分配数据库编码。这假设您已经以预期的编码存储了任何非 ASCII 字符(或者您根本没有使用任何非 ASCII 字符)。

Then you can do:

然后你可以这样做:

update pg_database set encoding = pg_char_to_encoding('UTF8') where datname = 'thedb'

This will not change the collation of the database, just how the encoded bytes are converted into characters (so now length('£123')will return 4 instead of 5). If the database uses 'C' collation, there should be no change to ordering for ASCII strings. You'll likely need to rebuild any indices containing non-ASCII characters though.

这不会改变数据库的排序规则,只会改变编码字节转换为字符的方式(所以现在length('£123')将返回 4 而不是 5)。如果数据库使用“C”排序规则,则 ASCII 字符串的排序应该不会发生变化。不过,您可能需要重建任何包含非 ASCII 字符的索引。

Caveat emptor. Dumping and reloading provides a way to check your database content is actually in the encoding you expect, and this doesn't. And if it turns out you did have some wrongly-encoded data in the database, rescuing is going to be difficult. So if you possibly can, dump and reinitialise.

买者自负。转储和重新加载提供了一种方法来检查您的数据库内容实际上是您期望的编码,而事实并非如此。如果事实证明您在数据库中确实有一些错误编码的数据,那么救援将很困难。因此,如果可能,请转储并重新初始化。

回答by Douglas

Dumping a database with a specific encoding and try to restore it on another database with a different encoding could result in data corruption. Data encoding must be set BEFORE any data is inserted into the database.

转储具有特定编码的数据库并尝试在具有不同编码的另一个数据库上恢复它可能会导致数据损坏。必须在将任何数据插入数据库之前设置数据编码。

Check this: When copying any other database, the encoding and locale settings cannot be changed from those of the source database, because that might result in corrupt data.

检查这一点复制任何其他数据库时,无法更改源数据库的编码和区域设置,因为这可能会导致数据损坏。

And this: Some locale categories must have their values fixed when the database is created. You can use different settings for different databases, but once a database is created, you cannot change them for that database anymore.LC_COLLATE and LC_CTYPE are these categories. They affect the sort order of indexes, so they must be kept fixed, or indexes on text columns would become corrupt.(But you can alleviate this restriction using collations, as discussed in Section 22.2.) The default values for these categories are determined when initdb is run, and those values are used when new databases are created, unless specified otherwise in the CREATE DATABASE command.

一点在创建数据库时,某些语言环境类别必须固定其值。您可以为不同的数据库使用不同的设置,但是一旦创建了一个数据库,您就不能再为该数据库更改它们。LC_COLLATE 和 LC_CTYPE 是这些类别。它们会影响索引的排序顺序,因此必须保持固定,否则文本列上的索引会损坏。但是您可以使用排序规则来缓解这种限制,如第 22.2 节所述。这些类别的默认值在运行 initdb 时确定,并且在创建新数据库时使用这些值,除非在 CREATE DATABASE 命令中另有指定。



I would rather rebuild everything from the begining properly with a correct local encoding on your debian OS as explained here:

我宁愿从作为解释正确与正确的本地编码在您的Debian操作系统begining重建一切在这里

su root

Reconfigure your local settings :

重新配置您的本地设置:

dpkg-reconfigure locales

Choose your locale (like for instance for french in Switzerland : fr_CH.UTF8)

选择您的语言环境(例如瑞士的法语:fr_CH.UTF8)

Uninstall and clean properly postgresql :

正确卸载并清理 postgresql :

apt-get --purge remove postgresql\*
rm -r /etc/postgresql/
rm -r /etc/postgresql-common/
rm -r /var/lib/postgresql/
userdel -r postgres
groupdel postgres

Re-install postgresql :

重新安装 postgresql :

aptitude install postgresql-9.1 postgresql-contrib-9.1 postgresql-doc-9.1

Now any new database will be automatically be created with correct encoding, LC_TYPE (character classification), and LC_COLLATE (string sort order).

现在,任何新数据库都将使用正确的编码、LC_TYPE(字符分类)和 LC_COLLATE(字符串排序顺序)自动创建。

回答by Sergey Zarubin

Daniel Kutik's answer is correct, but it can be even more safe, with database renaming.

Daniel Kutik 的回答是正确的,但通过将数据库重命名为.

So, the truly safe way is:

所以,真正安全的方法是:

  1. Create new database with the different encoding and name
  2. Dump your database
  3. Restore dump to the new DB
  4. Test that your application runs correctly with the new DB
  5. Rename old DB to something meaningful
  6. Rename new DB
  7. Test application again
  8. Drop the old database
  1. 使用不同的编码和名称创建新数据库
  2. 转储你的数据库
  3. 将转储还原到新数据库
  4. 测试您的应用程序是否使用新数据库正确运行
  5. 将旧数据库重命名为有意义的名称
  6. 重命名新数据库
  7. 再次测试应用
  8. 删除旧数据库

In case of emergency, just rename DBs back

在紧急情况下,只需将数据库重命名回

回答by rubo77

# dump into file
pg_dump myDB > /tmp/myDB.sql

# create an empty db with the right encoding (on older versions the escaped single quotes are needed!)
psql -c 'CREATE DATABASE "tempDB" WITH OWNER = "myself" LC_COLLATE = '\''de_DE.utf8'\'' TEMPLATE template0;'

# import in the new DB
psql -d tempDB -1 -f /tmp/myDB.sql

# rename databases
psql -c 'ALTER DATABASE "myDB" RENAME TO "myDB_wrong_encoding";' 
psql -c 'ALTER DATABASE "tempDB" RENAME TO "myDB";'

# see the result
psql myDB -c "SHOW LC_COLLATE"