在 SQL 中选择 CHAR 而不是 VARCHAR 的用例是什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/59667/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 23:21:40  来源:igfitidea点击:

What are the use cases for selecting CHAR over VARCHAR in SQL?

sqlsql-servertsql

提问by SkunkSpinner

I realize that CHAR is recommended if all my values are fixed-width. But, so what? Why not just pick VARCHAR for all text fields just to be safe.

我意识到如果我的所有值都是固定宽度,则建议使用 CHAR。但是,那又怎样?为什么不为所有文本字段选择 VARCHAR 只是为了安全。

回答by Jim McKeeth

Generally pick CHARif all rows will have close to the same length. Pick VARCHARwhen the length variessignificantly. CHAR may also be a bit faster because all the rows are of the same length.

如果所有行的长度接近相同,通常选择CHAR。当长度变化很大时选择VARCHAR。CHAR 也可能快一点,因为所有行的长度都相同。

It varies by DB implementation, but generally VARCHAR uses one or two more bytes of storage (for length or termination) in addition to the actual data. So (assuming you are using a one byte character set) storing the word "FooBar"

它因数据库实现而异,但通常 VARCHAR 除了实际数据之外还使用一或两个以上的存储字节(用于长度或终止)。因此(假设您使用的是一字节字符集)存储单词“FooBar”

  • CHAR(6) = 6 bytes (no overhead)
  • VARCHAR(10) = 8 bytes (2 bytes of overhead)
  • CHAR(10) = 10 bytes (4 bytes of overhead)
  • CHAR(6) = 6 字节(无开销)
  • VARCHAR(10) = 8 字节(2 字节的开销)
  • CHAR(10) = 10 字节(4 字节的开销)

Bottom line is CHARcan be fasterand more space efficientfor data of relatively the same length (within two characters length difference).

底线是CHAR对于长度相对相同的数据(在两个字符长度差异内)可以更快空间效率更高

Note: Microsoft SQL has 2 bytes of overhead for a VARCHAR. This may vary from DB to DB, but generally there is at least 1 byte of overhead needed to indicate length or EOL on a VARCHAR.

注意:Microsoft SQL 对于 VARCHAR 有 2 个字节的开销。这可能因数据库而异,但通常至少需要 1 个字节的开销来指示 VARCHAR 上的长度或 EOL。

As was pointed out by Gaven in the comments, if you are using a multi-byte, variable length character set like UTF8 then CHAR stores the maximum number of bytes necessary to store the number of characters. So if UTF8 needs at most 3 bytes to store a character, then CHAR(6) will be fixed at 18 bytes, even if only storing latin1 characters. So in this case VARCHAR becomes a much better choice.

正如 Gaven 在评论中指出的那样,如果您使用像 UTF8 这样的多字节、可变长度字符集,则 CHAR 存储存储字符数所需的最大字节数。所以如果 UTF8 最多需要 3 个字节来存储一个字符,那么 CHAR(6) 将固定为 18 个字节,即使只存储 latin1 字符。所以在这种情况下 VARCHAR 成为更好的选择。

回答by Ethan Post

If you're working with me and you're working with Oracle, I would probably make you use varcharin almost every circumstance. The assumption that charuses less processing power than varcharmay be true...for now...but database engines get better over time and this sort of general rule has the making of a future "myth".

如果你和我一起工作并且你正在使用 Oracle,我可能会让你varchar在几乎所有情况下使用。char使用较少处理能力的假设varchar可能是真实的......现在......但数据库引擎随着时间的推移变得更好,这种一般规则已经成为未来的“神话”。

Another thing: I have never seen a performance problem because someone decided to go with varchar. You will make much better use of your time writing good code (fewer calls to the database) and efficient SQL (how do indexes work, how does the optimizer make decisions, why is existsfaster than inusually...).

另一件事:我从未见过性能问题,因为有人决定使用varchar. 您将更好地利用时间编写好的代码(对数据库的调用更少)和高效的 SQL(索引如何工作,优化器如何做出决策,为什么existsin通常更快......)。

Final thought: I have seen all sorts of problems with use of CHAR, people looking for '' when they should be looking for ' ', or people looking for 'FOO' when they should be looking for 'FOO (bunch of spaces here)', or people not trimming the trailing blanks, or bugs with Powerbuilder adding up to 2000 blanks to the value it returns from an Oracle procedure.

最后的想法:我已经看到了使用的各种问题CHAR,人们在应该寻找 ' ' 时寻找 '',或者人们在应该寻找 'FOO(这里有一堆空格)'时寻找 'FOO' ,或者人们没有修剪尾随空白,或者 Powerbuilder 将最多 2000 个空白添加到它从 Oracle 过程返回的值中的错误。

回答by Hank Gay

In addition to performance benefits, CHARcan be used to indicate that all values shouldbe the same length, e.g., a column for U.S. state abbreviations.

除了性能优势之外,CHAR还可用于指示所有值的长度相同,例如,用于美国州缩写的列。

回答by Jarrett Meyer

Char is a little bit faster, so if you have a column that you KNOW will be a certain length, use char. For example, storing (M)ale/(F)emale/(U)nknown for gender, or 2 characters for a US state.

Char 的速度要快一些,因此如果您知道某个列有一定长度,请使用 char。例如,为性别存储 (M)ale/(F)emale/(U)nknown,或为美国州存储 2 个字符。

回答by Jeff

Does NChar or Char perform better that their var alternatives?

NChar 或 Char 是否比它们的 var 替代品表现更好?

Great question. The simple answer is yes in certain situations. Let's see if this can be explained.

很好的问题。在某些情况下,简单的答案是肯定的。让我们看看这是否可以解释。

Obviously we all know that if I create a table with a column of varchar(255) (let's call this column myColumn) and insert a million rows but put only a few characters into myColumn for each row, the table will be much smaller (overall number of data pages needed by the storage engine) than if I had created myColumn as char(255). Anytime I do an operation (DML) on that table and request alot of rows, it will be faster when myColumn is varchar because I don't have to movearound all those "extra" spaces at the end. Move, as in when SQL Server does internal sorts such as during a distinct or union operation, or if it chooses a merge during it's query plan, etc. Move could also mean the time it takes to get the data from the server to my local pc or to another computer or wherever it is going to be consumed.

显然我们都知道,如果我创建一个包含 varchar(255) 列(我们称此列为 myColumn)的表并插入一百万行但每行只在 myColumn 中放入几个字符,则该表将小得多(总体而言)存储引擎所需的数据页数)而不是我将 myColumn 创建为 char(255)。每当我对该表执行操作 (DML) 并请求大量行时,当 myColumn 为 varchar 时它会更快,因为我不必在最后移动所有这些“额外”空格。移动,例如当 SQL Server 执行内部排序时,例如在不同或联合操作期间,或者在查询计划期间选择合并等。

But there is some overhead in using varchar. SQL Server has to use a two byte indicator (overhead) to, on each row, to know how many bytes that particular row's myColumn has in it. It's not the extra 2 bytes that presents the problem, it's the having to "decode" the length of the data in myColumn on every row.

但是使用 varchar 有一些开销。SQL Server 必须使用一个两字节的指示符(开销)来在每一行上知道特定行的 myColumn 中有多少字节。出现问题的不是额外的 2 个字节,而是必须“解码”每行 myColumn 中数据的长度。

In my experiences it makes the most sense to use char instead of varchar on columns that will be joined to in queries. For example the primary key of a table, or some other column that will be indexed. CustomerNumber on a demographic table, or CodeID on a decode table, or perhaps OrderNumber on an order table. By using char, the query engine can more quickly perform the join because it can do straight pointer arithmetic (deterministically) rather than having to move it's pointers a variable amount of bytes as it reads the pages. I know I might have lost you on that last sentence. Joins in SQL Server are based around the idea of "predicates." A predicate is a condition. For example myColumn = 1, or OrderNumber < 500.

根据我的经验,在将加入查询的列上使用 char 而不是 varchar 最有意义。例如,表的主键,或将被索引的其他列。人口统计表上的 CustomerNumber,解码表上的 CodeID,或者订单表上的 OrderNumber。通过使用 char,查询引擎可以更快地执行连接,因为它可以执行直接指针算术(确定性地),而不必在读取页面时将其指针移动可变数量的字节。我知道我可能会因为最后一句话而失去你。SQL Server 中的联接基于“谓词”的概念。谓词是一个条件。例如 myColumn = 1,或 OrderNumber < 500。

So if SQL Server is performing a DML statement, and the predicates, or "keys" being joined on are a fixed length (char), the query engine doesn't have to do as much work to match rows from one table to rows from another table. It won't have to find out how long the data is in the row and then walk down the string to find the end. All that takes time.

因此,如果 SQL Server 正在执行 DML 语句,并且连接的谓词或“键”是固定长度(字符),则查询引擎不必做太多的工作来将一个表中的行匹配到来自另一张桌子。它不必找出数据在行中的长度,然后沿着字符串查找结尾。所有这一切都需要时间。

Now bear in mind this can easily be poorly implemented. I have seen char used for primary key fields in online systems. The width must be kept small i.e. char(15) or something reasonable. And it works best in online systems because you are usually only retrieving or upserting a small number of rows, so having to "rtrim" those trailing spaces you'll get in the result set is a trivial task as opposed to having to join millions of rows from one table to millions of rows on another table.

现在请记住,这很容易实施。我见过在线系统中用于主键字段的字符。宽度必须保持小,即 char(15) 或合理的东西。而且它在在线系统中效果最好,因为您通常只检索或插入少量行,因此必须“rtrim”您将在结果集中获得的那些尾随空格是一项微不足道的任务,而不是必须加入数百万行从一个表的行到另一个表上的数百万行。

Another reason CHAR makes sense over varchar on online systems is that it reduces page splits. By using char, you are essentially "reserving" (and wasting) that space so if a user comes along later and puts more data into that column SQL has already allocated space for it and in it goes.

CHAR 在在线系统上优于 varchar 的另一个原因是它减少了页面拆分。通过使用char,您实际上是在“保留”(并浪费)该空间,因此如果用户稍后出现并将更多数据放入该列中,SQL 已经为其分配了空间并继续使用。

Another reason to use CHAR is similar to the second reason. If a programmer or user does a "batch" update to millions of rows, adding some sentence to a note field for example, you won't get a call from your DBA in the middle of the night wondering why their drives are full. In other words, it leads to more predictable growth of the size of a database.

使用 CHAR 的另一个原因类似于第二个原因。如果程序员或用户对数百万行进行“批量”更新,例如在注释字段中添加一些句子,您将不会在半夜接到 DBA 的电话,想知道为什么他们的驱动器已满。换句话说,它会导致数据库大小的增长更可预测。

So those are 3 ways an online (OLTP) system can benefit from char over varchar. I hardly ever use char in a warehouse/analysis/OLAP scenario because usually you have SO much data that all those char columns can add up to lots of wasted space.

所以这些是在线 (OLTP) 系统可以从 char 而非 varchar 中受益的 3 种方式。我几乎从不在仓库/分析/OLAP 场景中使用 char,因为通常你有太多的数据,所有这些 char 列加起来会浪费很多空间。

Keep in mind that char can make your database much larger but most backup tools have data compression so your backups tend to be about the same size as if you had used varchar. For example LiteSpeed or RedGate SQL Backup.

请记住,char 可以使您的数据库更大,但大多数备份工具都具有数据压缩功能,因此您的备份大小往往与使用 varchar 的大小大致相同。例如 LiteSpeed 或 RedGate SQL 备份。

Another use is in views created for exporting data to a fixed width file. Let's say I have to export some data to a flat file to be read by a mainframe. It is fixed width (not delimited). I like to store the data in my "staging" table as varchar (thus consuming less space on my database) and then use a view to CAST everything to it's char equivalent, with the length corresponding to the width of the fixed width for that column. For example:

另一个用途是在为将数据导出到固定宽度文件而创建的视图中。假设我必须将一些数据导出到一个平面文件以供大型机读取。它是固定宽度(未定界)。我喜欢将数据存储在我的“暂存”表中作为 varchar(从而在我的数据库上消耗更少的空间),然后使用视图将所有内容转换为它的等效字符,长度对应于该列的固定宽度的宽度. 例如:

create table tblStagingTable (
pkID BIGINT (IDENTITY,1,1),
CustomerFirstName varchar(30),
CustomerLastName varchar(30),
CustomerCityStateZip varchar(100),
CustomerCurrentBalance money )

insert into tblStagingTable
(CustomerFirstName,CustomerLastName, CustomerCityStateZip) ('Joe','Blow','123 Main St Washington, MD 12345', 123.45)

create view vwStagingTable AS
SELECT CustomerFirstName = CAST(CustomerFirstName as CHAR(30)),
CustomerLastName = CAST(CustomerLastName as CHAR(30)),
CustomerCityStateZip = CAST(CustomerCityStateZip as CHAR(100)),
CustomerCurrentBalance = CAST(CAST(CustomerCurrentBalance as NUMERIC(9,2)) AS CHAR(10))

SELECT * from vwStagingTable

This is cool because internally my data takes up less space because it's using varchar. But when I use DTS or SSIS or even just a cut and paste from SSMS to Notepad, I can use the view and get the right number of trailing spaces. In DTS we used to have a feature called, damn I forget I think it was called "suggest columns" or something. In SSIS you can't do that anymore, you have to tediously define the flat file connection manager. But since you have your view setup, SSIS can know the width of each column and it can save alot of time when building your data flow tasks.

这很酷,因为我的数据在内部占用的空间更少,因为它使用的是 varchar。但是当我使用 DTS 或 SSIS 甚至只是从 SSMS 剪切和粘贴到记事本时,我可以使用视图并获得正确数量的尾随空格。在 DTS 中,我们曾经有一个功能叫做,该死的我忘了我认为它被称为“建议列”或其他东西。在 SSIS 中,您不能再这样做了,您必须繁琐地定义平面文件连接管理器。但是由于您已经设置了视图,SSIS 可以知道每列的宽度,并且可以在构建数据流任务时节省大量时间。

So bottom line... use varchar. There are a very small number of reasons to use char and it's only for performance reasons. If you have a system with hundrends of millions of rows you will see a noticeable difference if the predicates are deterministic (char) but for most systems using char is simply wasting space.

所以底线...使用varchar。使用 char 的原因很少,而且只是出于性能原因。如果您的系统有数以百万计的行,如果谓词是确定性的(char),您会看到明显的差异,但对于大多数使用 char 的系统来说只是浪费空间。

Hope that helps. Jeff

希望有帮助。杰夫

回答by Tony BenBrahim

There are performance benefits, but here is one that has not been mentioned: row migration. With char, you reserve the entire space in advance.So let's says you have a char(1000), and you store 10 characters, you will use up all 1000 charaters of space. In a varchar2(1000), you will only use 10 characters. The problem comes when you modify the data. Let's say you update the column to now contain 900 characters. It is possible that the space to expand the varchar is not available in the current block. In that case, the DB engine must migrate the row to another block, and make a pointer in the original block to the new row in the new block. To read this data, the DB engine will now have to read 2 blocks.
No one can equivocally say that varchar or char are better. There is a space for time tradeoff, and consideration of whether the data will be updated, especially if there is a good chance that it will grow.

有性能优势,但这里有一个没有提到:行迁移。使用char,您可以提前保留整个空间。所以假设您有一个char(1000),并且您存储了10 个字符,那么您将用完所有1000 个字符的空间。在 varchar2(1000) 中,您将只使用 10 个字符。当您修改数据时,问题就来了。假设您将列更新为现在包含 900 个字符。当前块中可能没有用于扩展 varchar 的空间。在这种情况下,数据库引擎必须将该行迁移到另一个块,并将原始块中的指针指向新块中的新行。要读取此数据,数据库引擎现在必须读取 2 个块。
没有人可以模棱两可地说 varchar 或 char 更好。有时间权衡的空间,并考虑是否会更新数据,特别是如果它很有可能会增长。

回答by Bryan Rehbein

There is a difference between early performance optimization and using a best practice type of rule. If you are creating new tables where you will always have a fixed length field, it makes sense to use CHAR, you should be using it in that case. This isn't early optimization, but rather implementing a rule of thumb (or best practice).

早期性能优化和使用最佳实践类型的规则之间存在差异。如果您正在创建新表,其中您将始终具有固定长度的字段,则使用 CHAR 是有意义的,在这种情况下您应该使用它。这不是早期优化,而是实施经验法则(或最佳实践)。

i.e. - If you have a 2 letter state field, use CHAR(2). If you have a field with the actual state names, use VARCHAR.

即 - 如果您有一个 2 个字母的状态字段,请使用 CHAR(2)。如果您有一个包含实际州名的字段,请使用 VARCHAR。

回答by Grzegorz Gierlik

I would choose varchar unless the column stores fixed value like US state code -- which is always 2 chars long and the list of valid US states code doesn't change often :).

我会选择 varchar 除非该列存储固定值,如美国州代码——它总是 2 个字符长,并且有效的美国州代码列表不会经常更改:)。

In every other case, even like storing hashed password (which is fixed length), I would choose varchar.

在所有其他情况下,即使是存储散列密码(固定长度),我也会选择 varchar。

Why -- char type column is always fulfilled with spaces, which makes for column my_columndefined as char(5) with value 'ABC' inside comparation:

为什么——char类型的列总是用空格来填充,这使得列my_column定义为 char(5) 与值 'ABC' 在比较中:

my_column = 'ABC' -- my_column stores 'ABC  ' value which is different then 'ABC'

false.

错误的。

This featurecould lead to many irritating bugs during development and makes testing harder.

功能可能会在开发过程中导致许多恼人的错误,并使测试更加困难。

回答by Scott Duffy

CHAR takes up less storage space than VARCHAR if all your data values in that field are the same length. Now perhaps in 2009 a 800GB database is the same for all intents and purposes as a 810GB if you converted the VARCHARs to CHARs, but for short strings (1 or 2 characters), CHAR is still a industry "best practice" I would say.

如果该字段中的所有数据值的长度相同,则 CHAR 占用的存储空间比 VARCHAR 少。现在也许在 2009 年,如果将 VARCHAR 转换为 CHAR,800GB 的数据库在所有意图和用途上都与 810GB 相同,但对于短字符串(1 或 2 个字符),我会说 CHAR 仍然是行业“最佳实践”。

Now if you look at the wide variety of data types most databases provide even for integers alone (bit, tiny, int, bigint), there ARE reasons to choose one over the other. Simply choosing bigint every time is actually being a bit ignorant of the purposes and uses of the field. If a field simply represents a persons age in years, a bigint is overkill. Now it's not necessarily "wrong", but it's not efficient.

现在,如果您查看大多数数据库提供的各种数据类型,即使仅是整数(bit、tiny、int、bigint),也有理由选择一种而不是另一种。每次都简单地选择bigint,实际上是对字段的用途和用途有点无知。如果一个字段只代表一个人的年龄,那么 bigint 就有点矫枉过正了。现在它不一定“错误”,但效率不高。

But its an interesting argument, and as databases improve over time, it could be argued CHAR vs VARCHAR does get less relevant.

但这是一个有趣的论点,随着数据库随着时间的推移而改进,可以认为 CHAR 与 VARCHAR 确实变得不那么相关了。

回答by Craig

Many people have pointed out that if you know the exact length of the value using CHAR has some benefits. But while storing US states as CHAR(2) is great today, when you get the message from sales that 'We have just made our first sale to Australia', you are in a world of pain. I always send to overestimate how long I think fields will need to be rather than making an 'exact' guess to cover for future events. VARCHAR will give me more flexibility in this area.

许多人指出,如果您知道使用 CHAR 值的确切长度有一些好处。但是,虽然今天将美国各州存储为 CHAR(2) 很不错,但当您从销售中得到“我们刚刚对澳大利亚进行了第一次销售”的消息时,您就会陷入痛苦的世界。我总是高估我认为字段需要多长时间,而不是对未来事件进行“准确”猜测。VARCHAR 将在这方面给我更多的灵活性。