database 使用整数列在数据库中存储美国邮政编码是个好主意吗?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/893454/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 07:19:15  来源:igfitidea点击:

Is it a good idea to use an integer column for storing US ZIP codes in a database?

databasedatabase-designtypesstreet-addresspostal-code

提问by Sean Hanley

From first glance, it would appear I have two basic choices for storing ZIP codesin a database table:

乍一看,在数据库表中存储邮政编码似乎有两个基本选择:

  1. Text (probably most common), i.e. char(5)or varchar(9)to support +4 extension
  2. Numeric, i.e. 32-bit integer
  1. 文本(可能是最常见的),即char(5)varchar(9)支持+4 扩展
  2. 数字,即 32 位整数

Both would satisfy the requirements of the data, if we assume that there are no international concerns. In the past we've generally just gone the text route, but I was wondering if anyone does the opposite? Just from brief comparison it looks like the integer method has two clear advantages:

如果我们假设没有国际关注,两者都将满足数据的要求。在过去,我们通常只是走文本路线,但我想知道是否有人反其道而行之?仅从简要比较来看,整数方法有两个明显的优点:

  • It is, by means of its nature, automatically limited to numerics only (whereas without validation the text style could store letters and such which are not, to my knowledge, ever valid in a ZIP code). This doesn'tmean we could/would/should forgo validating user input as normal, though!
  • It takes less space, being 4 bytes (which should be plenty even for 9-digit ZIP codes) instead of 5 or 9 bytes.
  • 就其性质而言,它自动仅限于数字(而在没有验证的情况下,文本样式可以存储字母等,据我所知,在邮政编码中是无效的)。但这并不意味着我们可以/应该/应该放弃像往常一样验证用户输入!
  • 它占用的空间更少,为 4 个字节(即使对于 9 位邮政编码也应该足够)而不是 5 个或 9 个字节。

Also, it seems like it wouldn't hurt display output much. It is trivial to slap a ToString()on a numeric value, use simple string manipulation to insert a hyphen or space or whatever for the +4 extension, and use string formatting to restore leading zeroes.

此外,它似乎不会对显示输出造成太大影响。ToString()在数值上打一个 a 是微不足道的,使用简单的字符串操作来插入连字符或空格或 +4 扩展名的任何内容,并使用字符串格式来恢复前导零。

Is there anything that would discourage using intas a datatype for US-only ZIP codes?

有什么会阻止将其int用作仅限美国邮政编码的数据类型吗?

回答by S.Lott

A numeric ZIP code is -- in a small way -- misleading.

数字邮政编码在某种程度上具有误导性。

Numbers should mean something numeric. ZIP codes don't add or subtract or participate in any numeric operations. 12309 - 12345 does not compute the distance from downtown Schenectady to my neighborhood.

数字应该意味着数字。邮政编码不加减或参与任何数字运算。12309 - 12345 不计算从斯克内克塔迪市中心到我附近的距离。

Granted, for ZIP codes, no one is confused. However, for other number-like fields, it can be confusing.

当然,对于邮政编码,没有人会感到困惑。但是,对于其他类似数字的字段,它可能会令人困惑。

Since ZIP codes aren't numbers -- they just happen to be coded with a restricted alphabet -- I suggest avoiding a numeric field. The 1-byte saving isn't worth much. And I think that that meaningis more important than the byte.

由于邮政编码不是数字——它们只是碰巧用受限制的字母编码——我建议避免使用数字字段。1 字节的节省没有多大价值。我认为这个意义比字节更重要。



Edit.

编辑

"As for leading zeroes..." is my point. Numbers don't have leading zeros. The presence of meaningful leading zeros on ZIP codes is yet another proof that they're not numeric.

“至于前导零......”是我的观点。数字没有前导零。邮政编码上有意义的前导零的存在又一次证明它们不是数字。

回答by Tom

Are you going to ever store non-US postal codes? Canada is 6 characters with some letters. I usually just use a 10 character field. Disk space is cheap, having to rework your data model is not.

您是否打算存储非美国邮政编码?加拿大是 6 个字符加上一些字母。我通常只使用 10 个字符的字段。磁盘空间很便宜,而不必返工您的数据模型则不然。

回答by Mark

Use a string with validation. Zip codes can begin with 0, so numeric is not a suitable type. Also, this applies neatly to international postal codes (e.g. UK, which is up to 8 characters). In the unlikely case that postal codes are a bottleneck, you could limit it to 10 characters, but check out your target formatsfirst.

使用带验证的字符串。邮政编码可以以 0 开头,因此数字不是合适的类型。此外,这也适用于国际邮政编码(例如英国,最多 8 个字符)。万一邮政编码成为瓶颈,您可以将其限制为 10 个字符,但请先检查您的目标格式

Here arevalidation regexes for UK, US and Canada.

以下是英国、美国和加拿大的验证正则表达式。



Yes, you can pad to get the leading zeroes back. However, you're theoretically throwing away information that might help in case of errors. If someone finds 1235 in the database, is that originally 01235, or has another digit been missed?

是的,您可以填充以恢复前导零。但是,从理论上讲,您丢弃了可能在出现错误时提供帮助的信息。如果有人在数据库中找到 1235,那原来是 01235,还是漏掉了另一个数字?

Best practice says you should say what you mean. A zip code is a code, not a number. Are you going to add/subtract/multiply/dividezip codes? And from a practical perspective, it's far more important that you're excluding extended zips.

最佳实践是说你应该说出你的意思。邮政编码是一个代码,而不是一个数字。你要加/减/乘/除邮政编码吗?从实用的角度来看,排除加长拉链更为重要。

回答by TheTXI

Normally you would use a non-numerical datatype such as a varchar which would allow for more zip code types. If you are dead set on only allowing 5 digit [XXXXX] or 9 digit [XXXXX-XXXX] zip codes, you could then use a char(5) or char(10), but I would not recommend it. Varchar is the safest and most sane choice.

通常,您会使用非数字数据类型,例如 varchar,这将允许更多邮政编码类型。如果您死心塌地只允许 5 位 [XXXXX] 或 9 位 [XXXXX-XXXX] 邮政编码,那么您可以使用 char(5) 或 char(10),但我不推荐它。Varchar 是最安全、最明智的选择。

Edit: It should also be noted that if you don't plan on doing numerical calculations on the field, you should not use a numerical data type. ZIP Code is a not a number in the sense that you add or subtract against it. It is just a string that happens to be made up typically of numbers, so you should refrain from using numerical data types for it.

编辑:还应该注意的是,如果您不打算对字段进行数值计算,则不应使用数值数据类型。邮政编码不是一个数字,您可以对其进行加减运算。它只是一个通常由数字组成的字符串,因此您应该避免使用数字数据类型。

回答by BenAlabaster

From a technical standpoint, some points raised here are fairly trivial. I work with address data cleansing on a dailybasis - in particular cleansing address data from all over the world. It's not a trivial task by any stretch of the imagination. When it comes to zip codes, you couldstore them as an integer although it may not be "semantically" correct. The fact is, the data is of a numeric form whether or not, strictly speaking it isconsidered numeric in value.

从技术角度来看,这里提出的一些观点是相当微不足道的。我每天都在处理地址数据清理工作- 特别是清理来自世界各地的地址数据。通过任何想象,这不是一项微不足道的任务。当涉及到邮政编码时,您可以将它们存储为整数,尽管它在“语义上”可能不正确。事实是,数据是否是数字形式,严格来说,它认为是数值型的。

However, the very real drawback of storing them as numeric types is that you'll lose the ability to easily see if the data was entered incorrectly (i.e. has missing values) or if the system removed leading zeros leading to costly operations to validate potentially invalid zip codes that were otherwise correct.

但是,将它们存储为数字类型的真正缺点是,您将无法轻松查看数据是否输入错误(即缺少值),或者系统是否删除了前导零,从而导致成本高昂的操作来验证可能无效否则正确的邮政编码。

It's also very hard to force the user to input correct data if one of the repercussions is a delay of business. Users often don't have the patience to enter correct data if it's not immediately obvious. Using a regex is one way of guaranteeing correct data, however if the user enters a value that doesn't conform and they're displayed an error, they may just omit this value altogether or enter something that conforms but is otherwise incorrect. One example [using Canadian postal codes] is that you often see A0A 0A0 entered which isn't valid but conforms to the regex for Canadian postal codes. More often than not, this is entered by users who are forced to provide a postal code, but they either don't know what it is or don't have all of it correct.

如果影响之一是业务延迟,也很难强迫用户输入正确的数据。如果不是很明显,用户通常没有耐心输入正确的数据。使用正则表达式是保证数据正确的一种方法,但是如果用户输入的值不符合并且显示错误,他们可能会完全忽略该值或输入符合但不正确的内容。[使用加拿大邮政编码] 的一个例子是,您经常看到输入的 A0A 0A0 无效,但符合加拿大邮政编码的正则表达式。通常情况下,这是由被迫提供邮政编码的用户输入的,但他们要么不知道它是什么,要么没有全部正确。

One suggestion is to validate the whole of the entry as a unit validating that the zip code is correct when compared with the rest of the address. If it is incorrect, then offering alternate valid zip codes for the address will make it easier for them to input valid data. Likewise, if the zip code is correct for the street address, but the street number falls outside the domain of that zip code, then offer alternate street numbers for that zip code/street combination.

一个建议是将整个条目作为一个单元进行验证,以验证邮政编码与地址的其余部分相比是否正确。如果不正确,则为地址提供替代的有效邮政编码将使他们更容易输入有效数据。同样,如果街道地址的邮政编码是正确的,但街道号码在该邮政编码的域之外,则为该邮政编码/街道组合提供备用街道号码。

回答by V'rasana Oannes

Unless you have a business requirement to perform mathematical calculations on ZIP code data, there's no point in using an INT. You're over engineering.

除非您有对邮政编码数据执行数学计算的业务需求,否则使用 INT 毫无意义。你已经完成了工程。

Hope this helps,

希望这可以帮助,

Bill

账单

回答by kexx

No, because

没有为什么

  • You never do math functions on zip code
  • Could contain dashes
  • Could start with 0
  • NULL values sometimes interpreted as zero in case of scalar types like integer (e.g. when you export the data somehow)
  • Zip code, even if it's a number, is a designation of an area, meaning this is a name instead of a numeric quantity of anything
  • 你永远不会在邮政编码上做数学函数
  • 可以包含破折号
  • 可以从 0 开始
  • 在标量类型(例如,以某种方式导出数据时)的情况下,NULL 值有时会解释为零
  • 邮政编码,即使它是一个数字,也是一个地区的名称,这意味着这是一个名称而不是任何数字的数量

回答by benc

ZIP code is really a coded namespace, if you think about it. Traditionally digits, but also a hyphen and capital letters:

仔细想想,邮政编码实际上是一个编码的命名空间。传统上是数字,但也有连字符和大写字母:

"10022-SHOE"

“10022-鞋”

http://www.saksfifthavenue.com/main/10022-shoe.jsp

http://www.saksfifthavenue.com/main/10022-shoe.jsp

Realistically, a lot of business applications will not need to support this edge case, even if it is valid.

实际上,许多业务应用程序不需要支持这种边缘情况,即使它是有效的。

回答by therealrodk

I learned recentlythat in Ruby one reason you would want to avoid this is because there are some zip codes that begin with leading zeroes, which–if stored as in integer–will automatically be converted to octal.

最近了解到,在 Ruby 中,您希望避免这种情况的一个原因是,有些邮政编码以前导零开头,如果以整数形式存储,它们将自动转换为八进制。

From the docs:

文档

You can use a special prefix to write numbers in decimal, hexadecimal, octal or binary formats. For decimal numbers use a prefix of 0d, for hexadecimal numbers use a prefix of 0x, for octal numbers use a prefix of 0 or 0o…

您可以使用特殊前缀以十进制、十六进制、八进制或二进制格式写入数字。十进制数使用前缀 0d,十六进制数使用前缀 0x,八进制数使用前缀 0 或 0o...

回答by Steve

If you were to use an integer for US Zips, you would want to multiply the leading part by 10,000 and add the +4. The encoding in the database has nothing to do with input validation. You can always require the input to be valid or not, but the storage is matter of how much you think your requirements or the USPS will change. (Hint: your requirements willchange.)

如果您要为美国邮编使用整数,您需要将前导部分乘以 10,000 并加上 +4。数据库中的编码与输入验证无关。您始终可以要求输入有效与否,但存储取决于您认为您的要求或 USPS 会改变多少。(提示:您的要求改变。)