postgresql 是否可以在 Postgres 中存储 1 个字节的数字?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3090138/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Is it possible to store a 1 byte number in Postgres?
提问by punkish
I have 7 8-bit integer values per record that I want to store in Postgres. Pg doesn't offer a single byte integer type, SMALLINT, or 2 bytes, being the smallest integer datatype. Is there anyway I can store my 7 8-bit numbers and save on space?
我要存储在 Postgres 中的每条记录有 7 个 8 位整数值。Pg 不提供单字节整数类型 SMALLINT 或 2 个字节,它们是最小的整数数据类型。无论如何,我可以存储我的 7 个 8 位数字并节省空间吗?
Would an array type with a 7 element array be more compact? Or, should I make a binary representation of my 7 numbers (for example, using pack in Perl) and store that in a single bytea field?
具有 7 个元素数组的数组类型会更紧凑吗?或者,我是否应该对我的 7 个数字进行二进制表示(例如,在 Perl 中使用 pack)并将其存储在单个 bytea 字段中?
Any other suggestions?
还有其他建议吗?
采纳答案by Greg Smith
Given that the overhead for any row in PostgreSQL is 23 bytes(HeapTupleHeaderData), if you really care about small amounts of space this much you've probably picked the wrong way to store your data.
鉴于 PostgreSQL 中任何行的开销是23 个字节(HeapTupleHeaderData),如果您真的如此关心少量空间,那么您可能选择了错误的数据存储方式。
Regardless, since all the more complicated types have their own overhead (bytea adds four bytes of overhead for example, bit strings 5 to 8), the only way to accomplish what you're looking for is to use a bigint (8 bytes), numerically shifting each value and OR-ing together the result. You can do this using the bit string operationsto make the code easier--compute as bit string, then cast to bigint before storing--or just manually multiply/add if you want speed to be better. For example, here's how you store two bytes together into a two byte structure and then get them back again:
无论如何,由于所有更复杂的类型都有自己的开销(例如,bytea 添加了 4 个字节的开销,位串 5 到 8),完成您要查找的内容的唯一方法是使用 bigint(8 个字节),对每个值进行数字移位并将结果进行 OR 运算。您可以使用位串操作来使代码更容易——计算为位串,然后在存储之前转换为 bigint——或者如果你希望速度更快,只需手动乘/加。例如,下面是如何将两个字节一起存储到一个两字节结构中,然后再将它们取回:
int2 = 256 * byte1 + byte2
byte1 = int2 / 256
byte2 = int2 % 256
You can extend the same idea into storing 7 of them that way. Retrieval overhead is still going to be terrible, but you will have actually saved some space in the process. But not very much relative to just the row header.
您可以将相同的想法扩展到以这种方式存储其中的 7 个。检索开销仍然会很糟糕,但您实际上会在此过程中节省一些空间。但与行标题关系不大。
回答by Wojtas
There is pg_catalog.char
( another notation - "char" ) type which uses only 1 byte to store its value.
有pg_catalog.char
(另一种表示法 - "char" )类型仅使用 1 个字节来存储其值。
select pg_column_size( 'A' );
pg_column_size
----------------
2
(1 row)
select pg_column_size( 'A'::"char" );
pg_column_size
----------------
1
(1 row)
回答by Hossein
I haven't tested them myself, but there are extensions for this; e.g http://pgxn.org/dist/tinyint/.
我自己还没有测试过它们,但是有一些扩展;例如http://pgxn.org/dist/tinyint/。
回答by Evan Carroll
"char"
"char"
This is a one-byte type in PostgreSQL that fits in the range of -128,127. From the docs,
这是 PostgreSQL 中的一个字节类型,适用于 -128,127 的范围。从文档中,
The type
"char"
(note the quotes) is different fromchar(1)
in that it only uses one byte of storage.It is internally used in the system catalogs as a simplistic enumeration type.
类型
"char"
(注意引号)的不同之处char(1)
在于它只使用一个字节的存储空间。它在系统目录中作为简单的枚举类型在内部使用。
You can bias this towards [-128,127], by subtracting 128 from any input in the range of [0-255] before you write to the database, and adding it back on the output when you read from the database.
您可以将其偏向 [-128,127],方法是在写入数据库之前从 [0-255] 范围内的任何输入中减去 128,并在从数据库中读取时将其添加回输出。
-- works
SELECT (-128)::"char", 127::"char";
-- generates out of range
SELECT (-128)::"char";
SELECT 128::"char";
-- Shifts to unsigned range.
-- If you're going to be using "char"
-- review the results of this query!
SELECT
x::int AS "inputUnsigned",
chr(x) AS "extendedASCII",
-- this is the "char" types representation for that input.
signed::"char" AS "charRepresentation",
signed AS "inputUnsignedToSigned",
signed+128 AS "inputUnsignedToSignedToUnsigned"
FROM generate_series(1,255) AS gs(x)
-- Here we map the input in the range of [0,255] to [-128,127]
CROSS JOIN LATERAL ( VALUES (x::int-128) )
AS v(signed);
Small excerpt of the output
输出的小摘录
inputUnsigned | extendedASCII | charRepresentation | inputUnsignedToSigned | inputUnsignedToSignedToUnsigned
---------------+---------------+--------------------+-----------------------+---------------------------------
....
190 | ? | > | 62 | 190
191 | ? | ? | 63 | 191
192 | à | @ | 64 | 192
193 | á | A | 65 | 193
194 | ? | B | 66 | 194
195 | ? | C | 67 | 195
196 | ? | D | 68 | 196
...
We use generate_series(1,255)
only because chr(0)
throws because you can't generate or output the ASCII NUL
(PostgreSQL uses cstrings)
我们使用generate_series(1,255)
只因为chr(0)
throws 因为你不能生成或输出 ASCII NUL
(PostgreSQL 使用 cstrings)
pguint
Extension
pguint
延期
Pguintis an extension that provides two one byte representations,
Pguint是一个扩展,它提供两个单字节表示,
int1
(signed)uint1
(unsigned)
int1
(签)uint1
(未签名)
回答by Anonymous
First you asked about 7, but now 6 bytes. Six 8-bit values corresponds exactly to MAC address size and PostgreSQL's built in type macaddr. You can insert those bytes using MAC syntax f.i. A1-B2-C3-D4-E5-F6.
首先,您询问了 7 个字节,但现在询问了 6 个字节。六个 8 位值与 MAC 地址大小和 PostgreSQL 的内置类型 macaddr 完全对应。您可以使用 MAC 语法 fi A1-B2-C3-D4-E5-F6 插入这些字节。
回答by Jordan S. Jones
You will want to look into the bytea
data type referenced here: http://www.postgresql.org/docs/8.4/interactive/datatype-binary.html
您需要查看bytea
此处引用的数据类型:http: //www.postgresql.org/docs/8.4/interactive/datatype-binary.html
There is also bit string data types: http://www.postgresql.org/docs/8.4/interactive/datatype-bit.html
还有位串数据类型:http: //www.postgresql.org/docs/8.4/interactive/datatype-bit.html
回答by Jordan S. Jones
Will you ever lookup records using these values?
您会使用这些值查找记录吗?
If yes - use normal datatypes like int4 (or even int8 if you're on 64bit architecture).
如果是 - 使用像 int4 这样的普通数据类型(如果你在 64 位架构上,甚至是 int8)。
If not - first ask yourself - what is the point of storing this values in Pg? You can use bytea (complicated i/o), or bitstrings (even more complicated i/o) but what is the point? How many billion records you're going to have? Did you actually check that smaller datatype uses less space (hint: it doesn't, check it - there are data alignment issues involved)? Are you working under impression that smaller datatype is faster (it isn't. It's actually more complex to compare two int2 values than two int4 values on 32bit architecture).
如果不是 - 首先问问自己 - 将这些值存储在 Pg 中有什么意义?您可以使用 bytea(复杂的 i/o)或位串(甚至更复杂的 i/o),但有什么意义呢?您将拥有多少亿条记录?您是否真的检查过较小的数据类型使用较少的空间(提示:它没有,请检查它 - 涉及数据对齐问题)?您是否认为较小的数据类型更快(事实并非如此。在 32 位架构上比较两个 int2 值实际上比比较两个 int4 值更复杂)。