python 如何在mySQL中存储IP

Question

提问by OhioDude

We've got a healthy debate going on in the office this week. We're creating a Db to store proxy information, for the most part we have the schema worked out except for how we should store IPs. One camp wants to use 4 smallints, one for each octet and the other wants to use a 1 big int,INET_ATON.

本周我们在办公室进行了一场健康的辩论。我们正在创建一个 Db 来存储代理信息，除了我们应该如何存储 IP 之外，大多数情况下我们已经制定了架构。一个阵营想要使用 4 个 smallint，一个用于每个八位字节，另一个阵营想要使用 1 个 big int，INET_ATON。

These tables are going to be huge so performance is key. I am in middle here as I normally use MS SQL and 4 small ints in my world. I don't have enough experience with this type of volume storing IPs.

这些表会很大，所以性能是关键。我在这里居中，因为我通常在我的世界中使用 MS SQL 和 4 个小整数。我对这种类型的卷存储 IP 没有足够的经验。

We'll be using perl and python scripts to access the database to further normalize the data into several other tables for top talkers, interesting traffic etc.

我们将使用 perl 和 python 脚本来访问数据库，以进一步将数据规范化为其他几个表，用于热门谈话者、有趣的流量等。

I am sure there are some here in the community that have done something simular to what we are doing and I am interested in hearing about their experiences and which route is best, 1 big int, or 4 small ints for IP addresses.

我相信社区中有些人已经做了一些与我们正在做的事情类似的事情，我很想听听他们的经验，以及哪条路线是最好的，IP 地址是 1 个大整数还是 4 个小整数。

EDIT- One of our concerns is space, this database is going to be huge like in 500,000,000 records a day. So we are trying to weigh the space issue along with the performance issue.

编辑- 我们关注的一个问题是空间，这个数据库将会很大，就像每天 500,000,000 条记录一样。所以我们试图权衡空间问题和性能问题。

EDIT 2Some of the conversation has turned over to the volume of data we are going to store...that's not my question. The question is which is the preferable way to store an IP address and why. Like I've said in my comments, we work for a large fortune 50 company. Our log files contain usage data from our users. This data in turn will be used within a security context to drive some metrics and to drive several security tools.

编辑 2一些对话已经转向我们将要存储的数据量……这不是我的问题。问题是哪种方式更适合存储 IP 地址以及为什么。就像我在评论中所说的那样，我们为一家财富 50 强公司工作。我们的日志文件包含来自我们用户的使用数据。反过来，这些数据将在安全上下文中用于驱动某些指标并驱动多个安全工具。

Answer 1

回答by Andre Miller

I would suggest looking at what type of queries you will be running to decide which format you adopt.

我建议查看您将运行的查询类型来决定您采用哪种格式。

Only if you need to pull out or compare individual octets would you have to consider splitting them up into separate fields.

只有当您需要提取或比较单个八位字节时，您才必须考虑将它们拆分为单独的字段。

Otherwise, store it as a 4 byte integer. That also has the bonus of allowing you to use the MySQL built-in INET_ATON()and INET_NTOA()functions.

否则，将其存储为 4 字节整数。这还有一个好处是允许您使用 MySQL 内置INET_ATON()和INET_NTOA()函数。

Performance vs. Space

性能与空间

Storage:

贮存：

If you are only going to support IPv4 addresses then your datatype in MySQL can be an UNSIGNED INTwhich only uses 4 bytes of storage.

如果您只打算支持 IPv4 地址，那么您在 MySQL 中的数据类型可以是UNSIGNED INT仅使用 4 字节存储的an 。

To store the individual octets you would only need to use UNSIGNED TINYINTdatatypes, not SMALLINTS, which would use up 1 byte each of storage.

要存储单个八位字节，您只需要使用UNSIGNED TINYINT数据类型，而不是SMALLINTS，这将占用每个存储空间 1 个字节。

Both methods would use similar storage with perhaps slightly more for separate fields for some overhead.

两种方法都将使用类似的存储，对于单独的字段可能会稍微多一些，以获得一些开销。

More info:

更多信息：

Performance:

表现：

Using a single field will yield much better performance, it's a single comparison instead of 4. You mentioned that you will only run queries against the whole IP address, so there should be no need to keep the octets separate. Using the INET_*functions of MySQL will do the conversion between the text and integer representations once for the comparison.

使用单个字段会产生更好的性能，它是单个比较而不是 4。您提到您将只对整个 IP 地址运行查询，因此不需要将八位字节分开。使用INET_*MySQL的函数会在文本和整数表示之间进行一次转换以进行比较。

Answer 2

回答by Quassnoi

A BIGINTis 8bytes in MySQL.

ABIGINT是中的8字节MySQL。

To store IPv4addresses, an UNSINGED INTis enough, which I think is what you shoud use.

要存储IPv4地址，一个UNSINGED INT就足够了，我认为这是您应该使用的。

I can't imagine a scenario where 4octets would gain more performance than a single INT, and the latter is much more convenient.

我无法想象4八位字节比单个获得更多性能的场景INT，而后者更方便。

Also note that if you are going to issue queries like this:

另请注意，如果您要发出这样的查询：

SELECT  *
FROM    ips
WHERE   ? BETWEEN start_ip AND end_ip

, where start_ipand end_ipare columns in your table, the performance will be poor.

, wherestart_ip和end_ipare 列在表中，性能会很差。

These queries are used to find out if a given IPis within a subnet range (usually to ban it).

这些查询用于查明给定IP是否在子网范围内（通常是禁止它）。

To make these queries efficient, you should store the whole range as a LineStringobject with a SPATIALindex on it, and query like this:

为了使这些查询有效，您应该将整个范围存储为一个LineString带有SPATIAL索引的对象，并像这样查询：

SELECT  *
FROM    ips
WHERE   MBRContains(?, ip_range)

See this entry in my blog for more detail on how to do it:

有关如何执行此操作的更多详细信息，请参阅我博客中的此条目：

Banning IPs

禁止IP

Answer 3

回答by Greg Hewgill

Use PostgreSQL, there's a native data typefor that.

使用 PostgreSQL，有一个本机数据类型。

More seriously, I would fall into the "one 32-bit integer" camp. An IP address only makes sense when all four octets are considered together, so there's no reason to store the octets in separate columns in the database. Would you store a phone number using three (or more) different fields?

更严重的是，我会落入“一个 32 位整数”阵营。仅当所有四个八位字节一起考虑时，IP 地址才有意义，因此没有理由将八位字节存储在数据库中的单独列中。您会使用三个（或更多）不同的字段存储电话号码吗？

Answer 4

回答by Rich Bradshaw

Having seperate fields doesn't sound particularly sensible to me - much like splitting a zipcode into sections or a phone number.

拥有单独的字段对我来说听起来并不是特别明智 - 就像将邮政编码分成多个部分或电话号码一样。

Might be useful if you wanted specific info on the sections, but I see no real reason to not use a 32 bit int.

如果您想要有关各部分的特定信息，可能会很有用，但我认为没有真正的理由不使用 32 位 int。

Answer 5

回答by user105033

Efficient transformation of ip to int and int to ip (could be useful to you): (PERL)

ip 到 int 和 int 到 ip 的高效转换（可能对你有用）：（PERL）

sub ip2dec {
    my @octs = split /\./,shift;
    return ($octs[0] << 24) + ($octs[1] << 16) + ($octs[2] << 8) + $octs[3];
}

sub dec2ip {
    my $number = shift;
    my $first_oct = $number >> 24;
    my $reverse_1_ = $number - ($first_oct << 24);
    my $secon_oct = $reverse_1_ >> 16;
    my $reverse_2_ = $reverse_1_ - ($secon_oct << 16);
    my $third_oct = $reverse_2_ >> 8;
    my $fourt_oct = $reverse_2_ - ($third_oct << 8);
    return "$first_oct.$secon_oct.$third_oct.$fourt_oct";
}

Answer 6

回答by Roger

Old thread, but for the benefit of readers, consider using ip2long. It translates ip into an integer.

旧线程，但为了读者的利益，请考虑使用 ip2long。它将 ip 转换为整数。

Basically, you will be converting with ip2long when storing into DB then converting back with long2ip when retrieving from DB. The field type in DB will INT, so you will save space and gain better performance compared to storing ip as a string.

基本上，您将在存储到 DB 时使用 ip2long 进行转换，然后在从 DB 检索时使用 long2ip 转换回来。DB 中的字段类型将为 INT，因此与将 ip 存储为字符串相比，您将节省空间并获得更好的性能。

Answer 7

回答by hanshenrik

for both ipv4 and ipv6 compatibility, use VARBINARY(16) , ipv4's will always be BINARY(4) and ipv6 will always be BINARY(16), so VARBINARY(16) seems like the most efficient way to support both. and to convert them from the normal readable format to binary, use INET6_ATON('127.0.0.1'), and to reverse that, use INET6_NTOA(binary)

对于 ipv4 和 ipv6 兼容性，请使用 VARBINARY(16) ，ipv4 将始终为 BINARY(4)，ipv6 将始终为 BINARY(16)，因此 VARBINARY(16) 似乎是支持两者的最有效方式。并将它们从正常可读格式转换为二进制，使用 INET6_ATON('127.0.0.1')，并反转它，使用 INET6_NTOA(binary)

python 如何在mySQL中存储IP

提问by OhioDude

回答by Andre Miller

Performance vs. Space

性能与空间

回答by Quassnoi

回答by Greg Hewgill

回答by Rich Bradshaw

回答by user105033

回答by Roger

回答by hanshenrik

相关推荐

最近更新

标签

python 如何在mySQL中存储IP

提问by OhioDude

回答by Andre Miller

Performance vs. Space

性能与空间

回答by Quassnoi

回答by Greg Hewgill

回答by Rich Bradshaw

回答by user105033

回答by Roger

回答by hanshenrik

相关推荐

python 如何使用 PIL 减少调色板

将 Perl 翻译成 Python

python 在 elementtree 中使用 SimpleXMLTreeBuilder

python Pylons：尝试服务时地址已在使用中

相关推荐

最近更新

标签