php 在将文本存储到数据库之前对其进行压缩

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/8228950/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-26 04:13:14  来源:igfitidea点击:

Compressing text before storing it in the database

phpmysqlcompressionarchive

提问by Silver Light

I need to store a very big amount of text in mysql database. It will be millions of records with field type LONGTEXT and database size will be huge.

我需要在 mysql 数据库中存储大量文本。这将是字段类型为 LONGTEXT 的数百万条记录,并且数据库大小将非常庞大。

So, I want ask, if there is a safe way to compress text before storing it into TEXT field to save space, with ability to extract it back if needed?

所以,我想问一下,是否有一种安全的方法可以在将文本存储到 TEXT 字段之前压缩文本以节省空间,并能够在需要时将其提取回来?

Something like:

就像是:

$archived_text = compress_text($huge_text);
// saving $archived_text to database here
// ...

// ...
// getting compressed text from database
$archived_text = get_text_from_db();
$huge_text = uncompress_text($archived_text);

Is there a way to do this with php or mysql? All the texts are utf-8 encoded.

有没有办法用 php 或 mysql 做到这一点?所有的文本都是 utf-8 编码的。

UPDATE

更新

My application is a large literature website where users can add their texts. Here is the table I have:

我的应用程序是一个大型文学网站,用户可以在其中添加他们的文本。这是我的表:

CREATE TABLE `book_parts` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `book_id` int(11) NOT NULL,
  `title` varchar(200) DEFAULT NULL,
  `content` longtext,
  `order_num` int(11) DEFAULT NULL,
  `views` int(10) unsigned DEFAULT '0',
  `add_date` datetime DEFAULT NULL,
  `is_public` tinyint(3) unsigned NOT NULL DEFAULT '1',
  `published_as_draft` tinyint(3) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`),
  KEY `key_order_num` (`order_num`),
  KEY `add_date` (`add_date`),
  KEY `key_book_id` (`book_id`,`is_public`,`order_num`),
  CONSTRAINT FOREIGN KEY (`book_id`) REFERENCES `books` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 

Currently it has about 800k records and weights 4 GB, 99% of queries are SELECT. I have all reasons to think that numbers increase diagrammatically. I wouldn't like to store texts in the files because there is quite heavy logic around and my website has quite a few hits.

目前它有大约 80 万条记录,权重为 4 GB,99% 的查询都是 SELECT。我有充分的理由认为数字会以图形方式增加。我不想在文件中存储文本,因为周围的逻辑非常繁重,而且我的网站有很多点击量。

回答by Oroboros102

Are you going to index these texts. How big is read load on this texts? Insert load?

你打算索引这些文本吗?这些文本的阅读量有多大?插入负载?

You can use InnoDB data compression - transparent and modern way. See docsfor more info.

您可以使用 InnoDB 数据压缩 - 透明和现代的方式。有关更多信息,请参阅文档

If you have realy huge texts (say, each text is above 10MB), than good idea is not to store them in Mysql. Store compressed by gzip texts in file system and only pointers and meta in mysql. You can easily expand your storage in future and move it to e.g. DFS.

如果您有非常大的文本(例如,每个文本超过 10MB),最好不要将它们存储在 Mysql 中。在文件系统中存储由 gzip 文本压缩的文本,在 mysql 中仅存储指针和元数据。您可以在将来轻松扩展您的存储并将其移动到例如 DFS。

Update:another plus of storing texts outside Mysql: DB stays small and fast. Minus: high probability of data inconsistence.

更新:在 Mysql 之外存储文本的另一个优点:DB 保持小而快。减:数据不一致的概率很高。

Update 2:if you have much programming resourses, please, take a look on projects like this one: http://code.google.com/p/mysql-filesystem-engine/.

更新 2:如果您有很多编程资源,请查看这样的项目:http: //code.google.com/p/mysql-filesystem-engine/

Final Update:according to your info, you can just use InnoDB compression - it is the same as ZIP. You can start with these params:

最终更新:根据您的信息,您可以只使用 InnoDB 压缩 - 它与 ZIP 相同。您可以从这些参数开始:

CREATE TABLE book_parts
 (...) 
 ENGINE=InnoDB
 ROW_FORMAT=COMPRESSED 
 KEY_BLOCK_SIZE=8;

Later you will need to play with KEY_BLOCK_SIZE. See SHOW STATUS LIKE 'COMPRESS_OPS_OK'and SHOW STATUS LIKE 'COMPRESS_OPS'. Ratio of these two params must be close to 1.0: Docs.

稍后您将需要使用KEY_BLOCK_SIZE. 见SHOW STATUS LIKE 'COMPRESS_OPS_OK'SHOW STATUS LIKE 'COMPRESS_OPS'。这两个参数的比率必须接近 1.0:Docs

回答by Marc B

If you're compressing (eg. gzip), then don't use TEXT fields of any sort. They're not binary-safe. Data going into/coming out of text fields is subject to character set translation, which probably (though not necessarily) mangle the compressed data and give you a corrupted result when you retrieve/uncompress the text.

如果您正在压缩(例如 gzip),则不要使用任何类型的 TEXT 字段。它们不是二进制安全的。输入/输出文本字段的数据受字符集转换的影响,这可能(尽管不一定)会破坏压缩数据并在您检索/解压缩文本时给您一个损坏的结果。

Use BLOB fields instead, which are binary-transparent and do not to any translation of the data.

请改用 BLOB 字段,它们是二进制透明的,不会对数据进行任何转换。

回答by user1931858

It might be better to define the text field as blob, and compress the data in PHP to save costs in communication.

最好将文本字段定义为 blob,并在 PHP 中压缩数据以节省通信成本。

CREATE TABLE book_parts (
    ......
    content blob default NULL,
    ......
)

In PHP, use gzcompress and gzuncompress.

在 PHP 中,使用 gzcompress 和 gzuncompress。

$content = '......';
$query = sprintf("replace into book_parts(content) values('%s') ",
        mysql_escape_string(gzcompress($content)) );
mysql_query($query); 


$query = "select * from book_parts where id = 111 ";
$result = mysql_query($query);
if ($result && $row = mysql_fetch_assoc($result))
    $content = gzuncompress($row['content']);

回答by Devart

You may also want to use a COMPRESS option to enable compression of packets. Read some information about this option:

您可能还想使用 COMPRESS 选项来启用数据包压缩。阅读有关此选项的一些信息:

For PHP I have found this - MYSQLI_CLIENT_COMPRESS for mysqli_real_connect function.

对于 PHP,我发现了这个 -用于 mysqli_real_connect 函数的 MYSQLI_CLIENT_COMPRESS

回答by user447951

You could use php functions gzdeflate and gzinflate for text.

您可以将 php 函数 gzdeflate 和 gzinflate 用于文本。

回答by ajreal

There are no benefits in compressing large texts into a database.

将大文本压缩到数据库中没有任何好处。

Here are the problems you might face in the long run:

从长远来看,您可能会遇到以下问题:

  • If the server crashes the data may be hard to recover.
  • Not ideal for search.
  • It takes additional time to transfer the data between the mysql server and the browser.
  • Time consuming for backup (not using replication).
  • 如果服务器崩溃,数据可能很难恢复。
  • 不适合搜索。
  • 在 mysql 服务器和浏览器之间传输数据需要额外的时间。
  • 备份耗时(不使用复制)。

I think storing these large texts into a disk file will be easier for:

我认为将这些大文本存储到磁盘文件中会更容易:

  • Distributed backup (rsync).
  • PHP to handle file upload.
  • 分布式备份(rsync)。
  • PHP 处理文件上传。