最佳实践:在 PHP 中导入 mySQL 文件;拆分查询

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1883079/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 04:10:06  来源:igfitidea点击:

Best practice: Import mySQL file in PHP; split queries

phpmysql

提问by Pekka

I have a situation where I have to update a web site on a shared hosting provider. The site has a CMS. Uploading the CMS's files is pretty straightforward using FTP.

我有一种情况,我必须更新共享托管服务提供商的网站。该网站有一个 CMS。使用 FTP 上传 CMS 的文件非常简单。

I also have to import a big (relative to the confines of a PHP script) database file (Around 2-3 MB uncompressed). Mysql is closed for access from the outside, so I have to upload a file using FTP, and start a PHP script to import it. Sadly, I do not have access to the mysqlcommand line function so I have to parse and query it using native PHP. I also can't use LOAD DATA INFILE. I also can't use any kind of interactive front-end like phpMyAdmin, it needs to run in an automated fashion. I also can't use mysqli_multi_query().

我还必须导入一个大的(相对于 PHP 脚本的范围)数据库文件(大约 2-3 MB 未压缩)。Mysql 是关闭的,无法从外部访问,所以我必须使用 FTP 上传一个文件,并启动一个 PHP 脚本来导入它。遗憾的是,我无法访问mysql命令行函数,因此我必须使用本机 PHP 解析和查询它。我也不能使用 LOAD DATA INFILE。我也不能使用像 phpMyAdmin 这样的任何类型的交互式前端,它需要以自动化方式运行。我也无法使用mysqli_multi_query()

Does anybody know or have a already coded, simple solution that reliablysplits such a file into single queries (there could be multi-line statements) and runs the query. I would like to avoid to start fiddling with it myself due to the many gotchas that I'm likely to come across (How to detect whether a field delimiter is part of the data; how to deal with line breaks in memo fields; and so on). There mustbe a ready made solution for this.

有没有人知道或有一个已经编码的简单解决方案,可以可靠地将这样的文件拆分为单个查询(可能有多行语句)并运行查询。我想避免自己开始摆弄它,因为我可能会遇到许多问题(如何检测字段分隔符是否是数据的一部分;如何处理备注字段中的换行符;等等在)。有必须是这个现成的解决方案。

回答by Alix Axel

Here is a memory-friendly function that should be able to split a big file in individual queries without needing to open the whole file at once:

这是一个内存友好的函数,它应该能够在单个查询中拆分一个大文件,而无需一次打开整个文件

function SplitSQL($file, $delimiter = ';')
{
    set_time_limit(0);

    if (is_file($file) === true)
    {
        $file = fopen($file, 'r');

        if (is_resource($file) === true)
        {
            $query = array();

            while (feof($file) === false)
            {
                $query[] = fgets($file);

                if (preg_match('~' . preg_quote($delimiter, '~') . '\s*$~iS', end($query)) === 1)
                {
                    $query = trim(implode('', $query));

                    if (mysql_query($query) === false)
                    {
                        echo '<h3>ERROR: ' . $query . '</h3>' . "\n";
                    }

                    else
                    {
                        echo '<h3>SUCCESS: ' . $query . '</h3>' . "\n";
                    }

                    while (ob_get_level() > 0)
                    {
                        ob_end_flush();
                    }

                    flush();
                }

                if (is_string($query) === true)
                {
                    $query = array();
                }
            }

            return fclose($file);
        }
    }

    return false;
}

I tested it on a big phpMyAdmin SQL dump and it worked just fine.

我在一个大的 phpMyAdmin SQL 转储上测试了它,它工作得很好。



Some test data:

部分测试数据:

CREATE TABLE IF NOT EXISTS "test" (
    "id" INTEGER PRIMARY KEY AUTOINCREMENT,
    "name" TEXT,
    "description" TEXT
);

BEGIN;
    INSERT INTO "test" ("name", "description")
    VALUES (";;;", "something for you mind; body; soul");
COMMIT;

UPDATE "test"
    SET "name" = "; "
    WHERE "id" = 1;

And the respective output:

以及相应的输出:

SUCCESS: CREATE TABLE IF NOT EXISTS "test" ( "id" INTEGER PRIMARY KEY AUTOINCREMENT, "name" TEXT, "description" TEXT );
SUCCESS: BEGIN;
SUCCESS: INSERT INTO "test" ("name", "description") VALUES (";;;", "something for you mind; body; soul");
SUCCESS: COMMIT;
SUCCESS: UPDATE "test" SET "name" = "; " WHERE "id" = 1;

回答by Zaje

Single page PHPMyAdmin - Adminer - Just one PHP script file. check : http://www.adminer.org/en/

单页 PHPMyAdmin - Adminer - 只有一个 PHP 脚本文件。检查:http: //www.adminer.org/en/

回答by Bill Karwin

When StackOverflow released their monthly data dump in XML format, I wrote PHP scripts to load it into a MySQL database. I imported about 2.2 gigabytes of XML in a few minutes.

当 StackOverflow 以 XML 格式发布他们每月的数据转储时,我编写了 PHP 脚本将其加载到 MySQL 数据库中。我在几分钟内导入了大约 2.2 GB 的 XML。

My technique is to prepare()an INSERTstatement with parameter placeholders for the column values. Then use XMLReaderto loop over the XML elements and execute()my prepared query, plugging in values for the parameters. I chose XMLReader because it's a streaming XML reader; it reads the XML input incrementally instead of requiring to load the whole file into memory.

我的技术是使用列值的参数占位符prepare()INSERT语句。然后使用XMLReader循环遍历 XML 元素和execute()我准备好的查询,插入参数值。我选择 XMLReader 是因为它是一个流式 XML 阅读器;它以增量方式读取 XML 输入,而不需要将整个文件加载到内存中。

You could also read a CSV file one line at a time with fgetcsv().

您还可以使用 .csv 文件一行一行地读取 CSV 文件fgetcsv()

If you're inporting into InnoDB tables, I recommend starting and committing transactions explicitly, to reduce the overhead of autocommit. I commit every 1000 rows, but this is arbitrary.

如果您要导入 InnoDB 表,我建议显式启动和提交事务,以减少自动提交的开销。我每 1000 行提交一次,但这是任意的。

I'm not going to post the code here (because of StackOverflow's licensing policy), but in pseudocode:

我不会在这里发布代码(因为 StackOverflow 的许可政策),而是在伪代码中:

connect to database
open data file
PREPARE parameterizes INSERT statement
begin first transaction
loop, reading lines from data file: {
    parse line into individual fields
    EXECUTE prepared query, passing data fields as parameters
    if ++counter % 1000 == 0,
        commit transaction and begin new transaction
}
commit final transaction

Writing this code in PHP is not rocket science, and it runs pretty quickly when one uses prepared statements and explicit transactions. Those features are not available in the outdated mysqlPHP extension, but you can use them if you use mysqlior PDO_MySQL.

用 PHP 编写这段代码并不是一门高深的科学,当使用准备好的语句和显式事务时,它运行得非常快。这些功能在过时的mysqlPHP 扩展中不可用,但如果您使用mysqliPDO_MySQL ,则可以使用它们。

I also added convenient stuff like error checking, progress reporting, and support for default values when the data file doesn't include one of the fields.

我还添加了一些方便的东西,例如错误检查、进度报告以及当数据文件不包含其中一个字段时对默认值的支持。

I wrote my code in an abstractPHP class that I subclass for each table I need to load. Each subclass declares the columns it wants to load, and maps them to fields in the XML data file by name (or by position if the data file is CSV).

我在一个abstractPHP 类中编写了代码,我为需要加载的每个表子类化了该类。每个子类声明它要加载的列,并按名称(如果数据文件是 CSV 则按位置)将它们映射到 XML 数据文件中的字段。

回答by Luká? Lalinsky

Can't you install phpMyAdmin, gzip the file (which should make it much smaller) and import it using phpMyAdmin?

你不能安装 phpMyAdmin,gzip 文件(这应该使它更小)并使用 phpMyAdmin 导入它吗?

EDIT:Well, if you can't use phpMyAdmin, you can use the code from phpMyAdmin. I'm not sure about this particular part, but it's generaly nicely structured.

编辑:好吧,如果你不能使用 phpMyAdmin,你可以使用来自 phpMyAdmin 的代码。我不确定这个特定部分,但它的结构通常很好。

回答by hobodave

Export

出口

The first step is getting the input in a sane format for parsing when you export it. From your question it appears that you have control over the exporting of this data, but not the importing.

第一步是在导出时以合理的格式获取输入以进行解析。从您的问题来看,您似乎可以控制这些数据的导出,但不能控制导入。

~: mysqldump test --opt --skip-extended-insert | grep -v '^--' | grep . > test.sql

This dumps the test database excluding all comment lines and blank lines into test.sql. It also disables extended inserts, meaning there is one INSERT statement per line. This will help limit the memory usage during the import, but at a cost of import speed.

这会将不包括所有注释行和空白行的测试数据库转储到 test.sql 中。它还禁用扩展插入,这意味着每行有一个 INSERT 语句。这将有助于限制导入期间的内存使用,但以导入速度为代价。

Import

进口

The import script is as simple as this:

导入脚本就像这样简单:

<?php

$mysqli = new mysqli('localhost', 'hobodave', 'p4ssw3rd', 'test');
$handle = fopen('test.sql', 'rb');
if ($handle) {
    while (!feof($handle)) {
        // This assumes you don't have a row that is > 1MB (1000000)
        // which is unlikely given the size of your DB
        // Note that it has a DIRECT effect on your scripts memory
        // usage.
        $buffer = stream_get_line($handle, 1000000, ";\n");
        $mysqli->query($buffer);
    }
}
echo "Peak MB: ",memory_get_peak_usage(true)/1024/1024;

This will utilize an absurdly low amount of memory as shown below:

这将使用低得离谱的内存,如下所示:

daves-macbookpro:~ hobodave$ du -hs test.sql 
 15M    test.sql
daves-macbookpro:~ hobodave$ time php import.php 
Peak MB: 1.75
real    2m55.619s
user    0m4.998s
sys 0m4.588s

What that says is you processed a 15MB mysqldump with a peak RAM usage of 1.75 MB in just under 3 minutes.

也就是说,您在不到 3 分钟的时间内处理了一个 15MB 的 mysqldump,峰值 RAM 使用量为 1.75 MB。

Alternate Export

替代导出

If you have a high enough memory_limit and this is too slow, you can try this using the following export:

如果您有足够高的 memory_limit 并且这太慢,您可以使用以下导出尝试此操作:

~: mysqldump test --opt | grep -v '^--' | grep . > test.sql

This will allow extended inserts, which insert multiple rows in a single query. Here are the statistics for the same datbase:

这将允许扩展插入,即在单个查询中插入多行。以下是同一数据库的统计数据:

daves-macbookpro:~ hobodave$ du -hs test.sql 
 11M    test.sql
daves-macbookpro:~ hobodave$ time php import.php 
Peak MB: 3.75
real    0m23.878s
user    0m0.110s
sys 0m0.101s

Notice that it uses over 2x the RAM at 3.75 MB, but takes about 1/6th as long. I suggest trying both methods and seeing which suits your needs.

请注意,它在 3.75 MB 时使用了超过 2 倍的 RAM,但需要大约 1/6 的时间。我建议尝试两种方法,看看哪种方法适合您的需求。

Edit:

编辑:

I was unable to get a newline to appear literally in any mysqldump output using any of CHAR, VARCHAR, BINARY, VARBINARY, and BLOB field types. If you do have BLOB/BINARY fields though then please use the following just in case:

我无法使用任何 CHAR、VARCHAR、BINARY、VARBINARY 和 BLOB 字段类型在任何 mysqldump 输出中按字面意思显示换行符。如果您确实有 BLOB/BINARY 字段,请使用以下内容以防万一:

~: mysqldump5 test --hex-blob --opt | grep -v '^--' | grep . > test.sql

回答by Jeremy Dorn

Splitting a query cannot be reliably done without parsing. Here is valid SQL that would be impossible to split correctly with a regular expression.

不解析就无法可靠地拆分查询。这是无法使用正则表达式正确拆分的有效 SQL。

SELECT ";"; SELECT ";\"; a;";
SELECT ";
    abc";

I wrote a small SqlFormatter class in PHP that includes a query tokenizer. I added a splitQuery method to it that splits all queries (including the above example) reliably.

我用 PHP 编写了一个小的 SqlFormatter 类,其中包含一个查询标记器。我向它添加了一个 splitQuery 方法,该方法可以可靠地拆分所有查询(包括上面的示例)。

https://github.com/jdorn/sql-formatter/blob/master/SqlFormatter.php

https://github.com/jdorn/sql-formatter/blob/master/SqlFormatter.php

You can remove the format and highlight methods if you don't need them.

如果不需要,可以删除格式和突出显示方法。

One downside is that it requires the whole sql string to be in memory, which could be a problem if you're working with huge sql files. I'm sure with a little bit of tinkering, you could make the getNextToken method work on a file pointer instead.

一个缺点是它需要整个 sql 字符串都在内存中,如果您正在处理巨大的 sql 文件,这可能是一个问题。我敢肯定,稍加修改,您就可以让 getNextToken 方法在文件指针上工作。

回答by Kovge

First at all thanks for this topic. This saved a lot of time for me :) And let me to make little fix for your code. Sometimes if TRIGGERS or PROCEDURES is in dump file, it is not enough to examine the ; delimiters. In this case may be DELIMITER [something] in sql code, to say that the statement will not end with ; but [something]. For example a section in xxx.sql:

首先感谢这个话题。这为我节省了很多时间 :) 并且让我对您的代码进行一些修复。有时,如果 TRIGGERS 或 PROCEDURES 在转储文件中,仅检查 ; 分隔符。在这种情况下可能是SQL代码中的DELIMITER [something],表示语句不会以;结尾。但是[东西]。例如 xxx.sql 中的一个部分:

    DELIMITER //
    CREATE TRIGGER `mytrigger` BEFORE INSERT ON `mytable`
    FOR EACH ROW BEGIN
         SET NEW.`create_time` = NOW();
    END
    //
    DELIMITER ;

So first need to have a falg, to detect, that query does not ends with ; And delete the unqanted query chunks, because the mysql_query does not need delimiter (the delimiter is the end of string) so mysql_query need someting like this:

所以首先需要有一个 falg 来检测,该查询不以 ; 结尾。并删除 unqanted 查询块,因为 mysql_query 不需要分隔符(分隔符是字符串的结尾)所以 mysql_query 需要这样的东西:

    CREATE TRIGGER `mytrigger` BEFORE INSERT ON `mytable`
    FOR EACH ROW BEGIN
         SET NEW.`create_time` = NOW();
    END;

So a little work and here is the fixed code:

所以做一点工作,这里是固定代码:

    function SplitSQL($file, $delimiter = ';')
    {
        set_time_limit(0);            
        $matches = array();
        $otherDelimiter = false;
        if (is_file($file) === true) {
            $file = fopen($file, 'r');
            if (is_resource($file) === true) {
                $query = array();
                while (feof($file) === false) {
                    $query[] = fgets($file);
                    if (preg_match('~' . preg_quote('delimiter', '~') . '\s*([^\s]+)$~iS', end($query), $matches) === 1){     
                        //DELIMITER DIRECTIVE DETECTED
                        array_pop($query); //WE DON'T NEED THIS LINE IN SQL QUERY
                        if( $otherDelimiter = ( $matches[1] != $delimiter )){
                        }else{
                            //THIS IS THE DEFAULT DELIMITER, DELETE THE LINE BEFORE THE LAST (THAT SHOULD BE THE NOT DEFAULT DELIMITER) AND WE SHOULD CLOSE THE STATEMENT                                
                            array_pop($query);
                            $query[]=$delimiter;
                        }                                                                                    
                    }                        
                    if ( !$otherDelimiter && preg_match('~' . preg_quote($delimiter, '~') . '\s*$~iS', end($query)) === 1) {                            
                        $query = trim(implode('', $query));
                        if (mysql_query($query) === false){
                            echo '<h3>ERROR: ' . $query . '</h3>' . "\n";
                        }else{
                            echo '<h3>SUCCESS: ' . $query . '</h3>' . "\n";
                        }
                        while (ob_get_level() > 0){
                            ob_end_flush();
                        }
                        flush();                        
                    }
                    if (is_string($query) === true) {
                        $query = array();
                    }
                }                    
                return fclose($file);
            }
        }
        return false;
}

I hope i could help somebody too. Have a nice day!

我希望我也能帮助别人。祝你今天过得愉快!

回答by Frederic Anand

http://www.ozerov.de/bigdump/was very useful for me in importing 200+ MB sql file.

http://www.ozerov.de/bigdump/在导入 200+ MB sql 文件时对我非常有用。

Note: SQL file should be already present in the server so that the process can be completed without any issue

注意:SQL 文件应该已经存在于服务器中,以便该过程可以毫无问题地完成

回答by mluebke

Can you use LOAD DATA INFILE?

您可以使用LOAD DATA INFILE 吗?

If you format your db dump file using SELECT INTO OUTFILE, this should be exactly what you need. No reason to have PHP parse anything.

如果您使用 SELECT INTO OUTFILE 格式化您的数据库转储文件,这应该正是您所需要的。没有理由让 PHP 解析任何东西。