使用 Python 将大量数据批量插入 SQLite

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18219779/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 10:14:26  来源:igfitidea点击:

Bulk insert huge data into SQLite using Python

pythonsqlite

提问by Shar

I read this: Importing a CSV file into a sqlite3 database table using Python

我读到了这个:使用 Python 将 CSV 文件导入到 sqlite3 数据库表中

and it seems that everyone suggests using line-by-line reading instead of using bulk .import from SQLite. However, that will make the insertion really slow if you have millions of rows of data. Is there any other way to circumvent this?

似乎每个人都建议使用逐行读取而不是使用来自 SQLite 的批量 .import。但是,如果您有数百万行数据,这将使插入变得非常缓慢。有没有其他方法可以规避这种情况?

Update: I tried the following code to insert line by line but the speed is not as good as I expected. Is there anyway to improve it

更新:我尝试了以下代码逐行插入,但速度没有我预期的那么好。有没有办法改进它

for logFileName in allLogFilesName:
    logFile = codecs.open(logFileName, 'rb', encoding='utf-8')
    for logLine in logFile:
        logLineAsList = logLine.split('\t')
        output.execute('''INSERT INTO log VALUES(?, ?, ?, ?)''', logLineAsList)
    logFile.close()
connection.commit()
connection.close()

采纳答案by alecxe

Divide your data into chunks on the fly using generator expressions, make inserts inside the transaction. Here's a quote from sqlite optimization FAQ:

使用生成器表达式将数据动态分成块,在事务中插入。这是sqlite优化常见问题的引用:

Unless already in a transaction, each SQL statement has a new transaction started for it. This is very expensive, since it requires reopening, writing to, and closing the journal file for each statement. This can be avoided by wrapping sequences of SQL statements with BEGIN TRANSACTION; and END TRANSACTION; statements. This speedup is also obtained for statements which don't alter the database.

除非已经在一个事务中,否则每个 SQL 语句都会为其启动一个新事务。这是非常昂贵的,因为它需要为每个语句重新打开、写入和关闭日志文件。这可以通过用 BEGIN TRANSACTION 包装 SQL 语句序列来避免;并结束交易;声明。对于不改变数据库的语句,也可以获得这种加速。

Here'show your code may look like.

这里是你的代码可能看起来怎么样。

Also, sqlite has an ability to import CSV files.

此外,sqlite 能够导入 CSV 文件

回答by davidfg4

Sqlite can do tens of thousands of inserts per second, just make sure to do all of them in a single transaction by surrounding the inserts with BEGIN and COMMIT. (executemany() does this automatically.)

Sqlite每秒可以执行数万次插入,只需确保在单个事务中通过 BEGIN 和 COMMIT 围绕插入来完成所有这些操作。(executemany() 会自动执行此操作。)

As always, don't optimize before you know speed will be a problem. Test the easiest solution first, and only optimize if the speed is unacceptable.

与往常一样,在您知道速度会成为问题之前不要进行优化。首先测试最简单的解决方案,只有在速度不可接受时才进行优化。

回答by Fred

Since this is the top result on a Google search I thought it might be nice to update this question.

由于这是谷歌搜索的最高结果,我认为更新这个问题可能会很好。

From the python sqlite docsyou can use

从你可以使用的python sqlite docs

import sqlite3

persons = [
    ("Hugo", "Boss"),
    ("Calvin", "Klein")
]

con = sqlite3.connect(":memory:")

# Create the table
con.execute("create table person(firstname, lastname)")

# Fill the table
con.executemany("insert into person(firstname, lastname) values (?,?)", persons)

I have used this method to commit over 50k row inserts at a time and it's lightning fast.

我已经使用这种方法一次提交超过 50k 行插入,而且速度非常快。