将数千条记录插入表中的最有效方法是什么(MySQL、Python、Django)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/850117/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What's the most efficient way to insert thousands of records into a table (MySQL, Python, Django)
提问by Roee Adler
I have a database table with a unique string field and a couple of integer fields. The string field is usually 10-100 characters long.
我有一个带有唯一字符串字段和几个整数字段的数据库表。字符串字段的长度通常为 10-100 个字符。
Once every minute or so I have the following scenario: I receive a list of 2-10 thousand tuples corresponding to the table's record structure, e.g.
大约每分钟一次,我有以下场景:我收到与表的记录结构相对应的 2-10 千个元组的列表,例如
[("hello", 3, 4), ("cat", 5, 3), ...]
I need to insert all these tuples to the table (assume I verified neither of these strings appear in the database). For clarification, I'm using InnoDB, and I have an auto-incremental primary key for this table, the string is not the PK.
我需要将所有这些元组插入到表中(假设我验证了这些字符串都没有出现在数据库中)。为了澄清起见,我正在使用 InnoDB,并且我有一个用于该表的自动增量主键,该字符串不是 PK。
My code currently iterates through this list, for each tuple creates a Python module object with the appropriate values, and calls ".save()", something like so:
我的代码当前遍历此列表,为每个元组创建一个具有适当值的 Python 模块对象,并调用“.save()”,如下所示:
@transaction.commit_on_success
def save_data_elements(input_list):
for (s, i1, i2) in input_list:
entry = DataElement(string=s, number1=i1, number2=i2)
entry.save()
This code is currently one of the performance bottlenecks in my system, so I'm looking for ways to optimize it.
这段代码目前是我系统中的性能瓶颈之一,所以我正在寻找优化它的方法。
For example, I could generate SQL codes each containing an INSERT command for 100 tuples ("hard-coded" into the SQL) and execute it, but I don't know if it will improve anything.
例如,我可以生成每个包含 100 个元组的 INSERT 命令的 SQL 代码(“硬编码”到 SQL 中)并执行它,但我不知道它是否会改进任何东西。
Do you have any suggestion to optimize such a process?
您对优化这样的流程有什么建议吗?
Thanks
谢谢
回答by Chad Birch
For MySQL specifically, the fastest way to load data is using LOAD DATA INFILE, so if you could convert the data into the format that expects, it'll probably be the fastest way to get it into the table.
特别是对于 MySQL,加载数据的最快方法是使用LOAD DATA INFILE,因此如果您可以将数据转换为预期的格式,这可能是将其放入表中的最快方法。
回答by Nadia Alramli
You can write the rows to a file in the format "field1", "field2", .. and then use LOAD DATA to load them
您可以将行以“field1”、“field2”等格式写入文件,然后使用 LOAD DATA 加载它们
data = '\n'.join(','.join('"%s"' % field for field in row) for row in data)
f= open('data.txt', 'w')
f.write(data)
f.close()
Then execute this:
然后执行这个:
LOAD DATA INFILE 'data.txt' INTO TABLE db2.my_table;
回答by Sean McSomething
If you don't LOAD DATA INFILE
as some of the other suggestions mention, two things you can do to speed up your inserts are :
如果您没有LOAD DATA INFILE
像其他一些建议提到的那样,您可以做两件事来加快插入速度:
- Use prepared statements - this cuts out the overhead of parsing the SQL for every insert
- Do all of your inserts in a single transaction - this would require using a DB engine that supports transactions (like InnoDB)
- 使用准备好的语句 - 这减少了为每个插入解析 SQL 的开销
- 在单个事务中完成所有插入 - 这需要使用支持事务的数据库引擎(如 InnoDB)
回答by staticsan
If you can do a hand-rolled INSERT
statement, then that's the way I'd go. A single INSERT
statement with multiple value clauses is much much faster than lots of individual INSERT
statements.
如果你可以做一个手卷INSERT
声明,那么这就是我要走的路。一个单一的INSERT
多值条款语句不是很多个人的多更快INSERT
的语句。
回答by weevilgenius
Regardless of the insert method, you will want to use the InnoDB engine for maximum read/write concurrency. MyISAM will lock the entire table for the duration of the insert whereas InnoDB (under most circumstances) will only lock the affected rows, allowing SELECT statements to proceed.
无论使用哪种插入方法,您都希望使用 InnoDB 引擎来实现最大的读/写并发性。MyISAM 将在插入期间锁定整个表,而 InnoDB(在大多数情况下)只会锁定受影响的行,从而允许 SELECT 语句继续进行。
回答by roopesh
I donot know the exact details, but u can use json style data representation and use it as fixtures or something. I saw something similar on Django Video Workshop by Douglas Napoleone. See the videos at http://www.linux-magazine.com/online/news/django_video_workshop. and http://www.linux-magazine.com/online/features/django_reloaded_workshop_part_1. Hope this one helps.
我不知道确切的细节,但是您可以使用 json 样式数据表示并将其用作固定装置或其他东西。我在 Douglas Napoleone 的 Django Video Workshop 上看到了类似的东西。请参阅http://www.linux-magazine.com/online/news/django_video_workshop 上的视频。和http://www.linux-magazine.com/online/features/django_reloaded_workshop_part_1。希望这个有帮助。
Hope you can work it out. I just started learning django, so I can just point you to resources.
希望你能解决。我刚开始学习 django,所以我可以给你指点资源。
回答by KM.
what format do you receive? if it is a file, you can do some sort of bulk load: http://www.classes.cs.uchicago.edu/archive/2005/fall/23500-1/mysql-load.html
你收到什么格式?如果它是一个文件,你可以做某种批量加载:http: //www.classes.cs.uchicago.edu/archive/2005/fall/23500-1/mysql-load.html
回答by NathanD
This is unrelated to the actual load of data into the DB, but...
这与将数据实际加载到数据库中无关,但是...
If providing a "The data is loading... The load will be done shortly" type of message to the user is an option, then you can run the INSERTs or LOAD DATA asynchronously in a different thread.
如果向用户提供“数据正在加载...加载将很快完成”类型的消息是一个选项,则您可以在不同的线程中异步运行 INSERT 或 LOAD DATA。
Just something else to consider.
只是其他需要考虑的事情。