使用 Laravel DB Seed 插入 100 万条记录
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35429584/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Insert 1 million records using Laravel DB Seed
提问by Mifas
I'm using faker to get dummy data and trying to add 1million records. Somehow I only can reach around 100000 rows only, Following is my code
我正在使用 faker 来获取虚拟数据并尝试添加 100 万条记录。不知何故,我只能达到大约 100000 行,以下是我的代码
$no_of_rows = 1000000;
for( $i=1; $i <= $no_of_rows; $i++ ){
$user_data[] = [
'status' => 'ACTIVE',
'username' => $faker->userName,
'email' => $faker->email,
'password' => $password,
'firstname' => $faker->firstName,
'surname' => $faker->lastName,
'mobilenumber' => $faker->phoneNumber,
'confirmed' => (int)$faker->boolean(50),
'gender' => $faker->boolean(50) ? 'MALE' : 'FEMALE',
'dob' => $faker->date(),
'address_line_1' => $faker->address,
'address_line_2' => '',
'post_code' => $faker->postcode,
];
}
User::insert($user_data);
I'm getting following error message
我收到以下错误消息
PHP Fatal error: Allowed memory size of 1073741824 bytes exhausted
I already set ini_set('memory_limit', '1024M');
我已经设置了 ini_set('memory_limit', '1024M');
Any useful thoughts or solutions?
任何有用的想法或解决方案?
回答by Matija Boban
The core issue of this problem is that the Fakerlib instance (usually used to generate data in Laravel) is memory heavy and it doesn't get properly cleared by the garbage collector while used in large loops.
这个问题的核心问题是Fakerlib 实例(通常用于在 Laravel 中生成数据)内存很大,并且在大循环中使用时没有被垃圾收集器正确清除。
I agree with the chucked processing that @Rob Mkrtchyanadded above but since this is Laravel I would suggest a more elegant solution using the Factory facility.
我同意@ Rob Mkrtchyan在上面添加的夹头处理,但由于这是 Laravel,我建议使用 Factory 工具提供更优雅的解决方案。
You can create a specific model factory (in Laravel 5.3 this should be placed in database/factories/), for instance:
您可以创建一个特定的模型工厂(在 Laravel 5.3 中,这应该放在 database/factories/ 中),例如:
$factory->define(Tests::class, function (Faker\Generator $faker) {
return [
'status' => 'ACTIVE',
'username' => $faker->userName,
'email' => $faker->email,
'password' => bcrypt('secret'),
'firstname' => $faker->firstName,
'surname' => $faker->lastName,
'mobilenumber' => $faker->phoneNumber,
'confirmed' => (int)$faker->boolean(50),
'gender' => $faker->boolean(50) ? 'MALE' : 'FEMALE',
'dob' => $faker->date(),
'address_line_1' => $faker->address,
'address_line_2' => '',
'post_code' => $faker->postcode,
];
});
Then it's a simple matter of running the factory within your dB seeder class. Please note that the number 200 indicated the number of seed data entries to create.
然后在您的 dB 播种机类中运行工厂是一件简单的事情。请注意,数字 200 表示要创建的种子数据条目的数量。
factory(Tests::class, 200)
->create();
The reasons for using the seed factory is that it allows you much more flexibility in setting variables, etc. For documentation on this, you can consult the Laravel docs on dB seeding
使用种子工厂的原因是它允许您在设置变量等方面具有更大的灵活性。有关这方面的文档,您可以查阅有关 dB 种子的Laravel 文档
Now, since you are dealing with a large number of records it's trivial to implement a chunked solution which will aid the php garbage collecting. For instance:
现在,由于您正在处理大量记录,因此实施一个有助于 php 垃圾收集的分块解决方案是微不足道的。例如:
for ($i=0; $i < 5000; $i++) {
factory(Tests::class, 200)
->create();
}
I did a quick test and in this configuration, your script memory usage should be around 12 - 15mb (depending on other system factors of course) regardless of the data entries created.
我做了一个快速测试,在此配置中,无论创建的数据条目如何,您的脚本内存使用量都应该在 12 - 15mb 左右(当然取决于其他系统因素)。
回答by RamRaider
The variables set in the foreach
loop never get used so if the only intention if the foreach loop was to add a million records you could do away with the foreach and use something like this? This way the array used to populate the db is redeclared on each iteration rather than having more and more entries added.
foreach
循环中设置的变量永远不会被使用,所以如果 foreach 循环的唯一目的是添加一百万条记录,您可以取消 foreach 并使用类似的东西吗?这样,用于填充数据库的数组在每次迭代时都会重新声明,而不是添加越来越多的条目。
$no_of_rows = 1000000;
for( $i=0; $i < $no_of_rows; $i++ ){
$user_data = array(
'status' => 'ACTIVE',
'username' => $faker->userName,
'email' => $faker->email,
'password' => $password,
'firstname' => $faker->firstName,
'surname' => $faker->lastName,
'mobilenumber' => $faker->phoneNumber,
'confirmed' => (int)$faker->boolean(50),
'gender' => $faker->boolean(50) ? 'MALE' : 'FEMALE',
'dob' => $faker->date(),
'address_line_1' => $faker->address,
'address_line_2' => '',
'post_code' => $faker->postcode,
);
User::insert( $user_data );
$user_data=null;
}
On the basis of your last comment I can see why the use of chunks - no way to know the syntax of the sql before posting answer so perhaps this might be more suitable?
根据您的最后一条评论,我可以理解为什么要使用块 - 在发布答案之前无法知道 sql 的语法,所以这可能更合适?
$no_of_rows = 1000000;
$range=range( 1, $no_of_rows );
$chunksize=1000;
foreach( array_chunk( $range, $chunksize ) as $chunk ){
$user_data = array();/* array is re-initialised each major iteration */
foreach( $chunk as $i ){
$user_data[] = array(
'status' => 'ACTIVE',
'username' => $faker->userName,
'email' => $faker->email,
'password' => $password,
'firstname' => $faker->firstName,
'surname' => $faker->lastName,
'mobilenumber' => $faker->phoneNumber,
'confirmed' => (int)$faker->boolean(50),
'gender' => $faker->boolean(50) ? 'MALE' : 'FEMALE',
'dob' => $faker->date(),
'address_line_1' => $faker->address,
'address_line_2' => '',
'post_code' => $faker->postcode
);
}
User::insert( $user_data );
}
回答by Rob
Hello: Here is good solution
您好:这里有很好的解决方案
public function run(){
for($j = 1; $j < 1000; $j++){
for($i = 0; $i < 1000; $i++){
$user_data[] = [
'status' => 'ACTIVE',
'username' => $faker->userName,
'email' => $faker->email,
'password' => $password,
'firstname' => $faker->firstName,
'surname' => $faker->lastName,
'mobilenumber' => $faker->phoneNumber,
'confirmed' => (int)$faker->boolean(50),
'gender' => $faker->boolean(50) ? 'MALE' : 'FEMALE',
'dob' => $faker->date(),
'address_line_1' => $faker->address,
'address_line_2' => '',
'post_code' => $faker->postcode,
];
}
User::insert($user_data);
}
}
This code uses only 1000 lenght arrays in memory... and you can run this without changing any default php settings...
这段代码在内存中只使用了 1000 个长度数组......你可以在不更改任何默认 php 设置的情况下运行它......
Enjoy,..
享受,..
回答by piyush anques
Hello: Here is very good and very Fast Insert data solution
您好:这里有一个非常好的和非常快速的插入数据解决方案
$no_of_data = 1000000;
$test_data = array();
for ($i = 0; $i < $no_of_data; $i++){
$test_data[$i]['number'] = "1234567890";
$test_data[$i]['message'] = "Test Data";
$test_data[$i]['status'] = "Delivered";
}
$chunk_data = array_chunk($test_data, 1000);
if (isset($chunk_data) && !empty($chunk_data)) {
foreach ($chunk_data as $chunk_data_val) {
DB::table('messages')->insert($chunk_data_val);
}
}