bash shell 脚本内存不足
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27839594/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
shell script runs out of memory
提问by vefthym
I have written the following random-number generator shell script:
我编写了以下随机数生成器 shell 脚本:
for i in $(seq 1 ) #for as many times, as the first argument () defines...
do
echo "$i $((RANDOM%))" #print the current iteration number and a random number in [0, )
done
I run it like that:
我是这样运行的:
./generator.sh 1000000000 101 > data.txt
to generate 1B rows of an id and a random number in [0,100] and store this data in file data.txt
.
在 [0,100] 中生成 1B 行的 id 和随机数,并将此数据存储在 file 中data.txt
。
My desired output is:
我想要的输出是:
1 39
2 95
3 61
4 27
5 85
6 44
7 49
8 75
9 52
10 66
...
It works fine for small number of rows, but with 1B, I get the following OOM error:
它适用于少量行,但对于 1B,我收到以下 OOM 错误:
./generator.sh: xrealloc: ../bash/subst.c:5179: cannot allocate 18446744071562067968 bytes (4299137024 bytes allocated)
./generator.sh: xrealloc: ../bash/subst.c:5179: 无法分配 18446744071562067968 字节(已分配 4299137024 字节)
Which part of my program creates the error?
How could I write the data.txt
file line-by-line?
I have tried replacing the echo
line with:
我的程序的哪一部分产生了错误?我怎么能data.txt
一行一行地写文件?我尝试用以下内容替换该echo
行:
echo "$i $((RANDOM%))" >>
where $3 is data.txt
, but I see no difference.
$3 在哪里data.txt
,但我看不出有什么区别。
回答by Martin Tournoij
The problem is your for
loop:
问题是你的for
循环:
for i in $(seq 1 )
This will firstexpand $(seq 1 $1)
, creating a very big list, which you then pass to for
.
这将首先扩展$(seq 1 $1)
,创建一个非常大的列表,然后您将其传递给for
。
Using while
, however, we can read the output of seq
line-by-line, which will take a small amount of memory:
while
但是,使用,我们可以seq
逐行读取输出,这将占用少量内存:
seq 1 1000000000 | while read i; do
echo $i
done
回答by Martin Tournoij
If you want it fast this should work.
如果你想快速,这应该有效。
You will need to compile it using g++ using the form
您将需要使用 g++ 使用表单编译它
g++ -o <executable> <C++file>
For example i did it this way
例如我是这样做的
g++ -o inseq.exe CTest.cpp
CTest.cpp
测试文件
#include <iostream>
#include <string>
#include <fstream>
#include <iomanip>
#include <cstdlib>
#include <sstream>
int main (int argc,char *argv[])
{
std::stringstream ss;
int x = atoi(argv[1]);
for(int i=1;i<=x;i++)
{
ss << i << "\n";
if(i%10000==0)
{
std::cout << ss.rdbuf();
ss.clear();
ss.str(std::string());
}
}
std::cout << ss.rdbuf();
ss.clear();
ss.str(std::string());
}
Speed comparisons
速度比较
Lowest speeds of 3 tests for each of the methods presented for a 1000000 line file.
对于 1000000 行文件,每种方法进行了 3 次测试的最低速度。
Jidder
吉德
$ time ./inseq 1000000 > file
real 0m0.143s
user 0m0.131s
sys 0m0.011s
Carpetsmoker
地毯吸烟者
$ cat Carpet.sh
#!/bin/bash
seq 1 | while read i; do
echo $i
done
.
.
$ time ./Carpet.sh 1000000 > file
real 0m12.223s
user 0m9.753s
sys 0m2.140s
Hari Shankar
哈里·香卡
$ cat Hari.sh
#!/bin/bash
for ((i=0; i<;++i))
do
echo "$i $((RANDOM%))"
done
.
.
$ time ./Hari.sh 1000000 > file
real 0m9.729s
user 0m8.084s
sys 0m1.064s
As you can see from the results, my way is slightlyfaster by about 60-70*.
从结果中可以看出,我的方式稍微快了大约 60-70*。
Edit
编辑
Because python is great
因为python很棒
$ cat Py.sh
#!/usr/bin/python
for x in xrange(1, 1000000):
print (x)
'
'
$ time ./Py.sh >file
real 0m0.543s
user 0m0.499s
sys 0m0.016s
4* slower than c++ so if the file was going to take an hour to make it would take 4 with these two lines.
4 * 比 C++ 慢,所以如果文件需要一个小时来制作,这两行需要 4。
EDIT 2
编辑 2
Decided to try Python and c++ on the 1000000000 line file
决定在 1000000000 行文件上尝试 Python 和 C++
For a none CPU-intensive task this seems to be using a lottt of cpu
对于非 CPU 密集型任务,这似乎使用了大量 CPU
PID USER %CPU TIME+ COMMAND
56056 me 96 2:51.43 Py.sh
Results for Python
Python 的结果
real 9m37.133s
user 8m53.550s
sys 0m8.348s
Results for c++
C++ 的结果
real 3m9.047s
user 2m53.400s
sys 0m2.842s
回答by Hari Menon
$(seq 1 $1)
is computing the whole list before iterating over it. So it takes memory to store the entire list of 10^9
numbers, which is a lot.
$(seq 1 $1)
在迭代之前计算整个列表。因此存储整个10^9
数字列表需要内存,这是很多的。
I am not sure if you can make seq
run lazily, i.e, get the next number only when needed. You can do a simple for loop instead:
我不确定您是否可以seq
懒惰地运行,即仅在需要时获取下一个数字。你可以做一个简单的 for 循环:
for ((i=0; i<;++i))
do
echo "$i $((RANDOM%))"
done