bash shell 脚本内存不足

Question

提问by vefthym

I have written the following random-number generator shell script:

我编写了以下随机数生成器 shell 脚本：

for i in $(seq 1 ) #for as many times, as the first argument () defines...
do 
echo "$i $((RANDOM%))" #print the current iteration number and a random number in [0, )
done

I run it like that:

我是这样运行的：

./generator.sh 1000000000 101 > data.txt

to generate 1B rows of an id and a random number in [0,100] and store this data in file data.txt.

在 [0,100] 中生成 1B 行的 id 和随机数，并将此数据存储在 file 中data.txt。

My desired output is:

我想要的输出是：

It works fine for small number of rows, but with 1B, I get the following OOM error:

它适用于少量行，但对于 1B，我收到以下 OOM 错误：

./generator.sh: xrealloc: ../bash/subst.c:5179: cannot allocate 18446744071562067968 bytes (4299137024 bytes allocated)

./generator.sh: xrealloc: ../bash/subst.c:5179: 无法分配 18446744071562067968 字节（已分配 4299137024 字节）

Which part of my program creates the error? How could I write the data.txtfile line-by-line? I have tried replacing the echoline with:

我的程序的哪一部分产生了错误？我怎么能data.txt一行一行地写文件？我尝试用以下内容替换该echo行：

echo "$i $((RANDOM%))" >>

where $3 is data.txt, but I see no difference.

$3 在哪里data.txt，但我看不出有什么区别。

Answer 1

回答by Martin Tournoij

The problem is your forloop:

问题是你的for循环：

for i in $(seq 1 )

This will firstexpand $(seq 1 $1), creating a very big list, which you then pass to for.

这将首先扩展$(seq 1 $1)，创建一个非常大的列表，然后您将其传递给for。

Using while, however, we can read the output of seqline-by-line, which will take a small amount of memory:

while但是，使用，我们可以seq逐行读取输出，这将占用少量内存：

seq 1 1000000000 | while read i; do
        echo $i
done

Answer 2

回答by Martin Tournoij

If you want it fast this should work.

如果你想快速，这应该有效。

You will need to compile it using g++ using the form

您将需要使用 g++ 使用表单编译它

g++ -o <executable> <C++file>

For example i did it this way

例如我是这样做的

g++ -o inseq.exe CTest.cpp

CTest.cpp

测试文件

#include <iostream>
#include <string>
#include <fstream>
#include <iomanip>
#include <cstdlib>
#include <sstream>

int main (int argc,char *argv[])
{
    std::stringstream ss;
    int x = atoi(argv[1]);
        for(int i=1;i<=x;i++)
        {
                ss << i << "\n";
                if(i%10000==0)
                {
                        std::cout << ss.rdbuf();
                        ss.clear();
                        ss.str(std::string());

                }
        }
std::cout << ss.rdbuf();
ss.clear();
ss.str(std::string());
}

Speed comparisons

速度比较

Lowest speeds of 3 tests for each of the methods presented for a 1000000 line file.

对于 1000000 行文件，每种方法进行了 3 次测试的最低速度。

Jidder

吉德

$ time ./inseq 1000000 > file

real    0m0.143s
user    0m0.131s
sys     0m0.011s

Carpetsmoker

地毯吸烟者

$ cat Carpet.sh

#!/bin/bash

seq 1  | while read i; do
    echo $i
done

.

$ time ./Carpet.sh 1000000 > file

 real    0m12.223s
 user    0m9.753s
 sys     0m2.140s

Hari Shankar

哈里·香卡

$ cat Hari.sh

#!/bin/bash

for ((i=0; i<;++i))
do
  echo "$i $((RANDOM%))"
done

.

$ time ./Hari.sh 1000000 > file
real    0m9.729s
user    0m8.084s
sys     0m1.064s

As you can see from the results, my way is slightlyfaster by about 60-70*.

从结果中可以看出，我的方式稍微快了大约 60-70*。

Edit

编辑

Because python is great

因为python很棒

$ cat Py.sh

#!/usr/bin/python

for x in xrange(1, 1000000):
print (x)

'

$ time ./Py.sh >file

real    0m0.543s
user    0m0.499s
sys     0m0.016s

4* slower than c++ so if the file was going to take an hour to make it would take 4 with these two lines.

4 * 比 C++ 慢，所以如果文件需要一个小时来制作，这两行需要 4。

EDIT 2

编辑 2

Decided to try Python and c++ on the 1000000000 line file

决定在 1000000000 行文件上尝试 Python 和 C++

For a none CPU-intensive task this seems to be using a lottt of cpu

对于非 CPU 密集型任务，这似乎使用了大量 CPU

PID USER  %CPU   TIME+  COMMAND
56056 me  96     2:51.43 Py.sh

Results for Python

Python 的结果

real    9m37.133s
user    8m53.550s
sys     0m8.348s

Results for c++

C++ 的结果

 real    3m9.047s
 user    2m53.400s
 sys     0m2.842s

Answer 3

回答by Hari Menon

$(seq 1 $1)is computing the whole list before iterating over it. So it takes memory to store the entire list of 10^9numbers, which is a lot.

$(seq 1 $1)在迭代之前计算整个列表。因此存储整个10^9数字列表需要内存，这是很多的。

I am not sure if you can make seqrun lazily, i.e, get the next number only when needed. You can do a simple for loop instead:

我不确定您是否可以seq懒惰地运行，即仅在需要时获取下一个数字。你可以做一个简单的 for 循环：

for ((i=0; i<;++i))
do
  echo "$i $((RANDOM%))"
done

bash shell 脚本内存不足

提问by vefthym

回答by Martin Tournoij

回答by Martin Tournoij

Speed comparisons

速度比较

Jidder

吉德

Carpetsmoker

地毯吸烟者

Hari Shankar

哈里·香卡

Edit

编辑

EDIT 2

编辑 2

回答by Hari Menon

相关推荐

最近更新

标签

bash shell 脚本内存不足

提问by vefthym

回答by Martin Tournoij

回答by Martin Tournoij

Speed comparisons

速度比较

Jidder

吉德

Carpetsmoker

地毯吸烟者

Hari Shankar

哈里·香卡

Edit

编辑

EDIT 2

编辑 2

回答by Hari Menon

相关推荐

bash 使用 join/awk/sed 合并 CSV 文件

bash 在 adb shell 中找不到 grep 命令

从另一个脚本运行 bash 脚本并在第二个脚本运行时退出第一个脚本

如何在厨师中运行 bash 脚本文件？

相关推荐

最近更新

标签