Linux 如何将一个文本文件拆分为多个 *.txt 文件?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19031144/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to split one text file into multiple *.txt files?
提问by Kris
I got a text file file.txt
(12 MBs) containing:
我得到了一个文本文件file.txt
(12 MB),其中包含:
something1
something2
something3
something4
(...)
Is there any way to split file.txt
in to 12 *.txt files let say file2.txt
, file3.txt
, file4.txt
(...) ?
有什么办法可以分成file.txt
12 个 *.txt 文件,比如说file2.txt
, file3.txt
, file4.txt
(...) ?
采纳答案by CS Pei
You can use the linux bash core utility split
您可以使用 linux bash 核心实用程序 split
split -b 1M -d file.txt file
Note that M
or MB
both are OK but size is different. MB is 1000 * 1000, M is 1024^2
请注意,M
或MB
两者都可以,但大小不同。MB 是 1000 * 1000,M 是 1024^2
If you want to separate by lines you can use -l
parameter.
如果要按行分隔,可以使用-l
参数。
UPDATE
更新
a=(`wc -l yourfile`) ; lines=`echo $(($a/12)) | bc -l` ; split -l $lines -d file.txt file
Another solution as suggested by Kirill, you can do something like the following
Kirill建议的另一种解决方案,您可以执行以下操作
split -n l/12 file.txt
Note that is l
not one
, split -n
has a few options, like N
, k/N
, l/k/N
, r/N
, r/k/N
.
请注意,l
不是one
,split -n
有几个选项,例如N
、k/N
、l/k/N
、r/N
、r/k/N
。
回答by konsolebox
Using bash:
使用 bash:
readarray -t LINES < file.txt
COUNT=${#LINES[@]}
for I in "${!LINES[@]}"; do
INDEX=$(( (I * 12 - 1) / COUNT + 1 ))
echo "${LINES[I]}" >> "file${INDEX}.txt"
done
Using awk:
使用 awk:
awk '{
a[NR] = $ split -l 100 input_file output_file
}
END {
for (i = 1; i in a; ++i) {
x = (i * 12 - 1) / NR + 1
sub(/\..*$/, "", x)
print a[i] > "file" x ".txt"
}
}' file.txt
Unlike split
this one makes sure that number of lines are most even.
与split
此不同的是,确保行数最均匀。
回答by amruta takawale
split -b=1M -d file.txt file --additional-suffix=.txt
where -l
is the number of lines in each files. This will create:
其中-l
是每个文件中的行数。这将创建:
- output_fileaa
- output_fileab
- output_fileac
- output_filead
- ....
- 输出文件aa
- output_fileab
- output_fileac
- output_filead
- ....
回答by schoon
John's answer won't produce .txt files as the OP wants. Use:
约翰的回答不会像 OP 那样生成 .txt 文件。用:
> split -b 10M -d system.log system_split.log
回答by Nicolas D
Regardless to what is said above, on my ubuntu 16 i had to do :
不管上面说的是什么,在我的 ubuntu 16 上,我必须这样做:
awk -vc=1 'NR%1000000==0{++c}{print split -d -l NUM_LINES really_big_file.txt split_files.txt.
> c".txt"}' Datafile.txt
for filename in *.txt; do mv "$filename" "Prefix_$filename"; done;
Please note the space between -b and the value
请注意 -b 和值之间的空格
回答by Morgan32
Try something like this:
尝试这样的事情:
$ ls -laF
total 1391952
drwxr-xr-x 2 user.name group 40 Sep 14 15:43 ./
drwxr-xr-x 3 user.name group 4096 Sep 14 15:39 ../
-rw-r--r-- 1 user.name group 1425352817 Sep 14 14:01 really_big_file.txt
回答by Ryan
I agree with @CS Pei, however this didn't work for me:
我同意@CS Pei,但这对我不起作用:
split -b=1M -d file.txt file
split -b=1M -d file.txt file
...as the =
after -b
threw it off. Instead, I simply deleted it and left no space between it and the variable, and used lowercase "m":
......因为=
后来-b
把它扔掉了。相反,我只是删除了它并且在它和变量之间不留空格,并使用小写的“m”:
split -b1m -d file.txt file
split -b1m -d file.txt file
And to append ".txt", we use what @schoon said:
并附加“.txt”,我们使用@schoon 所说的:
split -b=1m -d file.txt file --additional-suffix=.txt
split -b=1m -d file.txt file --additional-suffix=.txt
I had a 188.5MB txt file and I used this command [but with -b5m
for 5.2MB files], and it returned 35 split files all of which were txt files and 5.2MB except the last which was 5.0MB. Now, since I wanted my lines to stay whole, I wanted to split the main file every 1 million lines, but the split
command didn't allow me to even do -100000
let alone "-1000000
, so large numbers of lines to split will not work.
我有一个 188.5MB 的 txt 文件,我使用了这个命令 [但-b5m
用于 5.2MB 的文件],它返回了 35 个拆分文件,所有这些文件都是 txt 文件和 5.2MB,除了最后一个是 5.0MB。现在,由于我希望我的行保持完整,我想每 100 万行拆分一次主文件,但是该split
命令甚至不允许我做-100000
更不用说 " -1000000
,因此要拆分的大量行将不起作用。
回答by stackoverflowuser2010
On my Linux system (Red Hat Enterprise 6.9), the split
command does not have the command-line options for either -n
or --additional-suffix
.
在我的 Linux 系统(Red Hat Enterprise 6.9)上,该split
命令没有-n
或的命令行选项--additional-suffix
。
Instead, I've used this:
相反,我使用了这个:
$ split -d -l 30000 really_big_file.txt split_files.txt.
where -d
is to add a numeric suffix to the end of the split_files.txt.
and -l
specifies the number of lines per file.
where-d
是在末尾添加数字后缀split_files.txt.
并-l
指定每个文件的行数。
For example, suppose I have a really big file like this:
例如,假设我有一个非常大的文件,如下所示:
$ ls -laF
total 2783904
drwxr-xr-x 2 user.name group 156 Sep 14 15:43 ./
drwxr-xr-x 3 user.name group 4096 Sep 14 15:39 ../
-rw-r--r-- 1 user.name group 1425352817 Sep 14 14:01 really_big_file.txt
-rw-r--r-- 1 user.name group 428604626 Sep 14 15:43 split_files.txt.00
-rw-r--r-- 1 user.name group 427152423 Sep 14 15:43 split_files.txt.01
-rw-r--r-- 1 user.name group 427141443 Sep 14 15:43 split_files.txt.02
-rw-r--r-- 1 user.name group 142454325 Sep 14 15:43 split_files.txt.03
$ wc -l *.txt*
100000 really_big_file.txt
30000 split_files.txt.00
30000 split_files.txt.01
30000 split_files.txt.02
10000 split_files.txt.03
200000 total
This file has 100,000 lines, and I want to split it into files with at most 30,000 lines. This command will run the split and append an integer at the end of the output file pattern split_files.txt.
.
这个文件有 100,000 行,我想把它分成最多 30,000 行的文件。此命令将运行拆分并在输出文件 pattern 的末尾附加一个整数split_files.txt.
。
The resulting files are split correctly with at most 30,000 lines per file.
结果文件被正确分割,每个文件最多 30,000 行。
##代码##回答by bcag2
If each part have the same lines number, for example 22, here my solution:split --numeric-suffixes=2 --additional-suffix=.txt -l 22 file.txt file
and you obtain file2.txtwith the first 22 lines, file3.txtthe 22 next line…
如果每个部分都有相同的行数,例如 22,这里是我的解决方案:split --numeric-suffixes=2 --additional-suffix=.txt -l 22 file.txt file
你获得file2.txt前 22 行,file3.txt下 22 行......
Thank @hamruta-takawale, @dror-s and @stackoverflowuser2010
感谢@hamruta-takawale、@dror-s 和@stackoverflowuser2010