bash 如何连接具有相同前缀(和多个前缀)的文件?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20194294/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to concatenate files with the same prefix (and many prefixes)?
提问by DoubleDecker
I have many files that have the same prefix, only the bit after the underscore is different. And I have many prefixes as well! Underscore does not appear anywhere else in the file name. How can I concatenate all the files with the same prefix into a new one? I am adding that I have thousands of different prefixes and I cannot feed them to the loop.
我有许多具有相同前缀的文件,只有下划线后面的位不同。而且我也有很多前缀!下划线不会出现在文件名的其他任何地方。如何将具有相同前缀的所有文件连接成一个新文件?我补充说我有数千个不同的前缀,我无法将它们提供给循环。
回答by fedorqui 'SO stop harming'
You can do something like:
您可以执行以下操作:
cat /path/prefix* >> new_file
It will cat
(that is, concatenate files and print on the standard output
) all files whose name matches /path/prefix
. The rest of the text is what can be different.
它将cat
(即concatenate files and print on the standard output
)名称匹配的所有文件/path/prefix
。文本的其余部分可能有所不同。
Before executing that it is good to do ls /path/prefix*
to make sure it gets all (and only these) files you want to take into consideration.
在执行之前,最好ls /path/prefix*
确保它获取您想要考虑的所有(并且只有这些)文件。
Example
例子
$ ls
aa_bb prefix_23 prefix_235 prefix_nnn
$ ls prefix_*
prefix_23 prefix_235 prefix_nnn
回答by sage88
I had to do something very similar and I don't feel like the previous answers here solve your problem as they require a huge amount of manual input if there are many different prefixes, not just a few prefixes with lots of files all with the same prefix. If I knew the pattern of your prefix I could give you more specific advice, but for now I'm just going to assume that your prefix is numbering with leading zeros (as it is with my files). I am going to assume the following, but they need not be true to work:
我不得不做一些非常相似的事情,我不觉得这里以前的答案解决了您的问题,因为如果有许多不同的前缀,它们需要大量的手动输入,而不仅仅是几个前缀和许多文件都相同字首。如果我知道你的前缀的模式,我可以给你更具体的建议,但现在我只是假设你的前缀用前导零编号(就像我的文件一样)。我将假设以下内容,但它们不一定是真实的:
~/test01/001-test.txt
~/test01/002-test.txt
~/test01/003-test.txt
~/test02/001-test.txt
~/test02/002-test.txt
~/test02/003-test.txt
Once this is set up I'm going to change into a merge directory where I want all my merged files to be written to and then run the cat command in a for loop.
设置完成后,我将转到合并目录,我希望将所有合并的文件写入其中,然后在 for 循环中运行 cat 命令。
cd ~/merge
for i in {001..003}; do cat ../test*/"$i"*.txt > "$i"-merge.txt ; done
This will use 001, 002, and 003 as prefixes and look in all of the test directories for files that match these prefixes and merge them together in the order they're found. The end result will appear in:
这将使用 001、002 和 003 作为前缀,并在所有测试目录中查找与这些前缀匹配的文件,并按照找到的顺序将它们合并在一起。最终结果将出现在:
~/merge/001-merge.txt
~/merge/002-merge.txt
~/merge/003-merge.txt
I know this is a lot late, but hopefully it helps someone else. I have to do this with 5000 prefixes, so I completely understand.
我知道这已经很晚了,但希望它可以帮助其他人。我必须用 5000 个前缀来做到这一点,所以我完全理解。
回答by AJ Mill
I had a similar problem, had many files and wanted to group and cat
them by prefix, I used this little script:
我有一个类似的问题,有很多文件,想cat
按前缀对它们进行分组,我使用了这个小脚本:
ls | awk -F '_' '!x[]++{print }' | while read -r line
do
cat $line* > all_$line\.txt
done
ls
will show all the files in the directory
ls
将显示目录中的所有文件
In awk
the -F '_'
option is to set the underscore as the delimiter, and the code itself acts like uniq, meaning will print each prefix only once.
在awk
该-F '_'
选项是设置下划线作为分隔符,以及代码本身就像uniq的,这意味着将打印每个前缀只有一次。
Then we run a loop on all prefixes and cat
all the files with the same prefix.
然后我们对所有前缀和cat
具有相同前缀的所有文件运行一个循环。
回答by Alfe
In case your amount of files is very large, then sometimes just using shell globbing (prefix_*
and the like) isn't suitable.
如果您的文件量非常大,那么有时仅使用 shell globbing(prefix_*
等)是不合适的。
You can use a loop and append them one by one then:
您可以使用循环并将它们一一附加,然后:
find dir -type f -name 'prefix_*' -exec bash -c 'cat "{}" >> result' \;
This will append all files matching prefix_*
one by one to the file result
(which shouldn't exist in the beginning, if in doubt use rm result
).
这会将所有匹配prefix_*
的文件一一添加到文件中result
(一开始不应该存在,如果有疑问,请使用rm result
)。
If you have lots of different prefixes, you can of course append one group after the other without removing the result
file in between.
如果您有许多不同的前缀,您当然可以一个接一个地附加一组,而无需删除result
其间的文件。
All the other options the Unix tool find
offers can be used as well of course. But if you need help with that, feel free to ask again.
find
当然也可以使用Unix 工具提供的所有其他选项。但是,如果您需要这方面的帮助,请随时再次询问。