bash 如何在 Linux/Unix 上根据文件类型添加文件扩展名?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/352837/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-17 20:35:28  来源:igfitidea点击:

How to add file extensions based on file type on Linux/Unix?

pythonlinuxbashunixshell

提问by Palmin

This is a question regarding Unix shell scripting (any shell), but any other "standard" scripting language solution would also be appreciated:

这是一个关于 Unix shell 脚本(任何 shell)的问题,但任何其他“标准”脚本语言解决方案也将不胜感激:

I have a directory full of files where the filenames are hash values like this:

我有一个充满文件的目录,其中文件名是这样的哈希值:

fd73d0cf8ee68073dce270cf7e770b97
fec8047a9186fdcc98fdbfc0ea6075ee

These files have different original file types such as png, zip, doc, pdf etc.

这些文件具有不同的原始文件类型,例如 png、zip、doc、pdf 等。

Can anybody provide a script that would rename the files so they get their appropriate file extension, probably based on the output of the filecommand?

任何人都可以提供一个脚本来重命名文件,以便他们获得适当的文件扩展名,可能基于file命令的输出?

Answer:

回答:

J.F. Sebastian'sscript will work for both ouput of the filenames as well as the actual renaming.

JF Sebastian 的脚本既适用于文件名的输出,也适用于实际的重命名。

采纳答案by jfs

Here's mimetypes' version:

这是 mimetypes 的版本:

#!/usr/bin/env python
"""It is a `filename -> filename.ext` filter. 

   `ext` is mime-based.

"""
import fileinput
import mimetypes
import os
import sys
from subprocess import Popen, PIPE

if len(sys.argv) > 1 and sys.argv[1] == '--rename':
    do_rename = True
    del sys.argv[1]
else:
    do_rename = False    

for filename in (line.rstrip() for line in fileinput.input()):
    output, _ = Popen(['file', '-bi', filename], stdout=PIPE).communicate()
    mime = output.split(';', 1)[0].lower().strip()
    ext = mimetypes.guess_extension(mime, strict=False)
    if ext is None:
        ext = os.path.extsep + 'undefined'
    filename_ext = filename + ext
    print filename_ext
    if do_rename:
       os.rename(filename, filename_ext)

Example:

例子:

$ ls *.file? | python add-ext.py --rename
avi.file.avi
djvu.file.undefined
doc.file.dot
gif.file.gif
html.file.html
ico.file.obj
jpg.file.jpe
m3u.file.ksh
mp3.file.mp3
mpg.file.m1v
pdf.file.pdf
pdf.file2.pdf
pdf.file3.pdf
png.file.png
tar.bz2.file.undefined


Following @Phil H's response that follows @csl' response:

继@Phil H 的回应和@csl 的回应之后:

#!/usr/bin/env python
"""It is a `filename -> filename.ext` filter. 

   `ext` is mime-based.
"""
# Mapping of mime-types to extensions is taken form here:
# http://as3corelib.googlecode.com/svn/trunk/src/com/adobe/net/MimeTypeMap.as
mime2exts_list = [
    ["application/andrew-inset","ez"],
    ["application/atom+xml","atom"],
    ["application/mac-binhex40","hqx"],
    ["application/mac-compactpro","cpt"],
    ["application/mathml+xml","mathml"],
    ["application/msword","doc"],
    ["application/octet-stream","bin","dms","lha","lzh","exe","class","so","dll","dmg"],
    ["application/oda","oda"],
    ["application/ogg","ogg"],
    ["application/pdf","pdf"],
    ["application/postscript","ai","eps","ps"],
    ["application/rdf+xml","rdf"],
    ["application/smil","smi","smil"],
    ["application/srgs","gram"],
    ["application/srgs+xml","grxml"],
    ["application/vnd.adobe.apollo-application-installer-package+zip","air"],
    ["application/vnd.mif","mif"],
    ["application/vnd.mozilla.xul+xml","xul"],
    ["application/vnd.ms-excel","xls"],
    ["application/vnd.ms-powerpoint","ppt"],
    ["application/vnd.rn-realmedia","rm"],
    ["application/vnd.wap.wbxml","wbxml"],
    ["application/vnd.wap.wmlc","wmlc"],
    ["application/vnd.wap.wmlscriptc","wmlsc"],
    ["application/voicexml+xml","vxml"],
    ["application/x-bcpio","bcpio"],
    ["application/x-cdlink","vcd"],
    ["application/x-chess-pgn","pgn"],
    ["application/x-cpio","cpio"],
    ["application/x-csh","csh"],
    ["application/x-director","dcr","dir","dxr"],
    ["application/x-dvi","dvi"],
    ["application/x-futuresplash","spl"],
    ["application/x-gtar","gtar"],
    ["application/x-hdf","hdf"],
    ["application/x-javascript","js"],
    ["application/x-koan","skp","skd","skt","skm"],
    ["application/x-latex","latex"],
    ["application/x-netcdf","nc","cdf"],
    ["application/x-sh","sh"],
    ["application/x-shar","shar"],
    ["application/x-shockwave-flash","swf"],
    ["application/x-stuffit","sit"],
    ["application/x-sv4cpio","sv4cpio"],
    ["application/x-sv4crc","sv4crc"],
    ["application/x-tar","tar"],
    ["application/x-tcl","tcl"],
    ["application/x-tex","tex"],
    ["application/x-texinfo","texinfo","texi"],
    ["application/x-troff","t","tr","roff"],
    ["application/x-troff-man","man"],
    ["application/x-troff-me","me"],
    ["application/x-troff-ms","ms"],
    ["application/x-ustar","ustar"],
    ["application/x-wais-source","src"],
    ["application/xhtml+xml","xhtml","xht"],
    ["application/xml","xml","xsl"],
    ["application/xml-dtd","dtd"],
    ["application/xslt+xml","xslt"],
    ["application/zip","zip"],
    ["audio/basic","au","snd"],
    ["audio/midi","mid","midi","kar"],
    ["audio/mpeg","mp3","mpga","mp2"],
    ["audio/x-aiff","aif","aiff","aifc"],
    ["audio/x-mpegurl","m3u"],
    ["audio/x-pn-realaudio","ram","ra"],
    ["audio/x-wav","wav"],
    ["chemical/x-pdb","pdb"],
    ["chemical/x-xyz","xyz"],
    ["image/bmp","bmp"],
    ["image/cgm","cgm"],
    ["image/gif","gif"],
    ["image/ief","ief"],
    ["image/jpeg","jpg","jpeg","jpe"],
    ["image/png","png"],
    ["image/svg+xml","svg"],
    ["image/tiff","tiff","tif"],
    ["image/vnd.djvu","djvu","djv"],
    ["image/vnd.wap.wbmp","wbmp"],
    ["image/x-cmu-raster","ras"],
    ["image/x-icon","ico"],
    ["image/x-portable-anymap","pnm"],
    ["image/x-portable-bitmap","pbm"],
    ["image/x-portable-graymap","pgm"],
    ["image/x-portable-pixmap","ppm"],
    ["image/x-rgb","rgb"],
    ["image/x-xbitmap","xbm"],
    ["image/x-xpixmap","xpm"],
    ["image/x-xwindowdump","xwd"],
    ["model/iges","igs","iges"],
    ["model/mesh","msh","mesh","silo"],
    ["model/vrml","wrl","vrml"],
    ["text/calendar","ics","ifb"],
    ["text/css","css"],
    ["text/html","html","htm"],
    ["text/plain","txt","asc"],
    ["text/richtext","rtx"],
    ["text/rtf","rtf"],
    ["text/sgml","sgml","sgm"],
    ["text/tab-separated-values","tsv"],
    ["text/vnd.wap.wml","wml"],
    ["text/vnd.wap.wmlscript","wmls"],
    ["text/x-setext","etx"],
    ["video/mpeg","mpg","mpeg","mpe"],
    ["video/quicktime","mov","qt"],
    ["video/vnd.mpegurl","m4u","mxu"],
    ["video/x-flv","flv"],
    ["video/x-msvideo","avi"],
    ["video/x-sgi-movie","movie"],
    ["x-conference/x-cooltalk","ice"]]

#NOTE: take only the first extension
mime2ext = dict(x[:2] for x in mime2exts_list)

if __name__ == '__main__':
    import fileinput, os.path
    from subprocess import Popen, PIPE

    for filename in (line.rstrip() for line in fileinput.input()):
        output, _ = Popen(['file', '-bi', filename], stdout=PIPE).communicate()
        mime = output.split(';', 1)[0].lower().strip()
        print filename + os.path.extsep + mime2ext.get(mime, 'undefined')


Here's a snippet for old python's versions (not tested):

这是旧 python 版本的片段(未测试):

#NOTE: take only the first extension
mime2ext = {}
for x in mime2exts_list:
    mime2ext[x[0]] = x[1]

if __name__ == '__main__':
    import os
    import sys

    # this version supports only stdin (part of fileinput.input() functionality)
    lines = sys.stdin.read().split('\n')
    for line in lines:
        filename = line.rstrip()
        output = os.popen('file -bi ' + filename).read()        
        mime = output.split(';')[0].lower().strip()
        try: ext = mime2ext[mime]
        except KeyError:
             ext = 'undefined'
        print filename + '.' + ext

It should work on Python 2.3.5 (I guess).

它应该适用于 Python 2.3.5(我猜)。

回答by csl

You can use

您可以使用

file -i filename

to get a MIME-type. You could potentially lookup the type in a list and then append an extension. You can find a list of MIME-typesand example file extensionson the net.

得到一个 MIME 类型。您可以潜在地在列表中查找类型,然后附加一个扩展名。您可以在网上找到MIME 类型列表示例文件扩展名

回答by Phil H

Following csl's response:

以下 csl 的回应:

You can use

file -i filename

to get a MIME-type. You could potentially lookup the type in a list and then append an extension. You can find list of MIME-types and suggested file extensions on the net.

您可以使用

file -i filename

得到一个 MIME 类型。您可以潜在地在列表中查找类型,然后附加一个扩展名。您可以在网上找到 MIME 类型列表和建议的文件扩展名。

I'd suggest you write a script that takes the output of file -i filename, and returns an extension (split on spaces, find the '/', look up that term in a table file) in your language of choice - a few lines at most. Then you can do something like:

我建议您编写一个脚本,该脚本采用 的输出file -i filename,并以您选择的语言返回扩展名(按空格拆分,找到“/”,在表文件中查找该术语) - 最多几行。然后你可以做这样的事情:

ls | while read f; do mv "$f" "$f".`file -i "$f" | get_extension.py`; done

in bash, or throw that in a bash script. Or make the get_extension script bigger, but that makes it less useful next time you want the relevant extension.

在 bash 中,或将其放入 bash 脚本中。或者将 get_extension 脚本变大,但这会在您下次需要相关扩展时变得不那么有用。

Edit: change from for f in *to ls | while read fbecause the latter handles filenames with spaces in (a particular nightmare on Windows).

编辑:从 更改for f in *ls | while read f因为后者处理带有空格的文件名(Windows 上的特定噩梦)。

回答by Keltia

Of course, it should be added that deciding on a MIME type just based on file(1) output can be very inaccurate/vague (what's "data" ?) or even completely incorrect...

当然,应该补充一点,仅根据 file(1) 输出决定 MIME 类型可能非常不准确/模糊(什么是“数据”?)甚至完全不正确......

回答by silversleevesx

Agreeing with Keltia, and elaborating some on his answer:

同意 Keltia 的观点,并详细说明他的回答:

Take care -- some filetypes may be problematic. JPEG2000, for example.
And others might return too much info given the "file" command without any option tags. The way to avoidthis is to use "file -b" for a brief return of information.

BZT

注意——某些文件类型可能有问题。 例如JPEG2000
鉴于没有任何选项标签的“文件”命令,其他人可能会返回太多信息。避免这种情况的方法是使用“file -b”来简要返回信息。

BZT