string 在 Bash 中提取文件名和扩展名

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/965053/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 00:27:52  来源:igfitidea点击:

Extract filename and extension in Bash

bashstringfilenames

提问by ibz

I want to get the filename (without extension) and the extension separately.

我想分别获取文件名(不带扩展名)和扩展名。

The best solution I found so far is:

到目前为止,我找到的最佳解决方案是:

NAME=`echo "$FILE" | cut -d'.' -f1`
EXTENSION=`echo "$FILE" | cut -d'.' -f2`

This is wrong because it doesn't work if the file name contains multiple .characters. If, let's say, I have a.b.js, it will consider aand b.js, instead of a.band js.

这是错误的,因为如果文件名包含多个.字符,则它不起作用。如果,比方说,我有a.b.js,它会考虑aand b.js,而不是a.band js

It can be easily done in Python with

它可以在 Python 中轻松完成

file, ext = os.path.splitext(path)

but I'd prefer not to fire up a Python interpreter just for this, if possible.

但如果可能的话,我不想为此启动 Python 解释器。

Any better ideas?

有什么更好的想法吗?

回答by Petesh

First, get file name without the path:

首先,获取不带路径的文件名:

filename=$(basename -- "$fullfile")
extension="${filename##*.}"
filename="${filename%.*}"

Alternatively, you can focus on the last '/' of the path instead of the '.' which should work even if you have unpredictable file extensions:

或者,您可以关注路径的最后一个 '/' 而不是 '.' 即使您有不可预测的文件扩展名,这也应该有效:

filename="${fullfile##*/}"

You may want to check the documentation :

您可能需要查看文档:

回答by Juliano

~% FILE="example.tar.gz"

~% echo "${FILE%%.*}"
example

~% echo "${FILE%.*}"
example.tar

~% echo "${FILE#*.}"
tar.gz

~% echo "${FILE##*.}"
gz

For more details, see shell parameter expansionin the Bash manual.

有关更多详细信息,请参阅Bash 手册中的shell 参数扩展

回答by Tomi Po

Usually you already know the extension, so you might wish to use:

通常您已经知道扩展名,因此您可能希望使用:

basename filename .extension

for example:

例如:

basename /path/to/dir/filename.txt .txt

and we get

我们得到

filename

回答by sotapme

You can use the magic of POSIX parameter expansion:

您可以使用 POSIX 参数扩展的魔力:

bash-3.2$ FILENAME=somefile.tar.gz
bash-3.2$ echo "${FILENAME%%.*}"
somefile
bash-3.2$ echo "${FILENAME%.*}"
somefile.tar


There's a caveat in that if your filename was of the form ./somefile.tar.gzthen echo ${FILENAME%%.*}would greedily remove the longest match to the .and you'd have the empty string.

有一个警告,如果你的文件名是这种形式,./somefile.tar.gz那么echo ${FILENAME%%.*}就会贪婪地删除最长的匹配项.,你就会得到空字符串。

(You can work around that with a temporary variable:

(您可以使用临时变量解决这个问题:

FULL_FILENAME=$FILENAME
FILENAME=${FULL_FILENAME##*/}
echo ${FILENAME%%.*}

)

)



This siteexplains more.

这个网站解释了更多。

${variable%pattern}
  Trim the shortest match from the end
${variable##pattern}
  Trim the longest match from the beginning
${variable%%pattern}
  Trim the longest match from the end
${variable#pattern}
  Trim the shortest match from the beginning

回答by Doctor J

That doesn't seem to work if the file has no extension, or no filename. Here is what I'm using; it only uses builtins and handles more (but not all) pathological filenames.

如果文件没有扩展名或没有文件名,这似乎不起作用。这是我正在使用的;它只使用内置函数并处理更多(但不是全部)病态文件名。

#!/bin/bash
for fullpath in "$@"
do
    filename="${fullpath##*/}"                      # Strip longest match of */ from start
    dir="${fullpath:0:${#fullpath} - ${#filename}}" # Substring from 0 thru pos of filename
    base="${filename%.[^.]*}"                       # Strip shortest match of . plus at least one non-dot char from end
    ext="${filename:${#base} + 1}"                  # Substring from len of base thru end
    if [[ -z "$base" && -n "$ext" ]]; then          # If we have an extension and no base, it's really the base
        base=".$ext"
        ext=""
    fi

    echo -e "$fullpath:\n\tdir  = \"$dir\"\n\tbase = \"$base\"\n\text  = \"$ext\""
done

And here are some testcases:

这里有一些测试用例:

$ basename-and-extension.sh / /home/me/ /home/me/file /home/me/file.tar /home/me/file.tar.gz /home/me/.hidden /home/me/.hidden.tar /home/me/.. .
/:
    dir  = "/"
    base = ""
    ext  = ""
/home/me/:
    dir  = "/home/me/"
    base = ""
    ext  = ""
/home/me/file:
    dir  = "/home/me/"
    base = "file"
    ext  = ""
/home/me/file.tar:
    dir  = "/home/me/"
    base = "file"
    ext  = "tar"
/home/me/file.tar.gz:
    dir  = "/home/me/"
    base = "file.tar"
    ext  = "gz"
/home/me/.hidden:
    dir  = "/home/me/"
    base = ".hidden"
    ext  = ""
/home/me/.hidden.tar:
    dir  = "/home/me/"
    base = ".hidden"
    ext  = "tar"
/home/me/..:
    dir  = "/home/me/"
    base = ".."
    ext  = ""
.:
    dir  = ""
    base = "."
    ext  = ""

回答by Bjarke Freund-Hansen

You can use basename.

您可以使用basename.

Example:

例子:

$ basename foo-bar.tar.gz .tar.gz
foo-bar

You do need to provide basename with the extension that shall be removed, however if you are always executing tarwith -zthen you know the extension will be .tar.gz.

您确实需要为 basename 提供应删除的扩展名,但是如果您总是执行tarwith-z那么您知道扩展名将是.tar.gz.

This should do what you want:

这应该做你想做的:

tar -zxvf 
cd $(basename  .tar.gz)

回答by paxdiablo

pax> echo a.b.js | sed 's/\.[^.]*$//'
a.b
pax> echo a.b.js | sed 's/^.*\.//'
js

works fine, so you can just use:

工作正常,所以你可以使用:

pax> FILE=a.b.js
pax> NAME=$(echo "$FILE" | sed 's/\.[^.]*$//')
pax> EXTENSION=$(echo "$FILE" | sed 's/^.*\.//')
pax> echo $NAME
a.b
pax> echo $EXTENSION
js

The commands, by the way, work as follows.

顺便说一下,这些命令的工作方式如下。

The command for NAMEsubstitutes a "."character followed by any number of non-"."characters up to the end of the line, with nothing (i.e., it removes everything from the final "."to the end of the line, inclusive). This is basically a non-greedy substitution using regex trickery.

命令 forNAME替换一个"."字符,后跟任意数量的非"."字符直到行尾,没有任何内容(即,它删除从"."行尾到行尾的所有内容,包括在内)。这基本上是使用正则表达式技巧的非贪婪替换。

The command for EXTENSIONsubstitutes a any number of characters followed by a "."character at the start of the line, with nothing (i.e., it removes everything from the start of the line to the final dot, inclusive). This is a greedy substitution which is the default action.

命令 forEXTENSION替换任意数量的字符,然后是行首的一个"."字符,没有任何内容(即,它删除从行首到最后一个点的所有内容,包括在内)。这是一个贪婪的替换,它是默认操作。

回答by Kebabbert

Mellen writes in a comment on a blog post:

梅伦在一篇博客文章的评论中写道:

Using Bash, there's also ${file%.*}to get the filename without the extension and ${file##*.}to get the extension alone. That is,

使用 Bash,还${file%.*}可以获取不带扩展名的文件名并${file##*.}单独获取扩展名。那是,

file="thisfile.txt"
echo "filename: ${file%.*}"
echo "extension: ${file##*.}"

Outputs:

输出:

filename: thisfile
extension: txt

回答by Cyker

No need to bother with awkor sedor even perlfor this simple task. There is a pure-Bash, os.path.splitext()-compatible solution which only uses parameter expansions.

无需费心awk或者sed甚至perl为这个简单的任务。有一个纯 Bashos.path.splitext()兼容的解决方案,它只使用参数扩展。

Reference Implementation

参考实现

Documentation of os.path.splitext(path):

的文件os.path.splitext(path)

Split the pathname path into a pair (root, ext)such that root + ext == path, and extis empty or begins with a period and contains at most one period. Leading periods on the basename are ignored; splitext('.cshrc')returns ('.cshrc', '').

将路径名路径拆分为一对(root, ext),使得root + ext == path, 和ext为空或以句点开头并且最多包含一个句点。基本名称上的前导句点将被忽略;splitext('.cshrc')返回('.cshrc', '')

Python code:

蟒蛇代码:

root, ext = os.path.splitext(path)

Bash Implementation

Bash 实现

Honoring leading periods

表彰领先时期

root="${path%.*}"
ext="${path#"$root"}"

Ignoring leading periods

忽略领先期

root="${path#.}";root="${path%"$root"}${root%.*}"
ext="${path#"$root"}"

Tests

测试

Here are test cases for the Ignoring leading periodsimplementation, which should match the Python reference implementation on every input.

以下是忽略领先期实现的测试用例,它应该与每个输入的 Python 参考实现相匹配。

|---------------|-----------|-------|
|path           |root       |ext    |
|---------------|-----------|-------|
|' .txt'        |' '        |'.txt' |
|' .txt.txt'    |' .txt'    |'.txt' |
|' txt'         |' txt'     |''     |
|'*.txt.txt'    |'*.txt'    |'.txt' |
|'.cshrc'       |'.cshrc'   |''     |
|'.txt'         |'.txt'     |''     |
|'?.txt.txt'    |'?.txt'    |'.txt' |
|'\n.txt.txt'   |'\n.txt'   |'.txt' |
|'\t.txt.txt'   |'\t.txt'   |'.txt' |
|'a b.txt.txt'  |'a b.txt'  |'.txt' |
|'a*b.txt.txt'  |'a*b.txt'  |'.txt' |
|'a?b.txt.txt'  |'a?b.txt'  |'.txt' |
|'a\nb.txt.txt' |'a\nb.txt' |'.txt' |
|'a\tb.txt.txt' |'a\tb.txt' |'.txt' |
|'txt'          |'txt'      |''     |
|'txt.pdf'      |'txt'      |'.pdf' |
|'txt.tar.gz'   |'txt.tar'  |'.gz'  |
|'txt.txt'      |'txt'      |'.txt' |
|---------------|-----------|-------|

Test Results

检测结果

All tests passed.

所有测试都通过了。

回答by Some programmer dude

You could use the cutcommand to remove the last two extensions (the ".tar.gz"part):

您可以使用该cut命令删除最后两个扩展名(".tar.gz"部分):

$ echo "foo.tar.gz" | cut -d'.' --complement -f2-
foo


As noted by Clayton Hughes in a comment, this will not work for the actual example in the question. So as an alternative I propose using sedwith extended regular expressions, like this:

正如克莱顿休斯在评论中指出的那样,这不适用于问题中的实际示例。因此,作为替代方案,我建议使用sed扩展正则表达式,如下所示:

$ echo "mpc-1.0.1.tar.gz" | sed -r 's/\.[[:alnum:]]+\.[[:alnum:]]+$//'
mpc-1.0.1

It works by removing the last two (alpha-numeric) extensions unconditionally.

它的工作原理是无条件删除最后两个(字母数字)扩展名。

[Updated again after comment from Anders Lindahl]

[在 Anders Lindahl 发表评论后再次更新]