list rsync 获取仅包含文件名的列表

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/9102313/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-11 01:46:45  来源:igfitidea点击:

rsync to get a list of only file names

filelistfilenamesrsync

提问by user1172282

Here's an example of the command I'm using:

这是我正在使用的命令的示例:

rsync --list-only --include "*2012*.xml" -exclude "*.xml" serveripaddress::pt/dir/files/ --port=111 > output.txt

How can I get a listing of just the file names without the extra information like permissions, timestamp, etc.?

如何获得仅列出文件名而没有权限、时间戳等额外信息的列表?

Edit: And is it possible to output each file name on a new line?

编辑:是否可以在新行上输出每个文件名?

采纳答案by glglgl

Hoping the question will be moved to the appropriate site, I'll answer here nevertheless.

希望问题能移到合适的网站,我还是会在这里回答。

You could append a pipe with awk:

你可以附加一个管道awk

rsync ... | awk '{ ====""; print substr(
def rsync_list(fileobj):
    import re
    # Regex to identify a line
    line_re = re.compile(r'.{10} +\d+ ..../../.. ..:..:.. (.*)\n')
    # Regex for escaping
    quoted_re = re.compile(r'\#(\d\d\d)')
    for line in fileobj:
        match = line_re.match(line)
        assert match, repr(line) # error if not found...
        quoted_fname = match.group(1) # the filename part ...
        # ... must be unquoted:
        fname = quoted_re.sub( # Substitute the matching part...
            lambda m: chr(int(m.group(1), 8)), # ... with the result of this function ...
            quoted_fname)                      # ... while looking at this string.
        yield fname

if __name__ == '__main__':
    import sys
    for fname in rsync_list(sys.stdin):
        #import os
        #print repr(fname), os.access(fname, os.F_OK)
        #print repr(fname)
        sys.stdout.write(fname + '
rsync . | python rsf.py | xan -0 stat -c '%i'
')
,5); }' >output.txt

This eliminates all the unwanted information by outputting everything from the 5th field, but works only if none of the first four fields in the output format gets an additional whitespace somewhere (which is unlikely).

这通过输出第 5 个字段中的所有内容来消除所有不需要的信息,但仅当输出格式中的前四个字段都没有在某处获得额外的空格时才有效(这不太可能)。

This awksolution won't work if there are file names starting with whitespace.

awk如果文件名以空格开头,则此解决方案将不起作用。

An even more robust way to solve could be a rather complex program which as well makes assumptions.

一个更强大的解决方法可能是一个相当复杂的程序,它也会做出假设。

It works this way: For each line,

它是这样工作的:对于每一行,

  • Cut off the first 10 bytes. Verify that they are followed by a number of spaces. Cut them off as well.
  • Cut off all following digits. Verify that they are followed by one space. Cut that off as well.
  • Cut off the next 19 bytes. Verify that they contain a date and a time stamp in the appropriate format. (I don't know why the date's components are separated with /instead of -- it is not compliant with ISO 8601.)
  • Verify that now one space follows. Cut that off as well. Leave any following whitespace characters intact, as they belong to the file name.
  • If the test has passed all these verifications, it is likely that the remainder of that line will contain the file name.
  • 截断前 10 个字节。验证它们后面是否有多个空格。也把它们剪掉。
  • 截掉所有后面的数字。验证它们后跟一个空格。把它也剪掉。
  • 切断接下来的 19 个字节。验证它们是否包含适当格式的日期和时间戳。(我不知道为什么日期的组成部分用/而不是分开-- 它不符合ISO 8601。)
  • 验证现在后面跟着一个空格。把它也剪掉。保留所有后续空白字符不变,因为它们属于文件名。
  • 如果测试通过了所有这些验证,则该行的其余部分可能会包含文件名。


It gets even worse: for very esoteric corner cases, there are even more things to watch: File names can be escaped. Certain unprintable bytes are replaced by an escape sequence (#ooowith ooobeing their octal code), a process which must be reversed.

更糟糕的是:对于非常深奥的极端情况,还有更多需要注意的事情:文件名可以被转义。某些不可打印的字节被转义序列(#ooo使用ooo它们的八进制代码)替换,这个过程必须颠倒。

Thus, neither awknor a simple sedscript will do here if we want to do it properly.

因此,如果我们想正确地完成它,无论awk是简单的sed脚本都不会在这里完成。

Instead, the following Python script can be used:

相反,可以使用以下 Python 脚本:

DIR=`mktemp -d /tmp/rsync.XXXXXX`
rsync -nr --out-format='%n' serveripaddress::pt/dir/files/ $DIR > output.txt
rmdir $DIR

This outputs the list of file names separated by NUL characters, similiar to the way find -print0and many other tools work so that even a file name containing a newline character (which is valid!) is retained correctly:

这会输出由 NUL 字符分隔的文件名列表,类似于find -print0许多其他工具的工作方式,因此即使包含换行符(有效!)的文件名也能正确保留:

rsync -nr --out-format='%n' serveripaddress::pt/dir/files/ $(mktemp -d --dry-run) > output.txt

correctly shows the inode number of every given file.

正确显示每个给定文件的 inode 编号。

Certainly I may have missed the one or other corner case I didn't think of, but I think that the script correctly handles the very most cases (I tested with all 255 thinkable one-byte-filenames as well as a file name starting with a space).

当然,我可能错过了我没有想到的一个或其他极端情况,但我认为该脚本正确地处理了大多数情况(我测试了所有 255 个可能的一字节文件名以及以开头的文件名空间)。

回答by William Entriken

After years of work, here is my solution to this age-old problem:

经过多年的工作,这是我对这个古老问题的解决方案:

##代码##

回答by Ark-kun

rsync ... | sed -E 's|^([^\s]+\s+){4}||'

rsync ... | sed -E 's|^([^\s]+\s+){4}||'

回答by bxm

Further to https://stackoverflow.com/a/29522388/2858703

进一步到https://stackoverflow.com/a/29522388/2858703

If your mktempsupports the --dry-runoption, there's no need to actually create the temporary directory:

如果您mktemp支持该--dry-run选项,则无需实际创建临时目录:

##代码##