如何使文件系统的 Ruby 字符串安全?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1939333/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to make a Ruby string safe for a filesystem?
提问by marcgg
I have user entries as filenames. Of course this is not a good idea, so I want to drop everything except [a-z], [A-Z], [0-9], _and -.
我有用户条目作为文件名。当然,这不是一个好主意,所以我想放下一切,除了[a-z],[A-Z],[0-9],_和-。
For instance:
例如:
my§document$is°° very&interesting___thisIs%nice445.doc.pdf
should become
应该成为
my_document_is_____very_interesting___thisIs_nice445_doc.pdf
and then ideally
然后理想情况下
my_document_is_very_interesting_thisIs_nice445_doc.pdf
Is there a nice and elegant way for doing this?
有没有一种漂亮而优雅的方式来做到这一点?
采纳答案by miku
From http://devblog.muziboo.com/2008/06/17/attachment-fu-sanitize-filename-regex-and-unicode-gotcha/:
从http://devblog.muziboo.com/2008/06/17/attachment-fu-sanitize-filename-regex-and-unicode-gotcha/:
def sanitize_filename(filename)
returning filename.strip do |name|
# NOTE: File.basename doesn't work right with Windows paths on Unix
# get only the filename, not the whole path
name.gsub!(/^.*(\|\/)/, '')
# Strip out the non-ascii character
name.gsub!(/[^0-9A-Za-z.\-]/, '_')
end
end
回答by Anders Sj?qvist
I'd like to suggest a solution that differs from the old one. Note that the old one uses the deprecatedreturning. By the way, it's anyway specific to Rails, and you didn't explicitly mention Rails in your question (only as a tag). Also, the existing solution fails to encode .doc.pdfinto _doc.pdf, as you requested. And, of course, it doesn't collapse the underscores into one.
我想提出一种不同于旧解决方案的解决方案。请注意,旧的使用已弃用的returning. 顺便说一句,它无论如何都特定于 Rails,并且您没有在问题中明确提及 Rails (仅作为标签)。此外,现有解决方案无法按照您的要求编码.doc.pdf为_doc.pdf。而且,当然,它不会将下划线合并为一个。
Here's my solution:
这是我的解决方案:
def sanitize_filename(filename)
# Split the name when finding a period which is preceded by some
# character, and is followed by some character other than a period,
# if there is no following period that is followed by something
# other than a period (yeah, confusing, I know)
fn = filename.split /(?<=.)\.(?=[^.])(?!.*\.[^.])/m
# We now have one or two parts (depending on whether we could find
# a suitable period). For each of these parts, replace any unwanted
# sequence of characters with an underscore
fn.map! { |s| s.gsub /[^a-z0-9\-]+/i, '_' }
# Finally, join the parts with a period and return the result
return fn.join '.'
end
You haven't specified all the details about the conversion. Thus, I'm making the following assumptions:
您尚未指定有关转换的所有详细信息。因此,我做出以下假设:
- There should be at most one filename extension, which means that there should be at most one period in the filename
- Trailing periods do not mark the start of an extension
- Leading periods do not mark the start of an extension
- Any sequence of characters beyond
A–Z,a–z,0–9and-should be collapsed into a single_(i.e. underscore is itself regarded as a disallowed character, and the string'$%__°#'would become'_'– rather than'___'from the parts'$%','__'and'°#')
- 最多应该有一个文件扩展名,这意味着文件名中最多应该有一个句点
- 尾随期不标志着延期的开始
- 提前期并不标志着延期的开始
- 以后的任何字符序列
A-Z,a-z,0-9并且-应该合并为一个_(即下划线本身视为不允许的字符和字符串'$%__°#'将成为'_'-而不是'___'从零部件'$%','__'和'°#')
The complicated part of this is where I split the filename into the main part and extension. With the help of a regular expression, I'm searching for the last period, which is followed by something else than a period, so that there are no following periods matching the same criteria in the string. It must, however, be preceded by some character to make sure it's not the first character in the string.
最复杂的部分是我将文件名拆分为主要部分和扩展名。在正则表达式的帮助下,我正在搜索最后一个句点,它后跟一个句点以外的其他内容,以便在字符串中没有与相同条件匹配的后续句点。但是,它必须以某个字符开头,以确保它不是字符串中的第一个字符。
My results from testing the function:
我测试该功能的结果:
1.9.3p125 :006 > sanitize_filename 'my§document$is°° very&interesting___thisIs%nice445.doc.pdf'
=> "my_document_is_very_interesting_thisIs_nice445_doc.pdf"
which I think is what you requested. I hope this is nice and elegant enough.
我认为这就是您的要求。我希望这足够漂亮和优雅。
回答by albandiguer
If you use Rails you can also use String#parameterize. This is not particularly intended for that, but you will obtain a satisfying result.
如果您使用 Rails,您还可以使用 String#parameterize。这并不是专门用于此目的,但您将获得令人满意的结果。
"my§document$is°° very&interesting___thisIs%nice445.doc.pdf".parameterize
回答by morgler
In Rails you might also be able to use sanitizefrom ActiveStorage::Filename:
在 Rails 中,您也可以sanitize从ActiveStorage::Filename 使用:
ActiveStorage::Filename.new("foo:bar.jpg").sanitized # => "foo-bar.jpg"
ActiveStorage::Filename.new("foo/bar.jpg").sanitized # => "foo-bar.jpg"
回答by Blair Anderson
For Rails I found myself wanting to keep any file extensions but using parameterizefor the remainder of the characters:
对于 Rails,我发现自己想要保留任何文件扩展名,但使用parameterize其余字符:
filename = "my§doc$is°° very&itng___thsIs%nie445.doc.pdf"
cleaned = filename.split(".").map(&:parameterize).join(".")
Implementation details and ideas see source:https://github.com/rails/rails/blob/master/activesupport/lib/active_support/inflector/transliterate.rb
实现细节和思路见源码:https : //github.com/rails/rails/blob/master/activesupport/lib/active_support/inflector/transliterate.rb
def parameterize(string, separator: "-", preserve_case: false)
# Turn unwanted chars into the separator.
parameterized_string.gsub!(/[^a-z0-9\-_]+/i, separator)
#... some more stuff
end
回答by Jan Warcho?
回答by David
If your goal is just to generate a filename that is "safe" to use on all operating systems (and not to remove any and all non-ASCII characters), then I would recommend the zarugem. It doesn't do everything the original question specifies, but the filename produced should be safe to use (and still keep any filename-safe unicode characters untouched):
如果您的目标只是生成可在所有操作系统上使用的“安全”文件名(而不是删除任何和所有非 ASCII 字符),那么我会推荐zarugem。它没有完成原始问题指定的所有内容,但生成的文件名应该可以安全使用(并且仍然保持任何文件名安全的 unicode 字符不变):
Zaru.sanitize! " what\ēver//w?ird:user:?nput:"
# => "whatēverw?irduser?nput"
Zaru.sanitize! "my§docu*ment$is°° very&interes:ting___thisIs%nice445.doc.pdf"
# => "my§document$is°° very&interesting___thisIs%nice445.doc.pdf"

