从文件名中删除特殊字符的批处理文件脚本 (Windows)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/261515/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Batch file script to remove special characters from filenames (Windows)
提问by njr101
I have a large set of files, some of which contain special characters in the filename (e.g. ?,?,%, and others). I'd like a script file to iterate over these files and rename them removing the special characters. I don't really mind what it does, but it could replace them with underscores for example e.g.
我有一大堆文件,其中一些在文件名中包含特殊字符(例如?、?、% 等)。我想要一个脚本文件来遍历这些文件并重命名它们以删除特殊字符。我真的不介意它做什么,但它可以用下划线替换它们,例如
St?rung%20.doc would be renamed to St_rung_20.doc
St?rung%20.doc 将重命名为 St_rung_20.doc
In order of preference:
按优先顺序:
- A Windiws batch file
- A Windows script file to run with cscript (vbs)
- A third party piece of software that can be run from the command-line (i.e. no user interaction required)
- Another language script file, for which I'd have to install an additional script engine
- Windiws 批处理文件
- 使用 cscript (vbs) 运行的 Windows 脚本文件
- 可以从命令行运行的第三方软件(即不需要用户交互)
- 另一个语言脚本文件,我必须为它安装一个额外的脚本引擎
Background: I'm trying to encrypt these file with GnuPG on Windows but it doesn't seem to handle special characters in filenames with the --encrypt-files option.
背景:我试图在 Windows 上使用 GnuPG 加密这些文件,但它似乎没有使用 --encrypt-files 选项处理文件名中的特殊字符。
采纳答案by njr101
Thanks to Tomalak who actually pointed me in the right direction. Thought I'd post here for completeness.
感谢 Tomalak,他实际上为我指明了正确的方向。以为我会在这里张贴完整性。
The problem seems to be that the codepage used by GPG is fixed (Latin I) independent of the codepage configured in the console. But once he pointed this out, I figured out how to workaraound this.
问题似乎是 GPG 使用的代码页是固定的(拉丁文 I),独立于控制台中配置的代码页。但是一旦他指出这一点,我就想出了如何解决这个问题。
The trick is to change the codepage before generating the file list. This will actually make the filelist appear to be incorrect when viewed in the console. However, when passed to GPG, it works fine. GPG accepts the files and spits out the encrytped files with correct filenames.
诀窍是在生成文件列表之前更改代码页。这实际上会使在控制台中查看时文件列表看起来不正确。但是,当传递给 GPG 时,它工作正常。GPG 接受文件并以正确的文件名吐出加密文件。
The batch file looks something like this:
批处理文件如下所示:
chcp 1252
dir /b /s /a-d MyFolder >filelist.txt
gpg -r [email protected] --encrypt-files <filelist.txt
回答by Tomalak
Have you tried setting cmd.exe into another codepage before you feed the file names to gnupg? Issue chcp 65001
to set cmd.exe to Unicode beforehand and try again.
在将文件名提供给 gnupg 之前,您是否尝试将 cmd.exe 设置为另一个代码页?问题chcp 65001
来设定CMD.EXE为Unicode事先并再试一次。
If that fails, the following VBScript would do it:
如果失败,则以下 VBScript 会执行此操作:
Option Explicit
Dim fso: Set fso = CreateObject("Scripting.FileSystemObject")
Dim invalidChars: Set invalidChars = New RegExp
' put all characters that you want to strip inside the brackets
invalidChars.Pattern = "[??ü?&%]"
invalidChars.IgnoreCase = True
invalidChars.Global = True
If WScript.Arguments.Unnamed.Count = 0 Then
WScript.Echo "Please give folder name as argument 1."
WScript.Quit 1
End If
Recurse fso.GetFolder(WScript.Arguments.Unnamed(0))
Sub Recurse(f)
Dim item
For Each item In f.SubFolders
Recurse item
Sanitize item
Next
For Each item In f.Files
Sanitize item
Next
End Sub
Sub Sanitize(folderOrFile)
Dim newName: newName = invalidChars.Replace(folderOrFile.Name, "_")
If folderOrFile.Name = newName Then Exit Sub
WScript.Echo folderOrFile.Name, " -> ", newName
folderOrFile.Name = newName
End Sub
call it like this:
像这样称呼它:
cscript replace.vbs "c:\path\to\my\files"
You can also drag&drop a folder onto it in Windows Explorer.
您还可以在 Windows 资源管理器中将文件夹拖放到它上面。
回答by Treb
From http://www.robvanderwoude.com/bht.html:
来自http://www.robvanderwoude.com/bht.html:
use NT's SET's string substitution to replace or remove characters anywhere in a string:
使用 NT 的 SET 字符串替换来替换或删除字符串中任意位置的字符:
SET STRING=[ABCDEFG]
SET STRING=%STRING:[=%
SET STRING=%STRING:]=%
ECHO String: %STRING%
will display
String: ABCDEFG
SET STRING=[ABCDEFG]
SET STRING=%STRING:[=(%
SET STRING=%STRING:]=)%
ECHO String: %STRING%
will display
String: (ABCDEFG)
SET STRING=[ABCDEFG]
SET STRING=%STRING:~1,7%
ECHO String: %STRING%
will display
String: ABCDEFG
If you use this attempt, you will have to process each character you want to replace (e.g. ?,?,ü,?,?,ü,?, but also á,à,é,è...) seperately.
如果您尝试使用此方法,则必须单独处理要替换的每个字符(例如 ?,?,ü,?,?,ü,?,以及 á,à,é,è...)。
回答by Etienne URBAH
Following 'RenameFilesWithAccentedAndDiacriticalLatinChars.pl' PERL script renames files with accented and diacritical Latin characters :
遵循 'RenameFilesWithAccentedAndDiacriticalLatinChars.pl' PERL 脚本重命名带有重音和变音拉丁字符的文件:
- This PERL script starts from the folder given in parameter, or else from the current folder.
- It recursively searches for files with characters belonging to 80 - FF of CP 1250, CP 1252, CP 1254 and CP 1257 (mostly accented Latin characters) or Latin characters having diacritical marks.
- It calculates new file names by removing the accents and diacritical marks only from Latin characters (For example, été --> Ete).
- It displays all proposed renaming and perhaps conflicts, and asks the user for global approval.
- If the user has approved, it renames all files having no conflict.
- 这个 PERL 脚本从参数中给定的文件夹开始,或者从当前文件夹开始。
- 它递归地搜索包含属于 CP 1250、CP 1252、CP 1254 和 CP 1257 的 80 - FF 字符(主要是重音拉丁字符)或带有变音符号的拉丁字符的文件。
- 它通过仅从拉丁字符中删除重音和变音符来计算新文件名(例如,été --> Ete)。
- 它显示所有提议的重命名和可能的冲突,并要求用户进行全局批准。
- 如果用户已批准,它会重命名所有没有冲突的文件。
Option '--batch' avoids interactive questions. Use with care.
选项“--batch”避免了交互式问题。小心使用。
Option '--' avoids the next parameter to be interpreted as option.
选项“--”避免将下一个参数解释为选项。
Special Warning:
特别警告:
- This script was originally encoded in UTF-8, and should stay so.
- This script may rename a lot of files.
- Files names are theoretically all encoded only with UTF-8. But some file names may be found to contain also some characters having legacy encoding.
- The author has applied efforts for consistency checks, robustness, conflict detection and use of appropriate encoding. So this script should only rename files by removing accents and diacritical marks from Latin characters.
- But this script has been tested only under a limited number of OS (Windows, Mac OS X, Linux) and a limited number of terminal encodings (CP 850, ISO-8859-1, UTF-8).
- So, under weird circumstances, this script could rename many files with random names.
- Therefore, this script should be used with care, and modified with extreme care (beware encoding of internal strings, inputs, outputs and commands)
- 这个脚本最初是用 UTF-8 编码的,应该保持这样。
- 这个脚本可能会重命名很多文件。
- 文件名理论上都是只用 UTF-8 编码的。但是可能会发现某些文件名还包含某些具有旧编码的字符。
- 作者在一致性检查、鲁棒性、冲突检测和适当编码的使用方面付出了努力。所以这个脚本应该只通过从拉丁字符中删除重音符号和变音符号来重命名文件。
- 但是此脚本仅在有限数量的操作系统(Windows、Mac OS X、Linux)和有限数量的终端编码(CP 850、ISO-8859-1、UTF-8)下进行了测试。
- 所以,在奇怪的情况下,这个脚本可以用随机名称重命名许多文件。
- 因此,此脚本应谨慎使用,并极其谨慎地修改(注意内部字符串、输入、输出和命令的编码)
#!/usr/bin/perl -w
#=============================================================================
#
# Copyright 2010 Etienne URBAH
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details at
# http://www.gnu.org/licenses/gpl.html
#
# For usage and SPECIAL WARNING, see the 'Help' section below.
#
#=============================================================================
use 5.008_000; # For correct Unicode support
use warnings;
use strict;
use Encode;
$| = 1; # Autoflush STDOUT
#-----------------------------------------------------------------------------
# Function ucRemoveEolUnderscoreDash :
# Set Uppercase, remove End of line, Underscores and Dashes
#-----------------------------------------------------------------------------
sub ucRemoveEolUnderscoreDash
{
local $_ = uc($_[0]);
chomp;
tr/_\-//d;
$_;
}
#-----------------------------------------------------------------------------
# Constants
#-----------------------------------------------------------------------------
my $Encoding_Western = 'ISO-8859-1';
my $Encoding_Central = 'ISO-8859-2';
my $Encoding_Baltic = 'ISO-8859-4';
my $Encoding_Turkish = 'ISO-8859-9';
my $Encoding_W_Euro = 'ISO-8859-15';
my $Code_Page_OldWest = 850;
my $Code_Page_Central = 1250;
my $Code_Page_Western = 1252;
my $Code_Page_Turkish = 1254;
my $Code_Page_Baltic = 1257;
my $Code_Page_UTF8 = 65001;
my $HighBitSetChars = pack('C*', 0x80..0xFF);
my %SuperEncodings =
( &ucRemoveEolUnderscoreDash($Encoding_Western), 'cp'.$Code_Page_Western,
&ucRemoveEolUnderscoreDash($Encoding_Central), 'cp'.$Code_Page_Central,
&ucRemoveEolUnderscoreDash($Encoding_Baltic), 'cp'.$Code_Page_Baltic,
&ucRemoveEolUnderscoreDash($Encoding_Turkish), 'cp'.$Code_Page_Turkish,
&ucRemoveEolUnderscoreDash($Encoding_W_Euro), 'cp'.$Code_Page_Western,
&ucRemoveEolUnderscoreDash('cp'.$Code_Page_OldWest),
'cp'.$Code_Page_Western );
my %EncodingNames = ( 'cp'.$Code_Page_Central, 'Central European',
'cp'.$Code_Page_Western, 'Western European',
'cp'.$Code_Page_Turkish, ' Turkish ',
'cp'.$Code_Page_Baltic, ' Baltic ' );
my %NonAccenChars = (
#--------------------------------#
'cp'.$Code_Page_Central, # Central European (cp1250) #
#--------------------------------#
#_?_?…??_‰??????_‘'“”?–—_???????#
'E_,_,.++_%S_STZZ_````.--_Ts_stzz'.
#?ˇ??¤?|§¨????-??°±??′μ?·????????#
'_``LoAlS`CS_--RZ`+,l`uP.,as_L~lz'.
#?á???????é??ěí?????ó???×??ú?üY??#
'RAAAALCCCEEEEIIDDNNOOOOxRUUUUYTS'.
#?áa??????é??ěí???ńňó???÷??ú?üy?˙#
'raaaalccceeeeiiddnnoooo%ruuuuyt`',
#--------------------------------#
'cp'.$Code_Page_Western, # Western European (cp1252) #
#--------------------------------#
#_???…???‰???_?__‘'“”?–—?????_??#
'E_,f,.++^%S_O_Z__````.--~Ts_o_zY'.
#??¢£¤¥|§¨?a??-?ˉ°±23′μ?·?1o?????#
'_!cLoYlS`Ca_--R-`+23`uP.,10_qh3_'.
#àá??????èéê?ìí??D?òó???×?ùú?üYT?#
'AAAAAAACEEEEIIIIDNOOOOOxOUUUUYTS'.
#àáa?????èéê?ìí??e?òó???÷?ùú?üyt?#
'aaaaaaaceeeeiiiidnooooo%ouuuuyty',
#--------------------------------#
'cp'.$Code_Page_Turkish, # Turkish (cp1254) #
#--------------------------------#
#_???…???‰???____‘'“”?–—?????__?#
'E_,f,.++^%S_O____````.--~Ts_o__Y'.
#??¢£¤¥|§¨?a??-?ˉ°±23′μ?·?1o?????#
'_!cLoYlS`Ca_--R-`+23`uP.,10_qh3_'.
#àá??????èéê?ìí????òó???×?ùú?ü???#
'AAAAAAACEEEEIIIIGNOOOOOxOUUUUISS'.
#àáa?????èéê?ìí????òó???÷?ùú?ü???#
'aaaaaaaceeeeiiiignooooo%ouuuuisy',
#--------------------------------#
'cp'.$Code_Page_Baltic, # Baltic (cp1257) #
#--------------------------------#
#_?_?…??_‰_?_¨ˇ?_‘'“”?–—_?_?_ˉ?_#
'E_,_,.++_%___``,_````.--_T___-,_'.
#??¢£¤?|§?????-??°±23′μ?·?1??????#
'__cLo_lSOCR_--RA`+23`uP.o1r_qh3a'.
#??ā????ē?é????ī????óō??×???ūü???#
'AIACAAEECEZEGKILSNNOOOOxULSUUZZS'.
#??ā????ē?é????ī??ń?óō??÷???ūü??˙#
'aiacaaeecezegkilsnnoooo%ulsuuzz`' );
my %AccentedChars;
my $AccentedChars = '';
my $NonAccenChars = '';
for ( $Code_Page_Central, $Code_Page_Western,
$Code_Page_Turkish, $Code_Page_Baltic )
{
$AccentedChars{'cp'.$_} = decode('cp'.$_, $HighBitSetChars);
$AccentedChars .= $AccentedChars{'cp'.$_};
$NonAccenChars .= $NonAccenChars{'cp'.$_};
}
#print "\n", length($NonAccenChars), ' ', $NonAccenChars,"\n";
#print "\n", length($AccentedChars), ' ', $AccentedChars,"\n";
my $QuotedMetaNonAccenChars = quotemeta($NonAccenChars);
my $DiacriticalChars = '';
for ( 0x0300..0x036F, 0x1DC0..0x1DFF )
{ $DiacriticalChars .= chr($_) }
#-----------------------------------------------------------------------------
# Parse options and parameters
#-----------------------------------------------------------------------------
my $b_Help = 0;
my $b_Interactive = 1;
my $b_UTF8 = 0;
my $b_Parameter = 0;
my $Folder;
for ( @ARGV )
{
if ( lc($_) eq '--' )
{ $b_Parameter = 1 }
elsif ( (not $b_Parameter) and (lc($_) eq '--batch') )
{ $b_Interactive = 0 }
elsif ( (not $b_Parameter) and (lc($_) eq '--utf8') )
{ $b_UTF8 = 1 }
elsif ( $b_Parameter or (substr($_, 0, 1) ne '-') )
{
if ( defined($Folder) )
{ die "mode con codepage select=1252
@echo off
Setlocal enabledelayedexpansion
::folder only (/D option)
for /R /D %%d in (*) do (
set an=%%~nd
set bn=!an:.=_!
set cn=!bn:-=_!
set dn=!cn: =_!
set en=!dn:á=A!
set fn=!en:é=E!
set gn=!fn:í=I!
set hn=!gn:ó=O!
set in=!hn:ú=U!
set jn=!in:ü=U!
set kn=!jn:á=a!
set ln=!kn:é=e!
set mn=!ln:í=i!
set nn=!mn:ó=o!
set on=!nn:ú=u!
set pn=!on:ü=u!
set qn=!pn:?=N!
set zn=!on:?=n!
set ax=%%~xd
set bx=!ax:.=_!
set cx=!bx:-=_!
set dx=!cx: =_!
set bx=!ax:.=_!
set cx=!bx:-=_!
set dx=!cx: =_!
set ex=!dx:á=A!
set fx=!ex:é=E!
set gx=!fx:í=I!
set hx=!gx:ó=O!
set ix=!hx:ú=U!
set jx=!ix:ü=U!
set kx=!jx:á=a!
set lx=!kx:é=e!
set mx=!lx:í=i!
set nx=!mx:ó=o!
set ox=!nx:ú=u!
set px=!ox:ü=u!
set qx=!px:?=N!
set zx=!ox:?=n!
if [!an!]==[] (set zn=)
if [!ax!]==[] (set zx=)
set newname=!zn!!zx!
if /i not [%%~nd%%~xd]==[!newname!] rename "%%d" !newname!
)
endlocal
pause
accepts only 1 parameter\n" }
else
{ $Folder = $_ }
}
else
{ $b_Help = 1 }
}
#-----------------------------------------------------------------------------
# Help
#-----------------------------------------------------------------------------
if ( $b_Help )
{
die << "END_OF_HELP"
##代码## [--help] [--batch] [--] [folder]
This script renames files with accented and diacritical Latin characters :
- This PERL script starts from the folder given in parameter, or else from
the current folder.
- It recursively searches for files with characters belonging to 80 - FF of
CP 1250, CP 1252, CP 1254 and CP 1257 (mostly accented Latin characters)
or Latin characters having diacritical marks.
- It calculates new file names by removing the accents and diacritical marks
only from Latin characters (For example, été --> Ete).
- It displays all proposed renaming and perhaps conflicts, and asks the user
for global approval.
- If the user has approved, it renames all files having no conflict.
Option '--batch' avoids interactive questions. Use with care.
Option '--' avoids the next parameter to be interpreted as option.
SPECIAL WARNING :
- This script was originally encoded in UTF-8, and should stay so.
- This script may rename a lot of files.
- Files names are theoretically all encoded only with UTF-8. But some file
names may be found to contain also some characters having legacy encoding.
- The author has applied efforts for consistency checks, robustness, conflict
detection and use of appropriate encoding.
So this script should only rename files by removing accents and diacritical
marks from Latin characters.
- But this script has been tested only under a limited number of OS
(Windows, Mac OS X, Linux) and a limited number of terminal encodings
(CP 850, ISO-8859-1, UTF-8).
- So, under weird circumstances, this script could rename many files with
random names.
- Therefore, this script should be used with care, and modified with extreme
care (beware encoding of internal strings, inputs, outputs and commands)
END_OF_HELP
}
#-----------------------------------------------------------------------------
# If requested, change current folder
#-----------------------------------------------------------------------------
if ( defined($Folder) )
{ chdir($Folder) or die "Can NOT set '$Folder' as current folder\n" }
#-----------------------------------------------------------------------------
# Following instruction is MANDATORY.
# The return value should be non-zero, but on some systems it is zero.
#-----------------------------------------------------------------------------
utf8::decode($AccentedChars);
# or die "##代码##: '$AccentedChars' should be UTF-8 but is NOT.\n";
#-----------------------------------------------------------------------------
# Check consistency on 'tr'
#-----------------------------------------------------------------------------
$_ = $AccentedChars;
eval "tr/$AccentedChars/$QuotedMetaNonAccenChars/";
if ( $@ ) { warn $@ }
if ( $@ or ($_ ne $NonAccenChars) )
{ die "##代码##: Consistency check on 'tr' FAILED :\n\n",
"Translated Accented Chars : ", length($_), ' : ', $_, "\n\n",
" Non Accented Chars : ", length($NonAccenChars), ' : ',
$NonAccenChars, "\n" }
#-----------------------------------------------------------------------------
# Constants depending on the OS
#-----------------------------------------------------------------------------
my $b_Windows = ( defined($ENV{'OS'}) and ($ENV{'OS'} eq 'Windows_NT') );
my ($Q, $sep, $sep2, $HOME, $Find, @List, $cwd, @Move);
if ( $b_Windows )
{
$Q = '"';
$sep = '\';
$sep2 = '\\';
$HOME = $ENV{'USERPROFILE'};
$Find = 'dir /b /s';
@List = ( ( (`ver 2>&1` =~ m/version\s+([0-9]+)/i) and ( >= 6) ) ?
('icacls') :
( 'cacls') );
$cwd = `cd`; chomp $cwd; $cwd = quotemeta($cwd);
@Move = ('move');
}
else
{
$Q = "'";
$sep = '/';
$sep2 = '/';
$HOME = $ENV{'HOME'};
$Find = 'find .';
@List = ('ls', '-d', '--');
@Move = ('mv', '--');
if ( -w '/bin' ) { die "##代码##: For safety reasons, ",
"usage is BLOCKED to administrators.\n"}
}
my $Encoding;
my $ucEncoding;
my $InputPipe = '-|'; # Used as global variable
#-----------------------------------------------------------------------------
# Under Windows, associate input and output encodings to code pages :
# - Get the original code page,
# - If it is not UTF-8, try to set it to UTF-8,
# - Define the input encoding as the one associated to the ACTIVE code page,
# - If STDOUT is the console, encode output for the ORIGINAL code page.
#-----------------------------------------------------------------------------
my $Code_Page_Original;
my $Code_Page_Active;
if ( $b_Windows )
{
#-----------------------------------------------------------------------
# Get the original code page
#-----------------------------------------------------------------------
$_ = `chcp`;
m/([0-9]+)$/ or die "Non numeric Windows code page : ", $_;
$Code_Page_Original = ;
print 'Windows Original Code Page = ', $Code_Page_Original,
( $Code_Page_Original == $Code_Page_UTF8 ?
' = UTF-8, display is perhaps correct with a true type font.' :
'' ), "\n\n";
$Code_Page_Active = $Code_Page_Original ;
#-----------------------------------------------------------------------
# The input encoding must be the same as the ACTIVE code page
#-----------------------------------------------------------------------
$Encoding = ( $Code_Page_Active == $Code_Page_UTF8 ?
'utf8' :
'cp'.$Code_Page_Active ) ;
$InputPipe .= ":encoding($Encoding)";
print "InputPipe = '$InputPipe'\n\n";
#-----------------------------------------------------------------------
# If STDOUT is the console, output encoding must be the same as the
# ORIGINAL code page
#-----------------------------------------------------------------------
if ( $Code_Page_Original != $Code_Page_UTF8 )
{
no warnings 'unopened';
@_ = stat(STDOUT);
use warnings;
if ( scalar(@_) and ($_[0] == 1) )
{ binmode(STDOUT, ":encoding(cp$Code_Page_Original)") }
else
{ binmode(STDOUT, ":encoding($Encoding)") }
}
}
#-----------------------------------------------------------------------------
# Under *nix, if the 'LANG' environment variable contains an encoding,
# verify that this encoding is supported by the OS and by PERL.
#-----------------------------------------------------------------------------
elsif ( defined($ENV{'LANG'}) and ($ENV{'LANG'} =~ m/\.([^\@]+)$/i) )
{
$Encoding = ;
my $Kernel = `uname -s`;
chomp $Kernel;
my $ucEncoding = &ucRemoveEolUnderscoreDash($Encoding);
if ( (lc($Kernel) ne 'darwin') and not grep {$_ eq $ucEncoding}
( map { ($_, &ucRemoveEolUnderscoreDash($_)) }
`locale -m` ) )
{ die "Encoding = '$Encoding' or '$ucEncoding' NOT supported ".
"by the OS\n" }
my $ucLocale = &ucRemoveEolUnderscoreDash($ENV{'LANG'});
if ( not grep {$_ eq $ucLocale}
( map { ($_, &ucRemoveEolUnderscoreDash($_)) }
`locale -a` ) )
{ die "Locale = '$ENV{LANG}' or '$ucLocale' NOT supported ".
"by the OS\n" }
if ( not defined(Encode::find_encoding($Encoding)) )
{ die "Encoding = '$Encoding' or '$ucEncoding' NOT supported ".
"by PERL\n" }
print "Encoding = '$Encoding' is supported by the OS and PERL\n\n";
binmode(STDOUT, ":encoding($Encoding)");
}
#-----------------------------------------------------------------------------
# Check consistency between parameter of 'echo' and output of 'echo'
#-----------------------------------------------------------------------------
undef $_;
if ( defined($Encoding) )
{
$ucEncoding = &ucRemoveEolUnderscoreDash($Encoding);
if ( defined($SuperEncodings{$ucEncoding}) )
{ $_ = substr($AccentedChars{$SuperEncodings{$ucEncoding}},
0x20, 0x60) }
elsif ( defined($AccentedChars{$Encoding}) )
{ $_ = $AccentedChars{$Encoding} }
elsif ( $Encoding =~ m/^utf-?8$/i )
{ $_ = $AccentedChars }
}
if ( not defined($_) ) # Chosen chars are same in 4 code pages
{ $_ = decode('cp'.$Code_Page_Central,
pack('C*', 0xC9, 0xD3, 0xD7, 0xDC, # éó×ü
0xE9, 0xF3, 0xF7, 0xFC)) } # éó÷ü
#print $_, " (Parameter)\n\n";
#system 'echo', $_;
utf8::decode($_);
#print "\n", $_, " (Parameter after utf8::decode)\n\n";
my @EchoCommand = ( $b_Windows ?
"echo $_" :
('echo', $_) );
#system @EchoCommand;
open(ECHO, $InputPipe, @EchoCommand) or die 'echo $_: ', $!;
my $Output = join('', <ECHO>);
close(ECHO);
chomp $Output;
#print "\n", $Output, " (Output of 'echo')\n";
utf8::decode($Output);
#print "\n", $Output, " (Output of 'echo' after utf8::decode)\n\n";
if ( $Output ne $_ )
{
warn "##代码##: Consistency check between parameter ",
"of 'echo' and output of 'echo' FAILED :\n\n",
"Parameter of 'echo' : ", length($_), ' : ', $_, "\n\n",
" Output of 'echo' : ", length($Output), ' : ', $Output, "\n";
exit 1;
}
#-----------------------------------------------------------------------------
# Print the translation table
#-----------------------------------------------------------------------------
if ( defined($Encoding) )
{
undef $_;
$ucEncoding = &ucRemoveEolUnderscoreDash($Encoding);
if ( defined($SuperEncodings{$ucEncoding}) )
{
$_ = $SuperEncodings{$ucEncoding};
print "--------- $EncodingNames{$_} ---------\n",
' ', substr($AccentedChars{$_}, 0x20, 0x20), "\n",
'--> ', substr($NonAccenChars{$_}, 0x20, 0x20), "\n\n",
' ', substr($AccentedChars{$_}, 0x40, 0x20), "\n",
'--> ', substr($NonAccenChars{$_}, 0x40, 0x20), "\n\n",
' ', substr($AccentedChars{$_}, 0x60, 0x20), "\n",
'--> ', substr($NonAccenChars{$_}, 0x60, 0x20), "\n\n" }
else
{
for ( 'cp'.$Code_Page_Central, 'cp'.$Code_Page_Western,
'cp'.$Code_Page_Turkish, 'cp'.$Code_Page_Baltic )
{
if ( ('cp'.$Encoding eq $_) or ($Encoding =~ m/^utf-?8$/i) )
{ print "--------- $EncodingNames{$_} ---------\n",
' ', substr($AccentedChars{$_}, 0, 0x20), "\n",
'--> ', substr($NonAccenChars{$_}, 0, 0x20), "\n\n",
' ', substr($AccentedChars{$_}, 0x20, 0x20), "\n",
'--> ', substr($NonAccenChars{$_}, 0x20, 0x20), "\n\n",
' ', substr($AccentedChars{$_}, 0x40, 0x20), "\n",
'--> ', substr($NonAccenChars{$_}, 0x40, 0x20), "\n\n",
' ', substr($AccentedChars{$_}, 0x60, 0x20), "\n",
'--> ', substr($NonAccenChars{$_}, 0x60, 0x20), "\n\n" }
}
}
}
#-----------------------------------------------------------------------------
# Completely optional :
# Inside the Unison file, find the accented file names to ignore
#-----------------------------------------------------------------------------
my $UnisonFile = $HOME.$sep.'.unison'.$sep.'common.unison';
my @Ignores;
if ( open(UnisonFile, '<', $UnisonFile) )
{
print "\nUnison File '", $UnisonFile, "'\n";
while ( <UnisonFile> )
{
if ( m/^\s*ignore\s*=\s*Name\s*(.+)/ )
{
$_ = ;
if ( m/[$AccentedChars]/ )
{ push(@Ignores, $_) }
}
}
close(UnisonFile);
}
print map(" Ignore: ".$_."\n", @Ignores);
#-----------------------------------------------------------------------------
# Function OutputAndErrorFromCommand :
#
# Execute the command given as array in parameter, and return STDOUT + STDERR
#
# Reads global variable $InputPipe
#-----------------------------------------------------------------------------
sub OutputAndErrorFromCommand
{
local $_;
my @Command = @_; # Protects content of @_ from any modification
#---------------------------------------------------------------------------
# Under Windows, fork fails, so :
# - Enclose into double quotes parameters containing blanks or simple
# quotes,
# - Use piped open with redirection of STDERR.
#---------------------------------------------------------------------------
if ( defined($ENV{'OS'}) and ($ENV{'OS'} eq 'Windows_NT') )
{
for ( @Command )
{ s/^((-|.*(\s|')).*)$/$Q$Q/ }
my $Command = join(' ', @Command);
#print "\n", $Command;
open(COMMAND, $InputPipe, "$Command 2>&1") or die '$Command: ', $!;
}
#---------------------------------------------------------------------------
# Under Unix, quoting is too difficult, but fork succeeds
#---------------------------------------------------------------------------
else
{
my $pid = open(COMMAND, $InputPipe);
defined($pid) or die "Can't fork: $!";
if ( $pid == 0 ) # Child process
{
open STDERR, '>&=STDOUT';
exec @Command; # Returns only on failure
die "Can't @Command";
}
}
$_ = join('', <COMMAND>); # Child's STDOUT + STDERR
close COMMAND;
chomp;
utf8::decode($_);
$_;
}
#-----------------------------------------------------------------------------
# Find recursively all files inside the current folder.
# Verify accessibility of files with accented names.
# Calculate non-accented file names from accented file names.
# Build the list of duplicates.
#-----------------------------------------------------------------------------
my %Olds; # $Olds{$New} = [ $Old1, $Old2, ... ]
my $Old;
my $Dir;
my $Command;
my $ErrorMessage;
my $New;
my %News;
print "\n\nFiles with accented name and the corresponding non-accented name ",
":\n";
open(FIND, $InputPipe, $Find) or die $Find, ': ', $!;
FILE:
while ( <FIND> )
{
chomp;
#---------------------------------------------------------------------------
# If the file path contains UTF-8, following instruction is MANDATORY.
# If the file path does NOT contain UTF-8, it should NOT hurt.
#---------------------------------------------------------------------------
utf8::decode($_);
if ( $b_Windows )
{ s/^$cwd$sep2// }
else
{ s/^\.$sep2// }
#---------------------------------------------------------------------------
# From now on : $_ = Dir/OldFilename
#---------------------------------------------------------------------------
push(@{$Olds{$_}}, $_);
if ( m/([^$sep2]+)$/ and
( =~ m/[$AccentedChars]|([\ -\~][$DiacriticalChars])/) )
{
if ( $b_Windows and m/$Q/ )
{
print "\n $Q$_$Q\n*** contains quotes.\n";
next;
}
for my $Ignore ( @Ignores )
{
if ( m/$Ignore$/ )
{ next FILE }
}
$Old = $_ ;
m/^(.*$sep2)?([^$sep2]+)$/;
$Dir = ( defined() ? : '');
$_ = ;
#---------------------------------------------------------------------
# From now on : $Old = Dir/OldFilename
# $_ = OldFilename
#---------------------------------------------------------------------
print "\n $Q$Old$Q\n";
$ErrorMessage = &OutputAndErrorFromCommand(@List, $Old);
if ( $? != 0 )
{ print "*** $ErrorMessage\n" }
else
{
#---------------------------------------------------------------
# Change accented Latin chars to non-accented chars.
# Remove all diacritical marks after Latin chars.
#---------------------------------------------------------------
eval "tr/$AccentedChars/$QuotedMetaNonAccenChars/";
s/([\ -\~])[$DiacriticalChars]+//g;
#---------------------------------------------------------------
# From now on : $Old = Dir/OldFilename
# $_ = NewFilename
#---------------------------------------------------------------
if ( $@ )
{ warn $@ }
else
{
$New = $Dir.$_;
if ( $b_Windows or (not utf8::is_utf8($Dir)) ) # Weird
{ utf8::decode($New) } # but necessary
$News{$Old} = $New;
push(@{$Olds{$New}}, $Old);
}
print "--> $Q$Dir$_$Q\n";
}
}
}
close(FIND);
#-----------------------------------------------------------------------------
# Print list of duplicate non-accented file names
#-----------------------------------------------------------------------------
my $b_NoDuplicate = 1;
for my $New ( sort keys %Olds )
{
if ( scalar(@{$Olds{$New}}) > 1 )
{
if ( $b_NoDuplicate )
{
print "\n\nFollowing files would have same non-accented name ",
":\n";
$b_NoDuplicate = 0;
}
print "\n", map(' '.$_."\n", @{$Olds{$New}}), '--> ', $New, "\n";
for ( @{$Olds{$New}} )
{ delete $News{$_} };
}
}
#-----------------------------------------------------------------------------
# If there are NO file to rename, then exit
#-----------------------------------------------------------------------------
my $Number = scalar(keys %News);
print "\n\n";
if ( $Number < 1 )
{
print "There are NO file to rename\n";
exit;
}
#-----------------------------------------------------------------------------
# Ask the user for global approval of renaming
#-----------------------------------------------------------------------------
if ( $b_Interactive )
{
print "In order to really rename the ", $Number,
" files which can safely be renamed, type 'rename' : ";
$_ = <STDIN>;
sleep 1; # Gives time to PERL to handle interrupts
if ( not m/^rename$/i )
{ exit 1 }
}
else
{ print $Number, " files will be renamed\n\n" }
#-----------------------------------------------------------------------------
# Rename accented file names sorted descending by name size
#-----------------------------------------------------------------------------
$Number = 0;
my $Move = join(' ', @Move);
for ( sort {length($b) <=> length($a)} keys %News )
{
$ErrorMessage = &OutputAndErrorFromCommand(@Move, $_, $News{$_});
if ( $? == 0 )
{ $Number++ }
else
{ print "\n$Move $Q$_$Q\n", (' ' x length($Move)),
" $Q$News{$_}$Q\n", ('*' x length($Move)), " $ErrorMessage\n" }
}
print "\n$Number files have been successfully renamed\n";
__END__
回答by Manolo
I'm using this batch to rename folders and seems to work fine so far... In my case codepage is 1252, yours might be different.
我正在使用此批处理来重命名文件夹,到目前为止似乎工作正常......在我的情况下,代码页是 1252,你的可能会有所不同。
##代码##