将制表符转换为 .NET 字符串中的空格
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/508033/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert tabs to spaces in a .NET string
提问by Arsalan Ahmed
I am building a text parser using regular expressions. I need to convert all tab characters in a string to space characters. I cannot assume how many spaces a tab should encompass otherwise I could replace a tab with, say, 4 space characters. Is there any good solution for this type of problem. I need to do this in code so I cannot use an external tool.
我正在使用正则表达式构建文本解析器。我需要将字符串中的所有制表符转换为空格字符。我不能假设一个制表符应该包含多少个空格,否则我可以用 4 个空格字符替换制表符。这类问题有什么好的解决办法。我需要在代码中执行此操作,因此无法使用外部工具。
Unfortunately, none of these answers address the problem with which I am encountered. I am extracting text from external text files and I cannot assume how they were created or which operating system was used to create them. I believe the length of the tab character can vary so if I encounter a tab when I am reading the text file, I want to know how many space characters I should replace it with.
不幸的是,这些答案都没有解决我遇到的问题。我正在从外部文本文件中提取文本,我无法假设它们是如何创建的或使用哪个操作系统来创建它们的。我相信制表符的长度可能会有所不同,因此如果我在阅读文本文件时遇到制表符,我想知道应该用多少空格字符替换它。
回答by GateKiller
Unfortunately, you need to assume how many spaces a tab represents. You should set this to a fixed value (like the mentioned four) or make it a user option.
不幸的是,您需要假设一个制表符代表多少个空格。您应该将其设置为固定值(如上述四个)或使其成为用户选项。
The quickest way to do this is .NET is (I'm using C#):
执行此操作的最快方法是 .NET(我使用的是 C#):
var NewString = "This is a string with a Tab";
var TabLength = 4;
var TabSpace = new String(' ', TabLength);
NewString = NewString.Replace("\t", TabSpace);
You can then change the TabLength variable to anything you want, typically as mentioned previously, four space characters.
然后,您可以将 TabLength 变量更改为您想要的任何值,通常如前所述,四个空格字符。
Tabs in all operating systems are the same length, one tab! What differs is the way software displays them, typically this is the equivalent width of four space characters, and this also assumes that the display is using a fixed width font such as Courier New.
所有操作系统中的标签长度相同,一个标签!不同之处在于软件显示它们的方式,通常这是四个空格字符的等效宽度,并且还假设显示使用的是固定宽度的字体,例如Courier New。
For example, my IDE of choiceallows me to change the width of the tab character to a value that suits me.
例如,我选择的 IDE允许我将制表符的宽度更改为适合我的值。
回答by ckal
I'm not sure how tabs will read in from a Unix text file, or whatever your various formats are, but this works for inline text. Perhaps it will help.
我不确定选项卡将如何从 Unix 文本文件中读取,或者无论您的各种格式是什么,但这适用于内联文本。也许它会有所帮助。
var textWithTabs = "some\tvalues\tseperated\twith\ttabs";
var textWithSpaces = string.Empty;
var textValues = textWithTabs.Split('\t');
foreach (var val in textValues)
{
textWithSpaces += val + new string(' ', 8 - val.Length % 8);
}
Console.WriteLine(textWithTabs);
Console.WriteLine(textWithSpaces);
Console.Read();
回答by Nick McCowin
I think what you mean to say is you'd like to replace tabs with the effective amount of spaces they were expanded to. The first way that comes to mind doesn't involve regular expressions (and I don't know that this problem could be solved with them).
我想你的意思是你想用它们扩展到的有效空间量替换制表符。想到的第一种方法不涉及正则表达式(我不知道这个问题可以用它们解决)。
- Step through the string character by character, keeping track of your current position in the string.
- When you find a tab, replace it with N spaces, where
N = tab_length - (current_position % tab_length). - Add N to your current position and continue though the string.
- 逐个字符地遍历字符串,跟踪您在字符串中的当前位置。
- 当你找到一个制表符时,用 N 个空格替换它,其中
N = tab_length - (current_position % tab_length). - 将 N 添加到您当前的位置并继续使用字符串。
回答by HappyTown
(If you are looking for how to convert tabs to spaces in an editor, see at the end of my answer.)
(如果您正在寻找如何在编辑器中将制表符转换为空格,请参阅我的答案末尾。)
I was recently required to replace tabs with spaces.
我最近被要求用空格替换制表符。
The solution replaces tab with up to4 or 8 spaces.
该解决方案用最多4 或 8 个空格替换制表符。
The logic iterates through the input string, one character at a time and keeps track of current position (column #) in output string.
该逻辑迭代输入字符串,一次一个字符,并跟踪输出字符串中的当前位置(列#)。
- If it encounters
\t(tab char) - Finds the next tab stop, calculates how many spaces it needs to get to the next tab stop, and replaces \t with those number of spaces. - If
\n(new line) - Appends it to the output string and resets the position pointer to 1 on new line. The new lines on Windows are\r\nand on Unix (or flavors) use\n, so I suppose this should work for both platforms. I have tested on Windows, but don't have Unix handy. - Any other characters - Appends it to the output string and increments the position.
- 如果遇到
\t(tab char) - 查找下一个制表位,计算到达下一个制表位所需的空格数,并将 \t 替换为这些空格数。 - If
\n(new line) - 将其附加到输出字符串并将位置指针重置为新行上的 1。Windows 上的新行在\r\nUnix(或不同版本)上使用\n,所以我想这应该适用于两个平台。我已经在 Windows 上进行了测试,但手边没有 Unix。 - 任何其他字符 - 将其附加到输出字符串并增加位置。
.
.
using System.Text;
namespace CSharpScratchPad
{
class TabToSpaceConvertor
{
static int GetNearestTabStop(int currentPosition, int tabLength)
{
// If already at the tab stop, jump to the next tab stop.
if ((currentPosition % tabLength) == 1)
currentPosition += tabLength;
else
{
// If in the middle of two tab stops, move forward to the nearest.
for (int i = 0; i < tabLength; i++, currentPosition++)
if ((currentPosition % tabLength) == 1)
break;
}
return currentPosition;
}
public static string Process(string input, int tabLength)
{
if (string.IsNullOrEmpty(input))
return input;
StringBuilder output = new StringBuilder();
int positionInOutput = 1;
foreach (var c in input)
{
switch (c)
{
case '\t':
int spacesToAdd = GetNearestTabStop(positionInOutput, tabLength) - positionInOutput;
output.Append(new string(' ', spacesToAdd));
positionInOutput += spacesToAdd;
break;
case '\n':
output.Append(c);
positionInOutput = 1;
break;
default:
output.Append(c);
positionInOutput++;
break;
}
}
return output.ToString();
}
}
}
The calling code would be like:
调用代码如下:
string input = "I\tlove\tYosemite\tNational\tPark\t\t,\t\t\tGrand Canyon,\n\t\tand\tZion";
string output = CSharpScratchPad.TabToSpaceConvertor.Process(input, 4);
The output string would get the value:
输出字符串将获得值:
I love Yosemite National Park , Grand Canyon,
and Zion
How do I convert tabs to spaces in an editor?
如何在编辑器中将制表符转换为空格?
If you stumbled upon this question because you could not find the option to convert tabs to spaces in editors (just like I did and thought of writing your own utility for doing it), here is where the option is located in different editors -
如果您偶然发现这个问题是因为您在编辑器中找不到将制表符转换为空格的选项(就像我所做的并考虑编写自己的实用程序一样),这里是该选项在不同编辑器中的位置 -
Notepad++: Edit → Blank Operations → TAB to Space
Visual Studio: Edit → Advanced → Untabify Selected Lines
SQL Management Studio: Edit → Advanced → Untabify Selected Lines
回答by HappyTown
This is exactly what they are talking about needing. I wrote this back in Visual Basic 6.0. I made a few quick VB.NET 2010 updates, but it could use some better fixing up for it. Just be sure and set the desired tab width; it's set to 8 in there. Just send it the string, or even fix them right inside the textbox like so:
这正是他们所说的需要。我在Visual Basic 6.0 中写回了这个。我做了一些快速的 VB.NET 2010 更新,但它可以使用一些更好的修复方法。只要确定并设置所需的标签宽度;它在那里设置为8。只需将字符串发送给它,或者甚至将它们固定在文本框中,如下所示:
RichTextBox1.Text = strFixTab(RichTextBox1.Text)
Function strFixTab(ByVal TheStr As String) As String
Dim c As Integer
Dim i As Integer
Dim T As Integer
Dim RetStr As String
Dim ch As String
Dim TabWidth as Integer = 8 ' Set the desired tab width
c = 1
For i = 1 To TheStr.Length
ch = Mid(TheStr, i, 1)
If ch = vbTab Then
T = (TabWidth + 1) - (c Mod TabWidth)
If T = TabWidth + 1 Then T = 1
RetStr &= Space(T)
c += T - 1
Else
RetStr &= ch
End If
If ch = vbCr Or ch = vbLf Then
c = 1
Else
c += 1
End If
Next
Return RetStr
End Function
回答by DrWicked
I am not sure if my solution is more efficient in execution, but it is more compact in code. This is close to the solution by user ckal, but reassembles the split strings using the Join function rather than '+='.
我不确定我的解决方案的执行效率是否更高,但它的代码更紧凑。这与用户ckal的解决方案很接近,但使用 Join 函数而不是 '+=' 重新组装拆分的字符串。
public static string ExpandTabs(string input, int tabLength)
{
string[] parts = input.Split('\t');
int count = 0;
int maxpart = parts.Count() - 1;
foreach (string part in parts)
{
if (count < maxpart)
parts[count] = part + new string(' ', tabLength - (part.Length % tabLength));
count++;
}
return(string.Join("", parts));
}
回答by Digiproc
Quite a few answers on here neglect that a tab means the number of spaces to the next tab stop, not "four (or eight) spaces". Quite a few answers also neglect carriage returns and line feeds, and therefore don't handle multiline content. So without further ado:
这里有相当多的答案忽略了制表符意味着到下一个制表位的空格数,而不是“四个(或八个)空格”。相当多的答案也忽略了回车和换行,因此不处理多行内容。所以事不宜迟:
public static string TabsToSpaces(string inTxt, int tabLen=4 )
{
var outTxt = new List<string>();
var textValues = inTxt.Split('\t');
foreach (var val in textValues)
{
var lines = val.Split("\r");
var preTxt = lines[lines.Length - 1];
preTxt = preTxt.Replace("\n", "");
var numSpaces = tabLen - preTxt.Length % tabLen;
if (numSpaces == 0)
numSpaces = tabLen;
outTxt.Add(val + new string(' ', numSpaces));
}
return String.Join("", outTxt);
}
(By the way, this is also CPU efficient in that it doesn't recopy giant strings.)
(顺便说一句,这也是 CPU 高效的,因为它不会重新复制巨大的字符串。)
回答by Miyagi Coder
You can use the replace function:
您可以使用替换功能:
char tabs = '\u0009';
String newLine = withTabs.Replace(tabs.ToString(), " ");
回答by Ian Jacobs
You want to be able to convert a tab to N spaces? One quick and dirty option is:
您希望能够将制表符转换为 N 个空格吗?一种快速而肮脏的选择是:
output = input.Replace("\t", "".PadRight(N, (char)" "));
Obviously N has to be defined somewhere, be it user input or elsewhere in the program.
显然 N 必须在某处定义,无论是用户输入还是程序中的其他地方。
回答by TheSmurf
Regex.Replace(input, "\t", " ");

