.net 如何在 Windows 环境中检查 .txt 文件是 ASCII 还是 UTF-8 格式?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6947749/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-03 15:47:44  来源:igfitidea点击:

How to check if a .txt file is in ASCII or UTF-8 format in Windows environment?

.netwindows

提问by rk1962

I have converted a .txt file from ASCII to UTF-8 using UltraEdit. However, I am not sure how to verify if it is in UTF-8 format in Windows environment.

我已使用 UltraEdit 将 .txt 文件从 ASCII 转换为 UTF-8。但是,我不确定如何在 Windows 环境中验证它是否为 UTF-8 格式。

Thank you!

谢谢!

采纳答案by Mark Ransom

Text files in Windows don't have a format. There's an unofficial convention that if the file starts with the BOM codepoint in UTF-8 formatthat it's UTF-8, but that convention isn't universally supported. That would be the 3 byte sequence "\xef\xbf\xbe", i.e. ???in the Latin-1 character set.

Windows 中的文本文件没有格式。有一个非官方约定,如果文件以 UTF-8 格式BOM 代码点开头,则它是 UTF-8,但该约定并未得到普遍支持。那将是 3 字节序列"\xef\xbf\xbe",即???在 Latin-1 字符集中。

回答by Ofer Zelig

Open the file in Notepad. Click 'Save As...'. In the 'Encoding:' combo box you will see the current file format.

在记事本中打开文件。单击“另存为...”。在“编码:”组合框中,您将看到当前的文件格式。

回答by Miguel Hermoso

Open the file using Notepad++ and check the "Encoding" menu, you can check the current Encoding and/or Convert to a set of encodings available.

使用 Notepad++ 打开文件并检查“编码”菜单,您可以检查当前的编码和/或转换为一组可用的编码。

回答by SLaks

Open it in a hex editor and make sure that the first three bytes are a UTF8 BOM(EF BB BF)

在十六进制编辑器中打开它并确保前三个字节是UTF8 BOM( EF BB BF)

回答by Luminator

If you use Windows 10 and has Windows Subsystem for Linux (WSL), it can be easily done by typing "file " from the shell.

如果您使用 Windows 10 并具有适用于 Linux 的 Windows 子系统 (WSL),则可以通过从 shell 键入“file”轻松完成。

For example:

例如:

$ file code.cpp

code.cpp: C source, UTF-8 Unicode (with BOM) text, with CRLF line terminators

回答by Eric Moon

I had a directory of files that I wanted to check. I created an Excel macro to determine ANSI vs. UTF-8. This worked for me.

我有一个要检查的文件目录。我创建了一个 Excel 宏来确定 ANSI 与 UTF-8。这对我有用。

        Sub GetTextFileEncoding()
        Dim sFile As String
        Dim sPath As String
        Dim sTextLine As String
        Dim iRow As Integer

        'Set Defaults and Initial Values
        iRow = 1
        sPath = "C:textfiles\"
        sFile = Dir(sPath & "*.txt")

        Do While Len(sFile) > 0
            'Get FileType
            'Debug.Print sFile & " - " & FileEncodeType(sPath & sFile)

            'Show on Excel Worksheet
            Cells(iRow, 1).Value = sFile
            Cells(iRow, 2).Value = FileEncodeType(sPath & sFile)

            'Get next file
            sFile = Dir

            'Increment Row
            iRow = iRow + 1
        Loop
    End Sub

    Function FileEncodeType(sFile As String) As String
        Dim bEF As Boolean
        Dim bBB As Boolean
        Dim bBF As Boolean

        bEF = False
        bBB = False
        bBF = False

        Open sFile For Input As #1
            If Not EOF(1) Then
                'Read first line
                Line Input #1, textline
                'Debug.Print textline
                For i = 1 To 3
                    'Debug.Print Asc(Mid(textline, i, 1)) & " - " & Mid(textline, i, 1)
                    Select Case i
                        Case 1
                            If Asc(Mid(textline, i, 1)) = 239 Then
                                bEF = True
                            End If
                        Case 2
                             If Asc(Mid(textline, i, 1)) = 187 Then
                                bBB = True
                            End If
                        Case 3
                             If Asc(Mid(textline, i, 1)) = 191 Then
                                bBF = True
                            End If
                        Case 4

                    End Select
                Next
            End If
        Close #1

        If bEF And bBB And bBF Then
            FileEncodeType = "UTF-8"
        Else
            FileEncodeType = "ANSI"
        End If
    End Function