如何将 Xml 文件转换为文本文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2941264/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-06 13:08:04  来源:igfitidea点击:

How to convert Xml files to Text Files

xmltext

提问by Jason

I have around 8000 xml files that needs to be converted into text files. The text file must contain title, description and keywords of the xml file without the tags and removing other elements and attributes as well. In other words, i need to create 8000 text files containing the title,description and keywords of the xml file. I need codings for this to be done systematically. Any help would be greatly appreciated. Thanks in advance.

我有大约 8000 个需要转换为文本文件的 xml 文件。文本文件必须包含 xml 文件的标题、描述和关键字,不带标签并删除其他元素和属性。换句话说,我需要创建 8000 个文本文件,其中包含 xml 文件的标题、描述和关键字。我需要编码才能系统地完成。任何帮助将不胜感激。提前致谢。

回答by marc_s

Going from XML to text smells like a job for XSLT - it's a XML-based transformation language that can take an XML input and convert it to anything text-based on the output side.

从 XML 到文本听起来像是 XSLT 的工作——它是一种基于 XML 的转换语言,可以接受 XML 输入并将其转换为任何基于文本的输出端。

You can read up on XSLT on lots of websites - one of the better tutorials in the W3Schoolsone.

您可以在许多网站上阅读 XSLT - W3Schools 中更好的教程之一。

Since you didn't post any sample XML, I have no clue what your XML looks like, and also no idea what your output should be. But assuming it would look something like:

由于您没有发布任何示例 XML,我不知道您的 XML 是什么样子,也不知道您的输出应该是什么。但假设它看起来像:

<?xml version="1.0" encoding="utf-8"?>
<root>
  <title>Some Title</title>
  <description>Some description</description>
  <keywords>
    <keyword>Keyword1</keyword>
    <keyword>Keyword2</keyword>
    <keyword>Keyword3</keyword>
    <keyword>Keyword4</keyword>
   </keywords> 
</root>

you could easily write a XSLT transformation to turn that into

你可以很容易地编写一个 XSLT 转换来把它变成

YourTextFile.txt

你的文本文件.txt

Some Title
Some Description
Keyword1,Keyword2,Keyword3,Keyword4

or whatever other format you are looking for.

或您正在寻找的任何其他格式。

回答by Gabriel

My suggestion would be to use Python. You can use the interpreter to run the pattern while you are setting it up, command line goes along way in setting this sort of thing up properly. Assuming the xml is valid this should allow you the most flexibility with the least hassle.

我的建议是使用 Python。您可以在设置模式时使用解释器来运行模式,命令行会在正确设置此类事情时进行。假设 xml 是有效的,这应该可以让您以最少的麻烦获得最大的灵活性。

so assuming the following xml format:

所以假设以下xml格式:

<root>
  <title>Document Title</title>
  <content>Some document content.</content>
  <keywords>test, document, keyword</keywords>
</root>

and assuming the output of each document should be:

并假设每个文档的输出应该是:

Document Title

Some document content.

test, document, keyword

The python code might look something like:

python 代码可能类似于:

import sys
import os
from xml.etree.ElementTree import ElementTree

def Readthexml(f):
    """Read the file from the argument list and dump the title contents and keywords"""
    xcontent = ElementTree()
    xcontent.parse(f)
    doc = [xcontent.find("title").text, xcontent.find("content").text, xcontent.find("keywords").text]
    out = open(f + ".txt", "w")
    out.write("\n\n".join(doc))
    return True

def main(argv=None):
    if argv is None:
        argv = sys.argv
        args = argv[1:]
        for arg in args:
            if os.path.exists(arg):
                Readthexml(arg)

if __name__ == "__main__":
    main()

from which you could generate a batch file to update files regularly (assuming it is a windows environment though python works in whatever).

您可以从中生成一个批处理文件来定期更新文件(假设它是一个 Windows 环境,尽管 python 可以在任何环境中工作)。

回答by Robert Harvey

There are a couple of possibilities. If it is simple XML you can read it like any other text file, filter out the angle brackets and add in your own strategically-placed punctuation. Or, you can open up an XML reader and a text writer, and output it any way you want.

有几种可能性。如果它是简单的 XML,您可以像阅读任何其他文本文件一样阅读它,过滤掉尖括号并添加您自己的战略性标点符号。或者,您可以打开一个 XML 阅读器和一个文本编写器,并以您想要的任何方式输出它。

If you read the file names from the folder into a collection, you can loop through them and process all of the files automatically.

如果将文件夹中的文件名读入集合中,则可以循环遍历它们并自动处理所有文件。

回答by USS

I've had similar issues when I copied text messages from my phone to a file and it was an .xml format and had symbols and characters in between each word and I wanted to edit those out. So I downloaded Notepad++ and opened the .xml file in it. Say you want to delete all instances of <title>. You highlight (sample text) and the click Replace icon (it's a blue b→a icon in the tool bar at the top). It'll have the highlighted text in the "Find what" field and then you leave the "Replace With" field blank and choose Replace All and it'll get rid of all instances of (sample text). Do that for all symbols and text and replace with what you want or it should be. I had over 4800 lines and it worked great.

当我将短信从手机复制到文件时遇到了类似的问题,它是 .xml 格式,每个单词之间都有符号和字符,我想将它们编辑掉。所以我下载了 Notepad++ 并在其中打开了 .xml 文件。假设您要删除<title>. 您突出显示(示例文本)并单击“替换”图标(它是顶部工具栏中的蓝色 b→a 图标)。它将在“查找内容”字段中突出显示文本,然后将“替换为”字段留空并选择全部替换,它将删除(示例文本)的所有实例。对所有符号和文本执行此操作,并替换为您想要的或应该的。我有超过 4800 行,而且效果很好。