适用于Python的XML编写工具-IGI

时间：2020-03-05 18:51:21 　来源:igfitidea点击:

我目前正在尝试ElementTree，它看起来还不错，它可以转义HTML实体，依此类推。我是否错过了我从未听说过的真正奇妙的东西？

这类似于我的实际操作：

import xml.etree.ElementTree as ET
root = ET.Element('html')
head = ET.SubElement(root,'head')
script = ET.SubElement(head,'script')
script.set('type','text/javascript')
script.text = "var a = 'I love &aacute; letters'"
body = ET.SubElement(root,'body')
h1 = ET.SubElement(body,'h1')
h1.text = "And I like the fact that 3 > 1"
tree = ET.ElementTree(root)
tree.write('foo.xhtml')

# more foo.xhtml
<html><head><script type="text/javascript">var a = 'I love &amp;aacute;
letters'</script></head><body><h1>And I like the fact that 3 &gt; 1</h1>
</body></html>

解决方案

回答

我们实际上不是想要这样的东西吗：

html(head(script(type='text/javascript', content='var a = ...')),
body(h1('And I like the fact that 3 < 1'), p('just some paragraph'))

我想我在某处看到了类似的东西。太好了

编辑：实际上，我今天去写了一个库来做到这一点：magictree

我们可以像这样使用它：

from magictree import html, head, script, body, h1, p
root = html(
         head(
           script('''var a = 'I love &amp;aacute; letters''', 
                  type='text/javascript')),
         body(
           h1('And I like the fact that 3 > 1')))

# root is a plain Element object, like those created with ET.Element...
# so you can write it out using ElementTree :)
tree = ET.ElementTree(root)
tree.write('foo.xhtml')

" magictree"中的魔力在于导入的工作方式：在需要时创建" Element"工厂。看一下源代码，它基于另一个StackOverflow问题的答案。

回答

另一种方法是使用lxml中的E Factory构建器(也可在Elementtree中使用)

>>> from lxml import etree

>>> from lxml.builder import E

>>> def CLASS(*args): # class is a reserved word in Python
...     return {"class":' '.join(args)}

>>> html = page = (
...   E.html(       # create an Element called "html"
...     E.head(
...       E.title("This is a sample document")
...     ),
...     E.body(
...       E.h1("Hello!", CLASS("title")),
...       E.p("This is a paragraph with ", E.b("bold"), " text in it!"),
...       E.p("This is another paragraph, with a", "\n      ",
...         E.a("link", href="http://www.python.org"), "."),
...       E.p("Here are some reserved characters: <spam&egg>."),
...       etree.XML("<p>And finally an embedded XHTML fragment.</p>"),
...     )
...   )
... )

>>> print(etree.tostring(page, pretty_print=True))
<html>
  <head>
    <title>This is a sample document</title>
  </head>
  <body>
    <h1 class="title">Hello!</h1>
    <p>This is a paragraph with <b>bold</b> text in it!</p>
    <p>This is another paragraph, with a
      <a href="http://www.python.org">link</a>.</p>
    <p>Here are some reservered characters: &lt;spam&amp;egg&gt;.</p>
    <p>And finally an embedded XHTML fragment.</p>
  </body>
</html>

回答

我假设我们实际上是在创建XML DOM树，因为我们要验证此文件中包含的内容是有效的XML，否则我们将只向文件中写入静态字符串。如果验证输出确实是目标，那么我建议

from xml.dom.minidom import parseString

doc = parseString("""<html>
    <head>
        <script type="text/javascript">
            var a = 'I love &amp;aacute; letters'
        </script>
    </head>
    <body>
        <h1>And I like the fact that 3 &gt; 1</h1>
    </body>
    </html>""")

with open("foo.xhtml", "w") as f:
    f.write( doc.toxml() )

这使我们可以只编写要输出的XML，验证其正确性(因为parseString如果无效则将引发异常)，并使代码看起来更好。

大概我们不只是每次都编写相同的静态XML并且想要进行替换。在这种情况下，我会有像

var a = '%(message)s'

然后使用％运算符进行替换，例如

</html>""" % {"message": "I love &amp;aacute; letters"})

回答

尝试http://uche.ogbuji.net/tech/4suite/amara。它非常完整，并且具有一组直接的访问工具。普通Unicode支持等

#
#Output the XML entry
#
def genFileOLD(out,label,term,idval):
    filename=entryTime() + ".html"
    writer=MarkupWriter(out, indent=u"yes")
    writer.startDocument()
    #Test element and attribute writing
    ans=namespace=u'http://www.w3.org/2005/Atom'
    xns=namespace=u'http://www.w3.org/1999/xhtml'
    writer.startElement(u'entry',
       ans,
       extraNss={u'x':u'http://www.w3.org/1999/xhtml' ,
                 u'dc':u'http://purl.org/dc/elements/1.1'})
    #u'a':u'http://www.w3.org/2005/Atom',
    #writer.attribute(u'xml:lang',unicode("en-UK"))

    writer.simpleElement(u'title',ans,content=unicode(label))
    #writer.simpleElement(u'a:subtitle',ans,content=u' ')
    id=unicode("http://www.dpawson.co.uk/nodesets/"+afn.split(".")[0])
    writer.simpleElement(u'id',ans,content=id)
    writer.simpleElement(u'updated',ans,content=unicode(dtime()))
    writer.startElement(u'author',ans)
    writer.simpleElement(u'name',ans,content=u'Dave ')
    writer.simpleElement(u'uri',ans,
      content=u'http://www.dpawson.co.uk/nodesets/'+afn+".xml")
    writer.endElement(u'author')
    writer.startElement(u'category', ans)
    if (prompt):
        label=unicode(raw_input("Enter label "))
    writer.attribute(u'label',unicode(label))
    if (prompt):
        term = unicode(raw_input("Enter term to use "))
    writer.attribute(u'term', unicode(term))
    writer.endElement(u'category')
    writer.simpleElement(u'rights',ans,content=u'\u00A9 Dave 2005-2008')
    writer.startElement(u'link',ans)
    writer.attribute(u'href',
         unicode("http://www.dpawson.co.uk/nodesets/entries/"+afn+".html"))
    writer.attribute(u'rel',unicode("alternate"))
    writer.endElement(u'link')
    writer.startElement(u'published', ans)
    dt=dtime()
    dtu=unicode(dt)
    writer.text(dtu)
    writer.endElement(u'published')
    writer.simpleElement(u'summary',ans,content=unicode(label))
    writer.startElement(u'content',ans)
    writer.attribute(u'type',unicode("xhtml"))
    writer.startElement(u'div',xns)
    writer.simpleElement(u'h3',xns,content=unicode(label))
    writer.endElement(u'div')
    writer.endElement(u'content')
    writer.endElement(u'entry')

回答

我最终使用saxutils.escape(str)生成了有效的XML字符串，然后使用Eli的方法对其进行了验证，以确保我不会错过任何标签

from xml.sax import saxutils
from xml.dom.minidom import parseString
from xml.parsers.expat import ExpatError

xml = '''<?xml version="1.0" encoding="%s"?>\n
<contents title="%s" crawl_date="%s" in_text_date="%s" 
url="%s">\n<main_post>%s</main_post>\n</contents>''' %
(self.encoding, saxutils.escape(title), saxutils.escape(time), 
saxutils.escape(date), saxutils.escape(url), saxutils.escape(contents))
try:
    minidoc = parseString(xml)
catch ExpatError:
    print "Invalid xml"

回答

总有SimpleXMLWriter，它是ElementTree工具包的一部分。该接口非常简单。

这是一个例子：

from elementtree.SimpleXMLWriter import XMLWriter
import sys

w = XMLWriter(sys.stdout)
html = w.start("html")

w.start("head")
w.element("title", "my document")
w.element("meta", name="generator", value="my application 1.0")
w.end()

w.start("body")
w.element("h1", "this is a heading")
w.element("p", "this is a paragraph")

w.start("p")
w.data("this is ")
w.element("b", "bold")
w.data(" and ")
w.element("i", "italic")
w.data(".")
w.end("p")

w.close(html)

适用于Python的XML编写工具

解决方案

回答

回答

回答

回答

回答

回答

相关推荐

最近更新

标签

适用于Python的XML编写工具

解决方案

回答

回答

回答

回答

回答

回答

相关推荐

测试对象是否不是类型

const和readonly有什么区别？

.NET序列化类设计问题

Python中的单引号与双引号

相关推荐

最近更新

标签