Linux 将两个 HTML 文件合并到主 HTML 文件中
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19866929/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Merge two HTML files into master HTML file
提问by incutonez
Let's say I have the following HTML files:
假设我有以下 HTML 文件:
html1.html
html1.html
<html>
<head>
<link href="blah.css" rel="stylesheet" type="text/css" />
</head>
<body>
<div>this here be a div, y'all</div>
</body>
</html>
html2.html
html2.html
<html>
<head>
<script src="blah.js"></script>
</head>
<body>
<span>this here be a span, y'all</span>
</body>
</html>
I want to take these two files and make a master file that would look like this:
我想获取这两个文件并制作一个如下所示的主文件:
<html>
<head>
<link href="blah.css" rel="stylesheet" type="text/css" />
<script src="blah.js"></script>
</head>
<body>
<div>this here be a div, y'all</div>
<span>this here be a span, y'all</span>
</body>
</html>
Is this possible using a simple Linux command? I've tried looking at join, but it looks like that joins on a common field, and I'm not necessarily going to have common fields... I just need to basically add the difference, but also have the main structure still intact (I guess this could be referred to as a left-join?). Doesn't look like cat
will work either... as that merges by appending one file, then the next, etc.
这可以使用简单的 Linux 命令吗?我试过查看join,但它看起来像在公共字段上连接,而且我不一定会有公共字段......我只需要基本上添加差异,但主要结构仍然完好无损(我想这可以称为左连接?)。看起来cat
也不行……因为通过附加一个文件然后下一个文件来合并,等等。
If there isn't a simple Linux command, my next step is to either write a script that compares both scripts line by line, or create a master HTML file that references these two individual files somehow.
如果没有简单的 Linux 命令,我的下一步是编写一个脚本来逐行比较两个脚本,或者创建一个以某种方式引用这两个单独文件的主 HTML 文件。
采纳答案by Robin Green
Your example files are well-formed XHTML. Excellent! This means you can use a simple XSLT script. See How to merge two XML files with XSLT
您的示例文件是格式良好的 XHTML。优秀!这意味着您可以使用简单的 XSLT 脚本。请参阅如何使用 XSLT 合并两个 XML 文件
回答by bkxp
You can use html-merge tool to merge multiple HTML files preserving their internal hypertext links. It's a win32 program, but you can run it in linux using Wine. Download page: https://sourceforge.net/projects/htmlmg/files/
您可以使用 html-merge 工具合并多个 HTML 文件,保留其内部超文本链接。这是一个 win32 程序,但您可以使用 Wine 在 linux 中运行它。下载页面:https: //sourceforge.net/projects/htmlmg/files/
回答by Lars Bilke
回答by Robin Dinse
Here is a simple solution that uses Python's lxml
library, though it will only copy element children of the body
tag selected child::*
, not text nodes, which would require a modification child::node()
and some extra logic for dealing with appending text nodes.
这是一个使用 Pythonlxml
库的简单解决方案,尽管它只会复制body
selected 标签的元素子元素child::*
,而不是文本节点,这需要修改child::node()
和一些额外的逻辑来处理附加文本节点。
#!/usr/bin/python3
import sys, os
from lxml.html import tostring, parse
if len(sys.argv) < 2:
print("Usage: merge.py [file1] ... [filen] [outfile]")
if os.path.isfile(sys.argv[-1]):
if input('Override? (y/n) ' + sys.argv[-1]) != 'y':
sys.exit(0)
def tostr(n):
try:
return tostring(n)
except:
return str(n)
tree = parse(sys.argv[1])
for f in sys.argv[2:-1]:
print(f)
tree2 = parse(f)
for n in tree2.xpath('//head/child::*'):
if all([tostr(n) != tostr(n2)\
for n2 in tree2.xpath('//head/child::*')]):
tree.xpath('//head')[0].append(n)
for n in tree2.xpath('//body/child::*'):
tree.xpath('//body')[0].append(n)
tree.write(sys.argv[-1])
Save this to a file merge.py
and run chmod +x merge.py
.
将其保存到文件merge.py
并运行chmod +x merge.py
.
Usage: merge.py [file1] ... [filen] [outfile]
If it fails, one or more files are malformed and need to be fixed either manually or with htmllint
or hxnormalize
.
如果失败,则一个或多个文件格式错误,需要手动或使用htmllint
或进行修复hxnormalize
。