Linux 将两个 HTML 文件合并到主 HTML 文件中

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19866929/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-07 01:17:50  来源:igfitidea点击:

Merge two HTML files into master HTML file

htmllinuxjoinmerge

提问by incutonez

Let's say I have the following HTML files:

假设我有以下 HTML 文件:

html1.html

html1.html

<html>
  <head>
    <link href="blah.css" rel="stylesheet" type="text/css" />
  </head>
  <body>
    <div>this here be a div, y'all</div>
  </body>
</html>

html2.html

html2.html

<html>
  <head>
    <script src="blah.js"></script>
  </head>
  <body>
    <span>this here be a span, y'all</span>
  </body>
</html>

I want to take these two files and make a master file that would look like this:

我想获取这两个文件并制作一个如下所示的主文件:

<html>
  <head>
    <link href="blah.css" rel="stylesheet" type="text/css" />
    <script src="blah.js"></script>
  </head>
  <body>
    <div>this here be a div, y'all</div>
    <span>this here be a span, y'all</span>
  </body>
</html>

Is this possible using a simple Linux command? I've tried looking at join, but it looks like that joins on a common field, and I'm not necessarily going to have common fields... I just need to basically add the difference, but also have the main structure still intact (I guess this could be referred to as a left-join?). Doesn't look like catwill work either... as that merges by appending one file, then the next, etc.

这可以使用简单的 Linux 命令吗?我试过查看join,但它看起来像在公共字段上连接,而且我不一定会有公共字段......我只需要基本上添加差异,但主要结构仍然完好无损(我想这可以称为左连接?)。看起来cat也不行……因为通过附加一个文件然后下一个文件来合并,等等。

If there isn't a simple Linux command, my next step is to either write a script that compares both scripts line by line, or create a master HTML file that references these two individual files somehow.

如果没有简单的 Linux 命令,我的下一步是编写一个脚本来逐行比较两个脚本,或者创建一个以某种方式引用这两个单独文件的主 HTML 文件。

采纳答案by Robin Green

Your example files are well-formed XHTML. Excellent! This means you can use a simple XSLT script. See How to merge two XML files with XSLT

您的示例文件是格式良好的 XHTML。优秀!这意味着您可以使用简单的 XSLT 脚本。请参阅如何使用 XSLT 合并两个 XML 文件

回答by bkxp

You can use html-merge tool to merge multiple HTML files preserving their internal hypertext links. It's a win32 program, but you can run it in linux using Wine. Download page: https://sourceforge.net/projects/htmlmg/files/

您可以使用 html-merge 工具合并多个 HTML 文件,保留其内部超文本链接。这是一个 win32 程序,但您可以使用 Wine 在 linux 中运行它。下载页面:https: //sourceforge.net/projects/htmlmg/files/

回答by Lars Bilke

Use pandocto merge e.g. all html-files in the current directory:

使用pandoc合并当前目录中的所有 html 文件:

pandoc -s *.html -o output.html

回答by Robin Dinse

Here is a simple solution that uses Python's lxmllibrary, though it will only copy element children of the bodytag selected child::*, not text nodes, which would require a modification child::node()and some extra logic for dealing with appending text nodes.

这是一个使用 Pythonlxml库的简单解决方案,尽管它只会复制bodyselected 标签的元素子元素child::*,而不是文本节点,这需要修改child::node()和一些额外的逻辑来处理附加文本节点。

#!/usr/bin/python3
import sys, os
from lxml.html import tostring, parse

if len(sys.argv) < 2:
  print("Usage: merge.py [file1] ... [filen] [outfile]")

if os.path.isfile(sys.argv[-1]):
   if input('Override? (y/n) ' + sys.argv[-1]) != 'y':
      sys.exit(0)

def tostr(n):
  try:
    return tostring(n)
  except:
    return str(n)

tree = parse(sys.argv[1])
for f in sys.argv[2:-1]:
  print(f)
  tree2 = parse(f)
  for n in tree2.xpath('//head/child::*'):
     if all([tostr(n) != tostr(n2)\
        for n2 in tree2.xpath('//head/child::*')]):
       tree.xpath('//head')[0].append(n)
  for n in tree2.xpath('//body/child::*'):
     tree.xpath('//body')[0].append(n)

tree.write(sys.argv[-1])

Save this to a file merge.pyand run chmod +x merge.py.

将其保存到文件merge.py并运行chmod +x merge.py.

Usage: merge.py [file1] ... [filen] [outfile]

If it fails, one or more files are malformed and need to be fixed either manually or with htmllintor hxnormalize.

如果失败,则一个或多个文件格式错误,需要手动或使用htmllint或进行修复hxnormalize