Linux 将两个 HTML 文件合并到主 HTML 文件中

Question

提问by incutonez

Let's say I have the following HTML files:

假设我有以下 HTML 文件：

html1.html

<html>
  <head>
    <link href="blah.css" rel="stylesheet" type="text/css" />
  </head>
  <body>
    <div>this here be a div, y'all</div>
  </body>
</html>

html2.html

<html>
  <head>
    <script src="blah.js"></script>
  </head>
  <body>
    <span>this here be a span, y'all</span>
  </body>
</html>

I want to take these two files and make a master file that would look like this:

我想获取这两个文件并制作一个如下所示的主文件：

<html>
  <head>
    <link href="blah.css" rel="stylesheet" type="text/css" />
    <script src="blah.js"></script>
  </head>
  <body>
    <div>this here be a div, y'all</div>
    <span>this here be a span, y'all</span>
  </body>
</html>

Is this possible using a simple Linux command? I've tried looking at join, but it looks like that joins on a common field, and I'm not necessarily going to have common fields... I just need to basically add the difference, but also have the main structure still intact (I guess this could be referred to as a left-join?). Doesn't look like catwill work either... as that merges by appending one file, then the next, etc.

这可以使用简单的 Linux 命令吗？我试过查看join，但它看起来像在公共字段上连接，而且我不一定会有公共字段......我只需要基本上添加差异，但主要结构仍然完好无损（我想这可以称为左连接？）。看起来cat也不行……因为通过附加一个文件然后下一个文件来合并，等等。

If there isn't a simple Linux command, my next step is to either write a script that compares both scripts line by line, or create a master HTML file that references these two individual files somehow.

如果没有简单的 Linux 命令，我的下一步是编写一个脚本来逐行比较两个脚本，或者创建一个以某种方式引用这两个单独文件的主 HTML 文件。

Answer 1

采纳答案by Robin Green

Your example files are well-formed XHTML. Excellent! This means you can use a simple XSLT script. See How to merge two XML files with XSLT

您的示例文件是格式良好的 XHTML。优秀！这意味着您可以使用简单的 XSLT 脚本。请参阅如何使用 XSLT 合并两个 XML 文件

Answer 2

回答by bkxp

You can use html-merge tool to merge multiple HTML files preserving their internal hypertext links. It's a win32 program, but you can run it in linux using Wine. Download page: https://sourceforge.net/projects/htmlmg/files/

您可以使用 html-merge 工具合并多个 HTML 文件，保留其内部超文本链接。这是一个 win32 程序，但您可以使用 Wine 在 linux 中运行它。下载页面：https: //sourceforge.net/projects/htmlmg/files/

Answer 3

回答by Lars Bilke

Use pandocto merge e.g. all html-files in the current directory:

使用pandoc合并当前目录中的所有 html 文件：

pandoc -s *.html -o output.html

Answer 4

回答by Robin Dinse

Here is a simple solution that uses Python's lxmllibrary, though it will only copy element children of the bodytag selected child::*, not text nodes, which would require a modification child::node()and some extra logic for dealing with appending text nodes.

这是一个使用 Pythonlxml库的简单解决方案，尽管它只会复制bodyselected 标签的元素子元素child::*，而不是文本节点，这需要修改child::node()和一些额外的逻辑来处理附加文本节点。

#!/usr/bin/python3
import sys, os
from lxml.html import tostring, parse

if len(sys.argv) < 2:
  print("Usage: merge.py [file1] ... [filen] [outfile]")

if os.path.isfile(sys.argv[-1]):
   if input('Override? (y/n) ' + sys.argv[-1]) != 'y':
      sys.exit(0)

def tostr(n):
  try:
    return tostring(n)
  except:
    return str(n)

tree = parse(sys.argv[1])
for f in sys.argv[2:-1]:
  print(f)
  tree2 = parse(f)
  for n in tree2.xpath('//head/child::*'):
     if all([tostr(n) != tostr(n2)\
        for n2 in tree2.xpath('//head/child::*')]):
       tree.xpath('//head')[0].append(n)
  for n in tree2.xpath('//body/child::*'):
     tree.xpath('//body')[0].append(n)

tree.write(sys.argv[-1])

Save this to a file merge.pyand run chmod +x merge.py.

将其保存到文件merge.py并运行chmod +x merge.py.

Usage: merge.py [file1] ... [filen] [outfile]

If it fails, one or more files are malformed and need to be fixed either manually or with htmllintor hxnormalize.

如果失败，则一个或多个文件格式错误，需要手动或使用htmllint或进行修复hxnormalize。

Linux 将两个 HTML 文件合并到主 HTML 文件中

提问by incutonez

采纳答案by Robin Green

回答by bkxp

回答by Lars Bilke

回答by Robin Dinse

相关推荐

最近更新

标签

Linux 将两个 HTML 文件合并到主 HTML 文件中

提问by incutonez

采纳答案by Robin Green

回答by bkxp

回答by Lars Bilke

回答by Robin Dinse

相关推荐

C# 如何从 asp.net Web 应用程序中选择文件夹或文件？

Linux 如何在curl中自动恢复中断的下载？

C# LINQ 中的 Where 谓词

Linux 从 PHP 运行 Python 脚本

相关推荐

最近更新

标签