javascript 如何在不替换字符的情况下获取 iframe 的 body 标记中包含的 html?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7263808/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-25 23:30:28  来源:igfitidea点击:

How can I get the html contained in the body tag of an iframe without it replacing characters?

javascriptjqueryiframe

提问by Kevin B

I'm currently trying to get the contents of an iframe's body without any mangling of content by the browser.

我目前正在尝试获取 iframe 正文的内容,而无需浏览器对内容进行任何修改。

I could do it by including the content in a textarea, however I want to avoid that.

我可以通过在 textarea 中包含内容来做到这一点,但是我想避免这种情况。

using .innerHTML results in special characters such as <>and &being converted to &lt;, &gt;, and &amp;respectively.

使用.innerHTML结果在特殊字符,例如<>&被转换为&lt;&gt;,和&amp;分别。

To test, build an html file containing:

要进行测试,请构建一个包含以下内容的 html 文件:

{ 
 "id": 5, 
 "testtext":"I am > than this & < that", 
 "html":"<div>\"worky\"</div>" 
}

and then another page that includes that file in an iframe:

然后是在 iframe 中包含该文件的另一个页面:

<!doctype html>
<html>
  <head>
    <script src="http://code.jquery.com/jquery-latest.js"></script>
  </head>
  <body>
    <iframe id="myIframe" name="myIframe" src="test.html"></iframe><br />
    Result:<br />
    <textarea id='result'></textarea>
    <script>
      $("#myIframe").load(function(){
        var iframeBody = window.frames.myIframe.document
            .getElementsByTagName("body")[0], result;
        result = iframeBody.innerHTML;
        $("#result").val(result);
      });
    </script>
  </body>
</html>

I have tried this:

我试过这个:

result = $(iframeBody).contents().map(function(){
      return this.nodeValue ? this.nodeValue : this.innerHTML;
}).get().join("");

however it loses the div.

但是它失去了div。

EDIT:

编辑:

I have somewhat of a solution,

我有一些解决办法,

var iframeBody, result;
$("#myIframe").load(function(){
  iframeBody = window.frames.myIframe.document
    .getElementsByTagName("body")[0];
  result = $(iframeBody).contents().map(function(){
    if (this.nodeValue) {
        return this.nodeValue   
    }
    else {
        return $(this).clone().wrap('<p>').parent().html();
    }
  }).get().join("");
  $("#result").val(result);
});

However it will still encode things within the html that aren't html. I'm not sure if I'm ok with that.

但是它仍然会在 html 中编码不是 html 的东西。我不确定我是否同意。

EDIT AGAIN

再次编辑

Here's a little more context. I'm modifying a jquery iframe ajax transport to work without requiring a textarea in the iframe to hold the content when the content isn't html. For the most part it works fine without a textarea, however it ends up mangling any special html characters when you retrieve that text using .innerHTML. One way to avoid the mangling is to get the text using .nodeValue, however that doesn't work when you come across an html element. If you return json that contains an html string for whatever reason, it needs to be able to extract that json string exactly as it was returned within the iframe, meaning leaving all characters in-tact.

这里有更多的上下文。我正在修改 jquery iframe ajax 传输,当内容不是 html 时,不需要 iframe 中的 textarea 来保存内容。大多数情况下,它在没有 textarea 的情况下工作正常,但是当您使用 .innerHTML 检索该文本时,它最终会破坏任何特殊的 html 字符。避免重整的一种方法是使用 .nodeValue 获取文本,但是当您遇到 html 元素时这不起作用。如果您出于任何原因返回包含 html 字符串的 json,则它需要能够完全按照 iframe 中返回的方式提取该 json 字符串,这意味着要保留所有字符。

For the purpose of testing, this jsfiddle is enough of a test. Imagine that the div used in the fiddle is the body of the iframe and you can test the results in jsfiddle. The problem I'm having really has nothing to do with the iframe or it's load event.

为了测试的目的,这个 jsfiddle 已经足够测试了。假设fiddle中使用的div是iframe的body,可以在jsfiddle中测试结果。我遇到的问题实际上与 iframe 或其加载事件无关。

http://jsfiddle.net/P623a/2/

http://jsfiddle.net/P623a/2/

In that fiddle, the only issue is the & being converted to & inside of the div within the json.

在那个小提琴中,唯一的问题是 & 被转换为 & 在 json 中的 div 内。

Solution

解决方案

I'm going to just require that the page is properly encoded (application/json, script, or plain/text) if the response is json/jsonp/script and contains a dom element. If it isn't properly encoded under those conditions, the error handler is triggered.

如果响应是 json/jsonp/script 并且包含一个 dom 元素,我将只要求页面正确编码(应用程序/json、脚本或纯文本/文本)。如果在这些条件下未正确编码,则会触发错误处理程序。

When encoded properly, the iframe will end up having a body tag that contains <pre>your content</pre>which you can get the content of using .innerTextwhile preserving the special characters.

正确编码后,iframe 将最终有一个 body 标签,其中包含<pre>your content</pre>您可以.innerText在保留特殊字符的同时获取使用内容的内容。

回答by s4y

The browser is interpreting the data in the iframe as HTML and, as far as I know, there is noway to get at the original text (à la view source).

浏览器将 iframe 中的数据解释为 HTML,据我所知,无法获取原始文本(查看源代码)。

Here are the options I can come up with:

以下是我能想到的选项:

  • Make the response valid HTML — wrap it in a document and encode the data you want, something like this:

    <!DOCTYPE html>
    <html>
    <head>
    <body>
    { 
     "id": 5, 
     "testtext":"I am &gt; than this &amp; &lt; that", 
     "html":"&lt;div&gt;\"worky\"&lt;/div&gt;" 
    }
    
  • Send your response with a MIME type that doesn'tget interpreted as HTML, like application/jsonor text/plain. The browser will probably build a document around it (putting the data in, say, a pre) and you can get at it the same way.

  • 使响应成为有效的 HTML — 将其包装在一个文档中并编码您想要的数据,如下所示:

    <!DOCTYPE html>
    <html>
    <head>
    <body>
    { 
     "id": 5, 
     "testtext":"I am &gt; than this &amp; &lt; that", 
     "html":"&lt;div&gt;\"worky\"&lt;/div&gt;" 
    }
    
  • 使用不会被解释为 HTML的 MIME 类型发送您的响应,例如application/jsontext/plain。浏览器可能会围绕它构建一个文档(将数据放入 a 中pre),您可以通过相同的方式获取它。

In either case, you can get at the innerText(or textContent, depending on browser) of the document or the nodeValueof the text node which contains your data, like this:

在任何一种情况下,您都可以获取文档的innerText(或textContent,取决于浏览器)或nodeValue包含数据的文本节点的 ,如下所示:

var iframeBody = iframe.contentDocument.body,
    json = iframeBody.textContent || iframeBody.innerText;

回答by Rusty Jeans

The code you have in test1.html has no "body", you can't .getElementsByTagName("body")if there's not body. Try:

您在 test1.html 中的代码没有“正文”,.getElementsByTagName("body")如果没有正文,则不能。尝试:

$("#myIframe").load(function(){
    $("#result").val($(this).contents().text());
});

回答by ShankarSangoli

You are setting the iframeload event handler after iframe tag which already has the source. So its quiet possible that iframe gets loaded before the loadevent handler is attached. I am not saying this is the issue but this will create an issue if the iframe loads quickly. You can provide a inline load event handler in the iframetag itself.

您在iframe已经具有源的 iframe 标记之后设置加载事件处理程序。因此,在load附加事件处理程序之前加载 iframe 可能很安静。我并不是说这是问题所在,但如果 iframe 加载速度很快,这会产生问题。您可以在iframe标签本身中提供内联加载事件处理程序。

Try this

试试这个

<!doctype html>
<html>
  <head>
    <script src="http://code.jquery.com/jquery-latest.js"></script>
    <script type="text/javascript">
    function copyIframeContent(iframe){
        var iframeContent = $(iframe).contents();
        $("#result").html(iframeContent.find('body').html());
    }
    </script>
  </head>
  <body>
    <iframe id="myIframe" onload="copyIframeContent(this);" name="myIframe" src="test.html"></iframe><br />
    Result:<br />
    <textarea id='result'></textarea>
  </body>
</html>

I hope this helps you.

我希望这可以帮助你。

回答by dmidz

I think you have to first try with a valid html if you plan to use nodeValue or else, you can't just assume that the browser will add the body for you, this is not html at all :

我认为,如果您打算使用 nodeValue,则必须首先尝试使用有效的 html,否则,您不能假设浏览器会为您添加正文,这根本不是 html:

{ 
 "id": 5, 
 "testtext":"I am > than this & < that", 
 "html":"<div>\"worky\"</div>" 
}

It is weird to try parse a dom that is not html ! The fact is if you want to get any chance to manipulate or traverse with jQuery you must at least wrap all things in one grand container like :

尝试解析一个不是 html 的 dom 是很奇怪的!事实是,如果您想获得任何机会使用 jQuery 进行操作或遍历,您必须至少将所有内容包装在一个大容器中,例如:

<div>
// even if you don't want use body or html tag, things must be wrapped here
</div>

I think there is a problem of misconception of what and how you are trying to acomplish your needs, shouldn't be easier to load some json (like you wrote) ?! you are trying to roll a cube...if you wan't to parse your pure datas trought dom anyway, you can test something like this :

我认为存在对您尝试满足需求的内容和方式的误解的问题,加载一些 json(如您所写)不应该更容易吗?!你正在尝试滚动一个立方体......如果你无论如何都不想解析你的纯数据,你可以测试这样的东西:

<p>
<p>id<span>5</span></p>
<p>testtext<span>I "am" > than this & < that</span></p>
</p>

Of course you just can't insert html as plain text because how the browser is supposed to know what to do ? Just make a simple test :

当然,您不能将 html 作为纯文本插入,因为浏览器应该如何知道该怎么做?做一个简单的测试:

var div = $('<div/>').appendTo('body').html('I "am" > than this & < that');
console.log('plainText :', div.text(), ', html :', div.html());
// works as expected...

回答by Kevin M

Can you url encode your JSON string before you pass it to the iframe? For example... if you change your html string: "<div>\"worky\"</div>"to "&lt;div>\"worky\"&lt;/div>"it shows the div html properly. The div elements are being written to the dom when the iframe is loaded so you need to prevent it from parsing the html elements in your string properly.

您可以在将 JSON 字符串传递给 iframe 之前对其进行 url 编码吗?例如...如果你改变你的HTML字符串:"<div>\"worky\"</div>""&lt;div>\"worky\"&lt;/div>"它显示正确的DIV HTML。加载 iframe 时,div 元素将写入 dom,因此您需要防止它正确解析字符串中的 html 元素。