Javascript:如何从网页中检索文本
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13205289/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Javascript: How to retrieve text from a webpage
提问by Qwertyfshag
I want to retrieve the text within a webpage as a string. Is this possible? I am new to Javascript.
我想将网页中的文本作为字符串检索。这可能吗?我是 Javascript 的新手。
For example:
例如:
var url = "http://en.wikipedia.org/wiki/Programming";
var result = url.getText(); <---- stores text as a string
document.write(result);
How do I write the getText method? Ether the entire HTML source code (which I can use to get the text) or just the text. I would like to do this from within a web browser.
我如何编写 getText 方法?Ether 整个 HTML 源代码(我可以用来获取文本)或仅文本。我想从 Web 浏览器中执行此操作。
I tried this and I am able to get an index number:
我试过这个,我能够得到一个索引号:
var url = "http://www.youtube.com/results?search_query=cat&page=2";
var result;
function go(){
result = url.search(/cat/i);
document.write(result);
}
This gives me an index of 44. That means that reading a page is possible. Can I do the opposite and enter the index to retrieve the text?
这给了我 44 的索引。这意味着可以阅读一页。我可以做相反的事情并输入索引来检索文本吗?
回答by Danny Hong
Ajax won't support cross domain. You need server side language.
Ajax 不支持跨域。您需要服务器端语言。
回答by psema4
If the Ajax/Cross-Domain situation is not an issue for you, you can extract the text of a web page with
如果 Ajax/跨域情况对您来说不是问题,您可以使用以下命令提取网页文本
var el = document.body; // or some other element reference
var text = el.innerText || el.textContent;
If you need to read text from pages in the same domain as your application, you can use Ajax directly.
如果需要从与应用程序相同域中的页面读取文本,可以直接使用 Ajax。
If you need to read text from pages outside of your domain, you'll have to jump through a few extra hoops like setting up a proxy server or dealing with CORS - http://en.wikipedia.org/wiki/Cross-origin_resource_sharing
如果您需要从域外的页面读取文本,则必须跳过一些额外的环节,例如设置代理服务器或处理 CORS - http://en.wikipedia.org/wiki/Cross-origin_resource_sharing
回答by Rayshawn
You would be better off using a more powerful server-side language to do that, not JavaScript. Python or PHP would be decent choices.
你最好使用更强大的服务器端语言来做到这一点,而不是 JavaScript。Python 或 PHP 将是不错的选择。