ios 从 iPhone 上的 NSString 中删除 HTML 标签
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/277055/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Remove HTML Tags from an NSString on the iPhone
提问by lfalin
There are a couple of different ways to remove HTML tags
from an NSString
in Cocoa
.
有几种不同的方法可以HTML tags
从NSString
in 中删除Cocoa
。
One wayis to render the string into an NSAttributedString
and then grab the rendered text.
一种方法是将字符串呈现为NSAttributedString
,然后抓取呈现的文本。
Another wayis to use NSXMLDocument's
-objectByApplyingXSLTString
method to apply an XSLT
transform that does it.
另一种方法是使用NSXMLDocument's
-objectByApplyingXSLTString
方法来应用XSLT
执行它的转换。
Unfortunately, the iPhone doesn't support NSAttributedString
or NSXMLDocument
. There are too many edge cases and malformed HTML
documents for me to feel comfortable using regex or NSScanner
. Does anyone have a solution to this?
不幸的是,iPhone 不支持NSAttributedString
或NSXMLDocument
。有太多的边缘情况和格式错误的HTML
文档让我觉得使用正则表达式或NSScanner
. 有没有人有解决方案?
One suggestion has been to simply look for opening and closing tag characters, this method won't work except for very trivial cases.
一个建议是简单地寻找开始和结束标记字符,这种方法除了非常微不足道的情况外不起作用。
For example these cases (from the Perl Cookbook chapter on the same subject) would break this method:
例如,这些情况(来自 Perl Cookbook 中关于同一主题的章节)会破坏这种方法:
<IMG SRC = "foo.gif" ALT = "A > B">
<!-- <A comment> -->
<script>if (a<b && a>c)</script>
<![INCLUDE CDATA [ >>>>>>>>>>>> ]]>
回答by m.kocikowski
A quick and "dirty" (removes everything between < and >) solution, works with iOS >= 3.2:
快速且“脏”(删除 < 和 > 之间的所有内容)解决方案,适用于 iOS >= 3.2:
-(NSString *) stringByStrippingHTML {
NSRange r;
NSString *s = [[self copy] autorelease];
while ((r = [s rangeOfString:@"<[^>]+>" options:NSRegularExpressionSearch]).location != NSNotFound)
s = [s stringByReplacingCharactersInRange:r withString:@""];
return s;
}
I have this declared as a category os NSString.
我将此声明为类别 os NSString。
回答by Leigh McCulloch
This NSString
category uses the NSXMLParser
to accurately remove any HTML
tags from an NSString
. This is a single .m
and .h
file that can be included into your project easily.
此NSString
类别使用NSXMLParser
来准确地HTML
从NSString
. 这是一个.m
和.h
可以很容易地纳入您的项目文件。
https://gist.github.com/leighmcculloch/1202238
https://gist.github.com/leighmcculloch/1202238
You then strip html
by doing the following:
然后html
通过执行以下操作进行剥离:
Import the header:
导入标题:
#import "NSString_stripHtml.h"
And then call stripHtml:
然后调用stripHtml:
NSString* mystring = @"<b>Hello</b> World!!";
NSString* stripped = [mystring stripHtml];
// stripped will be = Hello World!!
This also works with malformed HTML
that technically isn't XML
.
这也适用于HTML
技术上不是XML
.
回答by MANCHIKANTI KRISHNAKISHORE
UITextView *textview= [[UITextView alloc]initWithFrame:CGRectMake(10, 130, 250, 170)];
NSString *str = @"This is <font color='red'>simple</font>";
[textview setValue:str forKey:@"contentToHTMLString"];
textview.textAlignment = NSTextAlignmentLeft;
textview.editable = NO;
textview.font = [UIFont fontWithName:@"vardana" size:20.0];
[UIView addSubview:textview];
work fine for me
对我来说很好用
回答by Kirtikumar A.
You can use like below
你可以像下面这样使用
-(void)myMethod
{
NSString* htmlStr = @"<some>html</string>";
NSString* strWithoutFormatting = [self stringByStrippingHTML:htmlStr];
}
-(NSString *)stringByStrippingHTML:(NSString*)str
{
NSRange r;
while ((r = [str rangeOfString:@"<[^>]+>" options:NSRegularExpressionSearch]).location != NSNotFound)
{
str = [str stringByReplacingCharactersInRange:r withString:@""];
}
return str;
}
回答by Mohamed AHDIDOU
use this
用这个
NSString *myregex = @"<[^>]*>"; //regex to remove any html tag
NSString *htmlString = @"<html>bla bla</html>";
NSString *stringWithoutHTML = [hstmString stringByReplacingOccurrencesOfRegex:myregex withString:@""];
don't forget to include this in your code : #import "RegexKitLite.h" here is the link to download this API : http://regexkit.sourceforge.net/#Downloads
不要忘记将其包含在您的代码中:#import "RegexKitLite.h" 这里是下载此 API 的链接:http://regexkit.sourceforge.net/#Downloads
回答by Colin Barrett
Take a look at NSXMLParser. It's a SAX-style parser. You should be able to use it to detect tags or other unwanted elements in the XML document and ignore them, capturing only pure text.
看看 NSXMLParser。这是一个 SAX 风格的解析器。您应该能够使用它来检测 XML 文档中的标签或其他不需要的元素并忽略它们,只捕获纯文本。
回答by hpique
Here's a more efficient solution than the accepted answer:
这是一个比公认的答案更有效的解决方案:
- (NSString*)hp_stringByRemovingTags
{
static NSRegularExpression *regex = nil;
static dispatch_once_t onceToken;
dispatch_once(&onceToken, ^{
regex = [NSRegularExpression regularExpressionWithPattern:@"<[^>]+>" options:kNilOptions error:nil];
});
// Use reverse enumerator to delete characters without affecting indexes
NSArray *matches =[regex matchesInString:self options:kNilOptions range:NSMakeRange(0, self.length)];
NSEnumerator *enumerator = matches.reverseObjectEnumerator;
NSTextCheckingResult *match = nil;
NSMutableString *modifiedString = self.mutableCopy;
while ((match = [enumerator nextObject]))
{
[modifiedString deleteCharactersInRange:match.range];
}
return modifiedString;
}
The above NSString
category uses a regular expression to find all the matching tags, makes a copy of the original string and finally removes all the tags in place by iterating over them in reverse order. It's more efficient because:
上述NSString
类别使用正则表达式查找所有匹配的标签,制作原始字符串的副本,最后通过以相反的顺序迭代它们来删除所有原位标签。它更有效,因为:
- The regular expression is initialised only once.
- A single copy of the original string is used.
- 正则表达式只初始化一次。
- 使用原始字符串的单个副本。
This performed well enough for me but a solution using NSScanner
might be more efficient.
这对我来说表现得足够好,但使用解决方案NSScanner
可能更有效。
Like the accepted answer, this solution doesn't address all the border cases requested by @lfalin. Those would be require much more expensive parsing which the average use case most likely doesn't need.
与接受的答案一样,此解决方案并未解决@lfalin 要求的所有边界情况。这些将需要更昂贵的解析,而普通用例很可能不需要。
回答by Rémy
Without a loop (at least on our side) :
没有循环(至少在我们这边):
- (NSString *)removeHTML {
static NSRegularExpression *regexp;
static dispatch_once_t onceToken;
dispatch_once(&onceToken, ^{
regexp = [NSRegularExpression regularExpressionWithPattern:@"<[^>]+>" options:kNilOptions error:nil];
});
return [regexp stringByReplacingMatchesInString:self
options:kNilOptions
range:NSMakeRange(0, self.length)
withTemplate:@""];
}
回答by Pavan Sisode
NSAttributedString *str=[[NSAttributedString alloc] initWithData:[trimmedString dataUsingEncoding:NSUTF8StringEncoding] options:@{NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType, NSCharacterEncodingDocumentAttribute: [NSNumber numberWithInt:NSUTF8StringEncoding]} documentAttributes:nil error:nil];
回答by Jim Liu
#import "RegexKitLite.h"
string text = [html stringByReplacingOccurrencesOfRegex:@"<[^>]+>" withString:@""]