ios 从 iPhone 上的 NSString 中删除 HTML 标签

Question

提问by lfalin

There are a couple of different ways to remove HTML tagsfrom an NSStringin Cocoa.

有几种不同的方法可以HTML tags从NSStringin 中删除Cocoa。

One wayis to render the string into an NSAttributedStringand then grab the rendered text.

一种方法是将字符串呈现为NSAttributedString，然后抓取呈现的文本。

Another wayis to use NSXMLDocument's-objectByApplyingXSLTStringmethod to apply an XSLTtransform that does it.

另一种方法是使用NSXMLDocument's-objectByApplyingXSLTString方法来应用XSLT执行它的转换。

Unfortunately, the iPhone doesn't support NSAttributedStringor NSXMLDocument. There are too many edge cases and malformed HTMLdocuments for me to feel comfortable using regex or NSScanner. Does anyone have a solution to this?

不幸的是，iPhone 不支持NSAttributedString或NSXMLDocument。有太多的边缘情况和格式错误的HTML文档让我觉得使用正则表达式或NSScanner. 有没有人有解决方案？

One suggestion has been to simply look for opening and closing tag characters, this method won't work except for very trivial cases.

一个建议是简单地寻找开始和结束标记字符，这种方法除了非常微不足道的情况外不起作用。

For example these cases (from the Perl Cookbook chapter on the same subject) would break this method:

例如，这些情况（来自 Perl Cookbook 中关于同一主题的章节）会破坏这种方法：

<IMG SRC = "foo.gif" ALT = "A > B">

<!-- <A comment> -->

<script>if (a<b && a>c)</script>

<![INCLUDE CDATA [ >>>>>>>>>>>> ]]>

Answer 1

回答by m.kocikowski

A quick and "dirty" (removes everything between < and >) solution, works with iOS >= 3.2:

快速且“脏”（删除 < 和 > 之间的所有内容）解决方案，适用于 iOS >= 3.2：

-(NSString *) stringByStrippingHTML {
  NSRange r;
  NSString *s = [[self copy] autorelease];
  while ((r = [s rangeOfString:@"<[^>]+>" options:NSRegularExpressionSearch]).location != NSNotFound)
    s = [s stringByReplacingCharactersInRange:r withString:@""];
  return s;
}

I have this declared as a category os NSString.

我将此声明为类别 os NSString。

Answer 2

回答by Leigh McCulloch

This NSStringcategory uses the NSXMLParserto accurately remove any HTMLtags from an NSString. This is a single .mand .hfile that can be included into your project easily.

此NSString类别使用NSXMLParser来准确地HTML从NSString. 这是一个.m和.h可以很容易地纳入您的项目文件。

https://gist.github.com/leighmcculloch/1202238

You then strip htmlby doing the following:

然后html通过执行以下操作进行剥离：

Import the header:

导入标题：

#import "NSString_stripHtml.h"

And then call stripHtml:

然后调用stripHtml：

NSString* mystring = @"<b>Hello</b> World!!";
NSString* stripped = [mystring stripHtml];
// stripped will be = Hello World!!

This also works with malformed HTMLthat technically isn't XML.

这也适用于HTML技术上不是XML.

Answer 3

回答by MANCHIKANTI KRISHNAKISHORE

UITextView *textview= [[UITextView alloc]initWithFrame:CGRectMake(10, 130, 250, 170)];
NSString *str = @"This is <font color='red'>simple</font>";
[textview setValue:str forKey:@"contentToHTMLString"];
textview.textAlignment = NSTextAlignmentLeft;
textview.editable = NO;
textview.font = [UIFont fontWithName:@"vardana" size:20.0];
[UIView addSubview:textview];

work fine for me

对我来说很好用

Answer 4

回答by Kirtikumar A.

You can use like below

你可以像下面这样使用

-(void)myMethod
 {

 NSString* htmlStr = @"<some>html</string>";
 NSString* strWithoutFormatting = [self stringByStrippingHTML:htmlStr];

 }

 -(NSString *)stringByStrippingHTML:(NSString*)str
 {
   NSRange r;
   while ((r = [str rangeOfString:@"<[^>]+>" options:NSRegularExpressionSearch]).location     != NSNotFound)
  {
     str = [str stringByReplacingCharactersInRange:r withString:@""];
 }
  return str;
 }

Answer 5

回答by Mohamed AHDIDOU

use this

用这个

NSString *myregex = @"<[^>]*>"; //regex to remove any html tag

NSString *htmlString = @"<html>bla bla</html>";
NSString *stringWithoutHTML = [hstmString stringByReplacingOccurrencesOfRegex:myregex withString:@""];

don't forget to include this in your code : #import "RegexKitLite.h" here is the link to download this API : http://regexkit.sourceforge.net/#Downloads

不要忘记将其包含在您的代码中：#import "RegexKitLite.h" 这里是下载此 API 的链接：http://regexkit.sourceforge.net/#Downloads

Answer 6

回答by Colin Barrett

Take a look at NSXMLParser. It's a SAX-style parser. You should be able to use it to detect tags or other unwanted elements in the XML document and ignore them, capturing only pure text.

看看 NSXMLParser。这是一个 SAX 风格的解析器。您应该能够使用它来检测 XML 文档中的标签或其他不需要的元素并忽略它们，只捕获纯文本。

Answer 7

回答by hpique

Here's a more efficient solution than the accepted answer:

这是一个比公认的答案更有效的解决方案：

- (NSString*)hp_stringByRemovingTags
{
    static NSRegularExpression *regex = nil;
    static dispatch_once_t onceToken;
    dispatch_once(&onceToken, ^{
        regex = [NSRegularExpression regularExpressionWithPattern:@"<[^>]+>" options:kNilOptions error:nil];
    });

    // Use reverse enumerator to delete characters without affecting indexes
    NSArray *matches =[regex matchesInString:self options:kNilOptions range:NSMakeRange(0, self.length)];
    NSEnumerator *enumerator = matches.reverseObjectEnumerator;

    NSTextCheckingResult *match = nil;
    NSMutableString *modifiedString = self.mutableCopy;
    while ((match = [enumerator nextObject]))
    {
        [modifiedString deleteCharactersInRange:match.range];
    }
    return modifiedString;
}

The above NSStringcategory uses a regular expression to find all the matching tags, makes a copy of the original string and finally removes all the tags in place by iterating over them in reverse order. It's more efficient because:

上述NSString类别使用正则表达式查找所有匹配的标签，制作原始字符串的副本，最后通过以相反的顺序迭代它们来删除所有原位标签。它更有效，因为：

The regular expression is initialised only once.
A single copy of the original string is used.

正则表达式只初始化一次。
使用原始字符串的单个副本。

This performed well enough for me but a solution using NSScannermight be more efficient.

这对我来说表现得足够好，但使用解决方案NSScanner可能更有效。

Like the accepted answer, this solution doesn't address all the border cases requested by @lfalin. Those would be require much more expensive parsing which the average use case most likely doesn't need.

与接受的答案一样，此解决方案并未解决@lfalin 要求的所有边界情况。这些将需要更昂贵的解析，而普通用例很可能不需要。

Answer 8

回答by Rémy

Without a loop (at least on our side) :

没有循环（至少在我们这边）：

- (NSString *)removeHTML {

    static NSRegularExpression *regexp;
    static dispatch_once_t onceToken;
    dispatch_once(&onceToken, ^{
        regexp = [NSRegularExpression regularExpressionWithPattern:@"<[^>]+>" options:kNilOptions error:nil];
    });

    return [regexp stringByReplacingMatchesInString:self
                                            options:kNilOptions
                                              range:NSMakeRange(0, self.length)
                                       withTemplate:@""];
}

Answer 9

回答by Pavan Sisode

NSAttributedString *str=[[NSAttributedString alloc] initWithData:[trimmedString dataUsingEncoding:NSUTF8StringEncoding] options:@{NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType, NSCharacterEncodingDocumentAttribute: [NSNumber numberWithInt:NSUTF8StringEncoding]} documentAttributes:nil error:nil];

Answer 10

回答by Jim Liu

#import "RegexKitLite.h"

string text = [html stringByReplacingOccurrencesOfRegex:@"<[^>]+>" withString:@""]

ios 从 iPhone 上的 NSString 中删除 HTML 标签

提问by lfalin

回答by m.kocikowski

回答by Leigh McCulloch

回答by MANCHIKANTI KRISHNAKISHORE

回答by Kirtikumar A.

回答by Mohamed AHDIDOU

回答by Colin Barrett

回答by hpique

回答by Rémy

回答by Pavan Sisode

回答by Jim Liu

相关推荐

最近更新

标签

ios 从 iPhone 上的 NSString 中删除 HTML 标签

提问by lfalin

回答by m.kocikowski

回答by Leigh McCulloch

回答by MANCHIKANTI KRISHNAKISHORE

回答by Kirtikumar A.

回答by Mohamed AHDIDOU

回答by Colin Barrett

回答by hpique

回答by Rémy

回答by Pavan Sisode

回答by Jim Liu

相关推荐

ios Objective-C 中的多个 NSURLConnection 委托

如何在 UIButton 上制作原生“脉冲效果”动画 - iOS

ios 如何从 iPhone 访问 SOAP 服务

ios 在 UIImageView 中的两个图像之间转换的最佳方法

相关推荐

最近更新

标签