Objective C HTML 转义/转义

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/659602/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-28 23:28:45  来源:igfitidea点击:

Objective C HTML escape/unescape

iphonehtmlobjective-ccocoa-touchescaping

提问by Alex Wayne

Wondering if there is an easy way to do a simple HTML escape/unescape in Objective C. What I want is something like this psuedo code:

想知道是否有一种简单的方法可以在 Objective C 中进行简单的 HTML 转义/转义。我想要的是这样的伪代码:

NSString *string = @"<span>Foo</span>";
[string stringByUnescapingHTML];

Which returns

哪个返回

<span>Foo</span>

Hopefully unescaping all other HTML entities as well and even ASCII codes like Ӓ and the like.

希望也能转义所有其他 HTML 实体,甚至像 Ӓ 之类的 ASCII 代码。

Is there any methods in Cocoa Touch/UIKit to do this?

Cocoa Touch/UIKit 中是否有任何方法可以做到这一点?

采纳答案by Andrew Grant

This linkcontains the solution below. Cocoa CF has the CFXMLCreateStringByUnescapingEntities function but that's not available on the iPhone.

链接包含以下解决方案。Cocoa CF 具有 CFXMLCreateStringByUnescapingEntities 函数,但在 iPhone 上不可用。

@interface MREntitiesConverter : NSObject <NSXMLParserDelegate>{
    NSMutableString* resultString;
}

@property (nonatomic, retain) NSMutableString* resultString;

- (NSString*)convertEntitiesInString:(NSString*)s;

@end


@implementation MREntitiesConverter

@synthesize resultString;

- (id)init
{
    if([super init]) {
        resultString = [[NSMutableString alloc] init];
    }
    return self;
}

- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)s {
        [self.resultString appendString:s];
}

- (NSString*)convertEntitiesInString:(NSString*)s {
    if (!s) {
        NSLog(@"ERROR : Parameter string is nil");
    }
    NSString* xmlStr = [NSString stringWithFormat:@"<d>%@</d>", s];
    NSData *data = [xmlStr dataUsingEncoding:NSUTF8StringEncoding allowLossyConversion:YES];
    NSXMLParser* xmlParse = [[[NSXMLParser alloc] initWithData:data] autorelease];
    [xmlParse setDelegate:self];
    [xmlParse parse];
    return [NSString stringWithFormat:@"%@",resultString];
}

- (void)dealloc {
    [resultString release];
    [super dealloc];
}

@end

回答by Michael Waterfall

Check out my NSString category for XMLEntities. There's methods to decode XML entities (including all HTML character references), encode XML entities, stripping tags and removing newlines and whitespace from a string:

查看我的 XMLEntities 的 NSString 类别。有一些方法可以解码 XML 实体(包括所有 HTML 字符引用)、编码 XML 实体、剥离标签以及从字符串中删除换行符和空格:

- (NSString *)stringByStrippingTags;
- (NSString *)stringByDecodingXMLEntities; // Including all HTML character references
- (NSString *)stringByEncodingXMLEntities;
- (NSString *)stringWithNewLinesAsBRs;
- (NSString *)stringByRemovingNewLinesAndWhitespace;

回答by Nikita Rybak

Another HTML NSString category from Google Toolbox for Mac
Despite the name, this works on iOS too.

来自Google Toolbox for Mac 的另一个 HTML NSString 类别
尽管名称如此,但它也适用于 iOS。

http://google-toolbox-for-mac.googlecode.com/svn/trunk/Foundation/GTMNSString+HTML.h

http://google-toolbox-for-mac.googlecode.com/svn/trunk/Foundation/GTMNSString+HTML.h

/// Get a string where internal characters that are escaped for HTML are unescaped 
//
///  For example, '&amp;' becomes '&'
///  Handles &#32; and &#x32; cases as well
///
//  Returns:
//    Autoreleased NSString
//
- (NSString *)gtm_stringByUnescapingFromHTML;

And I had to include only three files in the project: header, implementation and GTMDefines.h.

我只需要在项目中包含三个文件:头文件、实现文件和GTMDefines.h.

回答by Andrew Kozlik

This is an incredibly hacked together solution I did, but if you want to simply escape a string without worrying about parsing, do this:

这是我所做的一个令人难以置信的组合解决方案,但是如果您想简单地转义字符串而不用担心解析,请执行以下操作:

-(NSString *)htmlEntityDecode:(NSString *)string
    {
        string = [string stringByReplacingOccurrencesOfString:@"&quot;" withString:@"\""];
        string = [string stringByReplacingOccurrencesOfString:@"&apos;" withString:@"'"];
        string = [string stringByReplacingOccurrencesOfString:@"&lt;" withString:@"<"];
        string = [string stringByReplacingOccurrencesOfString:@"&gt;" withString:@">"];
        string = [string stringByReplacingOccurrencesOfString:@"&amp;" withString:@"&"]; // Do this last so that, e.g. @"&amp;lt;" goes to @"&lt;" not @"<"

        return string;
    }

I know it's by no means elegant, but it gets the job done. You can then decode an element by calling:

我知道这绝不是优雅的,但它完成了工作。然后,您可以通过调用来解码元素:

string = [self htmlEntityDecode:string];

Like I said, it's hacky but it works. IF you want to encode a string, just reverse the stringByReplacingOccurencesOfString parameters.

就像我说的,它很hacky,但它有效。如果要对字符串进行编码,只需反转 stringByReplacingOccurencesOfString 参数即可。

回答by orj

In iOS 7 you can use NSAttributedString's ability to import HTML to convert HTML entities to an NSString.

在 iOS 7 中,您可以使用 NSAttributedString 导入 HTML 的功能将 HTML 实体转换为 NSString。

Eg:

例如:

@interface NSAttributedString (HTML)
+ (instancetype)attributedStringWithHTMLString:(NSString *)htmlString;
@end

@implementation NSAttributedString (HTML)
+ (instancetype)attributedStringWithHTMLString:(NSString *)htmlString
{
    NSDictionary *options = @{ NSDocumentTypeDocumentAttribute : NSHTMLTextDocumentType,
                               NSCharacterEncodingDocumentAttribute :@(NSUTF8StringEncoding) };

    NSData *data = [htmlString dataUsingEncoding:NSUTF8StringEncoding];

    return [[NSAttributedString alloc] initWithData:data options:options documentAttributes:nil error:nil];
}

@end

Then in your code when you want to clean up the entities:

然后在您想要清理实体时的代码中:

NSString *cleanString = [[NSAttributedString attributedStringWithHTMLString:question.title] string];

This is probably the simplest way, but I don't know how performant it is. You should probably be pretty damn sure the content your "cleaning" doesn't contain any <img>tags or stuff like that because this method will download those images during the HTML to NSAttributedString conversion. :)

这可能是最简单的方法,但我不知道它的性能如何。您应该非常确定您的“清理”内容不包含任何<img>标签或类似内容,因为此方法将在 HTML 到 NSAttributedString 转换期间下载这些图像。:)

回答by BadPirate

Here's a solution that neutralizes all characters (by making them all HTML encoded entities for their unicode value)... Used this for my need (making sure a string that came from the user but was placed inside of a webview couldn't have any XSS attacks):

这是一个中和所有字符的解决方案(通过使它们成为其 unicode 值的所有 HTML 编码实体)......根据我的需要使用它(确保来自用户但放置在 webview 中的字符串不能有任何XSS 攻击):

Interface:

界面:

@interface NSString (escape)
- (NSString*)stringByEncodingHTMLEntities;
@end

Implementation:

执行:

@implementation NSString (escape)

- (NSString*)stringByEncodingHTMLEntities {
    // Rather then mapping each individual entity and checking if it needs to be replaced, we simply replace every character with the hex entity

    NSMutableString *resultString = [NSMutableString string];
    for(int pos = 0; pos<[self length]; pos++)
        [resultString appendFormat:@"&#x%x;",[self characterAtIndex:pos]];
    return [NSString stringWithString:resultString];
}

@end

Usage Example:

用法示例:

UIWebView *webView = [[UIWebView alloc] init];
NSString *userInput = @"<script>alert('This is an XSS ATTACK!');</script>";
NSString *safeInput = [userInput stringByEncodingHTMLEntities];
[webView loadHTMLString:safeInput baseURL:nil];

Your mileage will vary.

你的里程会有所不同。

回答by T Blank

The least invasive and most lightweight way to encode and decode HTML or XML strings is to use the GTMNSStringHTMLAdditions CocoaPod.

对 HTML 或 XML 字符串进行编码和解码的最小侵入性和最轻量级的方法是使用GTMNSStringHTMLAdditions CocoaPod

It is simply the Google Toolbox for Mac NSString category GTMNSString+HTML, stripped of the dependency on GTMDefines.h. So all you need to add is one .h and one .m, and you're good to go.

它只是 Google Toolbox for Mac NSString 类别GTMNSString+HTML,去掉了对GTMDefines.h. 所以你需要添加的只是一个 .h 和一个 .m,你就可以开始了。

Example:

例子:

#import "GTMNSString+HTML.h"

// Encoding a string with XML / HTML elements
NSString *stringToEncode = @"<TheBeat>Goes On</TheBeat>";
NSString *encodedString = [stringToEncode gtm_stringByEscapingForHTML];

// encodedString looks like this now:
// &lt;TheBeat&gt;Goes On&lt;/TheBeat&gt;

// Decoding a string with XML / HTML encoded elements
NSString *stringToDecode = @"&lt;TheBeat&gt;Goes On&lt;/TheBeat&gt;";
NSString *decodedString = [stringToDecode gtm_stringByUnescapingFromHTML];

// decodedString looks like this now:
// <TheBeat>Goes On</TheBeat>

回答by Blago

This is an easy to use NSString category implementation:

这是一个易于使用的 NSString 类别实现:

It is far from complete but you can add some missing entities from here: http://code.google.com/p/statz/source/browse/trunk/NSString%2BHTML.m

它远未完成,但您可以从这里添加一些缺少的实体:http: //code.google.com/p/statz/source/browse/trunk/NSString%2BHTML.m

Usage:

用法:

#import "NSString+HTML.h"

NSString *raw = [NSString stringWithFormat:@"<div></div>"];
NSString *escaped = [raw htmlEscapedString];

回答by Brain2000

The MREntitiesConverter above is an HTML stripper, not encoder.

上面的 MREntitiesConverter 是一个 HTML 剥离器,而不是编码器。

If you need an encoder, go here: Encode NSString for XML/HTML

如果您需要编码器,请转到此处:为 XML/HTML 编码 NSString

回答by diadyne

If you need to generate a literal you might consider using a tool like this:

如果您需要生成文字,您可以考虑使用这样的工具:

http://www.freeformatter.com/java-dotnet-escape.html#ad-output

http://www.freeformatter.com/java-dotnet-escape.html#ad-output

to accomplish the work for you.

为您完成工作。

See also this answer.

另请参阅此答案