json 如何在 Swift 中解码 HTML 实体?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25607247/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I decode HTML entities in Swift?
提问by code_cookies
I am pulling a JSON file from a site and one of the strings received is:
我正在从站点中提取一个 JSON 文件,收到的字符串之一是:
The Weeknd ‘King Of The Fall’ [Video Premiere] | @TheWeeknd | #SoPhi
How can I convert things like ‘into the correct characters?
如何将诸如此类的内容‘转换为正确的字符?
I've made a Xcode Playground to demonstrate it:
我制作了一个 Xcode Playground 来演示它:
import UIKit
var error: NSError?
let blogUrl: NSURL = NSURL.URLWithString("http://sophisticatedignorance.net/api/get_recent_summary/")
let jsonData = NSData(contentsOfURL: blogUrl)
let dataDictionary = NSJSONSerialization.JSONObjectWithData(jsonData, options: nil, error: &error) as NSDictionary
var a = dataDictionary["posts"] as NSArray
println(a[0]["title"])
回答by akashivskyy
This answer was last revised for Swift 5.2 and iOS 13.4 SDK.
此答案最后为 Swift 5.2 和 iOS 13.4 SDK 修订。
There's no straightforward way to do that, but you can use NSAttributedStringmagic to make this process as painless as possible (be warned that this method will strip all HTML tags as well).
没有直接的方法可以做到这一点,但您可以使用NSAttributedString魔法使这个过程尽可能轻松(请注意,此方法也会删除所有 HTML 标记)。
Remember to initialize NSAttributedStringfrom main thread only. It uses WebKit to parse HTML underneath, thus the requirement.
请记住仅从主线程初始化NSAttributedString。它使用 WebKit 来解析下面的 HTML,因此需要。
// This is a[0]["title"] in your case
let encodedString = "The Weeknd <em>‘King Of The Fall’</em>"
guard let data = htmlEncodedString.data(using: .utf8) else {
return
}
let options: [NSAttributedString.DocumentReadingOptionKey: Any] = [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
]
guard let attributedString = try? NSAttributedString(data: data, options: options, documentAttributes: nil) else {
return
}
// The Weeknd ‘King Of The Fall'
let decodedString = attributedString.string
extension String {
init?(htmlEncodedString: String) {
guard let data = htmlEncodedString.data(using: .utf8) else {
return nil
}
let options: [NSAttributedString.DocumentReadingOptionKey: Any] = [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
]
guard let attributedString = try? NSAttributedString(data: data, options: options, documentAttributes: nil) else {
return nil
}
self.init(attributedString.string)
}
}
let encodedString = "The Weeknd <em>‘King Of The Fall’</em>"
let decodedString = String(htmlEncodedString: encodedString)
回答by Martin R
@akashivskyy's answer is great and demonstrates how to utilize NSAttributedStringto decode HTML entities. One possible disadvantage
(as he stated) is that allHTML markup is removed as well, so
@akashivskyy 的回答很好,并演示了如何利用NSAttributedString来解码 HTML 实体。一个可能的缺点(如他所说)是所有HTML 标记也被删除,所以
<strong> 4 < 5 & 3 > 2</strong>
becomes
变成
4 < 5 & 3 > 2
On OS X there is CFXMLCreateStringByUnescapingEntities()which does the job:
在 OS X 上,它CFXMLCreateStringByUnescapingEntities()可以完成这项工作:
let encoded = "<strong> 4 < 5 & 3 > 2 .</strong> Price: 12 €. @ "
let decoded = CFXMLCreateStringByUnescapingEntities(nil, encoded, nil) as String
println(decoded)
// <strong> 4 < 5 & 3 > 2 .</strong> Price: 12 . @
but this is not available on iOS.
但这在 iOS 上不可用。
Here is a pure Swift implementation. It decodes character entities
references like <using a dictionary, and all numeric character
entities like @or €. (Note that I did not list all
252 HTML entities explicitly.)
这是一个纯 Swift 实现。它像<使用字典一样解码字符实体引用,以及像@or一样的所有数字字符实体€。(请注意,我没有明确列出所有 252 个 HTML 实体。)
Swift 4:
斯威夫特 4:
// Mapping from XML/HTML character entity reference to character
// From http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
private let characterEntities : [ Substring : Character ] = [
// XML predefined entities:
""" : "\"",
"&" : "&",
"'" : "'",
"<" : "<",
">" : ">",
// HTML character entity references:
" " : "\u{00a0}",
// ...
"♦" : "?",
]
extension String {
/// Returns a new string made by replacing in the `String`
/// all HTML character entity references with the corresponding
/// character.
var stringByDecodingHTMLEntities : String {
// ===== Utility functions =====
// Convert the number in the string to the corresponding
// Unicode character, e.g.
// decodeNumeric("64", 10) --> "@"
// decodeNumeric("20ac", 16) --> ""
func decodeNumeric(_ string : Substring, base : Int) -> Character? {
guard let code = UInt32(string, radix: base),
let uniScalar = UnicodeScalar(code) else { return nil }
return Character(uniScalar)
}
// Decode the HTML character entity to the corresponding
// Unicode character, return `nil` for invalid input.
// decode("@") --> "@"
// decode("€") --> ""
// decode("<") --> "<"
// decode("&foo;") --> nil
func decode(_ entity : Substring) -> Character? {
if entity.hasPrefix("&#x") || entity.hasPrefix("&#X") {
return decodeNumeric(entity.dropFirst(3).dropLast(), base: 16)
} else if entity.hasPrefix("&#") {
return decodeNumeric(entity.dropFirst(2).dropLast(), base: 10)
} else {
return characterEntities[entity]
}
}
// ===== Method starts here =====
var result = ""
var position = startIndex
// Find the next '&' and copy the characters preceding it to `result`:
while let ampRange = self[position...].range(of: "&") {
result.append(contentsOf: self[position ..< ampRange.lowerBound])
position = ampRange.lowerBound
// Find the next ';' and copy everything from '&' to ';' into `entity`
guard let semiRange = self[position...].range(of: ";") else {
// No matching ';'.
break
}
let entity = self[position ..< semiRange.upperBound]
position = semiRange.upperBound
if let decoded = decode(entity) {
// Replace by decoded character:
result.append(decoded)
} else {
// Invalid entity, copy verbatim:
result.append(contentsOf: entity)
}
}
// Copy remaining characters to `result`:
result.append(contentsOf: self[position...])
return result
}
}
Example:
例子:
let encoded = "<strong> 4 < 5 & 3 > 2 .</strong> Price: 12 €. @ "
let decoded = encoded.stringByDecodingHTMLEntities
print(decoded)
// <strong> 4 < 5 & 3 > 2 .</strong> Price: 12 . @
Swift 3:
斯威夫特 3:
// Mapping from XML/HTML character entity reference to character
// From http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
private let characterEntities : [ String : Character ] = [
// XML predefined entities:
""" : "\"",
"&" : "&",
"'" : "'",
"<" : "<",
">" : ">",
// HTML character entity references:
" " : "\u{00a0}",
// ...
"♦" : "?",
]
extension String {
/// Returns a new string made by replacing in the `String`
/// all HTML character entity references with the corresponding
/// character.
var stringByDecodingHTMLEntities : String {
// ===== Utility functions =====
// Convert the number in the string to the corresponding
// Unicode character, e.g.
// decodeNumeric("64", 10) --> "@"
// decodeNumeric("20ac", 16) --> ""
func decodeNumeric(_ string : String, base : Int) -> Character? {
guard let code = UInt32(string, radix: base),
let uniScalar = UnicodeScalar(code) else { return nil }
return Character(uniScalar)
}
// Decode the HTML character entity to the corresponding
// Unicode character, return `nil` for invalid input.
// decode("@") --> "@"
// decode("€") --> ""
// decode("<") --> "<"
// decode("&foo;") --> nil
func decode(_ entity : String) -> Character? {
if entity.hasPrefix("&#x") || entity.hasPrefix("&#X"){
return decodeNumeric(entity.substring(with: entity.index(entity.startIndex, offsetBy: 3) ..< entity.index(entity.endIndex, offsetBy: -1)), base: 16)
} else if entity.hasPrefix("&#") {
return decodeNumeric(entity.substring(with: entity.index(entity.startIndex, offsetBy: 2) ..< entity.index(entity.endIndex, offsetBy: -1)), base: 10)
} else {
return characterEntities[entity]
}
}
// ===== Method starts here =====
var result = ""
var position = startIndex
// Find the next '&' and copy the characters preceding it to `result`:
while let ampRange = self.range(of: "&", range: position ..< endIndex) {
result.append(self[position ..< ampRange.lowerBound])
position = ampRange.lowerBound
// Find the next ';' and copy everything from '&' to ';' into `entity`
if let semiRange = self.range(of: ";", range: position ..< endIndex) {
let entity = self[position ..< semiRange.upperBound]
position = semiRange.upperBound
if let decoded = decode(entity) {
// Replace by decoded character:
result.append(decoded)
} else {
// Invalid entity, copy verbatim:
result.append(entity)
}
} else {
// No matching ';'.
break
}
}
// Copy remaining characters to `result`:
result.append(self[position ..< endIndex])
return result
}
}
Swift 2:
斯威夫特 2:
// Mapping from XML/HTML character entity reference to character
// From http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
private let characterEntities : [ String : Character ] = [
// XML predefined entities:
""" : "\"",
"&" : "&",
"'" : "'",
"<" : "<",
">" : ">",
// HTML character entity references:
" " : "\u{00a0}",
// ...
"♦" : "?",
]
extension String {
/// Returns a new string made by replacing in the `String`
/// all HTML character entity references with the corresponding
/// character.
var stringByDecodingHTMLEntities : String {
// ===== Utility functions =====
// Convert the number in the string to the corresponding
// Unicode character, e.g.
// decodeNumeric("64", 10) --> "@"
// decodeNumeric("20ac", 16) --> ""
func decodeNumeric(string : String, base : Int32) -> Character? {
let code = UInt32(strtoul(string, nil, base))
return Character(UnicodeScalar(code))
}
// Decode the HTML character entity to the corresponding
// Unicode character, return `nil` for invalid input.
// decode("@") --> "@"
// decode("€") --> ""
// decode("<") --> "<"
// decode("&foo;") --> nil
func decode(entity : String) -> Character? {
if entity.hasPrefix("&#x") || entity.hasPrefix("&#X"){
return decodeNumeric(entity.substringFromIndex(entity.startIndex.advancedBy(3)), base: 16)
} else if entity.hasPrefix("&#") {
return decodeNumeric(entity.substringFromIndex(entity.startIndex.advancedBy(2)), base: 10)
} else {
return characterEntities[entity]
}
}
// ===== Method starts here =====
var result = ""
var position = startIndex
// Find the next '&' and copy the characters preceding it to `result`:
while let ampRange = self.rangeOfString("&", range: position ..< endIndex) {
result.appendContentsOf(self[position ..< ampRange.startIndex])
position = ampRange.startIndex
// Find the next ';' and copy everything from '&' to ';' into `entity`
if let semiRange = self.rangeOfString(";", range: position ..< endIndex) {
let entity = self[position ..< semiRange.endIndex]
position = semiRange.endIndex
if let decoded = decode(entity) {
// Replace by decoded character:
result.append(decoded)
} else {
// Invalid entity, copy verbatim:
result.appendContentsOf(entity)
}
} else {
// No matching ';'.
break
}
}
// Copy remaining characters to `result`:
result.appendContentsOf(self[position ..< endIndex])
return result
}
}
回答by yishus
Swift 3version of @akashivskyy's extension,
@akashivskyy 扩展的Swift 3版本,
extension String {
init(htmlEncodedString: String) {
self.init()
guard let encodedData = htmlEncodedString.data(using: .utf8) else {
self = htmlEncodedString
return
}
let attributedOptions: [String : Any] = [
NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
NSCharacterEncodingDocumentAttribute: String.Encoding.utf8.rawValue
]
do {
let attributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil)
self = attributedString.string
} catch {
print("Error: \(error)")
self = htmlEncodedString
}
}
}
回答by AamirR
Swift 4
斯威夫特 4
- String extension computed variable
- Without extra guard, do, catch, etc...
- Returns the original strings if decoding fails
- 字符串扩展计算变量
- 没有额外的守卫,做,抓住,等等......
- 如果解码失败返回原始字符串
extension String {
var htmlDecoded: String {
let decoded = try? NSAttributedString(data: Data(utf8), options: [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
], documentAttributes: nil).string
return decoded ?? self
}
}
回答by Zaid Pathan
Swift 2version of @akashivskyy's extension,
@akashivskyy 扩展的Swift 2版本 ,
extension String {
init(htmlEncodedString: String) {
if let encodedData = htmlEncodedString.dataUsingEncoding(NSUTF8StringEncoding){
let attributedOptions : [String: AnyObject] = [
NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
NSCharacterEncodingDocumentAttribute: NSUTF8StringEncoding
]
do{
if let attributedString:NSAttributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil){
self.init(attributedString.string)
}else{
print("error")
self.init(htmlEncodedString) //Returning actual string if there is an error
}
}catch{
print("error: \(error)")
self.init(htmlEncodedString) //Returning actual string if there is an error
}
}else{
self.init(htmlEncodedString) //Returning actual string if there is an error
}
}
}
回答by pipizanzibar
Swift 4 Version
斯威夫特 4 版本
extension String {
init(htmlEncodedString: String) {
self.init()
guard let encodedData = htmlEncodedString.data(using: .utf8) else {
self = htmlEncodedString
return
}
let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
]
do {
let attributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil)
self = attributedString.string
}
catch {
print("Error: \(error)")
self = htmlEncodedString
}
}
}
回答by wLc
extension String{
func decodeEnt() -> String{
let encodedData = self.dataUsingEncoding(NSUTF8StringEncoding)!
let attributedOptions : [String: AnyObject] = [
NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
NSCharacterEncodingDocumentAttribute: NSUTF8StringEncoding
]
let attributedString = NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil, error: nil)!
return attributedString.string
}
}
let encodedString = "The Weeknd ‘King Of The Fall’"
let foo = encodedString.decodeEnt() /* The Weeknd ‘King Of The Fall' */
回答by Youming Lin
I was looking for a pure Swift 3.0 utility to escape to/unescape from HTML character references (i.e. for server-side Swift apps on both macOS and Linux) but didn't find any comprehensive solutions, so I wrote my own implementation: https://github.com/IBM-Swift/swift-html-entities
我一直在寻找一个纯 Swift 3.0 实用程序来转义/取消转义 HTML 字符引用(即适用于 macOS 和 Linux 上的服务器端 Swift 应用程序),但没有找到任何全面的解决方案,所以我编写了自己的实现:https: //github.com/IBM-Swift/swift-html-entities
The package, HTMLEntities, works with HTML4 named character references as well as hex/dec numeric character references, and it will recognize special numeric character references per the W3 HTML5 spec (i.e. €should be unescaped as the Euro sign (unicode U+20AC) and NOT as the unicode character for U+0080, and certain ranges of numeric character references should be replaced with the replacement character U+FFFDwhen unescaping).
包HTMLEntities使用 HTML4 命名字符引用以及十六进制/十进制数字字符引用,它将根据 W3 HTML5 规范识别特殊数字字符引用(即,€应该不转义为欧元符号 (unicode U+20AC) 而不是 unicode字符U+0080, 以及某些范围的数字字符引用U+FFFD在转义时应替换为替换字符)。
Usage example:
用法示例:
import HTMLEntities
// encode example
let html = "<script>alert(\"abc\")</script>"
print(html.htmlEscape())
// Prints ”<script>alert("abc")</script>"
// decode example
let htmlencoded = "<script>alert("abc")</script>"
print(htmlencoded.htmlUnescape())
// Prints ”<script>alert(\"abc\")</script>"
And for OP's example:
对于 OP 的示例:
print("The Weeknd ‘King Of The Fall’ [Video Premiere] | @TheWeeknd | #SoPhi ".htmlUnescape())
// prints "The Weeknd ‘King Of The Fall' [Video Premiere] | @TheWeeknd | #SoPhi "
Edit: HTMLEntitiesnow supports HTML5 named character references as of version 2.0.0. Spec-compliant parsing is also implemented.
编辑:HTMLEntities从 2.0.0 版开始,现在支持 HTML5 命名字符引用。还实现了符合规范的解析。
回答by Naishta
Swift 4:
斯威夫特 4:
The total solution that finally worked for me with HTML code and newline characters and single quotes
最终使用 HTML 代码、换行符和单引号对我有用的整体解决方案
extension String {
var htmlDecoded: String {
let decoded = try? NSAttributedString(data: Data(utf8), options: [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
], documentAttributes: nil).string
return decoded ?? self
}
}
Usage:
用法:
let yourStringEncoded = yourStringWithHtmlcode.htmlDecoded
I then had to apply some more filters to get rid of single quotes(for example, don't, hasn't, It's, etc.), and new line characters like \n:
然后我不得不应用更多的过滤器来去除单引号(例如,不要、没有、它是等)和换行符,如\n:
var yourNewString = String(yourStringEncoded.filter { !"\n\t\r".contains(extension String {
func htmlDecoded()->String {
guard (self != "") else { return self }
var newStr = self
let entities = [
""" : "\"",
"&" : "&",
"'" : "'",
"<" : "<",
">" : ">",
]
for (name,value) in entities {
newStr = newStr.stringByReplacingOccurrencesOfString(name, withString: value)
}
return newStr
}
}
) })
yourNewString = yourNewString.replacingOccurrences(of: "\'", with: "", options: NSString.CompareOptions.literal, range: nil)
回答by Bseaborn
This would be my approach. You could add the entities dictionary from https://gist.github.com/mwaterfall/25b4a6a06dc3309d9555Michael Waterfall mentions.
这将是我的方法。您可以添加来自https://gist.github.com/mwaterfall/25b4a6a06dc3309d9555Michael Waterfall 提及的实体字典。
let encoded = "this is so "good""
let decoded = encoded.htmlDecoded() // "this is so "good""
Examples used:
使用的例子:
let encoded = "this is so "good"".htmlDecoded() // "this is so "good""
OR
或者
##代码##
