Java 相当于产生相同输出的 JavaScript 的 encodeURIComponent 吗?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/607176/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Java equivalent to JavaScript's encodeURIComponent that produces identical output?
提问by John Topley
I've been experimenting with various bits of Java code trying to come up with something that will encode a string containing quotes, spaces and "exotic" Unicode characters and produce output that's identical to JavaScript's encodeURIComponentfunction.
我一直在尝试各种 Java 代码,试图想出一些东西来编码包含引号、空格和“异国情调”Unicode 字符的字符串,并产生与 JavaScript 的encodeURIComponent函数相同的输出。
My torture test string is: "A" B ± "
我的酷刑测试字符串是:“A”B±“
If I enter the following JavaScript statement in Firebug:
如果我在 Firebug 中输入以下 JavaScript 语句:
encodeURIComponent('"A" B ± "');
—Then I get:
——然后我得到:
"%22A%22%20B%20%C2%B1%20%22"
Here's my little test Java program:
这是我的小测试 Java 程序:
import java.io.UnsupportedEncodingException;
import java.net.URLEncoder;
public class EncodingTest
{
public static void main(String[] args) throws UnsupportedEncodingException
{
String s = "\"A\" B ± \"";
System.out.println("URLEncoder.encode returns "
+ URLEncoder.encode(s, "UTF-8"));
System.out.println("getBytes returns "
+ new String(s.getBytes("UTF-8"), "ISO-8859-1"));
}
}
—This program outputs:
——这个程序输出:
URLEncoder.encode returns %22A%22+B+%C2%B1+%22 getBytes returns "A" B ± "
Close, but no cigar! What is the best way of encoding a UTF-8 string using Java so that it produces the same output as JavaScript's encodeURIComponent
?
关闭,但没有雪茄!使用 Java 编码 UTF-8 字符串以使其产生与 JavaScript 相同的输出的最佳方法是什么encodeURIComponent
?
EDIT:I'm using Java 1.4 moving to Java 5 shortly.
编辑:我正在使用 Java 1.4,很快就会迁移到 Java 5。
采纳答案by Tomalak
Looking at the implementation differences, I see that:
查看实现差异,我看到:
- literal characters (regex representation):
[-a-zA-Z0-9._*~'()!]
- 文字字符(正则表达式):
[-a-zA-Z0-9._*~'()!]
Java 1.5.0 documentation on URLEncoder
:
- literal characters (regex representation):
[-a-zA-Z0-9._*]
- the space character
" "
is converted into a plus sign"+"
.
- 文字字符(正则表达式):
[-a-zA-Z0-9._*]
- 空格字符
" "
转换为加号"+"
。
So basically, to get the desired result, use URLEncoder.encode(s, "UTF-8")
and then do some post-processing:
所以基本上,要获得所需的结果,请使用URLEncoder.encode(s, "UTF-8")
然后进行一些后处理:
- replace all occurrences of
"+"
with"%20"
- replace all occurrences of
"%xx"
representing any of[~'()!]
back to their literal counter-parts
- 替换所有出现的
"+"
with"%20"
- 将所有出现的
"%xx"
表示任何[~'()!]
回替换为它们的文字对应部分
回答by Ravi Wallau
Using the javascript engine that is shipped with Java 6:
使用 Java 6 附带的 javascript 引擎:
import javax.script.ScriptEngine;
import javax.script.ScriptEngineManager;
public class Wow
{
public static void main(String[] args) throws Exception
{
ScriptEngineManager factory = new ScriptEngineManager();
ScriptEngine engine = factory.getEngineByName("JavaScript");
engine.eval("print(encodeURIComponent('\"A\" B ± \"'))");
}
}
Output: %22A%22%20B%20%c2%b1%20%22
输出:%22A%22%20B%20%c2%b1%20%22
The case is different but it's closer to what you want.
情况不同,但更接近您想要的。
回答by John Topley
This is the class I came up with in the end:
这是我最后想出的类:
import java.io.UnsupportedEncodingException;
import java.net.URLDecoder;
import java.net.URLEncoder;
/**
* Utility class for JavaScript compatible UTF-8 encoding and decoding.
*
* @see http://stackoverflow.com/questions/607176/java-equivalent-to-javascripts-encodeuricomponent-that-produces-identical-output
* @author John Topley
*/
public class EncodingUtil
{
/**
* Decodes the passed UTF-8 String using an algorithm that's compatible with
* JavaScript's <code>decodeURIComponent</code> function. Returns
* <code>null</code> if the String is <code>null</code>.
*
* @param s The UTF-8 encoded String to be decoded
* @return the decoded String
*/
public static String decodeURIComponent(String s)
{
if (s == null)
{
return null;
}
String result = null;
try
{
result = URLDecoder.decode(s, "UTF-8");
}
// This exception should never occur.
catch (UnsupportedEncodingException e)
{
result = s;
}
return result;
}
/**
* Encodes the passed String as UTF-8 using an algorithm that's compatible
* with JavaScript's <code>encodeURIComponent</code> function. Returns
* <code>null</code> if the String is <code>null</code>.
*
* @param s The String to be encoded
* @return the encoded String
*/
public static String encodeURIComponent(String s)
{
String result = null;
try
{
result = URLEncoder.encode(s, "UTF-8")
.replaceAll("\+", "%20")
.replaceAll("\%21", "!")
.replaceAll("\%27", "'")
.replaceAll("\%28", "(")
.replaceAll("\%29", ")")
.replaceAll("\%7E", "~");
}
// This exception should never occur.
catch (UnsupportedEncodingException e)
{
result = s;
}
return result;
}
/**
* Private constructor to prevent this class from being instantiated.
*/
private EncodingUtil()
{
super();
}
}
回答by sangupta
I came up with another implementation documented at, http://blog.sangupta.com/2010/05/encodeuricomponent-and.html. The implementation can also handle Unicode bytes.
我想出了另一个记录在http://blog.sangupta.com/2010/05/encodeuricomponent-and.html 的实现。该实现还可以处理 Unicode 字节。
回答by Joe Mill
I came up with my own version of the encodeURIComponent, because the posted solution has one problem, if there was a + present in the String, which should be encoded, it will converted to a space.
我想出了我自己的encodeURIComponent版本,因为发布的解决方案有一个问题,如果字符串中存在一个+,应该编码,它将转换为一个空格。
So here is my class:
所以这是我的课:
import java.io.UnsupportedEncodingException;
import java.util.BitSet;
public final class EscapeUtils
{
/** used for the encodeURIComponent function */
private static final BitSet dontNeedEncoding;
static
{
dontNeedEncoding = new BitSet(256);
// a-z
for (int i = 97; i <= 122; ++i)
{
dontNeedEncoding.set(i);
}
// A-Z
for (int i = 65; i <= 90; ++i)
{
dontNeedEncoding.set(i);
}
// 0-9
for (int i = 48; i <= 57; ++i)
{
dontNeedEncoding.set(i);
}
// '()*
for (int i = 39; i <= 42; ++i)
{
dontNeedEncoding.set(i);
}
dontNeedEncoding.set(33); // !
dontNeedEncoding.set(45); // -
dontNeedEncoding.set(46); // .
dontNeedEncoding.set(95); // _
dontNeedEncoding.set(126); // ~
}
/**
* A Utility class should not be instantiated.
*/
private EscapeUtils()
{
}
/**
* Escapes all characters except the following: alphabetic, decimal digits, - _ . ! ~ * ' ( )
*
* @param input
* A component of a URI
* @return the escaped URI component
*/
public static String encodeURIComponent(String input)
{
if (input == null)
{
return input;
}
StringBuilder filtered = new StringBuilder(input.length());
char c;
for (int i = 0; i < input.length(); ++i)
{
c = input.charAt(i);
if (dontNeedEncoding.get(c))
{
filtered.append(c);
}
else
{
final byte[] b = charToBytesUTF(c);
for (int j = 0; j < b.length; ++j)
{
filtered.append('%');
filtered.append("0123456789ABCDEF".charAt(b[j] >> 4 & 0xF));
filtered.append("0123456789ABCDEF".charAt(b[j] & 0xF));
}
}
}
return filtered.toString();
}
private static byte[] charToBytesUTF(char c)
{
try
{
return new String(new char[] { c }).getBytes("UTF-8");
}
catch (UnsupportedEncodingException e)
{
return new byte[] { (byte) c };
}
}
}
回答by honzajde
I have found PercentEscaper class from google-http-java-client library, that can be used to implement encodeURIComponent quite easily.
我从 google-http-java-client 库中找到了 PercentEscaper 类,它可以很容易地用于实现 encodeURIComponent。
PercentEscaper from google-http-java-client javadocgoogle-http-java-client home
来自 google-http-java-client javadoc google-http-java-client 主页的PercentEscaper
回答by Mike Bryant
I have successfully used the java.net.URI class like so:
我已经成功地使用了 java.net.URI 类,如下所示:
public static String uriEncode(String string) {
String result = string;
if (null != string) {
try {
String scheme = null;
String ssp = string;
int es = string.indexOf(':');
if (es > 0) {
scheme = string.substring(0, es);
ssp = string.substring(es + 1);
}
result = (new URI(scheme, ssp, null)).toString();
} catch (URISyntaxException usex) {
// ignore and use string that has syntax error
}
}
return result;
}
回答by Chris Nitchie
I use java.net.URI#getRawPath()
, e.g.
我使用java.net.URI#getRawPath()
,例如
String s = "a+b c.html";
String fixed = new URI(null, null, s, null).getRawPath();
The value of fixed
will be a+b%20c.html
, which is what you want.
fixed
will的值是a+b%20c.html
,这就是你想要的。
Post-processing the output of URLEncoder.encode()
will obliterate any pluses that are supposedto be in the URI. For example
对 的输出进行后处理URLEncoder.encode()
将消除任何应该在 URI 中的加号。例如
URLEncoder.encode("a+b c.html").replaceAll("\+", "%20");
will give you a%20b%20c.html
, which will be interpreted as a b c.html
.
会给你a%20b%20c.html
,这将被解释为a b c.html
.
回答by Aliaksei Nikuliak
Guava library has PercentEscaper:
Guava 库有 PercentEscaper:
Escaper percentEscaper = new PercentEscaper("-_.*", false);
Escaper percentEscaper = new PercentEscaper("-_.*", false);
"-_.*" are safe characters
“-_.*”是安全字符
false says PercentEscaper to escape space with '%20', not '+'
false 表示 PercentEscaper 使用“%20”而不是“+”来转义空间
回答by silver
This is a straightforward example Ravi Wallau's solution:
这是 Ravi Wallau 解决方案的一个简单示例:
public String buildSafeURL(String partialURL, String documentName)
throws ScriptException {
ScriptEngineManager scriptEngineManager = new ScriptEngineManager();
ScriptEngine scriptEngine = scriptEngineManager
.getEngineByName("JavaScript");
String urlSafeDocumentName = String.valueOf(scriptEngine
.eval("encodeURIComponent('" + documentName + "')"));
String safeURL = partialURL + urlSafeDocumentName;
return safeURL;
}
public static void main(String[] args) {
EncodeURIComponentDemo demo = new EncodeURIComponentDemo();
String partialURL = "https://www.website.com/document/";
String documentName = "Tom & Jerry Manuscript.pdf";
try {
System.out.println(demo.buildSafeURL(partialURL, documentName));
} catch (ScriptException se) {
se.printStackTrace();
}
}
Output:https://www.website.com/document/Tom%20%26%20Jerry%20Manuscript.pdf
输出:https://www.website.com/document/Tom%20%26%20Jerry%20Manuscript.pdf
It also answers the hanging question in the comments by Loren Shqipognja on how to pass a String variable to encodeURIComponent()
. The method scriptEngine.eval()
returns an Object
, so it can converted to String via String.valueOf()
among other methods.
它还回答了 Loren Shqipognja 关于如何将 String 变量传递给encodeURIComponent()
. 该方法scriptEngine.eval()
返回一个Object
,因此它可以通过String.valueOf()
其他方法转换为字符串。