如何在 Java 中生成共享相同哈希码的字符串?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/12925988/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to generate strings that share the same hashcode in Java?
提问by StarPinkER
An existing system written in Java uses the hashcode of a string as its routing strategy for load balancing.
一个用 Java 编写的现有系统使用字符串的哈希码作为其负载平衡的路由策略。
Now, I cannot modify the systembut need to generate strings that share the same hashcode to test the worst condition.
现在,我无法修改系统,但需要生成共享相同哈希码的字符串来测试最坏的情况。
I provide those strings from commandline and hope the system will route all these strings into the same destination.
我从命令行提供这些字符串,并希望系统将所有这些字符串路由到同一个目的地。
Is it possible to generate a large numbers of strings that share the same hashcode?
是否可以生成大量共享相同哈希码的字符串?
To make this question clear:
为了清楚这个问题:
String[] getStringsInSameHashCode(int number){
//return an array in length "number"
//Every element of the array share the same hashcode.
//The element should be different from each other
}
Remarks: Any hashCode value is acceptable. There is no constraint on what the string is. But they should be different from each other.
备注:任何 hashCode 值都是可以接受的。对字符串是什么没有限制。但它们应该彼此不同。
EDIT: Override method of String class is not acceptable because I feed those string from command line.
编辑:String 类的 Override 方法是不可接受的,因为我从命令行提供这些字符串。
Instrumentation is also not acceptable because that will make some impacts on the system.
仪表也是不可接受的,因为这会对系统产生一些影响。
回答by hetaoblog
see a test method, basically, so long as you match, a1*31+b1 = a2*31 +b2, which means (a1-a2)*31=b2-b1
看一个测试方法,基本上,只要你匹配,a1*31+b1 = a2*31 +b2,即(a1-a2)*31=b2-b1
public void testHash()
{
System.out.println("A:" + ((int)'A'));
System.out.println("B:" + ((int)'B'));
System.out.println("a:" + ((int)'a'));
System.out.println(hash("Aa".hashCode()));
System.out.println(hash("BB".hashCode()));
System.out.println(hash("Aa".hashCode()));
System.out.println(hash("BB".hashCode()));
System.out.println(hash("AaAa".hashCode()));
System.out.println(hash("BBBB".hashCode()));
System.out.println(hash("AaBB".hashCode()));
System.out.println(hash("BBAa".hashCode()));
}
you will get
你会得到
A:65
B:66
a:97
2260
2260
2260
2260
2019172
2019172
2019172
2019172
edit: someone said this is not straightforward enough. I added below part
编辑:有人说这不够直接。我在下面添加了部分
@Test
public void testN() throws Exception {
List<String> l = HashCUtil.generateN(3);
for(int i = 0; i < l.size(); ++i){
System.out.println(l.get(i) + "---" + l.get(i).hashCode());
}
}
AaAaAa---1952508096
AaAaBB---1952508096
AaBBAa---1952508096
AaBBBB---1952508096
BBAaAa---1952508096
BBAaBB---1952508096
BBBBAa---1952508096
BBBBBB---1952508096
below is the source code, it might be not efficient, but it work:
下面是源代码,它可能效率不高,但它有效:
public class HashCUtil {
private static String[] base = new String[] {"Aa", "BB"};
public static List<String> generateN(int n)
{
if(n <= 0)
{
return null;
}
List<String> list = generateOne(null);
for(int i = 1; i < n; ++i)
{
list = generateOne(list);
}
return list;
}
public static List<String> generateOne(List<String> strList)
{
if((null == strList) || (0 == strList.size()))
{
strList = new ArrayList<String>();
for(int i = 0; i < base.length; ++i)
{
strList.add(base[i]);
}
return strList;
}
List<String> result = new ArrayList<String>();
for(int i = 0; i < base.length; ++i)
{
for(String str: strList)
{
result.add(base[i] + str);
}
}
return result;
}
}
look at String.hashCode()
看看 String.hashCode()
public int hashCode() {
int h = hash;
if (h == 0) {
int off = offset;
char val[] = value;
int len = count;
for (int i = 0; i < len; i++) {
h = 31*h + val[off++];
}
hash = h;
}
return h;
}
回答by yelliver
I think find a equal-hash string from a long string is too hard, it's easy when find equal-hash string of an short string (2 or 3). Look at the equation below. (sorry I cant post image cause me new member)
我认为从长字符串中找到等哈希字符串太难了,找到短字符串(2 或 3)的等哈希字符串很容易。看看下面的等式。(对不起,我不能发布图片,因为我是新成员)
Notice that, "FB" and "Ea" have the same hashcode, and any two strings like s1+"FB"+s2 and s1+"Ea"+s2 will have the same hashcode. So, the easy solution is finding any 2-char substring of existing string and replace with a 2-char substring with the same hashcode
请注意,"FB" 和 "Ea" 具有相同的哈希码,任何两个字符串,如 s1+"FB"+s2 和 s1+"Ea"+s2 将具有相同的哈希码。因此,简单的解决方案是找到现有字符串的任何 2-char 子字符串并替换为具有相同哈希码的 2-char 子字符串
Exmaple, we have the string "helloworld"get 2-char substring "he", hashcode("he") = 'h'*31 + 'e' = ('h'*31 + 31) + ('e' - 31) = ('h'+1)*31 + 'F' = 'i' + 'F' = hashcode("iF") so the desire string is "iFlloworld" we have increased 'h' by 1, we can increase by 2, or 3 etc (but will be wrong if it overflow the char value)
例如,我们让字符串 "helloworld"得到 2 个字符的子字符串 "he",hashcode("he") = 'h'*31 + 'e' = ('h'*31 + 31) + ('e' - 31) = ('h'+1)*31 + 'F' = 'i' + 'F' = hashcode("iF") 所以想要的字符串是 "iFlloworld" 我们已经将 'h' 增加了 1,我们可以增加 2 或 3 等(但如果它溢出 char 值将是错误的)
The below code run well with small level, it will wrong if the level is big, make the char value overflow, I will fix it later if you want (this code change 2 first chars, but I will edit code to 2 last chars because 2 first chars are calc with largest value)
下面的代码在小级别下运行良好,如果级别大会出错,使字符值溢出,如果你愿意,我稍后会修复它(这段代码更改了前2个字符,但我将代码编辑为最后2个字符,因为2 个第一个字符是具有最大值的计算)
public static String samehash(String s, int level) {
if (s.length() < 2)
return s;
String sub2 = s.substring(0, 2);
char c0 = sub2.charAt(0);
char c1 = sub2.charAt(1);
c0 = (char) (c0 + level);
c1 = (char) (c1 - 31 * level);
String newsub2 = new String(new char[] { c0, c1 });
String re = newsub2 + s.substring(2);
return re;
}
回答by Stephen C
I was wondering if there was a "universal" solution; e.g. some constant string XYZ
, such that
我想知道是否有“通用”解决方案;例如一些常量字符串XYZ
,这样
s.hashCode() == (s + XYZ).hashCode()
for any string s
. Finding such a string involves solving a fairly complicated equation ... which was beyond my rusty mathematical ability. But then it dawned on me that h == 31*h + ch
is always true
when h
and ch
are both zero!
对于任何字符串s
。找到这样的字符串需要解决一个相当复杂的方程……这超出了我生疏的数学能力。但后来我想通了这h == 31*h + ch
始终是true
当h
和ch
都为零!
Based on that insight, the following method should create a different String with the same hashcode as its argument:
基于这种见解,以下方法应该创建一个不同的字符串,其哈希码与其参数相同:
public String collider(String s) {
return "ClassPool classPool = new ClassPool(true);
CtClass stringClass = classPool.get("java.lang.String");
CtMethod hashCodeMethod = stringClass.getDeclaredMethod("hashCode", null);
hashCodeMethod.setBody("{return 0;}");
byte[] bytes = stringClass.toBytecode();
ClassDefinition[] classDefinitions = new ClassDefinition[] {new ClassDefinition(String.class, bytes);
instrumentation.redefineClasses(classDefinitions);// this instrumentation can be obtained via Java-agent
" + s;
}
If NUL characters are problematic for you, prepending anystring whose hashcode is zero would work too ... albeit that the colliding strings would be longer than if you used zero.
如果 NUL 字符对您来说有问题,那么预先添加哈希码为零的任何字符串也可以工作......尽管冲突的字符串会比您使用零时长。
回答by Male
You can instrument the java.lang.String class so its method hashCode() will always return the same number.
您可以检测 java.lang.String 类,因此其方法 hashCode() 将始终返回相同的数字。
I suppose Javassist is the easiest way to do such an instrumentation.
我认为 Javassist 是进行此类检测的最简单方法。
In short:
简而言之:
- obtain an instance of java.lang.instrument.Instrumentation by using a Java-agent (see package java.lang.instrument documentationfor details)
- redefine java.lang.String class by using Instrumentation.redefineClasses(ClassDefinition[]) method
- 使用 Java 代理获取 java.lang.instrument.Instrumentation 的实例(详细信息请参阅包 java.lang.instrument 文档)
- 使用 Instrumentation.redefineClasses(ClassDefinition[]) 方法重新定义 java.lang.String 类
The code will look like (roughly):
代码看起来像(大致):
String s = "Some String"
for (int i = 0; i < SOME_VERY_BIG_NUMBER; ++i) {
String copy = new String(s);
// Do something with copy.
}
Also don't forget that agent manifest file must specify Can-Redefine-Classes: true
to be able to use redefineClasses(ClassDefinition[]) method.
另外不要忘记代理清单文件必须指定Can-Redefine-Classes: true
能够使用 redefineClasses(ClassDefinition[]) 方法。
回答by Code-Apprentice
Will this work for you? It just creates a lot of copies of the same String literal that you can then use in your testing.
这对你有用吗?它只是创建了大量相同字符串文字的副本,然后您可以在测试中使用它们。