javascript 如何检查字符串是否完全由相同的子字符串组成?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/55823298/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I check if a string is entirely made of the same substring?
提问by Maheer Ali
I have to create a function which takes a string, and it should return trueor falsebased on whether the input consists of a repeated character sequence. The length of the given string is always greater than 1and the character sequence must have at least one repetition.
我必须创建一个接受字符串的函数,它应该返回true或false基于输入是否包含重复的字符序列。给定字符串的长度总是大于1并且字符序列必须至少有一次重复。
"aa" // true(entirely contains two strings "a")
"aaa" //true(entirely contains three string "a")
"abcabcabc" //true(entirely containas three strings "abc")
"aba" //false(At least there should be two same substrings and nothing more)
"ababa" //false("ab" exists twice but "a" is extra so false)
I have created the below function:
我创建了以下函数:
function check(str){
if(!(str.length && str.length - 1)) return false;
let temp = '';
for(let i = 0;i<=str.length/2;i++){
temp += str[i]
//console.log(str.replace(new RegExp(temp,"g"),''))
if(!str.replace(new RegExp(temp,"g"),'')) return true;
}
return false;
}
console.log(check('aa')) //true
console.log(check('aaa')) //true
console.log(check('abcabcabc')) //true
console.log(check('aba')) //false
console.log(check('ababa')) //false
Checking of this is part of the real problem. I can't afford a non-efficient solution like this. First of all, it's looping through half of the string.
检查这一点是真正问题的一部分。我负担不起这样的低效解决方案。首先,它循环遍历字符串的一半。
The second problem is that it is using replace()in each loop which makes it slow. Is there a better solution regarding performance?
第二个问题是它replace()在每个循环中使用,这使它变慢。有没有更好的性能解决方案?
回答by templatetypedef
There's a nifty little theorem about strings like these.
关于像这样的字符串有一个漂亮的小定理。
A string consists of the same pattern repeated multiple times if and only if the string is a nontrivial rotation of itself.
一个字符串由多次重复的相同模式组成,当且仅当该字符串是其自身的非平凡旋转。
Here, a rotation means deleting some number of characters from the front of the string and moving them to the back. For example, the string hellocould be rotated to form any of these strings:
在这里,旋转意味着从字符串的前面删除一些字符并将它们移到后面。例如,hello可以旋转字符串以形成以下任何字符串:
hello (the trivial rotation)
elloh
llohe
lohel
ohell
To see why this works, first, assume that a string consists of k repeated copies of a string w. Then deleting the first copy of the repeated pattern (w) from the front of the string and tacking it onto the back will give back the same string. The reverse direction is a bit trickier to prove, but the idea is that if you rotate a string and get back what you started with, you can apply that rotation repeatedly to tile the string with multiple copies of the same pattern (that pattern being the string you needed to move to the end to do the rotation).
要了解为什么会这样,首先,假设一个字符串由字符串 w 的 k 个重复副本组成。然后从字符串的前面删除重复模式 (w) 的第一个副本并将其添加到后面将返回相同的字符串。证明相反的方向有点棘手,但这个想法是,如果你旋转一个字符串并回到你开始的地方,你可以重复应用该旋转来用相同模式的多个副本平铺字符串(该模式是您需要移动到最后进行旋转的字符串)。
Now the question is how to check whether this is the case. For that, there's another beautiful theorem we can use:
现在的问题是如何检查是否是这种情况。为此,我们可以使用另一个美丽的定理:
If x and y are strings of the same length, then x is a rotation of y if and only if x is a substring of yy.
如果 x 和 y 是相同长度的字符串,则 x 是 y 的旋转当且仅当 x 是 yy 的子串。
As an example, we can see that lohelis a rotation of helloas follows:
作为一个例子,我们可以看到这lohel是一个hello如下旋转:
hellohello
^^^^^
In our case, we know that every string x will always be a substring of xx (it'll appear twice, once at each copy of x). So basically we just need to check if our string x is a substring of xx without allowing it to match at the first or halfway character. Here's a one-liner for that:
在我们的例子中,我们知道每个字符串 x 将始终是 xx 的子字符串(它会出现两次,在 x 的每个副本上出现一次)。所以基本上我们只需要检查我们的字符串 x 是否是 xx 的子字符串,而不允许它在第一个或中间字符处匹配。这是一个单线:
function check(str) {
return (str + str).indexOf(str, 1) !== str.length;
}
Assuming indexOfis implemented using a fast string matching algorithm, this will run in time O(n), where n is the length of the input string.
假设indexOf使用快速字符串匹配算法实现,这将在 O(n) 时间内运行,其中 n 是输入字符串的长度。
Hope this helps!
希望这可以帮助!
回答by Pranav C Balan
You can do it by a capturing groupand backreference. Just check it's the repetition of the first captured value.
您可以通过捕获组和反向引用来完成。只需检查它是第一个捕获值的重复。
function check(str) {
return /^(.+)+$/.test(str)
}
console.log(check('aa')) //true
console.log(check('aaa')) //true
console.log(check('abcabcabc')) //true
console.log(check('aba')) //false
console.log(check('ababa')) //false
In the above RegExp:
在上面的正则表达式中:
^and$stands for start and end anchorsto predict the position.(.+)captures any pattern and captures the value(except\n).\1is backreference of first captured value and\1+would check for repetition of captured value.
^并$代表开始和结束锚点来预测位置。(.+)捕获任何模式并捕获值(除了\n)。\1是第一个捕获值的反向引用,\1+将检查捕获值的重复。
For RegExp debugging use: https://regex101.com/r/pqlAuP/1/debugger
对于 RegExp 调试使用:https: //regex101.com/r/pqlAuP/1/debugger
Performance : https://jsperf.com/reegx-and-loop/13
回答by MBo
Perhaps the fastest algorithmic approach is building a Z-functionin linear time:
也许最快的算法方法是在线性时间内构建Z 函数:
The Z-function for this string is an array of length n where the i-th element is equal to the greatest number of characters starting from the position i that coincide with the first characters of s.
In other words, z[i] is the length of the longest common prefix between s and the suffix of s starting at i.
该字符串的 Z 函数是一个长度为 n 的数组,其中第 i 个元素等于从与 s 的第一个字符重合的位置 i 开始的最大字符数。
换句话说,z[i] 是 s 和从 i 开始的 s 后缀之间最长公共前缀的长度。
C++ implementation for reference:
C++ 实现供参考:
vector<int> z_function(string s) {
int n = (int) s.length();
vector<int> z(n);
for (int i = 1, l = 0, r = 0; i < n; ++i) {
if (i <= r)
z[i] = min (r - i + 1, z[i - l]);
while (i + z[i] < n && s[z[i]] == s[i + z[i]])
++z[i];
if (i + z[i] - 1 > r)
l = i, r = i + z[i] - 1;
}
return z;
}
JavaScript implementation
Added optimizations - building a half of z-array and early exit
JavaScript 实现
添加优化 - 构建一半 z-array 并提前退出
function z_function(s) {
var n = s.length;
var z = Array(n).fill(0);
var i, l, r;
//for our task we need only a half of z-array
for (i = 1, l = 0, r = 0; i <= n/2; ++i) {
if (i <= r)
z[i] = Math.min(r - i + 1, z[i - l]);
while (i + z[i] < n && s[z[i]] == s[i + z[i]])
++z[i];
//we can check condition and return here
if (z[i] + i === n && n % i === 0) return true;
if (i + z[i] - 1 > r)
l = i, r = i + z[i] - 1;
}
return false;
//return z.some((zi, i) => (i + zi) === n && n % i === 0);
}
console.log(z_function("abacabacabac"));
console.log(z_function("abcab"));
Then you need to check indexes ithat divide n. If you find such ithat i+z[i]=nthen the string scan be compressed to the length iand you can return true.
然后你需要检查i除 n 的索引。如果你发现这样i,i+z[i]=n那么字符串s可以被压缩到长度i,你可以返回true.
For example, for
例如,对于
string s= 'abacabacabac' with length n=12`
z-array is
z数组是
(0, 0, 1, 0, 8, 0, 1, 0, 4, 0, 1, 0)
and we can find that for
我们可以发现对于
i=4
i+z[i] = 4 + 8 = 12 = n
and
n % i = 12 % 4 = 0`
so smight be represented as substring of length 4 repeated three times.
sos可能表示为重复 3 次的长度为 4 的子串。
回答by user42723
I read the answer of gnasher729 and implemented it. The idea is that if there are any repetitions, then there must be (also) a prime number of repetitions.
我阅读了 gnasher729 的答案并实现了它。这个想法是,如果有任何重复,那么必须(也)有质数的重复。
function* primeFactors (n) {
for (var k = 2; k*k <= n; k++) {
if (n % k == 0) {
yield k
do {n /= k} while (n % k == 0)
}
}
if (n > 1) yield n
}
function check (str) {
var n = str.length
primeloop:
for (var p of primeFactors(n)) {
var l = n/p
var s = str.substring(0, l)
for (var j=1; j<p; j++) {
if (s != str.substring(l*j, l*(j+1))) continue primeloop
}
return true
}
return false
}
A slightly different algorithm is this:
一个稍微不同的算法是这样的:
function check (str) {
var n = str.length
for (var p of primeFactors(n)) {
var l = n/p
if (str.substring(0, n-l) == str.substring(l)) return true
}
return false
}
I've updated the jsPerf pagethat contains the algorithms used on this page.
我更新了jsPerf 页面,其中包含此页面上使用的算法。
回答by gnasher729
Assume the string S has length N and is made of duplicates of the substring s, then the length of s divides N. For example, if S has length 15, then the substring has length 1, 3, or 5.
假设字符串 S 的长度为 N 并且由子字符串 s 的副本组成,那么 s 的长度除以 N。例如,如果 S 的长度为 15,则子字符串的长度为 1、3 或 5。
Let S be made of (p*q) copies of s. Then S is also made of p copies of (s, repeated q times). We have therefore two cases: If N is prime or 1, then S can only be made of copies of the substring of length 1. If N is composite, then we only need to check substrings s of length N / p for primes p dividing the length of S.
让 S 由 (p*q) 个 s 副本组成。然后 S 也由 (s, 重复 q 次) 的 p 个副本组成。因此我们有两种情况:如果 N 是素数或 1,那么 S 只能由长度为 1 的子串的副本组成。如果 N 是复合的,那么我们只需要检查长度为 N / p 的子串 s 的素数 p 除法S 的长度。
So determine N = the length of S, then find all its prime factors in time O (sqrt (N)). If there is only one factor N, check if S is the same string repeated N times, otherwise for each prime factor p, check if S consists of p repeations of the first N / p characters.
所以确定N=S的长度,然后在时间O(sqrt(N))中找到它的所有质因数。如果只有一个因子 N,则检查 S 是否是重复 N 次的同一个字符串,否则对于每个质因子 p,检查 S 是否由前 N / p 个字符的 p 次重复组成。
回答by Axel Podehl
I think a recursive function might be very fast as well. The first observation is that the maximum repeated pattern length is half as long as the total string. And we could just test all possible repeated pattern lengths: 1, 2, 3, ..., str.length/2
我认为递归函数也可能非常快。第一个观察结果是最大重复模式长度是总字符串长度的一半。我们可以测试所有可能的重复模式长度:1, 2, 3, ..., str.length/2
The recursive function isRepeating(p,str) tests if this pattern is repeated in str.
递归函数 isRepeating(p,str) 测试此模式是否在 str 中重复。
If str is longer than the pattern, the recursion requires the first part (same length as p) to be a repetition as well as the remainder of str. So str is effectively broken up into pieces of length p.length.
如果 str 比模式长,则递归要求第一部分(与 p 长度相同)以及 str 的其余部分重复。所以 str 被有效地分解成长度为 p.length 的片段。
If the tested pattern and str are of equal size, recursion ends here, successfully.
如果测试的模式和 str 大小相等,递归到此结束,成功。
If the length is different (happens for "aba" and pattern "ab") or if the pieces are different, then false is returned, propagating up the recursion.
如果长度不同(发生在 "aba" 和模式 "ab" 中)或者如果片段不同,则返回 false,向上传播递归。
function check(str)
{
if( str.length==1 ) return true; // trivial case
for( var i=1;i<=str.length/2;i++ ) { // biggest possible repeated pattern has length/2 characters
if( str.length%i!=0 ) continue; // pattern of size i doesn't fit
var p = str.substring(0, i);
if( isRepeating(p,str) ) return true;
}
return false;
}
function isRepeating(p, str)
{
if( str.length>p.length ) { // maybe more than 2 occurences
var left = str.substring(0,p.length);
var right = str.substring(p.length, str.length);
return left===p && isRepeating(p,right);
}
return str===p;
}
console.log(check('aa')) //true
console.log(check('aaa')) //true
console.log(check('abcabcabc')) //true
console.log(check('aba')) //false
console.log(check('ababa')) //false
Performance: https://jsperf.com/reegx-and-loop/13
回答by JustABeginner
Wrote this in Python. I know it is not the platform, but it did take 30 mins of time. P.S.=> PYTHON
这是用 Python 写的。我知道这不是平台,但确实花了 30 分钟的时间。PS=> 蟒蛇
def checkString(string):
gap = 1
index= 0
while index < len(string)/2:
value = [string[i:i+gap] for i in range(0,len(string),gap) ]
x = [string[:gap]==eachVal for eachVal in value]
if all(x):
print("THEY ARE EQUAL")
break
gap = gap+1
index= index+1
checkString("aaeaaeaaeaae")
回答by SunKnight0
My approach is similar to gnasher729, in that it uses the potential length of the substring as the main focus, but it is less math-y and process intensive:
我的方法类似于 gnasher729,因为它使用子串的潜在长度作为主要焦点,但它的数学和过程密集程度较低:
L: Length of original string
L:原始字符串的长度
S: Potential lengths of valid sub-strings
S:有效子串的潜在长度
Loop S from (integer part of) L/2 to 1. If L/S is an integer check your original string against the fist S characters of the original string repeated L/S times.
从 L/2 的(整数部分)到 1 循环 S。如果 L/S 是整数,请检查原始字符串与重复 L/S 次的原始字符串的第一个 S 字符。
The reason for looping from L/2 backwards and not from 1 onwards is to get the largest possible substring. If you want the smallest possible substring loop from 1 to L/2. Example: "abababab" has both "ab" and "abab" as possible substrings. Which of the two would be faster if you only care about a true/false result depends on the type of strings/substrings this will be applied to.
从 L/2 向后循环而不是从 1 开始循环的原因是为了获得最大的可能子串。如果您想要从 1 到 L/2 的最小可能子串循环。示例:“abababab”同时具有“ab”和“abab”作为可能的子字符串。如果您只关心真/假结果,两者中的哪一个会更快取决于这将应用于的字符串/子字符串的类型。
回答by Per Alexandersson
The following Mathematica code almostdetects if the list is repeated at least once. If the string is repeated at least once, it returns true, but it might also return true if the string is a linear combination of repeated strings.
以下 Mathematica 代码几乎可以检测列表是否至少重复一次。如果字符串至少重复一次,则返回 true,但如果字符串是重复字符串的线性组合,则它也可能返回 true。
IsRepeatedQ[list_] := Module[{n = Length@list},
Round@N@Sum[list[[i]] Exp[2 Pi I i/n], {i, n}] == 0
];
This code looks for the "full-length" contribution, which must be zero in a repeating string, but the string accbbdis also considered repeated,
as it is a sum of the two repeated strings abababand 012012.
此代码查找“全长”贡献,它在重复字符串中必须为零,但该字符串accbbd也被视为重复,因为它是两个重复字符串ababab和的总和012012。
The idea is to use Fast Fourier Transform, and look for the frequency spectra. By looking at other frequencies, one should be able to detect this strange scenario as well.
这个想法是使用快速傅立叶变换,并寻找频谱。通过查看其他频率,人们也应该能够检测到这种奇怪的情况。
回答by Austin Mullins
The basic idea here is to examine any potential substring, beginning at length 1 and stopping at half of the original string's length. We only look at substring lengths that divide the original string length evenly (i.e. str.length % substring.length == 0).
这里的基本思想是检查任何潜在的子字符串,从长度 1 开始,到原始字符串长度的一半停止。我们只看平均划分原始字符串长度的子字符串长度(即 str.length % substring.length == 0)。
This implementation looks at the first character of each possible substring iteration before moving to the second character, which might save time if the substrings are expected to be long. If no mismatch is found after examining the entire substring, then we return true.
此实现在移动到第二个字符之前查看每个可能的子字符串迭代的第一个字符,如果子字符串预计很长,这可能会节省时间。如果在检查整个子字符串后没有发现不匹配,那么我们返回 true。
We return false when we run out of potential substrings to check.
当我们用完要检查的潜在子字符串时,我们返回 false。
function check(str) {
const len = str.length;
for (let subl = 1; subl <= len/2; ++subl) {
if ((len % subl != 0) || str[0] != str[subl])
continue;
let i = 1;
for (; i < subl; ++i)
{
let j = 0;
for (; j < len; j += subl)
if (str[i] != str[j + i])
break;
if (j != len)
break;
}
if (i == subl)
return true;
}
return false;
}
console.log(check('aa')) //true
console.log(check('aaa')) //true
console.log(check('abcabcabc')) //true
console.log(check('aba')) //false
console.log(check('ababa')) //false

