string 如何找到一个字符串的不同子序列的数量?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5151483/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
how to find the number of distinct subsequences of a string?
提问by IVlad
Here is another spoj problemthat asks how to find the number of distinct subsequences of a string ?
这是另一个spoj 问题,它询问如何找到字符串的不同子序列的数量?
For example,
例如,
Input
AAA
ABCDEFG
CODECRAFTOutput
4
128
496
输入
AAA
ABCDEFG
CODECRAFT输出
4
128
496
How can I solve this problem ?
我怎么解决这个问题 ?
回答by IVlad
It's a classic dynamic programming problem.
这是一个经典的动态规划问题。
Let:
让:
dp[i] = number of distinct subsequences ending with a[i]
sum[i] = dp[1] + dp[2] + ... + dp[i]. So sum[n] will be your answer.
last[i] = last position of character i in the given string.
A null string has one subsequence, so dp[0] = 1
.
一个空字符串有一个子序列,所以dp[0] = 1
.
read a
n = strlen(a)
for i = 1 to n
dp[i] = sum[i - 1] - sum[last[a[i]] - 1]
sum[i] = sum[i - 1] + dp[i]
last[a[i]] = i
return sum[n]
Explanation
解释
dp[i] = sum[i - 1] - sum[last[a[i]] - 1]
Initially, we assume we can append a[i]
to all subsequences ending on previous characters, but this might violate the condition that the counted subsequences need to be distinct. Remember that last[a[i]]
gives us the last position a[i]
appeared on until now. The only subsequences we overcount are those that the previous a[i]
was appended to, so we subtract those.
最初,我们假设我们可以附加a[i]
到所有以先前字符结尾的子序列,但这可能违反了计数的子序列必须是不同的条件。请记住,这last[a[i]]
为我们提供了a[i]
迄今为止出现的最后一个位置。我们高估的唯一子序列是前一个a[i]
附加到的子序列,因此我们减去它们。
sum[i] = sum[i - 1] + dp[i]
last[a[i]] = i
Update these values as per their definition.
根据它们的定义更新这些值。
If your indexing starts from 0, use a[i - 1]
wherever I used a[i]
. Also remember to wrap your computations in a mod
function if you're going to submit code. This should be implemented like this:
如果您的索引从 0 开始,请使用a[i - 1]
我使用过的任何地方a[i]
。mod
如果您要提交代码,请记住将您的计算包装在一个函数中。这应该像这样实现:
mod(x) = (x % m + m) % m
In order to correctly handle negative values in some languages (such as C/C++).
为了正确处理某些语言(如 C/C++)中的负值。
回答by Mostafiz Rahman
There exists an easier solution to this problem.
这个问题有一个更简单的解决方案。
The idea is: If all character of the string are distinct, total number of subsequences is 2^n.
Now, if we find any character that have already occurred before, we should consider its last occurrence only (otherwise sequence won't be distinct). So we have to subtract the number of subsequences due to its previous occurrence.
这个想法是:如果字符串的所有字符都是不同的,则子序列的总数是 2^n.
现在,如果我们找到之前已经出现过的任何字符,我们应该只考虑它的最后一次出现(否则序列将不会是不同的)。所以我们必须减去之前出现的子序列的数量。
My implementation is like this:
我的实现是这样的:
read s
dp[0] = 1
len = strlen(s)
last[s.length()] = {-1} //declaring `last` array with same as length of string `s` and all elements initialized with -1.
for (i = 1; i <= len; i++)
{
dp[i] = (dp[i - 1] * 2)
if (last[s[i]] > 0) dp[i] = (dp[i] - dp[last[s[i]] - 1])
last[s[i]] = i
}
回答by KPMG
Here is my CODE:
这是我的代码:
#include<iostream>
typedef long long ll;
ll fun(std::string s,ll visited[256],ll n,ll L[]){
ll ans=0;
if(n<0){
return 1;
}
//std::cout<<s.substr(0,n+1)<<" "<<n<<endl;
ans=fun(s,visited,n-1,L);
L[n]=ans;
ans=ans*2;
if(visited[int(s[n])]>=0){
ans -= L[visited[int(s[n])]];
}
visited[int(s[n])]=n;
return ans;
}
int main(){
std::string s;
std::cin>>s;
ll n=s.length();
ll visited[256];
ll L[n];
memset(visited,-1,sizeof(visited));
memset(L,-1,sizeof(L));
std::cout<<fun(s,visited,n-1,L);
return 0;
}
Explanation:
说明:
I scan from the back of a string ie- from the last element to the first and therefore send the first n-1
characters for further scanning in the recursion.
我从字符串的后面扫描,即从最后一个元素到第一个元素,因此发送第一个n-1
字符以在递归中进一步扫描。
Once n==-1 or n<0(both are same)
, I reach on the empty string and return 1 because no. of subsequences of an empty string is 1.
一次n==-1 or n<0(both are same)
,我到达空字符串并返回 1 因为没有。空字符串的子序列数为 1。
So, on returning back from recursion, we know that adding the current non-duplicate character to the previous string doubles the no. of subsequences. Doubling happens because now I can add this character at the end of all the previous subsequences. So, with
and without
this character means double of all previous subsequences.
因此,从递归返回时,我们知道将当前非重复字符添加到前一个字符串会使 no 加倍。的子序列。发生加倍是因为现在我可以在所有先前子序列的末尾添加这个字符。因此,with
和without
这个人物的手段加倍以前所有的子序列。
Assuming that the current character is not a duplicate, I multiply the previous no. of subsequences with 2.
假设当前字符不是重复的,我乘以前面的数字。2的子序列。
After the total no. of subsequences of the first n-1
characters has been computed, we double them for the first n
characters.
后总没有。n-1
已经计算了第一个字符的子序列,我们为第一个n
字符将它们加倍。
But, suppose the character currently encountered(nth character) has already been present in the first n-1
characters before(ie - found within the string s[0....n-1] (Note: s[n] is the current character)), then we have to subtract those no. of subsequences possible from up to (excluding) that part of s when the last time this current character was encountered and which has already been computed and stored in L['this particular character'].
但是,假设当前遇到的字符(第 n 个字符)已经出现在n-1
前面的第一个字符中(即 - 在字符串 s[0....n-1] 中找到(注意:s[n] 是当前字符) ),那么我们必须减去那些没有。子序列可能从上次遇到当前字符时(不包括) s 的那部分开始,并且已经计算并存储在 L['这个特定字符'] 中。
ie - BACA
- for the given string, the 4th A
has already been encountered before(while returning from the recursion, we first encounter B
, then A
, then C
and at last A
) and so we deduct the no. of subsequences calculated upto (excluding) the 2nd A
(which is 2 (because no. of subseq. before A
is 2)).
即 - BACA
对于给定的字符串,A
之前已经遇到过第 4 个(从递归返回时,我们首先遇到B
, then A
, thenC
和 at last A
),因此我们扣除了 no。计算到(不包括)第 2 个A
(即 2(因为之前的子序列数A
为 2))的子序列数。
So, every time we have calculated the no. of subsequences for the first n-1
characters, we store them in the array L.
所以,每次我们都计算出没有。对于第一个n-1
字符的子序列,我们将它们存储在数组 L 中。
Notice: L[k] store the no. of subsequences before the kth index.
注意:L[k] 存储编号。第 k 个索引之前的子序列。
I've used the visited array in order to check whether the given character that I'm currently present at has already been scanned through or not.
我使用了visited 数组来检查我当前所在的给定字符是否已经被扫描过。
On encountering the current character, I update the visited array with the position of current position as n
. This need to be done because we have to exclude the duplicate sequences.
在遇到当前字符时,我使用当前位置的位置更新访问过的数组n
。这需要完成,因为我们必须排除重复序列。
Note: visited[]
is initialized with all -1 because the position of any character in the string s
is non-negative (0 based indexing).
注意:visited[]
初始化为全 -1,因为字符串中任何字符的位置s
都是非负的(基于 0 的索引)。
Summary:
总结:
How do you arrive at the number of duplicates? Let's say the last occurrence of current character at i, was at j'th position. Then, we will have duplicate subsequences: consider starting with i'th character and then all subsequences possible from [0,j-1] vs. starting at j'th character and then all subsequences possible from [0,j-1]. So, to eliminate this, you subtract the number of subsequences possible from upto (excluding) j with L[0]=1 mean that upto(excluding 0), no. of subseq are 1(empty string has 1 subsequence).
How do you arrive at the number of duplicates? Let's say the last occurrence of current character at i, was at j'th position. Then, we will have duplicate subsequences: consider starting with i'th character and then all subsequences possible from [0,j-1] vs. starting at j'th character and then all subsequences possible from [0,j-1]. So, to eliminate this, you subtract the number of subsequences possible from upto (excluding) j with L[0]=1 mean that upto(excluding 0), no. of subseq are 1(empty string has 1 subsequence).
回答by pradip
///i get wa
int finding_dist_subs(int len,char data[])
{
dp[0]=1;
for(int i=1;i<len;i++)
{
dp[i]=(dp[i-1]*2+1)%1000000007;
for(int j=i-1;j>=0;j--)
{
if(data[i]==data[j])
{
if(j!=0)
dp[i]=(dp[i]-(dp[j-1])-1)%1000000007;
else dp[i]=(dp[i]-1)%1000000007;
break;
}
}
}
return dp[len-1];
}