string MATLAB:字符串元胞数组的比较

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3231580/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 00:48:07  来源:igfitidea点击:

MATLAB: comparison of cell arrays of string

matlabstringcomparisonvectorizationcell-array

提问by Dave

I have two cell arrays of strings, and I want to check if they contain the same strings (they do not have to be in the same order, nor do we know if they are of the same lengths).

我有两个字符串元胞数组,我想检查它们是否包含相同的字符串(它们不必具有相同的顺序,我们也不知道它们是否具有相同的长度)。

For example:

例如:

a = {'2' '4' '1' '3'};
b = {'1' '2' '4' '3'};

or

或者

a = {'2' '4' '1' '3' '5'};
b = {'1' '2' '4' '3'};

First I thought of strcmpbut it would require looping over one cell contents and compare against the other. I also considered ismemberby using something like:

首先我想到了strcmp但它需要循环一个单元格内容并与另一个进行比较。我还考虑ismember过使用类似的东西:

ismember(a,b) & ismember(b,a)

but then we don't know in advance that they are of the same length (obvious case of unequal). So how would you perform this comparison in the most efficient way without writing too many cases of if/else.

但是我们事先不知道它们的长度相同(明显的不相等的情况)。那么如何在不编写太多 if/else 案例的情况下以最有效的方式执行此比较。

回答by gnovice

You could use the function SETXOR, which will return the values that are not in the intersection of the two cell arrays. If it returns an empty array, then the two cell arrays contain the same values:

您可以使用函数SETXOR,该函数将返回不在两个元胞数组交集中的值。如果它返回一个空数组,则两个元胞数组包含相同的值:

arraysAreEqual = isempty(setxor(a,b));





EDIT: Some performance measures...

编辑:一些性能指标...

Since you were curious about performance measures, I thought I'd test the speed of my solution against the two solutions listed by Amro(which use ISMEMBERand STRCMP/CELLFUN). I first created two large cell arrays:

由于您对性能指标感到好奇,我想我会针对Amro列出的两个解决方案(使用ISMEMBERSTRCMP/ CELLFUN)测试我的解决方案的速度。我首先创建了两个大型元胞数组:

a = cellstr(num2str((1:10000).'));  %'# A cell array with 10,000 strings
b = cellstr(num2str((1:10001).'));  %'# A cell array with 10,001 strings

Next, I ran each solution 100 times over to get a mean execution time. Then, I swapped aand band reran it. Here are the results:

接下来,我将每个解决方案运行 100 次以获得平均执行时间。于是,我换a,并b和重新运行它。结果如下:

    Method     |      Time     |  a and b swapped
---------------+---------------+------------------
Using SETXOR   |   0.0549 sec  |    0.0578 sec
Using ISMEMBER |   0.0856 sec  |    0.0426 sec
Using STRCMP   |       too long to bother ;)

Notice that the SETXORsolution has consistently fast timing. The ISMEMBERsolution will actually run slightly faster if ahas elements that are not in b. This is due to the short-circuit &&which skips the second half of the calculation (because we already know aand bdo not contain the same values). However, if all of the values in aare also in b, the ISMEMBERsolution is significantly slower.

请注意,SETXOR解决方案具有始终如一的快速计时。该ISMEMBER解决方案实际上将稍快运行,如果a有不在元素b。这是由于短路&&跳过了计算的后半部分(因为我们已经知道a并且b不包含相同的值)。但是,如果 中的所有值a也都在 中b,则ISMEMBER解决方案的速度要慢得多。

回答by Amro

You can still use ISMEMBER function like you did with a small modification:

您仍然可以像使用小修改一样使用 ISMEMBER 函数:

arraysAreEqual = all(ismember(a,b)) && all(ismember(b,a))

Also, you can write the loop version with STRCMP as one line:

此外,您可以将 STRCMP 的循环版本写为一行:

arraysAreEqual = all( cellfun(@(s)any(strcmp(s,b)), a) )


EDIT:I'm adding a third solution adapted from another SO question:

编辑:我正在添加改编自另一个SO 问题的第三个解决方案:

g = grp2idx([a;b]);
v = all( unique(g(1:numel(a))) == unique(g(numel(a)+1:end)) );

In the same spirit, Im performed the time comparison (using the TIMEITfunction):

本着同样的精神,我进行了时间比较(使用TIMEIT函数):

function perfTests()
    a = cellstr( num2str((1:10000)') );            %#' fix SO highlighting
    b = a( randperm(length(a)) );

    timeit( @() func1(a,b) )
    timeit( @() func2(a,b) )
    timeit( @() func3(a,b) )
    timeit( @() func4(a,b) )
end

function v = func1(a,b)
    v = isempty(setxor(a,b));                      %# @gnovice answer
end

function v = func2(a,b)
    v = all(ismember(a,b)) && all(ismember(b,a));
end

function v = func3(a,b)
    v = all( cellfun(@(s)any(strcmp(s,b)), a) );
end

function v = func4(a,b)
    g = grp2idx([a;b]);
    v = all( unique(g(1:numel(a))) == unique(g(numel(a)+1:end)) );
end

and the results in the same order of functions (lower is better):

和函数顺序相同的结果(越低越好):

ans =
     0.032527
ans =
     0.055853
ans =
       8.6431
ans =
     0.022362

回答by Mikhail

Take a look at the function intersect

看看功能 intersect

What MATLAB Help says:

什么 MATLAB 帮助说:

[c, ia, ib] = intersect(a, b)also returns column index vectors iaand ibsuch that c = a(ia)and b(ib)(or c =a(ia,:)and b(ib,:)).

[c, ia, ib] = intersect(a, b)还返回列索引向量iaib这样c = a(ia)b(ib)(或c =a(ia,:)b(ib,:)).