java codility 训练基因组范围查询

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19552754/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-12 18:18:00  来源:igfitidea点击:

java codility training Genomic-range-query

javaalgorithm

提问by pshemek

The task is:

任务是:

A non-empty zero-indexed string S is given. String S consists of N characters from the set of upper-case English letters A, C, G, T.

给出了一个非空的零索引字符串 S。字符串 S 由大写英文字母 A、C、G、T 中的 N 个字符组成。

This string actually represents a DNA sequence, and the upper-case letters represent single nucleotides.

这个字符串实际上代表一个DNA序列,大写字母代表单个核苷酸。

You are also given non-empty zero-indexed arrays P and Q consisting of M integers. These arrays represent queries about minimal nucleotides. We represent the letters of string S as integers 1, 2, 3, 4 in arrays P and Q, where A = 1, C = 2, G = 3, T = 4, and we assume that A < C < G < T.

您还将获得由 M 个整数组成的非空零索引数组 P 和 Q。这些数组代表关于最小核苷酸的查询。我们在数组 P 和 Q 中将字符串 S 的字母表示为整数 1、2、3、4,其中 A = 1,C = 2,G = 3,T = 4,我们假设 A < C < G < T .

Query K requires you to find the minimal nucleotide from the range (P[K], Q[K]), 0 ≤ P[i] ≤ Q[i] < N.

查询 K 要求您从 (P[K], Q[K]), 0 ≤ P[i] ≤ Q[i] < N 范围内找到最小核苷酸。

For example, consider string S = GACACCATA and arrays P, Q such that:

例如,考虑字符串 S = GACACCATA 和数组 P, Q 使得:

P[0] = 0    Q[0] = 8
P[1] = 0    Q[1] = 2
P[2] = 4    Q[2] = 5
P[3] = 7    Q[3] = 7

The minimal nucleotides from these ranges are as follows:

这些范围内的最小核苷酸如下:

    (0, 8) is A identified by 1,
    (0, 2) is A identified by 1,
    (4, 5) is C identified by 2,
    (7, 7) is T identified by 4.

Write a function:

写一个函数:

class Solution { public int[] solution(String S, int[] P, int[] Q); } 

that, given a non-empty zero-indexed string S consisting of N characters and two non-empty zero-indexed arrays P and Q consisting of M integers, returns an array consisting of M characters specifying the consecutive answers to all queries.

给定一个由 N 个字符组成的非空零索引字符串 S 和两个由 M 个整数组成的非空零索引数组 P 和 Q,返回一个由 M 个字符组成的数组,指定所有查询的连续答案。

The sequence should be returned as:

该序列应返回为:

    a Results structure (in C), or
    a vector of integers (in C++), or
    a Results record (in Pascal), or
    an array of integers (in any other programming language).

For example, given the string S = GACACCATA and arrays P, Q such that:

例如,给定字符串 S = GACACCATA 和数组 P, Q 使得:

P[0] = 0    Q[0] = 8
P[1] = 0    Q[1] = 2
P[2] = 4    Q[2] = 5
P[3] = 7    Q[3] = 7

the function should return the values [1, 1, 2, 4], as explained above.

如上所述,该函数应返回值 [1, 1, 2, 4]。

Assume that:

假使,假设:

    N is an integer within the range [1..100,000];
    M is an integer within the range [1..50,000];
    each element of array P, Q is an integer within the range [0..N ? 1];
    P[i] ≤ Q[i];
    string S consists only of upper-case English letters A, C, G, T.

Complexity:

复杂:

    expected worst-case time complexity is O(N+M);
    expected worst-case space complexity is O(N), 
         beyond input storage 
         (not counting the storage required for input arguments).

Elements of input arrays can be modified.

可以修改输入数组的元素。

My solution is:

我的解决办法是:

class Solution {
    public int[] solution(String S, int[] P, int[] Q) {
        final  char c[] = S.toCharArray();
        final int answer[] = new int[P.length];
        int tempAnswer;
        char tempC;

        for (int iii = 0; iii < P.length; iii++) {
            tempAnswer = 4;
            for (int zzz = P[iii]; zzz <= Q[iii]; zzz++) {
                tempC = c[zzz];
                if (tempC == 'A') {
                    tempAnswer = 1;
                    break;
                } else if (tempC == 'C') {
                    if (tempAnswer > 2) {
                        tempAnswer = 2;
                    }
                } else if (tempC == 'G') {
                    if (tempAnswer > 3) {
                        tempAnswer = 3;
                    }

                }
            }
            answer[iii] = tempAnswer;
        }

        return answer;
    }
}

It is not optimal, I believe it's supposed to be done within one loop, any hint how can I achieve it?

这不是最优的,我相信它应该在一个循环内完成,任何提示我如何实现它?

You can check quality of your solution here https://codility.com/train/test name is Genomic-range-query.

您可以在此处检查解决方案的质量https://codility.com/train/测试名称是 Genomic-range-query。

采纳答案by codebusta

Here is the solution that got 100 out of 100 in codility.com. Please read about prefix sums to understand the solution:

这是在 codility.com 中获得 100 分的解决方案。请阅读前缀和以了解解决方案:

public static int[] solveGenomicRange(String S, int[] P, int[] Q) {
        //used jagged array to hold the prefix sums of each A, C and G genoms
        //we don't need to get prefix sums of T, you will see why.
        int[][] genoms = new int[3][S.length()+1];
        //if the char is found in the index i, then we set it to be 1 else they are 0
        //3 short values are needed for this reason
        short a, c, g;
        for (int i=0; i<S.length(); i++) {
            a = 0; c = 0; g = 0;
            if ('A' == (S.charAt(i))) {
                a=1;
            }
            if ('C' == (S.charAt(i))) {
                c=1;
            }
            if ('G' == (S.charAt(i))) {
                g=1;
            }
            //here we calculate prefix sums. To learn what's prefix sums look at here https://codility.com/media/train/3-PrefixSums.pdf
            genoms[0][i+1] = genoms[0][i] + a;
            genoms[1][i+1] = genoms[1][i] + c;
            genoms[2][i+1] = genoms[2][i] + g;
        }

        int[] result = new int[P.length];
        //here we go through the provided P[] and Q[] arrays as intervals
        for (int i=0; i<P.length; i++) {
            int fromIndex = P[i];
            //we need to add 1 to Q[i], 
            //because our genoms[0][0], genoms[1][0] and genoms[2][0]
            //have 0 values by default, look above genoms[0][i+1] = genoms[0][i] + a; 
            int toIndex = Q[i]+1;
            if (genoms[0][toIndex] - genoms[0][fromIndex] > 0) {
                result[i] = 1;
            } else if (genoms[1][toIndex] - genoms[1][fromIndex] > 0) {
                result[i] = 2;
            } else if (genoms[2][toIndex] - genoms[2][fromIndex] > 0) {
                result[i] = 3;
            } else {
                result[i] = 4;
            }
        }

        return result;
    }

回答by pshemek

Here is the solution, supposing someone is still interested.

这是解决方案,假设有人仍然感兴趣。

class Solution {
        public int[] solution(String S, int[] P, int[] Q) {
            int[] answer = new int[P.length];
            char[] chars = S.toCharArray();
            int[][] cumulativeAnswers = new int[4][chars.length + 1];

            for (int iii = 0; iii < chars.length; iii++) {
                if (iii > 0) {
                    for (int zzz = 0; zzz < 4; zzz++) {
                        cumulativeAnswers[zzz][iii + 1] = cumulativeAnswers[zzz][iii];
                    }
                }

                switch (chars[iii]) {
                    case 'A':
                        cumulativeAnswers[0][iii + 1]++;
                        break;
                    case 'C':
                        cumulativeAnswers[1][iii + 1]++;
                        break;
                    case 'G':
                        cumulativeAnswers[2][iii + 1]++;
                        break;
                    case 'T':
                        cumulativeAnswers[3][iii + 1]++;
                        break;
                }
            }

            for (int iii = 0; iii < P.length; iii++) {
                for (int zzz = 0; zzz < 4; zzz++) {

                    if ((cumulativeAnswers[zzz][Q[iii] + 1] - cumulativeAnswers[zzz][P[iii]]) > 0) {
                        answer[iii] = zzz + 1;
                        break;
                    }

                }
            }

            return answer;
        }
    }

回答by SrcXform

pshemek's solution constrains itself to the space complexity (O(N)) - even with the 2-d array and the answer array because a constant (4) is used for the 2-d array. That solution also fits in with the computational complexity - whereas mine is O (N^2) - though the actual computational complexity is much lower because it skips over entire ranges that include minimal values.

pshemek 的解决方案将自身限制为空间复杂度 (O(N)) - 即使使用二维数组和答案数组,因为常量 (4) 用于二维数组。该解决方案也符合计算复杂性 - 而我的是 O (N^2) - 尽管实际计算复杂性要低得多,因为它跳过了包括最小值的整个范围。

I gave it a try - but mine ends up using more space - but makes more intuitive sense to me (C#):

我试了一下 - 但我的最终使用了更多空间 - 但对我来说更直观(C#):

public static int[] solution(String S, int[] P, int[] Q)
{
    const int MinValue = 1;
    Dictionary<char, int> stringValueTable = new Dictionary<char,int>(){ {'A', 1}, {'C', 2}, {'G', 3}, {'T', 4} };

    char[] inputArray = S.ToCharArray();
    int[,] minRangeTable = new int[S.Length, S.Length]; // The meaning of this table is [x, y] where x is the start index and y is the end index and the value is the min range - if 0 then it is the min range (whatever that is)
    for (int startIndex = 0; startIndex < S.Length; ++startIndex)
    {
        int currentMinValue = 4;
        int minValueIndex = -1;
        for (int endIndex = startIndex; (endIndex < S.Length) && (minValueIndex == -1); ++endIndex)
        {
            int currentValue = stringValueTable[inputArray[endIndex]];
            if (currentValue < currentMinValue)
            {
                currentMinValue = currentValue;
                if (currentMinValue == MinValue) // We can stop iterating - because anything with this index in its range will always be minimal
                    minValueIndex = endIndex;
                else
                    minRangeTable[startIndex, endIndex] = currentValue;
            }
            else
                minRangeTable[startIndex, endIndex] = currentValue;
        }

        if (minValueIndex != -1) // Skip over this index - since it is minimal
            startIndex = minValueIndex; // We would have a "+ 1" here - but the "auto-increment" in the for statement will get us past this index
    }

    int[] result = new int[P.Length];
    for (int outputIndex = 0; outputIndex < result.Length; ++outputIndex)
    {
        result[outputIndex] = minRangeTable[P[outputIndex], Q[outputIndex]];
        if (result[outputIndex] == 0) // We could avoid this if we initialized our 2-d array with 1's
            result[outputIndex] = 1;
    }

    return result;
}

In pshemek's answer - the "trick" in the second loop is simply that once you've determined you've found a range with the minimal value - you don't need to continue iterating. Not sure if that helps.

在 pshemek 的回答中 - 第二个循环中的“技巧”很简单,一旦你确定你找到了一个具有最小值的范围 - 你不需要继续迭代。不确定这是否有帮助。

回答by Maruccio

In ruby (100/100)

红宝石 (100/100)

def interval_sum x,y,p
    p[y+1] - p[x]
end

def solution(s,p,q)

    #Hash of arrays with prefix sums

    p_sums = {}
    respuesta = []


    %w(A C G T).each do |letter|
        p_sums[letter] = Array.new s.size+1, 0
    end 

    (0...s.size).each do |count|
        %w(A C G T).each do |letter|
            p_sums[letter][count+1] = p_sums[letter][count] 
        end if count > 0

        case s[count]
        when 'A'
            p_sums['A'][count+1] += 1
        when 'C'
            p_sums['C'][count+1] += 1
        when 'G'
            p_sums['G'][count+1] += 1
        when 'T'
            p_sums['T'][count+1] += 1
        end 

    end




    (0...p.size).each do |count|


        x = p[count]
        y = q[count]


        if interval_sum(x, y, p_sums['A']) > 0 then
            respuesta << 1
            next
        end 

        if interval_sum(x, y, p_sums['C']) > 0 then
            respuesta << 2
            next
        end 

        if interval_sum(x, y, p_sums['G']) > 0 then
            respuesta << 3
            next
        end 

        if interval_sum(x, y, p_sums['T']) > 0 then
            respuesta << 4
            next
        end 

    end

    respuesta

end

回答by Viswanath

import java.util.Arrays;
import java.util.HashMap;
class Solution {

   static HashMap<Character, Integer > characterMapping = new HashMap<Character, Integer>(){{
    put('A',1);
    put('C',2);
    put('G',3);
    put('T',4);
  }};

  public static int minimum(int[] arr) {

    if (arr.length ==1) return arr[0];

    int smallestIndex = 0;
    for (int index = 0; index<arr.length; index++) {
      if (arr[index]<arr[smallestIndex]) smallestIndex=index;
    }
    return arr[smallestIndex];
  }

    public int[] solution(String S, int[] P, int[] Q) {
        final char[] characterInput = S.toCharArray();
    final int[] integerInput = new int[characterInput.length];

    for(int counter=0; counter < characterInput.length; counter++) {
      integerInput[counter] = characterMapping.get(characterInput[counter]);
    }

    int[] result = new int[P.length];

    //assuming P and Q have the same length
    for(int index =0; index<P.length; index++) {

      if (P[index]==Q[index]) {
        result[index] = integerInput[P[index]];
        break;
      }
      final int[] subArray = Arrays.copyOfRange(integerInput, P[index], Q[index]+1);
      final int minimumValue = minimum(subArray);
      result[index]= minimumValue;
    }
    return result;
    }
}

回答by Alexander M.

The php 100/100 solution:

php 100/100 解决方案:

function solution($S, $P, $Q) {
    $S      = str_split($S);
    $len    = count($S);
    $lep    = count($P);
    $arr    = array();
    $result = array();
    $clone  = array_fill(0, 4, 0);
    for($i = 0; $i < $len; $i++){
        $arr[$i] = $clone;
        switch($S[$i]){
            case 'A':
                $arr[$i][0] = 1;
                break;
            case 'C':
                $arr[$i][1] = 1;
                break;
            case 'G':
                $arr[$i][2] = 1;
                break;
            default:
                $arr[$i][3] = 1;
                break;
        }
    }
    for($i = 1; $i < $len; $i++){
        for($j = 0; $j < 4; $j++){
            $arr[$i][$j] += $arr[$i - 1][$j];
        }
    }
    for($i = 0; $i < $lep; $i++){
        $x = $P[$i];
        $y = $Q[$i];
        for($a = 0; $a < 4; $a++){
            $sub = 0;
            if($x - 1 >= 0){
                $sub = $arr[$x - 1][$a];
            }
            if($arr[$y][$a] - $sub > 0){
                $result[$i] = $a + 1;
                break;
            }
        }
    }
    return $result;
}

回答by hrajesh4u

This program has got score 100 and performance wise has got an edge over other java codes listed above!

该程序获得了 100 分,并且在性能方面比上面列出的其他 Java 代码更具优势!

The code can be found here.

代码可以在这里找到。

public class GenomicRange {

final int Index_A=0, Index_C=1, Index_G=2, Index_T=3;
final int A=1, C=2, G=3, T=4; 

public static void main(String[] args) {

    GenomicRange gen = new GenomicRange();
    int[] M = gen.solution( "GACACCATA", new int[] { 0,0,4,7 } , new int[] { 8,2,5,7 } );
    System.out.println(Arrays.toString(M));
} 

public int[] solution(String S, int[] P, int[] Q) {

    int[] M = new int[P.length];
    char[] charArr = S.toCharArray();
    int[][] occCount = new int[3][S.length()+1];

    int charInd = getChar(charArr[0]);

    if(charInd!=3) {
        occCount[charInd][1]++;
    }

    for(int sInd=1; sInd<S.length(); sInd++) {

        charInd = getChar(charArr[sInd]);

        if(charInd!=3)
            occCount[charInd][sInd+1]++;

        occCount[Index_A][sInd+1]+=occCount[Index_A][sInd];
        occCount[Index_C][sInd+1]+=occCount[Index_C][sInd];
        occCount[Index_G][sInd+1]+=occCount[Index_G][sInd];
    }

    for(int i=0;i<P.length;i++) {

        int a,c,g;

        if(Q[i]+1>=occCount[0].length) continue;

        a =  occCount[Index_A][Q[i]+1] - occCount[Index_A][P[i]];
        c =  occCount[Index_C][Q[i]+1] - occCount[Index_C][P[i]];
        g =  occCount[Index_G][Q[i]+1] - occCount[Index_G][P[i]];

        M[i] = a>0? A : c>0 ? C : g>0 ? G : T;    
    }

    return M;
}

private int getChar(char c) {

    return ((c=='A') ? Index_A : ((c=='C') ? Index_C : ((c=='G') ? Index_G : Index_T)));  
}
}

回答by Pedro

Here is a C# solution, the basic idea is pretty much the same as the other answers, but it may be cleaner:

这是一个 C# 解决方案,基本思想与其他答案几乎相同,但它可能更清晰:

using System;

class Solution
{
    public int[] solution(string S, int[] P, int[] Q)
    {
        int N = S.Length;
        int M = P.Length;
        char[] chars = {'A','C','G','T'};

        //Calculate accumulates
        int[,] accum = new int[3, N+1];
        for (int i = 0; i <= 2; i++)
        {
            for (int j = 0; j < N; j++)
            {
                if(S[j] == chars[i]) accum[i, j+1] = accum[i, j] + 1;
                else accum[i, j+1] = accum[i, j];
            }
        }

        //Get minimal nucleotides for the given ranges
        int diff;
        int[] minimums = new int[M];
        for (int i = 0; i < M; i++)
        {
            minimums[i] = 4;
            for (int j = 0; j <= 2; j++)
            {
                diff = accum[j, Q[i]+1] - accum[j, P[i]];
                if (diff > 0)
                {
                    minimums[i] = j+1;
                    break;
                }
            }
        }

        return minimums;
    }
}

回答by ncabral

Here's a simple javascript solution which got 100%.

这是一个 100% 的简单 javascript 解决方案。

function solution(S, P, Q) {
    var A = [];
    var C = [];
    var G = [];
    var T = [];
    var result = [];
    var i = 0;

    S.split('').forEach(function(a) {
        if (a === 'A') {
            A.push(i);
        } else if (a === 'C') {
            C.push(i);
        } else if (a === 'G') {
            G.push(i);
        } else {
            T.push(i);
        }

        i++;
    });

    function hasNucl(typeArray, start, end) {
        return typeArray.some(function(a) {
            return a >= P[j] && a <= Q[j];
        });
    }

    for(var j=0; j<P.length; j++) {
        if (hasNucl(A, P[j], P[j])) {
            result.push(1)
        } else if (hasNucl(C, P[j], P[j])) {
            result.push(2);
        } else if (hasNucl(G, P[j], P[j])) {
            result.push(3);
        } else {
            result.push(4);
        }
    }

    return result;
}

回答by Marcin Wojciechowski

perl 100/100 solution:

perl 100/100 解决方案:

sub solution {
    my ($S, $P, $Q)=@_; my @P=@$P; my @Q=@$Q;

    my @_A = (0), @_C = (0), @_G = (0), @ret =();
    foreach (split //, $S)
    {
        push @_A, $_A[-1] + ($_ eq 'A' ? 1 : 0);
        push @_C, $_C[-1] + ($_ eq 'C' ? 1 : 0);
        push @_G, $_G[-1] + ($_ eq 'G' ? 1 : 0);
    }

    foreach my $i (0..$#P)
    {
        my $from_index = $P[$i];
        my $to_index = $Q[$i] + 1;
        if ( $_A[$to_index] - $_A[$from_index] > 0 )
        {
            push @ret, 1;
            next;
        }
        if ( $_C[$to_index] - $_C[$from_index] > 0 )
        {
            push @ret, 2;
            next;
        }
        if ( $_G[$to_index] - $_G[$from_index] > 0 )
        {
            push @ret, 3;
            next;
        }
        push @ret, 4
    }

    return @ret;
}