javascript 将字符串中的整数和文本分开

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3370263/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-25 00:59:54  来源:igfitidea点击:

separate integers and text in a string

javascriptregex

提问by Hacker

I have string like fullData1 upto fullData10 in this i need to separate out the integers and text part. how do I do it using javascript.

我有像 fullData1 到 fullData10 这样的字符串,我需要将整数和文本部分分开。我如何使用 javascript 做到这一点。

回答by Aiden Bell

Split your string into an array by integer:

按整数将字符串拆分为数组:

myArray = datastring.split(/([0-9]+)/)

myArray = datastring.split(/([0-9]+)/)

Then the first element of myArraywill be something like fullDataand the second will be some numbers such as 1or 10.

然后,的第一个元素myArray将是类似的fullData,第二个将是一些数字,例如1or 10

If your string was fullData10foothen you would have an array ['fullData', 10, 'foo']

如果你的字符串是fullData10foo那么你会有一个数组['fullData', 10, 'foo']

You could also:

你也可以:

  • .split(/(?=\d+)/)which will yield ["fullData", "1", "0"]

  • .split(/(\d+)/)which will yield ["fullData", "10", ""]

  • Additionally .filter(Boolean)to get rid of any empty strings ("")

  • .split(/(?=\d+)/)这将产生 ["fullData", "1", "0"]

  • .split(/(\d+)/)这将产生 ["fullData", "10", ""]

  • 另外.filter(Boolean)要摆脱任何空字符串 ( "")

回答by Kangkan

If the length of the character part is constant, you can very well remove them using a substring method.

如果字符部分的长度是常数,则可以很好地使用子字符串方法删除它们。

回答by CodeManX

tl;dr

tl;博士

If the RegExp sticky flagis supported in your JS environment, use it for optimal performance.

如果您的 JS 环境支持RegExp 粘性标志,请使用它以获得最佳性能。

Benchmark

基准

Here are 8 different implementations to split digits from other characters:

这里有 8 种不同的实现来将数字与其他字符分开:

function naturalSplit(str) {
    'use strict';
    let arr = [];
    let split = str.split(/(\d+)/);
    for (let i in split) {
        let s = split[i];
        if (s !== "") {
            if (i % 2) {
                arr.push(+s);
            } else {
                arr.push(s);
            }
        }
    }
    return arr;
}

function naturalSplit2(str) {
    'use strict';
    return str.split(/(\d+)/)
        .map((elem, i) => {
            if (i % 2) {
                return +elem;
            }
            return elem;
        })
        .filter(elem => elem !== "");
}

function naturalSplitMapFilterUnaryPlus(str) {
    'use strict';
    return str.split(/(\d+)/)
        .map((elem, i) => i % 2 ? +elem : elem)
        .filter(elem => elem !== "");
}

function naturalSplitMapFilterNumber(str) {
    'use strict';
    return str.split(/(\d+)/)
        .map((elem, i) => i % 2 ? Number(elem) : elem)
        .filter(elem => elem !== "");
}

function naturalConcat(str) {
    'use strict';
    const arr = [];
    let i = 0;
    while (i < str.length) {
        let token = "";
        while (i < str.length && str[i] >= "0" && str[i] <= "9") {
            token += str[i];
            i++;
        }
        if (token) {
            arr.push(Number(token));
            token = "";
        }
        while (i < str.length && (str[i] < "0" || str[i] > "9")) {
            token += str[i];
            i++;
        }
        if (token) {
            arr.push(token);
        }
    }
    return arr;
}

function naturalMatch(str) {
    'use strict';
    const arr = [];
    const num_re = /^(\D+)?(\d+)?(.*)$/;
    let s = str;
    while (s) {
        const match = s.match(num_re);
        if (!match) {
            break;
        }
        if (match[1]) {
            arr.push(match[1]);
        }
        if (match[2]) {
            arr.push(Number(match[2]));
        }
        s = match[3];
    }
    return arr;
}

function naturalExecSticky(str) {
    'use strict';
    const arr = [];
    const num_re = /(\D+)?(\d+)?/y;
    let match;
    do {
        match = num_re.exec(str);
        if (match[1] !== undefined) {
            arr.push(match[1]);
        }
        if (match[2] !== undefined) {
            arr.push(Number(match[2]));
        }
    } while (match[0]);
    return arr;
}

function naturalSlice(str) {
    'use strict';
    const arr = [];
    let i = 0;
    while (i < str.length) {
        let j = 0;
        while ((i + j) < str.length && str[i + j] >= "0" && str[i + j] <= "9") {
            j++;
        }
        if (j) {
            arr.push(Number(str.substr(i, j)));
            i += j;
            j = 0;
        }
        while ((i + j) < str.length && (str[i + j] < "0" || str[i + j] > "9")) {
            j++;
        }
        if (j) {
            arr.push(str.substr(i, j));
            i += j;
        }
    }
    return arr;
}

const algorithms = [
    naturalSplit,
    naturalSplit2,
    naturalSplitMapFilterUnaryPlus,
    naturalSplitMapFilterNumber,
    naturalConcat,
    naturalSlice,
    naturalMatch,
    naturalExecSticky
];

(function(){
    'use strict';

    let randomTests = [];
    for (let i = 0; i < 100000; i++) {
        randomTests.push({str: Math.random().toString(36).slice(2)});
    }

    const tests = [
        {str: "112233", expect: [112233]},
        {str: "foo bar baz", expect: ["foo bar baz"]},
        {str: "foo11bar22baz", expect: ["foo", 11, "bar", 22, "baz"]},
        {str: "11foo22bar33baz", expect: [11, "foo", 22, "bar", 33, "baz"]},
        {str: "foo11bar22baz33", expect: ["foo", 11, "bar", 22, "baz", 33]},
        {str: "11foo22bar33baz44", expect: [11, "foo", 22, "bar", 33, "baz", 44]},
        {str: "", expect: []},
        //{str: "99999999999999999999999999999999999999999999999999999999999999999999999999999999999", expect: ""}, // number too large for JS = ?
        {str: "Li Europan 0234 lingues es membres del sam familie. Lor separat existentie es un myth. Por scientie, musica, sport etc, litot Europa usa li sam vocabular. Li lingues differe solmen in li 0.00 grammatica, -1e5 li pronunciation e li plu commun vocabules. Omnicos directe al desirabilite de un nov lingua franca: On refusa continuar payar custosi traductores. At solmen va 8esser necessi far uniform grammatica, pronunciation 025.35 e plu sommun paroles. Ma +234234 quande lingues coalesce, li grammatica del resultant lingue es plu simplic e 432 regulari quam ti del coalescent9 lingues. Li nov 90548 lingua franca va esser plu simplic e 23453 regulari quam li existent 234898234 Europan lingues. It va esser tam simplic23423452349819879234quam Occidental in fact, it va esser Occidental. A un Angleso it va semblar un simplificat Angles, quam un skeptic 89723894 Cambridge amico dit me que Occidental es.Li Europan lingues es membres del sam familie. Lor separat existentie es un myth. Por scientie, musica, sport etc, litot Europa usa li sam vocabular. Li 3,4,5,6,7,8 lingues differe solmen in li grammatica, li 495 pronunciation e li plu commun -45345 vocabules. Omnicos directe al desirabilite de un nov lingua franca: On refusa continuar payar custosi traductores. At solmen va esser necessi far uniform grammatica, pronunciation e plu sommun paroles.",
        expect: ["Li Europan ", 234, " lingues es membres del sam familie. Lor separat existentie es un myth. Por scientie, musica, sport etc, litot Europa usa li sam vocabular. Li lingues differe solmen in li ", 0, ".", 0, " grammatica, -", 1, "e", 5, " li pronunciation e li plu commun vocabules. Omnicos directe al desirabilite de un nov lingua franca: On refusa continuar payar custosi traductores. At solmen va ", 8, "esser necessi far uniform grammatica, pronunciation ", 25, ".", 35, " e plu sommun paroles. Ma +", 234234, " quande lingues coalesce, li grammatica del resultant lingue es plu simplic e ", 432, " regulari quam ti del coalescent", 9, " lingues. Li nov ", 90548, " lingua franca va esser plu simplic e ", 23453, " regulari quam li existent ", 234898234, " Europan lingues. It va esser tam simplic", 23423452349819879234, "quam Occidental in fact, it va esser Occidental. A un Angleso it va semblar un simplificat Angles, quam un skeptic ", 89723894, " Cambridge amico dit me que Occidental es.Li Europan lingues es membres del sam familie. Lor separat existentie es un myth. Por scientie, musica, sport etc, litot Europa usa li sam vocabular. Li ", 3, ",", 4, ",", 5, ",", 6, ",", 7, ",", 8, " lingues differe solmen in li grammatica, li ", 495, " pronunciation e li plu commun -", 45345, " vocabules. Omnicos directe al desirabilite de un nov lingua franca: On refusa continuar payar custosi traductores. At solmen va esser necessi far uniform grammatica, pronunciation e plu sommun paroles."]}
    ];

    for (let t of tests) {
        console.log('\nTest "' + t.str.slice(0, 20) + '"');
        for (let f of algorithms) {
            console.time(f.name);
            for (let i = 0; i < 1000; i++) {
                let result = f(t.str);
            }
            console.timeEnd(f.name);
        }
    }
    console.log('\nRandom tests')
    for (let f of algorithms) {
        console.time(f.name);
        for (let r of randomTests) {
            let result = f(r.str);
        }
        console.timeEnd(f.name);
    }
})();

My test results

我的测试结果

Using NodeJS 5.11.0 with --harmony_regexps --regexp-optimization:

使用 NodeJS 5.11.0 与--harmony_regexps --regexp-optimization

Test "112233"
naturalSplit: 2.817ms
naturalSplit2: 3.033ms
naturalSplitMapFilterUnaryPlus: 3.199ms
naturalSplitMapFilterNumber: 1.910ms
naturalConcat: 0.876ms
naturalSlice: 1.274ms
naturalMatch: 0.960ms
naturalExecSticky: 0.863ms

Test "foo bar baz"
naturalSplit: 1.072ms
naturalSplit2: 0.839ms
naturalSplitMapFilterUnaryPlus: 0.800ms
naturalSplitMapFilterNumber: 0.802ms
naturalConcat: 0.952ms
naturalSlice: 0.697ms
naturalMatch: 0.577ms
naturalExecSticky: 1.329ms

Test "foo11bar22baz"
naturalSplit: 3.410ms
naturalSplit2: 2.398ms
naturalSplitMapFilterUnaryPlus: 2.083ms
naturalSplitMapFilterNumber: 6.107ms
naturalConcat: 1.627ms
naturalSlice: 1.633ms
naturalMatch: 2.070ms
naturalExecSticky: 1.697ms

Test "11foo22bar33baz"
naturalSplit: 3.572ms
naturalSplit2: 2.805ms
naturalSplitMapFilterUnaryPlus: 2.691ms
naturalSplitMapFilterNumber: 2.570ms
naturalConcat: 1.990ms
naturalSlice: 1.983ms
naturalMatch: 2.474ms
naturalExecSticky: 1.591ms

Test "foo11bar22baz33"
naturalSplit: 3.439ms
naturalSplit2: 2.637ms
naturalSplitMapFilterUnaryPlus: 2.613ms
naturalSplitMapFilterNumber: 4.554ms
naturalConcat: 1.958ms
naturalSlice: 2.002ms
naturalMatch: 0.686ms
naturalExecSticky: 0.792ms

Test "11foo22bar33baz44"
naturalSplit: 3.916ms
naturalSplit2: 2.824ms
naturalSplitMapFilterUnaryPlus: 2.843ms
naturalSplitMapFilterNumber: 2.685ms
naturalConcat: 2.164ms
naturalSlice: 2.246ms
naturalMatch: 0.981ms
naturalExecSticky: 0.961ms

Test ""
naturalSplit: 1.579ms
naturalSplit2: 2.993ms
naturalSplitMapFilterUnaryPlus: 1.356ms
naturalSplitMapFilterNumber: 1.201ms
naturalConcat: 0.029ms
naturalSlice: 0.029ms
naturalMatch: 0.025ms
naturalExecSticky: 0.186ms

Test "Li Europan 0234 ling"
naturalSplit: 25.771ms
naturalSplit2: 14.735ms
naturalSplitMapFilterUnaryPlus: 14.905ms
naturalSplitMapFilterNumber: 13.707ms
naturalConcat: 90.956ms
naturalSlice: 54.905ms
naturalMatch: 20.436ms
naturalExecSticky: 5.915ms

Random tests
naturalSplit: 376.622ms
naturalSplit2: 293.722ms
naturalSplitMapFilterUnaryPlus: 286.914ms
naturalSplitMapFilterNumber: 281.534ms
naturalConcat: 234.996ms
naturalSlice: 233.745ms
naturalMatch: 100.181ms
naturalExecSticky: 100.647ms

naturalMatchis clearly faster than the rest - except for naturalExecSticky, which is on par, but sometimes even superior (4x on a long input string).

naturalMatch显然比其他的要快 - 除了naturalExecSticky,它是同等的,但有时甚至更好(长输入字符串的 4 倍)。

BTW: the functions are called natural..., because the result is useful for natural sorting ("file10" after "file2" instead of "file1", which would be the alphabetical order).

顺便说一句:函数被调用natural...,因为结果对于自然排序很有用(“file2”之后的“file10”而不是“file1”,这将是字母顺序)。