Java String Tokenizer：用逗号分割字符串并忽略双引号中的逗号

Question

提问by Shashi

I have a string like below -

我有一个像下面这样的字符串 -

value1, value2, value3, value4, "value5, 1234", value6, value7, "value8", value9, "value10, 123.23"

If I tokenize above string I'm getting comma separated tokens. But I would like to say to string tokenizer ignore comma's after double quotes while doing splits. How can I say this?

如果我标记上面的字符串，我会得到逗号分隔的标记。但是我想说字符串标记器在进行拆分时忽略双引号后的逗号。我怎么能说这个？

Thanks in advance

提前致谢

Shashi

沙市

Answer 1

采纳答案by Ravi Thapliyal

Use a CSV parser like OpenCSVto take care of things like commas in quoted elements, values that span multiple lines etc. automatically. You can use the library to serialize your text back as CSV as well.

使用像OpenCSV这样的 CSV 解析器来自动处理引用元素中的逗号、跨越多行的值等。您也可以使用该库将文本序列化回 CSV。

String str = "value1, value2, value3, value4, \"value5, 1234\", " +
        "value6, value7, \"value8\", value9, \"value10, 123.23\"";

CSVReader reader = new CSVReader(new StringReader(str));

String [] tokens;
while ((tokens = reader.readNext()) != null) {
    System.out.println(tokens[0]); // value1
    System.out.println(tokens[4]); // value5, 1234
    System.out.println(tokens[9]); // value10, 123.23
}

Answer 2

回答by Ivan Mushketyk

You can use several approaches:

您可以使用多种方法：

Write code that search for comas and maintain a state weather a particular coma is in quotes or note.
Tokenize by double-quote symbol and than tokenize strings in the result array by comma symbol (make sure you tokenize strings with indexes 0, 2, 4, etc., since they were not in double quotes in the original string)

编写代码来搜索昏迷并保持特定昏迷在引号或注释中的天气状态。
通过双引号标记，然后通过逗号符号标记结果数组中的字符串（确保使用索引 0、2、4 等标记字符串，因为它们不在原始字符串中的双引号中）

Answer 3

回答by Bohemian

You just need one line and the right regex:

你只需要一行和正确的正则表达式：

String[] values = input.replaceAll("^\"", "").split("\"?(,|$)(?=(([^\"]*\"){2})*[^\"]*$) *\"?");

This also neatly trims off the wrapping double quotes for you too, including the final quote!

这也可以为您整齐地修剪双引号，包括最后的引号！

Note: Interesting edge case when the firstterm is quoted required an extra step of trimming the leading quote using replaceAll().

注意：引用第一个术语时的有趣边缘情况需要使用replaceAll().

Here's some test code:

下面是一些测试代码：

String input= "\"value1, value2\", value3, value4, \"value5, 1234\", " +
    "value6, value7, \"value8\", value9, \"value10, 123.23\"";
String[] values = input.replaceAll("^\"", "").split("\"?(,|$)(?=(([^\"]*\"){2})*[^\"]*$) *\"?");
for (String s : values)
System.out.println(s);

Output:

输出：

value1, value2
value3
value4
value5, 1234
value6
value7
value8
value9
value10, 123.23

Answer 4

回答by Sumedh Kapoor

Without any third party library dependency, following code can also parse the fields as per the requirements given:

在没有任何第三方库依赖的情况下，以下代码也可以根据给定的要求解析字段：

import java.util.*;

public class CSVSpliter {

  public static void main (String [] args) {
    String inputStr = "value1, value2, value3, value4, \"value5, 1234\", value6, value7, \"value8\", value9, \"value10, 123.23\"";

    StringBuffer sb = new StringBuffer (inputStr);
    List<String> splitStringList = new ArrayList<String> ();
    boolean insideDoubleQuotes = false;
    StringBuffer field = new StringBuffer ();

    for (int i=0; i < sb.length(); i++) {
        if (sb.charAt (i) == '"' && !insideDoubleQuotes) {
            insideDoubleQuotes = true;
        } else if (sb.charAt(i) == '"' && insideDoubleQuotes) {
            insideDoubleQuotes = false;
            splitStringList.add (field.toString().trim());
            field.setLength(0);
        } else if (sb.charAt(i) == ',' && !insideDoubleQuotes) {
            // ignore the comma after double quotes.
            if (field.length() > 0) {
                splitStringList.add (field.toString().trim());
            }
            // clear the field for next word
            field.setLength(0);
        } else {
            field.append (sb.charAt(i));
        }
    }
    for (String str: splitStringList) {
        System.out.println ("Split fields: "+str);
    }
}

}

This will give the following output:

这将提供以下输出：

Split fields: value1
Split fields: value2
Split fields: value3
Split fields: value4
Split fields: value5, 1234
Split fields: value6
Split fields: value7
Split fields: value8
Split fields: value9
Split fields: value10, 123.23

拆分字段：value1
拆分字段：value2
拆分字段：value3
拆分字段：value4
拆分字段：value5、1234
拆分字段：value6
拆分字段：value7
拆分字段：value8
拆分字段：value9
拆分字段：value10、123.23

Answer 5

回答by Reza

String delimiter = ",";

String v = "value1, value2, value3, value4, \"value5, 1234\", value6, value7, \"value8\", value9, \"value10, 123.23\"";

String[] a = v.split(delimiter + "(?=(?:(?:[^\"]*+\"){2})*+[^\"]*+$)");

Answer 6

回答by Igor Baikalov

I'm allergic to regex; why not double-split as someone suggested?

我对正则表达式过敏；为什么不像有人建议的那样双重分裂？

    String str = "value1, value2, value3, value4, \"value5, 1234\", value6, value7, \"value8\", value9, \"value10, 123.23\"";
    boolean quoted = false;
    for(String q : str.split("\"")) {
        if(quoted)
            System.out.println(q.trim());
        else
            for(String s : q.split(","))
                if(!s.trim().isEmpty())
                    System.out.println(s.trim());
        quoted = !quoted;
    }

Java String Tokenizer：用逗号分割字符串并忽略双引号中的逗号

提问by Shashi

采纳答案by Ravi Thapliyal

回答by Ivan Mushketyk

回答by Bohemian

回答by Sumedh Kapoor

回答by Reza

回答by Igor Baikalov

相关推荐

最近更新

标签

Java String Tokenizer：用逗号分割字符串并忽略双引号中的逗号

提问by Shashi

采纳答案by Ravi Thapliyal

回答by Ivan Mushketyk

回答by Bohemian

回答by Sumedh Kapoor

回答by Reza

回答by Igor Baikalov

相关推荐

Java RestFull WebService：使用 Jersey 2.3.1 库的 JAX-RS 实现

Java 验证spring JDBC批量更新成功

Java Spring Security 登录错误：HTTP 状态 404 - /j_spring_security_check

IntelliJ IDEA - 错误：java: 包 foo 不存在

相关推荐

最近更新

标签