C#是否内置了对页码字符串进行解析的支持?

时间:2020-03-05 18:46:37  来源:igfitidea点击:

Chave内置支持解析页码字符串吗?所谓页码,是指我们可能会在打印对话框中输入格式,该格式由逗号和破折号分隔。

像这样的东西:

1,3,5-10,12

真正好的解决方案是给我一些某种形式的列表,该列表包含由字符串表示的所有页码。在上面的示例中,像这样返回列表会很好:

1,3,5,6,7,8,9,10,12

如果有简单的方法,我只是想避免自己动手做。

解决方案

回答

它没有内置的方法来执行此操作,但是使用String.Split进行操作将是微不足道的。

只需在','上分割,便有一系列代表页码或者范围的字符串。遍历该系列并执行String.Split'-'。如果没有结果,那是一个普通的页码,因此请将其粘贴在页面列表中。如果有结果,则以"-"的左边和右边为界限,并使用简单的for循环将每个页码添加到该范围内的最终列表中。

只需5分钟即可完成操作,然后可能还需要10分钟来进行一些完整性检查,以在用户尝试输入无效数据(例如" 1-2-3"等)时引发错误。

回答

应该很简单:

foreach( string s in "1,3,5-10,12".Split(',') ) 
{
    // try and get the number
    int num;
    if( int.TryParse( s, out num ) )
    {
        yield return num;
        continue; // skip the rest
    }

    // otherwise we might have a range
    // split on the range delimiter
    string[] subs = s.Split('-');
    int start, end;

    // now see if we can parse a start and end
    if( subs.Length > 1 &&
        int.TryParse(subs[0], out start) &&
        int.TryParse(subs[1], out end) &&
        end >= start )
    {
        // create a range between the two values
        int rangeLength = end - start + 1;
        foreach(int i in Enumerable.Range(start, rangeLength))
        {
            yield return i;
        }
    }
}

编辑:感谢修复;-)

回答

基思的方法看起来不错。我使用列表整理了一种更幼稚的方法。这有错误检查,因此希望应该能解决大多数问题:

public List<int> parsePageNumbers(string input) {
  if (string.IsNullOrEmpty(input))
    throw new InvalidOperationException("Input string is empty.");

  var pageNos = input.Split(',');

  var ret = new List<int>();
  foreach(string pageString in pageNos) {
    if (pageString.Contains("-")) {
      parsePageRange(ret, pageString);
    } else {
      ret.Add(parsePageNumber(pageString));
    }
  }

  ret.Sort();
  return ret.Distinct().ToList();
}

private int parsePageNumber(string pageString) {
  int ret;

  if (!int.TryParse(pageString, out ret)) {
    throw new InvalidOperationException(
      string.Format("Page number '{0}' is not valid.", pageString));
  }

  return ret;
}

private void parsePageRange(List<int> pageNumbers, string pageNo) {
  var pageRange = pageNo.Split('-');

  if (pageRange.Length != 2)
    throw new InvalidOperationException(
      string.Format("Page range '{0}' is not valid.", pageNo));

  int startPage = parsePageNumber(pageRange[0]),
    endPage = parsePageNumber(pageRange[1]);

  if (startPage > endPage) {
    throw new InvalidOperationException(
      string.Format("Page number {0} is greater than page number {1}" +
      " in page range '{2}'", startPage, endPage, pageNo));
  }

  pageNumbers.AddRange(Enumerable.Range(startPage, endPage - startPage + 1));
}

回答

这是我为类似的东西烹调的东西。

它处理以下类型的范围:

1        single number
1-5      range
-5       range from (firstpage) up to 5
5-       range from 5 up to (lastpage)
..       can use .. instead of -
;,       can use both semicolon, comma, and space, as separators

它不会检查重复值,因此集合1,5,-10将产生序列1、5、1、2、3、4、5、6、7、8、9、10.

public class RangeParser
{
    public static IEnumerable<Int32> Parse(String s, Int32 firstPage, Int32 lastPage)
    {
        String[] parts = s.Split(' ', ';', ',');
        Regex reRange = new Regex(@"^\s*((?<from>\d+)|(?<from>\d+)(?<sep>(-|\.\.))(?<to>\d+)|(?<sep>(-|\.\.))(?<to>\d+)|(?<from>\d+)(?<sep>(-|\.\.)))\s*$");
        foreach (String part in parts)
        {
            Match maRange = reRange.Match(part);
            if (maRange.Success)
            {
                Group gFrom = maRange.Groups["from"];
                Group gTo = maRange.Groups["to"];
                Group gSep = maRange.Groups["sep"];

                if (gSep.Success)
                {
                    Int32 from = firstPage;
                    Int32 to = lastPage;
                    if (gFrom.Success)
                        from = Int32.Parse(gFrom.Value);
                    if (gTo.Success)
                        to = Int32.Parse(gTo.Value);
                    for (Int32 page = from; page <= to; page++)
                        yield return page;
                }
                else
                    yield return Int32.Parse(gFrom.Value);
            }
        }
    }
}

回答

这是lassevk的代码的略微修改版本,用于处理字符串.Regex匹配内部的split操作。它是作为扩展方法编写的,我们可以使用LINQ的Disinct()扩展轻松解决重复问题。

/// <summary>
    /// Parses a string representing a range of values into a sequence of integers.
    /// </summary>
    /// <param name="s">String to parse</param>
    /// <param name="minValue">Minimum value for open range specifier</param>
    /// <param name="maxValue">Maximum value for open range specifier</param>
    /// <returns>An enumerable sequence of integers</returns>
    /// <remarks>
    /// The range is specified as a string in the following forms or combination thereof:
    /// 5           single value
    /// 1,2,3,4,5   sequence of values
    /// 1-5         closed range
    /// -5          open range (converted to a sequence from minValue to 5)
    /// 1-          open range (converted to a sequence from 1 to maxValue)
    /// 
    /// The value delimiter can be either ',' or ';' and the range separator can be
    /// either '-' or ':'. Whitespace is permitted at any point in the input.
    /// 
    /// Any elements of the sequence that contain non-digit, non-whitespace, or non-separator
    /// characters or that are empty are ignored and not returned in the output sequence.
    /// </remarks>
    public static IEnumerable<int> ParseRange2(this string s, int minValue, int maxValue) {
        const string pattern = @"(?:^|(?<=[,;]))                      # match must begin with start of string or delim, where delim is , or ;
                                 \s*(                                 # leading whitespace
                                 (?<from>\d*)\s*(?:-|:)\s*(?<to>\d+)  # capture 'from <sep> to' or '<sep> to', where <sep> is - or :
                                 |                                    # or
                                 (?<from>\d+)\s*(?:-|:)\s*(?<to>\d*)  # capture 'from <sep> to' or 'from <sep>', where <sep> is - or :
                                 |                                    # or
                                 (?<num>\d+)                          # capture lone number
                                 )\s*                                 # trailing whitespace
                                 (?:(?=[,;\b])|$)                     # match must end with end of string or delim, where delim is , or ;";

        Regex regx = new Regex(pattern, RegexOptions.IgnorePatternWhitespace | RegexOptions.Compiled);

        foreach (Match m in regx.Matches(s)) {
            Group gpNum = m.Groups["num"];
            if (gpNum.Success) {
                yield return int.Parse(gpNum.Value);

            } else {
                Group gpFrom = m.Groups["from"];
                Group gpTo = m.Groups["to"];
                if (gpFrom.Success || gpTo.Success) {
                    int from = (gpFrom.Success && gpFrom.Value.Length > 0 ? int.Parse(gpFrom.Value) : minValue);
                    int to = (gpTo.Success && gpTo.Value.Length > 0 ? int.Parse(gpTo.Value) : maxValue);

                    for (int i = from; i <= to; i++) {
                        yield return i;
                    }
                }
            }
        }
    }