Home All Groups Group Topic Archive Search About

[Slightly OT] Is this possible with a regex?

Author
2 Oct 2005 3:24 PM
Cool Guy
Is it possible, with a regular expression, to split a string by spaces,
*not* splitting substrings between quotation characters?

e.g.:

    one two "three three" four "five five" six

-->

    1: one
    2: two
    3: three three
    4: four
    5: five five
    6: six

Author
3 Oct 2005 3:15 AM
Roger Rabbit
Yes it is. Check out the documentation for an explaination of how to do it.

RR

Show quote
"Cool Guy" <cool***@abc.xyz> wrote in message
news:l7l1utqkp06a.dlg@cool.guy.abc.xyz...
> Is it possible, with a regular expression, to split a string by spaces,
> *not* splitting substrings between quotation characters?
>
> e.g.:
>
>     one two "three three" four "five five" six
>
> -->
>
>     1: one
>     2: two
>     3: three three
>     4: four
>     5: five five
>     6: six
Author
3 Oct 2005 4:53 AM
William Stacey [MVP]
// Test
  string line = @"one two ""three three"" four ""five five"" six";
  string[] fields = SplitQuoted(line, " "); // Delim with space. Ignore any
delims inside quotes.
  foreach ( string s in fields )
  {
      Console.WriteLine(s);
  }


        /// <summary>
        /// Splits any string using seperators string.  This is different
from the
        /// string.Split method as we ignore delimiters inside double quotes
and
        /// will *ignore multiple delimiters in a row (i.e. [One     two]
will split
        /// into two fields if space is one of the delimiters).
        /// Example:
        /// Delims: " \t," (space, tab, comma)
        /// Input: "one two" three four,five
        /// Returns (4 strings):
        /// one two
        /// three
        /// four
        /// five
        /// </summary>
        /// <param name="text">The string to split.</param>
        /// <param name="delimiters">The characters to split on.</param>
        /// <returns></returns>
        public static string[] SplitQuoted(string text, string delimiters)
        {
            // Default delimiters are a space and tab (e.g. " \t").
            // All delimiters not inside quote pair are ignored.
            // Default quotes pair is two double quotes ( e.g. '""' ).
            if ( text == null )
                throw new ArgumentNullException("text", "text is null.");
            if ( delimiters == null || delimiters.Length < 1 )
                delimiters = " \t"; // Default is a space and tab.

            ArrayList res = new ArrayList();

            // Build the pattern that searches for both quoted and unquoted
elements
            // notice that the quoted element is defined by group #2 (g1)
            // and the unquoted element is defined by group #3 (g2).

            string pattern =
                @"""([^""\\]*[\\.[^""\\]*]*)""" +
                "|" +
                @"([^" + delimiters + @"]+)";

            // Search the string.
            foreach ( System.Text.RegularExpressions.Match m in
System.Text.RegularExpressions.Regex.Matches(text, pattern) )
            {
                //string g0 = m.Groups[0].Value;
                string g1 = m.Groups[1].Value;
                string g2 = m.Groups[2].Value;
                if ( g2 != null && g2.Length > 0 )
                {
                    res.Add(g2);
                }
                else
                {
                    // get the quoted string, but without the quotes in g1;
                    res.Add(g1);
                }
            }
            return (string[])res.ToArray(typeof(string));
        }

--
William Stacey [MVP]

Show quote
"Cool Guy" <cool***@abc.xyz> wrote in message
news:l7l1utqkp06a.dlg@cool.guy.abc.xyz...
> Is it possible, with a regular expression, to split a string by spaces,
> *not* splitting substrings between quotation characters?
>
> e.g.:
>
>    one two "three three" four "five five" six
>
> -->
>
>    1: one
>    2: two
>    3: three three
>    4: four
>    5: five five
>    6: six

AddThis Social Bookmark Button