Monday, October 16, 2006

Natural Sort in C#

Natural or Human Sort is the very useful sort where the string "10" comes after the string "9" instead of before "9" and after "1".

If you have a list of files what are numbered you'll probably notice that Windows XP sorts these using a natural sort algorithm.

Useful languages like PHP have a natural sort as part of their base libraries. Unfortunately .Net did not get a natural sort comparer.

So, using http://sourcefrog.net/projects/natsort/ and http://www.mircscripts.org/showdoc.php?type=code&id=2771 as references I have created a Natural Sort compare algorithm in C#


#region Natural Sort
/// <summary>
/// Performs a natural sort order comparison of the strings. This can be used by CompareTo methods of
/// objects that want to be sorted. See: http://sourcefrog.net/projects/natsort/
/// </summary>
/// <param name="a"></param>
/// <param name="b"></param>
/// <returns></returns>
public static int NaturalCompareTo(string a, string b)
{
// String is not an object and so this doesn't change the passed in value
a = a.ToUpper();
b = b.ToUpper();

int indexA = 0;
int indexB = 0;
int result = 0;
string aPart = " ", bPart = " ";
while (true)
{
aPart = indexA < a.Length ? a.Substring(indexA, 1) : "";
bPart = indexB < b.Length ? b.Substring(indexB, 1) : "";

while (IsSpace(aPart))
{
aPart = ++indexA < a.Length ? a.Substring(indexA, 1) : "";
}
while (IsSpace(bPart))
{
bPart = ++indexB < b.Length ? b.Substring(indexB, 1) : "";
}

if (IsDigit(aPart) && IsDigit(bPart))
{
if (aPart.Equals("0") || bPart.Equals("0"))
{
result = NatCompareLeft(a.Substring(indexA), b.Substring(indexB));
if (result != 0)
{
return result;
}
}
else
{
result = NatCompareRight(a.Substring(indexA), b.Substring(indexB));
if (result != 0)
{
return result;
}
}
}

if (IsEmpty(aPart) && IsEmpty(bPart))
{
return 0;
}

result = aPart.CompareTo(bPart);
if (result != 0)
{
return result;
}


indexA++;
indexB++;
}

}


private static bool IsDigit(string theValue)
{
if (theValue.CompareTo("0") >= 0 && theValue.CompareTo("9") <= 0 && theValue.Length == 1)
{
return true;
}
else
{
return false;
}
}

private static bool IsSpace(string theValue)
{
switch (theValue)
{
case " ":
case "\t":
case "\n":
case "\r":
case "\v":
case "\f":
return true;
default:
return false;
}
}
private static int NatCompareRight(string a, string b)
{
int bias = 0;
int index = 0;
string aPart, bPart;

while (true)
{
aPart = index < a.Length ? a.Substring(index, 1) : "";
bPart = index < b.Length ? b.Substring(index, 1) : "";

if (!IsDigit(aPart) && !IsDigit(bPart))
{
return bias;
}
else if (!IsDigit(aPart))
{
return -1;
}
else if (!IsDigit(bPart))
{
return 1;
}
else if (aPart.CompareTo(bPart) < 0)
{
if (bias == 0) bias = -1;
}
else if (aPart.CompareTo(bPart) > 0)
{
if (bias == 0) bias = 1;
}

index++;
}


}


private static int NatCompareLeft(string a, string b)
{
int index = 0;
string aPart, bPart;

while (true)
{
aPart = index < a.Length ? a.Substring(index, 1) : "";
bPart = index < b.Length ? b.Substring(index, 1) : "";

if (!IsDigit(aPart) && !IsDigit(bPart))
{
return 0;
}
else if (!IsDigit(aPart))
{
return -1;
}
else if (!IsDigit(bPart))
{
return 1;
}
else if (aPart.CompareTo(bPart) < 0)
{
return -1;
}
else if (aPart.CompareTo(bPart) > 0)
{
return 1;
}

index++;
}

}


#endregion

4 comments:

  1. Works great unless you're using decimal numbers.

    For example two strings

    "my string 2.4"
    "my string 2.11"

    will sort in the following order
    "my string 2.4"
    "my string 2.11"

    but, obviously they should be
    "my string 2.11"
    "my string 2.4"
    for a true natural order.

    ReplyDelete
  2. I hadn't thought about that sorting case. It is a good point and you are exactly right.

    The results you are getting are by design.

    It is not uncommon for books or manuals to be divided in to sections and sub-sections.

    "Section 2.1"
    .
    .
    .
    "Section 2.9"
    "Section 2.10"
    "Section 2.11"

    Natural Sort is designed with this case in mind.

    It might be possible to modify the algorithm to recognize decimals but then my sections would not sort correctly.

    So it looks like the "Natural" order depends on the context of the sort.

    In my defense, the algorithm I based my C# port on is widely accepted. I believe that it is the same algorithm used in PHP. It produces the same results as the Windows Explorer file sorting as well.

    If you used a fixed width decimal notation the sort would work properly:

    "my string 2.11"
    "my string 2.40"

    ReplyDelete
  3. Good point about the book notation. I checked the same example against the natural sort in Oracle and it behaves in the same fashion as your sort does.

    For my purpose it was exactly what I needed, at first I thought the presence of decimals was going to be an issue for my client but they are using the decimals as a sub indicator instead of a true "decimal" value.

    Thanks for this, big help!

    ReplyDelete
  4. Anonymous12:26 AM

    I can host your files. I have plenty of bandwidth for you. Please contact me at boston AT savinet.net . I really like the product, I need to download your version with the demo.

    Also is it possible to show the uploaded image like either 1 by 1 or the grid-like pattern in the original SWFUpload

    ReplyDelete