-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Closed
Milestone
Description
e.g.
using System;
using System.Globalization;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
CultureInfo.CurrentCulture = new CultureInfo("tr-TR");
var r = new Regex(@"[A-Z]", RegexOptions.IgnoreCase);
Console.WriteLine(r.IsMatch("\u0131")); // should print true, but prints false
}
}In Turkish, I lowercases to ı (\u0131), so the above repro should print out true. But whereas Regex is using the target culture when dealing with individual characters in a set:
Lines 551 to 556 in fd82afe
| SingleRange range = rangeList[i]; | |
| if (range.First == range.Last) | |
| { | |
| char lower = culture.TextInfo.ToLower(range.First); | |
| rangeList[i] = new SingleRange(lower, lower); | |
| } |
when it instead has a range with multiple characters, it delegates to this AddLowercaseRange function:
Line 569 in fd82afe
| private void AddLowercaseRange(char chMin, char chMax) |
which doesn't factor in the target culture into its decision, instead using a precomputed table:
Line 301 in fd82afe
| private static readonly LowerCaseMapping[] s_lcTable = new LowerCaseMapping[] |
@tarekgh, @GrabYourPitchforks, am I correct that such a table couldn't possibly be right, given that different cultures case differently?
Note that if the above repro is instead changed to spell out the whole range of uppercase letters:
using System;
using System.Globalization;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
CultureInfo.CurrentCulture = new CultureInfo("tr-TR");
var r = new Regex(@"[ABCDEFGHIJKLMNOPQRSTUVWXYZ]", RegexOptions.IgnoreCase);
Console.WriteLine(r.IsMatch("\u0131")); // prints true
}
}it then correctly prints true.