17 January 2011

Custom Regular Expressions with Examples

This Article explains how can we use the custom regular
expressions in .NET. Here nearly 50 custom regular expressions covered. after
reading this article you should be able to create any type of custom reqular
expression.
Here is the .NET Custom regular expressions list.

^hello
Matches hello there, hello sam, hellotopical

To match the end
of the string, use the $ character. For example:
ere$Matches Where, and ThereThe ^
and $ characters are know as "Atomic Zero Width Assertions",
in case you were wondering.
Character Classes
Character classes
allow you to specify sets of characters or ranges. For example:
[aeiou]
Matches Hey, and Hi, but not Zzz
In other words,
the string must contain at least one of the characters in the character class.
You can also exclude characters. For example:
[^aeiou]
Matches Zzz, but not Hey or Hi.
When the ^
character is the first character in the character class, it means "anything
but the following characters".
Putting this together,
we could create a pattern that matches strings that start with a vowel:
^[aeiou]Or, strings that
don't start with a vowel:
^[^aeiou]With character
classes, you can also specify ranges. For example:
[0-9]
Matches 0, 5, 8, or any number between 0 and 9.
[0-9][0-9]
Matches any two digit number (04, 13, 87, etc.), but there's
a better way to do this.
[a-zA-Z0-9_]Matches characters typically found in words. A short hand syntax
for this is \w
Some other build
in classes are \W for anything other than a word character
([^a-zA-Z0-9_]). \s for any whitespace character.
\S for any non-whitespace character. \d
for any decimal ([0-9]) and \D for any non-decimal
([^0-9])

Quantifiers
Sometime you want
to specify a certain number of characters that match a certain pattern.
For example, a zip code is 5 digits. This is written as:
^[0-9]{5}$
Which says match the beginning of the string, followed by five digits, followed
by the end of the string. This matches 97211, 01293, 88460.
^[0-9]{5}-[0-9]{4}$Matches 9 digit zip codes, such as 97211-0165.
This could also be written as ^\d{5}-\d{4}$. where \d
matches any decimal (the same as [0-9]).
You can also specify
minimum and maximum
style="FONT-SIZE: 12pt; FONT-FAMILY: 'Times New Roman'; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA">occurrences.
For example:

^\d{1,3}$
matches 1, 15, 987. In other words, any number that
is 1 to 3 digits in length.
You can specify
open ended ranges:
^\d{1,}$
matches 1 or more digits. This can also be written ^\d+$
^\d{0,}$
matches 0 or more digits. This can also be written ^\d*$
^[+-]{0,1}\d{0,}$matches 123, +123, and -123.
This could also be written as ^[+-]?\d* where ?
means 0 or 1 occurrences.
Options
The regular exPssion
syntax contains a number of options that you can toggle. For example,
you can enable case insensitive matching with (?i:):
(?i:[aeiou])matches hello, and HELLO, but not
Zzzz

Examples:
US Phone Number:
^\(?\d{3}\)?\s-\d{3}-\d{4}$
matches (555) 555-5555, or 555-555-5555
Improved US Phone Number
^1?\s*-?\s*(\d{3}\(\s*\d{3}\s*\))\s*-?\s*\d{3}\s*-?\s*\d{4}$
This recognizes 1-123-456-7890, 1 (123) 456 7980, 1 123 456
7890, (123) 456-7890, 123-456-7890, and so on, and makes sure that if one
paren is Psent both must be Psent.
International Phone
Number
^\d(\d-){7,20}
matches 1-12-3123-4141.
E-Mail Address
(by Lucadean)
^([a-zA-Z0-9_\-])([a-zA-Z0-9_\-\.]*)@(\[((25[0-5]2[0-4][0-9]1[0-9][0-9][1-9][0-9][0-9])\.){3}((([a-zA-Z0-9\-]+)\.)+))([a-zA-Z]{2,}(25[0-5]2[0-4][0-9]1[0-9][0-9][1-9][0-9][0-9])\])$
5 Digit Zipcode
^\d{5}$
matches 12879, 97211
9 Digit Zipcode
^\d{5}-\d{4}$
matches 97211-1234

5 or 9 Digit Zipcode
^\d{5}(-?\d{4})?$This exPssion will match 12345, 123451234, or 12345-1234.
Date
(as in MM-DD-YYYY or MM/DD/YYYY, by Chow). Accepts 1 or 2 digits for month
and day.
^\d{1,2}/-\d{1,2}/-\d{4}$
More sophisticated
date, that accepts dates from 1/1/0001 - 12/31/9999 (mm/dd/yyyy), and validates
leap years (2/29/2000 is valid, but 2/29/2001 is not) - By Mike Akins based
off work by Michael Ash
^(?:(?:(?:0?[13578]1[02])(\/-)31)(?:(?:0?[1,3-9]1[0-2])(\/-)(?:2930)))(\/-)(?:[1-9]\d\d\d\d[1-9]\d\d\d\d[1-9]\d\d\d\d[1-9])$^(?:(?:0?[1-9]1[0-2])(\/-)(?:0?[1-9]1\d2[0-8]))(\/-)(?:[1-9]\d\d\d\d[1-9]\d\d\d\d[1-9]\d\d\d\d[1-9])$^(0?2(\/-)29)(\/-)(?:(?:0[48]00[13579][26]00[2468][048]00)(?:\d\d)?(?:0[48][2468][048][13579][26]))$
Here's a version of the above
date exPssions that matches UK dates (dd/mm/yyyy) - by Adam Carless
^(?:(?:0?[1-9]1\d2[0-8])(\/-)(?:0?[1-9]1[0-2]))(\/-)(?:[1-9]\d\d\d\d[1-9]\d\d\d\d[1-9]\d\d\d\d[1-9])$^(?:(?:31(\/-)(?:0?[13578]1[02]))(?:(?:2930)(\/-)(?:0?[1,3-9]1[0-2])))(\/-)(?:[1-9]\d\d\d\d[1-9]\d\d\d\d[1-9]\d\d\d\d[1-9])$^(29(\/-)0?2)(\/-)(?:(?:0[48]00[13579][26]00[2468][048]00)(?:\d\d)?(?:0[48][2468][048][13579][26]))$

IP Address
^((25[0-5]2[0-4][0-9]1[0-9][0-9][1-9][0-9][0-9])\.){3}(25[0-5]2[0-4][0-9]1[0-9][0-9][1-9][0-9][0-9])$
matches 255.255.255.255, and 0.0.0.0, but
doesn't match 256.1.1.1 or 999.1.1.1.

Make sure a string
doesn't contain certain characters (by Chris Venus):
^[^ab]*$
matches hello, eye, fred (any string that doesn't have "a"
or "b" in it), but doesn't match bye.

UK Postal Codes
by John Dyke.
Their format
is an outer part: 1 or 2 letter(s) + 1 or 2 digits + a letter (sometime mainly
London) an inner part 1 digit and two letters.

The code is normally
written in capital letters with a space between the outer and inner parts;
it is understandable if the space is omitted

This regular
exPssion validates upper or lower case with or without the space:
^[A-Za-z]{1,2}[\d]{1,2}([A-Za-z])?\s?[\d][A-Za-z]{2}$"
CF1 2AA matches
as does cf564fg (= CF56 4FG) but a1234d, A12 77Y would not.

Extract all the
HTML tags from a web page:
In conjunction
with a little .NET code that extracts all the matches, this can be used to
extract every HTML tag from a page.

<;[^>;]*>;
Or, if you just
want image tags, for example:

<;img[^>;]*>;
To get the values
of a CSV (updated by Arnold Bailey), you can use this exPssion:
,(?=(?:[^"]*"[^"]*")*(?![^"]*"))

In conjunction
with this code:
Regex r = new Regex(",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))");
string s = "\"a\",b,\"c, d, e\",,f";
string[] sAry = r.Split(s);
for(int i=0;i <; sAry.Length;i++)
{
Console.WriteLine(sAry[i]);
}

Percentage (by
Andres Garcia)
^(0*100{1,1}\.?((?<;=\.)0*)?%?$)(^0*\d{0,2}\.?((?<;=\.)\d*)?%?)$
- Matches 0, 0.0, 99.9, 100.0, but excludes -1, 100.1, etc.File Names (by Karl Moore)
^([a-zA-Z]\:\\)\\([^\\]+\\)*[^\/:*?"<;>;]+\.DOC(l)?$ - alter the DOC here to your "valid" file extension, use "IgnoreCase"
Sample matches: c:\data.doc, e:\whitecliff\staff\km\file.DOC, \\network\km\file.doc
Sample nonmatches: c:\, c:\myreport.txt, sitrep.doc
Sample VB.NET code:
Dim blnMatch As Boolean, strValue As String = "c:\files\report.doc"
blnMatch = System.Text.RegularExPssions.Regex.IsMatch( _
strValue, "^([a-zA-Z]\:\\)\\([^\\]+\\)*[^\/:*?""<;>;]+\.doc(l)?$", _
System.Text.RegularExPssions.RegexOptions.IgnoreCase)
Credit Card Numbers (by Sushrut Joshi)
^\d{4}((-\d{4}){3}$)((\d{4}){3}$) - Matches 1234-1234-1234-1234 or 1234123412341234, but not 1234-123412341234.
Numeric Value (by Benoit Aubuchon)
^([^.][-0-9.]+[^.-])$ This will match 0.9, -99 but not .3, 4. and 4-

No comments:

Post a Comment