Saturday, 18 May 2013

Regular Expressions in vb script example

Regular expressions Basics

Definition :
A regular expression is a pattern of characters(meta characters and special characters).

General Applications of Regular Expressions :
Where are regular expressions used.
  1. Pattern Matching in Strings
  2. To find the occurrences of one string/pattern in given string.
  3. To replace the patterns with another string in a given text
All programming languages support the use of regular expressions. 

Examples on Regular Expressions:  
As I said earlier We use regular expressions to check if the given string matches the specified pattern.
For example - Consider a scenario where you have to validate that the given string should be a valid email address.

So list of valid email addresses are -, etc
Some of the invalid email addresses are - kjkjj@fdff, etc

With the help of regular expression, you can easily validate the email address.

Syntax of Regular Expression in VBScript:
To write any VBScript program involving regular expressions, you will have to follow below steps.
  1. Create a regular expression object (RegExp)
  2. Define the pattern using RegExp object's pattern property.
  3. Use test method to check whether the given string matches with the pattern specified in step 2.
'Create the regular expression object
Set myRegEx = New RegExp 

'Specify the pattern (Regular Expression)
myRegEx.Pattern = "[a-z0-9]+@[a-z]+\.[a-z]+"

'Specify whether the matching is to be done with case sensitivity on or off.
myRegEx.IgnoreCase = True

'Use Test method to see if the given string is matching with the pattern
isMatched = myRegEx.Test("")

Variable isMatched will be true if the string "" matches with the given pattern 

Another example on Regular Expression.

searchString = "Sachin tendulkar is the master blaster. Sachin lives in Mumbai and likes to play cricket."
searchPattern = "Sachin"

  Set reObject= New RegExp          ' Create a regular expression.
  reObject.Pattern = searchPattern   ' Set pattern.
  reObject.IgnoreCase = True          ' Set case insensitivity.
  reObject.Global = True                  ' Set global applicability.
  Set Matches = reObject.Execute(searchString)   ' Execute search.

  For Each M in Matches  
      Str = Str &  M.Firstindex &   "  ->  " &   M.Value &  vbCRLF

  Msgbox  Str

  Msgbox "String after replacing -> " & vbcrlf  & reObject.replace(searchString,"Arjun")

Below is the list of all meta characters used in regular expressions in VBScript

\ -  indicates that the next character would be a special character, a literal or a backreference

^  - Input String should be matched at the beginning.

$  - Input String should be matched at the end.

*  - Matches the preceding character zero or more times. It is same as {0,}.

+  -  Matches the preceding character one or more times.  It is same as {1,}.

?   - Matches the preceding character zero or one time.  It is same as {0,1}

{i} - Matches the previous character exactly i times.

{i,} - Matches the previous character at least i times and at most any time.

{i,j} -Matches the previous character at least i times and at the most j times.

.     -  Matches any single character except "\n".

(pattern) -  Matches pattern and captures the match that can be used in backreferences.
 p|q  -  Matches either p or q. Please note that p and q could be more complex regular expressions

[pqr]  - A character set. Matches any one of the character inside the brackets.

[^pqr]  - A negative character set. Matches any character not inside the brackets.

[p-z]   -  A range of characters. Matches any character in the specified range i.e p,q,r,....x,y,z.

[^p-z]  -  A negative range characters. Matches any character not in the specified range i.e. a,b,c...m,n,o

\b      -  Matches the boundary of the word

\B      -  Matches middle part of the word.

\d      -  Matches a digit character. same as [0-9].

\D      -  Matches a nondigit character. same as  [^0-9].

\f , \n and \r     -  Matches a form-feed character, newline and carriage character.

\s    - Matches any white space character including space, tab, form-feed. Equivalent to [ \f\n\r\t\v].

\S      - Matches any non-white space character. Equivalent to [^ \f\n\r\t\v].

\t  , \v   - Matches a horizontal and  vertical tab character.

\w      - Matches alpha numeric character including underscore. Equivalent to '[A-Za-z0-9_]'.

\W      - Matches any non - alpha numeric character. Equivalent to '[^A-Za-z0-9_]'.

\number- A reference back to captured matches.


Some examples on regular expressions:
  1. To match the 10 digit mobile number ->  \d{10}
  2. To match email address  -> \w+@\w+\.\w+
