Regular Expressions explained with grep command.

In our earlier post regarding standard wildcard we saw how wildcards are mostly used with shell commands to replace or represent  one or more characters. Eventhogh similar to standard wildcard, a regular expression or regex is a method of using a sequence of characters to match a wider range of search patterns. In this tutorial I will try to explain regular expressions using grep, awk and sed commands. Before that lets see different regex operators and their meanings.

Operator Description
. (dot) match any single character
.* match any string
^ (caret) Beginning of a line
$ (dollar) Ending of a line
[ ] Represents a range. i.e [abc] means any of a,b or c
[^] Logical NOT i.e [^abc] means match anything except a,b or c
| Logical OR. i.e a|b means either a or b
* (asterisk) proceeding item is to be matched zero or more
{x,y} Matches the preceding element at least x and not more than y times.Also {n} will match the preceeding element n number of times
+ Matches the preceding element one or more times. For example


In addition to the above operators there are useful basic character classes which can be used to match a sequence. Few of them are listed in the below table

 Character class  Description
[:alpha:] matches alphabetic characters
[:digit:] matches numbers
[:alnum:] matches alpha numerals
[:upper:] matches upper case
[:lower:] matches lower case
[:space:] matches  white space characters

Now lets see how regex works with linux commands

Examples with grep command

 

.  (dot)

The . (dot) can match any single character just like the ? (question mark) in the basic wildcard.  It can be an alphabet, a number, a symbol or a space.

example 1:
grep command check whether the user input is present in a line and if yes, displays the entire line and highlights the search string/pattern in colours. In this example '.' (dot) regexp is used as the search pattern and it will match every single character in that line including white space.





example 2:
In this example instead of matching every single character the search pattern should match any 3 character string ending with 'at'.





If you are wondering why the pattern 'at' at the end of the line is highlighted, do remember that '.' (dot) can match any single character including space and there is a space in front of the pattern 'at'.

 

 

[ ] (square bracket)

Regular expression that matches a single character specified inside the [] (square bracket). It can be an alphabet, a number, a symbol or a space.

example 3:
In this example grep command will search whether the characters  a, b, c or  d  is available in the line.





example 4:
We can also mention a character range inside the [ ] (square bracket). An alternate method of executing the previous example is given below 





example 5:
We can also use a mix of characters and ranges as a search criteria. The below example search the line for characters a, b, c, d, e, x, y or z.





example 6:
This example illustrates the case sensitivity of the regular expression pattern.The regular expression [a-z] will match all lower case letters between a and z. Everything else will be ignored.





example 7:
To match all the alphabets irrespective of their case, execute

 

 

 

 

[^]

Regular expression operator for  matching  everything except the character(s) specified inside it. Character can be an alphabet, a number, a symbol, or space.

example 8:
In this example the regular expression matches every three character string ending with 'at' except '4at' and 'cat'





example 9:
In this example grep command check  the echo output for any characters other than alphabets.





example 10:
 In the above example the regular expression matches characters other than alphabets and the grep command checks whether there is any character which is not an alphabet and highlights them.  Adding '_ ' symbol inside the [^] operator will cause grep to check for characters other than alphabets and '_' symbol.





| (pipe)

| (pipe) matches any of the conditions given before or after the | operator. While using pipe the escape character \ (back slash) should be used else '|' will be treated literally as a character.

example 11:
Following regular expression will match either cat or mat.






( )

( ), Parenthesis, is the operator used to group expressions together. Like  | operator parenthesis should be used along with escape character

example 12:
This example is an alternate method to example 11. In the case of strings cat and mat, the last 2 characters are the same and the only difference is the first character. Instead of using | (pipe) operator in between the two strings,  we can simplify the expression using parenthesis. ie cat|mat and (c|m)at have the same meaning







The rest of the examples will be explained using a file regexp.txt. The contents of the file are
cat regexp.txt
REGEXP DEMO FILE
abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789
+-.,!@#$%^&*();\/|<>"'
12345 -98.7 3.141 .6180 9,000 +42
phone: +91-3456788080
email: foo@demo.net, bar.ba@demo.co.in
www.demo.net http://demo.co.in/
http://linuxfledglings.blogspot.com
ftp://10.124.0.239

example 13:
To display all the lines containing capital letters and highlighting the capital letter, execute






An alternate method to get the above output is
grep '[A-Z]' regexp.txt

example 14:
To display all the lines containing numbers and highlighting the numbers, execute








An alternate method to get the above output is
grep '[0-9]' regexp.txt

example 15:
To display all the lines containing punctuations and highlighting the punctuations, execute











example 16:
To display all the lines containing 26 upper case alphabet and to highlight them , execute






example 17:
To display  the line containing Mobile phone number of the format +XXX-NNNNNNNNNN where XXX is the country code and NNNNNNNNNN represents 10-digit mobile number.






where -E is grep command's option to interpret a pattern as a regular expression. i.e there is no need for escape characters infront of regular expression operators like (, ), {, }, + and |


example 18:
To match ip addresses  from the file, execute
But the above method have certain limitation i.e. it will match even patterns like 999.999.999.999 For a more accurate matching, execute
grep -E "(([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])" regexp.txt




example 19:
To match email ids from the file, execute
NB: This method have certain limitation but will work 99% of the time.








Comments

Popular posts from this blog

Understanding awk command with examples

what is an inode?

Understanding sed command with example -Part 1