Table of Contents

~~SLIDESHOW~~

Text Processing with Regular Expressions

Contact Hour 10: To be discussed on Tuesday 19th February, 2013.

Lecturer: Dr Chris P. Jobling.

Text Processing with Regular Expressions


The slides and notes for this lecture are based on Chapter 4 of Robert W. Sebasta, Programming the World-Wide Web, 3rd Edition, Addison Wesley, 2006. There is a good discussion of JavaScript regular expressions in Sections 7.2 and 9.9 of the Chris Bates, Web Programming: Building Internet Applications, 3rd Edition, John Wiley, 2006. A good website that intruduces this topic is Regular-Expressions.info.

Contents of this Session

Text processing with regular expressions

Learning Outcomes

At the end of this lecture you should be able to answer these questions:

Learning Outcomes (2)

At the end of this lecture you should be able to answer these questions:

Text manipulation in JavaScript

Pattern Matching


A Little History

Regular expression pattern matching is a technique that was first developed for the text editors ed and sed which were (and still are) part of the Unix system. The ideas were extended to the program awk and eventually reached their full potential in the Perl programming language. Perl regular expressions are the inspiration for JavaScript's and a variation of the Perl form of regular expression are to be found in many other contexts such as the text editors vi and emacs, most scripting languages, and even in the standard Java library.

If you are interested, Regular expression has more to say on the subject.

Demo


The tools illustrated are both based on the JavaScript regular expression engine which itself is based on the Perl Common Regular Expression library that is used in many modern scripting languages, programmer's editors and even the Apache web server.

There is a version of RegexPal on the Blackboard site that you can download and which gives you access to the global search as a switchable option.

Simple patterns: characters

Normal characters (match themselves)

Simple patterns: meta-characters

Meta-characters have special meanings in patterns – they do not match themselves:

  \ | ( ) [ ] { } ^ $ * + ? .

Character classes

Character class abbreviations

Abbr. Equiv. Pattern Matches
\d [0-9] a digit
\D [^0-9] not a digit
\w [A-Za-z_0-9] a word character
\W [^A-Za-z_0-9] not a word character
\s [ \r\t\n\f] a whitespace character
\S [^ \r\t\n\f] not a whitespace character

(JavaScript) variables in patterns are interpolated

Quantifiers

Quantifiers in braces

Quantifier Meaning
{n} exactly n repetitions
{m,} at least m repetitions
{m, n} at least m but not more than n repetitions

Other Quantifiers

Just abbreviations for the most commonly used quantifiers

Anchors

The pattern can be forced to match only at the start with ^ or at the end with $

Pattern modifiers

The i modifier tells the matcher to ignore the case of letters

The x modifier tells the matcher to ignore whitespace in the pattern (allows comments in patterns)

Using Regular Expressions in JavaScript

In JavaScript we can use regular expressions to:

The most common use of these is in form validation.

The search function

search (pattern) returns the position in the object string of the pattern (position is relative to zero);

    var str = "Gluckenheimer";
    var position = str.search(/n/);
    /* position is now 6 */

The replace function

replace(pattern, string)

The replace function: example

var str = "Some rabbits are rabid";
str.replace(/rab/g, "tim"); 
// str is now "Some timbits are timid" 
// $1 and $2 are both set to "rab" 

The split function

split(parameter)

var str = "grapes:apples:oranges"
var fruit = str.split(/:/) 
// fruit is set to ["grapes", "apples", "oranges"]

The match function

match(pattern)

The match function: example

var str = "My 3 kings beat your 2 aces";
var matches = str.match(/[ab]/g); 
//matches is set to ["b", "a", "a"]

Form Validation

Common use of JavaScript is to check validity of user inputs on forms


Markup:

<!DOCTYPE html>
<!-- forms_check.html
A function tst_phone_num is defined and tested.
This function checks the validity of phone
number input from a form
--> 
<html class="no-js" lang="en">
  <head>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
    <title> Phone number tester </title>
    <meta name="viewport" content="width=device-width">
 
    <link rel="stylesheet" href="css/bootstrap.min.css">
    <link rel="stylesheet" href="css/bootstrap-responsive.min.css">
    <link rel="stylesheet" href="css/main.css">
 
    <script src="js/vendor/modernizr-2.6.2-respond-1.1.0.min.js"></script>
  </head>
  <body>
    <div class="container">
      <h1>Phone Number Tester</h1>
 
      <p>An example of the use of Regular Expressions for form validation. 
         View source to see the HTML code and use your browser's development tools to view the JavaScript.</p>
      <p>Phone numbers should match the pattern 3 digits followed by a dash followed by four 
         digits. The regular expression for this is <code>/\d{3}-\d{4}/

.

    </p>
    <p>The example uses the DOM 0 event model which will be discussed in the next session.</p>
    <form id="phone" method="post" action="/cgi-bin/echo_form.cgi" onsubmit="validate();">
      <label for="phone_number">Phone number: </label>
      <input id="phone_number" type="text" name="phone_number" placeholder="444-4444" 
             title="Please enter phone number using the pattern ddd-dddd."/>
      <input type="submit" name="Submit" value="Submit" />
    </form>
  </div> <!-- /container -->
  <script src="//ajax.googleapis.com/ajax/libs/jquery/1.9.0/jquery.min.js"></script>
  <script>window.jQuery || document.write('<script src="js/vendor/jquery-1.9.0.min.js"><\/script>')</script>
  <script src="js/vendor/bootstrap.min.js"></script>
  <script src="js/plugins.js"></script>
  <!-- The actual forms_check script -->
  <script src="forms_check.js"></script>
</body>

</html> </code>

The script (validation function validate() will be explained later)

/* Function tst_phone_num
 Parameter: A string
 Result: Returns true if the parameter has the form of a legal
 seven-digit phone number (3 digits, a dash, 4 digits)
 */
 
function test_phone_number(num) {
 
  // Use a simple pattern to check the number of digits and the dash
 
  var ok = num.search(/\d{3}-\d{4}/);
 
  if (ok === 0) {
    return true;
  }
  else {
    return false;
  }
 
}// end of function tst_phone_num
 
/* Actual form validation. Called onclick */
var validate = function() {
  var phoneNumber = document.getElementById("phone_number");
  if (test_phone_number(phoneNumber.value)) {
    return true;
  } else {
    alert("Phone number is invalid. Please use format ddd-dddd.");
    // prevent submission 
    return false;
  }
};

Test code for tst_phone_num

// Test test_phone_number
var test_phone_number_test = function() {
  var tests = ["444-5432", "444-r432", "44-1234"];
  for (i = 0; i < tests.length; i++) {
    var test = test_phone_number(tests[i]);
    if (test) {
      console.log(tests[i] + " is a legal phone number <br />");
    } else {
      console.error("Error in test_phone_number: " + tests[i] + " is not a legal phone number <br />");
    }
  }
};

HTML5 Pattern Attribute

A regular expression validator that is built-in to HTML5

<input id="phone_number" type="text" name="phone_number" 
       placeholder="444-4444" pattern="\d{3}-\d{4}" />

<html> <input id=“phone_number” type=“text” name=“phone_number” placeholder=“444-4444” pattern=“\d{3}-\d{4}” /> </html>

HTML5 Version of the Phone Number Validator


<!DOCTYPE html>
<!-- forms_check.html
A function tst_phone_num is defined and tested.
This function checks the validity of phone
number input from a form
--> 
<html class="no-js" lang="en">
  <head>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
    <title> Phone number tester (HTML5)</title>
    <meta name="viewport" content="width=device-width">
 
    <link rel="stylesheet" href="css/bootstrap.min.css">
    <link rel="stylesheet" href="css/bootstrap-responsive.min.css">
    <link rel="stylesheet" href="css/main.css">
 
    <script src="js/vendor/modernizr-2.6.2-respond-1.1.0.min.js"></script>
  </head>
  <body>
    <div class="container">
      <h1>Phone Number Tester (HTML5)</h1>
      <p>An example of the use of Regular Expressions for form validation. View source to see the code.</p>
      <p>Phone numbers should match the pattern 3 digits followed by a dash followed by four digits. The regular expression for this is
        <code>/\d{3}-\d{4}/

.

    </p>
    <p>HTML5 provides a new form attribute <em>pattern</em> whose value is a regular expression (without the slashes). When supported,
      this can be used instead of JavaScript for form validation.</p>
    <p>In production, you would normally need to provide a JavaScript fallback for browsers that don't yet support the <em>pattern</em> attribute.</p>
    <!-- No need for onsubmit validator now -->
    <form id="phone" method="post" action="http://www.cpjobling.me/cgi-bin/echo_form.cgi">
      <label for="phone_number">Phone number: </label>
      <input id="phone_number" type="text" name="phone_number" 
             pattern="\d{3}-\d{4}" placeholder="444-4444" 
             title="Please enter phone number using the pattern ddd-dddd."/>
      <input type="submit" name="Submit" value="Submit" />
    </form>
  </div> <!-- end of container -->
  <script src="//ajax.googleapis.com/ajax/libs/jquery/1.9.0/jquery.min.js"></script>
  <script>window.jQuery || document.write('<script src="js/vendor/jquery-1.9.0.min.js"><\/script>')</script>
  <script src="js/vendor/bootstrap.min.js"></script>
  <script src="js/plugins.js"></script>
  <!-- Look no scripts! -->
</body>

</html> </code>

Debugging JavaScript: IE6+

Debugging JavaScript: IE6+ (continued)

Script error in Internet Explorer

Debugging JavaScript: Firefox

Script error in Firefox

Debugging JavaScript (continued)

Debugging with Firebug

Debugging in WebKit Browsers

Summary of This Lecture

Text processing with regular expressions

Learning Outcomes

At the end of this lecture you should be able to answer these questions:

Learning Outcomes (2)

At the end of this lecture you should be able to answer these questions:

Exercises

Write, test and debug (if necessary) HTML files that include JavaScript scripts for the following problems. When required to write functions, you must include a script to test the function with at least two different data sets.

  1. Input: A text string, using prompt; Output: either legal name or Illegal name, depending on whether the input string fits the required format, which is: Last name, first name, middle initial where neither of the names can have more than 15 characters.
  2. Input: A text string, using prompt; Output: The words of the input text, in alphabetical order
  3. Function: tst_name; Parameter: a string; Returns: true if the given string has the form: string1, string2, letter where both strings must be all lowercase letters except the first letter, and letter must be uppercase; false otherwise.
  4. Use the function developed in Exercise 3 to validate a form with a text field that captures the user's name when the user presses the submit button. The form should not submit data if the name is not in the correct format. Use the example given in the session as a template.
  5. Repeat exercise 4 using the built-in HTML5 pattern attribute to validate the name as defined in Exercise 3.

More Homework Exercises

What's Next?

Manipulating web documents through the Document Object Model (DOM) and the JavaScript event model.

Previous Lecture | home | Next Lecture

1)
See http://www.cuneytyilmaz.com/prog/jrx/ for original version.
2)
We won't have time for them all, but you can look at them yourself later
3)
form's onsubmit event