User Tools

Site Tools


eg-259:lecture7

~~SLIDESHOW~~

Text Processing with Regular Expressions

Contact Hour 10: To be discussed on Tuesday 19th February, 2013.

Lecturer: Dr Chris P. Jobling.

Text Processing with Regular Expressions

  • We conclude our review of the Basics of JavaScript with a discussion of text manipulation with regular expressions.
  • Regular expressions is a key idea that we shall return to again in the context of server-side scripting.

The slides and notes for this lecture are based on Chapter 4 of Robert W. Sebasta, Programming the World-Wide Web, 3rd Edition, Addison Wesley, 2006. There is a good discussion of JavaScript regular expressions in Sections 7.2 and 9.9 of the Chris Bates, Web Programming: Building Internet Applications, 3rd Edition, John Wiley, 2006. A good website that intruduces this topic is Regular-Expressions.info.

Contents of this Session

Learning Outcomes

At the end of this lecture you should be able to answer these questions:

  • What is a character class in a pattern?
  • What are the predefined character classes, and what do they mean?
  • What are the symbolic quantifiers, and what do they mean?
  • Describe the two end-of-line anchors.

Learning Outcomes (2)

At the end of this lecture you should be able to answer these questions:

  • What does the i pattern modifier do?
  • What exactly does the String method replace do?
  • What exactly does the String method match do?

Text manipulation in JavaScript

  • Text manipulation is a very important feature of many Web Applications.
  • Some examples:
    • search for a string in a document or form field
    • search for a string and replace it with another
    • validate the text fields of a form
  • Can use string manipulation, but it tends to be restrictive and inefficient
  • Better to use a technique called pattern matching

Pattern Matching

  • JavaScript provides two ways to do pattern matching:
    • Using RegExp objects
    • Using methods on String objects
  • A powerful pattern matcher called a regular expression matcher is provided
  • This is our first exposure to regular expressions but it is an important topic in its own right.

A Little History

Regular expression pattern matching is a technique that was first developed for the text editors ed and sed which were (and still are) part of the Unix system. The ideas were extended to the program awk and eventually reached their full potential in the Perl programming language. Perl regular expressions are the inspiration for JavaScript's and a variation of the Perl form of regular expression are to be found in many other contexts such as the text editors vi and emacs, most scripting languages, and even in the standard Java library.

If you are interested, Regular expression has more to say on the subject.

Demo

  • We will be using a version of RegexPal to illustrate regular expressions.
  • There's another slightly more powerful one, a clever piece of JavaScript 1) magic was developed by Cüneyt Yýlmaz
  • You can use either of these tools to play with regular expressions.
  • Here's the set of examples that I will work through2).

The tools illustrated are both based on the JavaScript regular expression engine which itself is based on the Perl Common Regular Expression library that is used in many modern scripting languages, programmer's editors and even the Apache web server.

There is a version of RegexPal on the Blackboard site that you can download and which gives you access to the global search as a switchable option.

Simple patterns: characters

Normal characters (match themselves)

  • E.g: /ee/ matches need, greed, weed, but not wed or dead

Simple patterns: meta-characters

Meta-characters have special meanings in patterns – they do not match themselves:

  \ | ( ) [ ] { } ^ $ * + ? .
  • A meta-character is treated as a normal character if it is escaped (preceded with a backslash \)
  • period ( . ) is a special meta-character – it matches any character except newline
  • /c.t/ matches Ascot, cat, cut and crt but not act or cart.

Character classes

  • Put a sequence of characters in brackets, and it defines a set of characters, any one of which matches:
    • [abcd] matches any of letters 'a', 'b', 'c', or 'd'.
  • Dashes can be used to specify spans of characters in a class:
    • [a-z] matches any lower-case letter (in the English alphabet).
  • A caret at the left end of a class definition means match anything but the characters in the class:
    • [^0-9] matches any character that is not a decimal digit.

Character class abbreviations

Abbr. Equiv. Pattern Matches
\d [0-9] a digit
\D [^0-9] not a digit
\w [A-Za-z_0-9] a word character
\W [^A-Za-z_0-9] not a word character
\s [ \r\t\n\f] a whitespace character
\S [^ \r\t\n\f] not a whitespace character

(JavaScript) variables in patterns are interpolated

Quantifiers

Quantifiers in braces

Quantifier Meaning
{n} exactly n repetitions
{m,} at least m repetitions
{m, n} at least m but not more than n repetitions

Other Quantifiers

Just abbreviations for the most commonly used quantifiers

  • * means zero or more repetitions e.g., \d* means zero or more digits
  • + means one or more repetitions e.g., \d+ means one or more digits
  • ? Means zero or one e.g., \d? means zero or one digit

Anchors

The pattern can be forced to match only at the start with ^ or at the end with $

  • Example 1: /^Lee/ matches “Lee Ann” but not “Mary Lee Ann”
  • Example 2: /Lee Ann$/ matches “Mary Lee Ann”, but not “Mary Lee Ann is nice”
  • The anchor operators (^ and $) do not match characters in the string – they match positions, at the beginning or end

Pattern modifiers

The i modifier tells the matcher to ignore the case of letters

  • Example: /oak/i matches “OAK” and “Oak”

The x modifier tells the matcher to ignore whitespace in the pattern (allows comments in patterns)

Using Regular Expressions in JavaScript

In JavaScript we can use regular expressions to:

  • Search for text patterns in a string
  • Replace patterns in a string
  • Split a string based on some defined delimiter pattern
  • Match patterns found in a string and do something with each match

The most common use of these is in form validation.

The search function

search (pattern) returns the position in the object string of the pattern (position is relative to zero);

  • returns -1 if it fails
    var str = "Gluckenheimer";
    var position = str.search(/n/);
    /* position is now 6 */

The replace function

replace(pattern, string)

  • Finds a substring that matches the pattern and replaces it with the string (g modifier can be used)
  • g modifier means “replace globally”, all matched strings will be replaced.
  • Matched substrings are returned in special variables $1, $2, etc.

The replace function: example

var str = "Some rabbits are rabid";
str.replace(/rab/g, "tim"); 
// str is now "Some timbits are timid" 
// $1 and $2 are both set to "rab" 

The split function

split(parameter)

  • Example:
var str = "grapes:apples:oranges"
var fruit = str.split(/:/) 
// fruit is set to ["grapes", "apples", "oranges"]
  • ":" and /:/ are equivalent

The match function

match(pattern)

  • The most general pattern-matching method
  • Returns an array of results of the pattern-matching operation
  • With the g modifier, it returns an array of all of the substrings that matched
  • Without the g modifier, first element of the returned array has the matched substring, the other elements have the values of $1,

The match function: example

var str = "My 3 kings beat your 2 aces";
var matches = str.match(/[ab]/g); 
//matches is set to ["b", "a", "a"]

Form Validation

Common use of JavaScript is to check validity of user inputs on forms

  • avoids a trip to server that would result in an error page
  • error handling is kept local
  • usually triggered by submission button3)
  • error message generated locally by writing into document object.
  • This example defines a function that could be used in a registration page to check that a phone number is valid (using US conventions!) HTML5 Markup: fiddle with it forms_check.html Script: forms_check.js

Markup:

<!DOCTYPE html>
<!-- forms_check.html
A function tst_phone_num is defined and tested.
This function checks the validity of phone
number input from a form
--> 
<html class="no-js" lang="en">
  <head>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
    <title> Phone number tester </title>
    <meta name="viewport" content="width=device-width">
 
    <link rel="stylesheet" href="css/bootstrap.min.css">
    <link rel="stylesheet" href="css/bootstrap-responsive.min.css">
    <link rel="stylesheet" href="css/main.css">
 
    <script src="js/vendor/modernizr-2.6.2-respond-1.1.0.min.js"></script>
  </head>
  <body>
    <div class="container">
      <h1>Phone Number Tester</h1>
 
      <p>An example of the use of Regular Expressions for form validation. 
         View source to see the HTML code and use your browser's development tools to view the JavaScript.</p>
      <p>Phone numbers should match the pattern 3 digits followed by a dash followed by four 
         digits. The regular expression for this is <code>/\d{3}-\d{4}/

.

    </p>
    <p>The example uses the DOM 0 event model which will be discussed in the next session.</p>
    <form id="phone" method="post" action="/cgi-bin/echo_form.cgi" onsubmit="validate();">
      <label for="phone_number">Phone number: </label>
      <input id="phone_number" type="text" name="phone_number" placeholder="444-4444" 
             title="Please enter phone number using the pattern ddd-dddd."/>
      <input type="submit" name="Submit" value="Submit" />
    </form>
  </div> <!-- /container -->
  <script src="//ajax.googleapis.com/ajax/libs/jquery/1.9.0/jquery.min.js"></script>
  <script>window.jQuery || document.write('<script src="js/vendor/jquery-1.9.0.min.js"><\/script>')</script>
  <script src="js/vendor/bootstrap.min.js"></script>
  <script src="js/plugins.js"></script>
  <!-- The actual forms_check script -->
  <script src="forms_check.js"></script>
</body>

</html> </code>

The script (validation function validate() will be explained later)

/* Function tst_phone_num
 Parameter: A string
 Result: Returns true if the parameter has the form of a legal
 seven-digit phone number (3 digits, a dash, 4 digits)
 */
 
function test_phone_number(num) {
 
  // Use a simple pattern to check the number of digits and the dash
 
  var ok = num.search(/\d{3}-\d{4}/);
 
  if (ok === 0) {
    return true;
  }
  else {
    return false;
  }
 
}// end of function tst_phone_num
 
/* Actual form validation. Called onclick */
var validate = function() {
  var phoneNumber = document.getElementById("phone_number");
  if (test_phone_number(phoneNumber.value)) {
    return true;
  } else {
    alert("Phone number is invalid. Please use format ddd-dddd.");
    // prevent submission 
    return false;
  }
};

Test code for tst_phone_num

// Test test_phone_number
var test_phone_number_test = function() {
  var tests = ["444-5432", "444-r432", "44-1234"];
  for (i = 0; i < tests.length; i++) {
    var test = test_phone_number(tests[i]);
    if (test) {
      console.log(tests[i] + " is a legal phone number <br />");
    } else {
      console.error("Error in test_phone_number: " + tests[i] + " is not a legal phone number <br />");
    }
  }
};

HTML5 Pattern Attribute

A regular expression validator that is built-in to HTML5

  • New pattern attribute can be used on some modern browsers
  • Pattern text is actually evaluated as the JavaScript expression /^pattern$/ by the JavaScript engine.
  • You may need to provide a JavaScript fallback for older browsers (see later)
  • E.g.
<input id="phone_number" type="text" name="phone_number" 
       placeholder="444-4444" pattern="\d{3}-\d{4}" />

<html> <input id=“phone_number” type=“text” name=“phone_number” placeholder=“444-4444” pattern=“\d{3}-\d{4}” /> </html>

HTML5 Version of the Phone Number Validator


<!DOCTYPE html>
<!-- forms_check.html
A function tst_phone_num is defined and tested.
This function checks the validity of phone
number input from a form
--> 
<html class="no-js" lang="en">
  <head>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
    <title> Phone number tester (HTML5)</title>
    <meta name="viewport" content="width=device-width">
 
    <link rel="stylesheet" href="css/bootstrap.min.css">
    <link rel="stylesheet" href="css/bootstrap-responsive.min.css">
    <link rel="stylesheet" href="css/main.css">
 
    <script src="js/vendor/modernizr-2.6.2-respond-1.1.0.min.js"></script>
  </head>
  <body>
    <div class="container">
      <h1>Phone Number Tester (HTML5)</h1>
      <p>An example of the use of Regular Expressions for form validation. View source to see the code.</p>
      <p>Phone numbers should match the pattern 3 digits followed by a dash followed by four digits. The regular expression for this is
        <code>/\d{3}-\d{4}/

.

    </p>
    <p>HTML5 provides a new form attribute <em>pattern</em> whose value is a regular expression (without the slashes). When supported,
      this can be used instead of JavaScript for form validation.</p>
    <p>In production, you would normally need to provide a JavaScript fallback for browsers that don't yet support the <em>pattern</em> attribute.</p>
    <!-- No need for onsubmit validator now -->
    <form id="phone" method="post" action="http://www.cpjobling.me/cgi-bin/echo_form.cgi">
      <label for="phone_number">Phone number: </label>
      <input id="phone_number" type="text" name="phone_number" 
             pattern="\d{3}-\d{4}" placeholder="444-4444" 
             title="Please enter phone number using the pattern ddd-dddd."/>
      <input type="submit" name="Submit" value="Submit" />
    </form>
  </div> <!-- end of container -->
  <script src="//ajax.googleapis.com/ajax/libs/jquery/1.9.0/jquery.min.js"></script>
  <script>window.jQuery || document.write('<script src="js/vendor/jquery-1.9.0.min.js"><\/script>')</script>
  <script src="js/vendor/bootstrap.min.js"></script>
  <script src="js/plugins.js"></script>
  <!-- Look no scripts! -->
</body>

</html> </code>

Debugging JavaScript: IE6+

  • Select Internet Options from the Tools menu
  • Choose the Advanced tab
  • Uncheck the Disable script debugging box
  • Check the Display a notification about every script error box
  • Now, a script error causes a small window to be opened with an explanation of the error

Debugging JavaScript: IE6+ (continued)

Script error in Internet Explorer

Debugging JavaScript: Firefox

  • Select Tools → JavaScript Console
  • A small window appears to display script errors
  • Remember to clear the console after correcting an error message – avoids confusion

Script error in Firefox

Debugging JavaScript (continued)

Debugging with Firebug

  • Firefox only!
  • Firebug plugin provides sophisticated web page analysis tools including JavaScript debugging facilities and a console
  • Firebug Lite provides (limited) facilities for IE and other browsers.
  • Demo

Debugging in WebKit Browsers

  • Apple Safari
  • Google Chrome
  • Have built-in development tools

Summary of This Lecture

Learning Outcomes

At the end of this lecture you should be able to answer these questions:

  • What is a character class in a pattern?
  • What are the predefined character classes, and what do they mean?
  • What are the symbolic quantifiers, and what do they mean?
  • Describe the two end-of-line anchors.

Learning Outcomes (2)

At the end of this lecture you should be able to answer these questions:

  • What does the i pattern modifier do?
  • What exactly does the String method replace do?
  • What exactly does the String method match do?

Exercises

Write, test and debug (if necessary) HTML files that include JavaScript scripts for the following problems. When required to write functions, you must include a script to test the function with at least two different data sets.

  1. Input: A text string, using prompt; Output: either legal name or Illegal name, depending on whether the input string fits the required format, which is: Last name, first name, middle initial where neither of the names can have more than 15 characters.
  2. Input: A text string, using prompt; Output: The words of the input text, in alphabetical order
  3. Function: tst_name; Parameter: a string; Returns: true if the given string has the form: string1, string2, letter where both strings must be all lowercase letters except the first letter, and letter must be uppercase; false otherwise.
  4. Use the function developed in Exercise 3 to validate a form with a text field that captures the user's name when the user presses the submit button. The form should not submit data if the name is not in the correct format. Use the example given in the session as a template.
  5. Repeat exercise 4 using the built-in HTML5 pattern attribute to validate the name as defined in Exercise 3.

More Homework Exercises

What's Next?

1)
See http://www.cuneytyilmaz.com/prog/jrx/ for original version.
2)
We won't have time for them all, but you can look at them yourself later
3)
form's onsubmit event
eg-259/lecture7.txt · Last modified: 2013/02/19 12:29 by eechris