~~SLIDESHOW~~ ====== Text Processing with Regular Expressions ====== **Contact Hour 10**: To be discussed on Tuesday 19th February, 2013. **Lecturer**: [[C.P.Jobling@Swansea.ac.uk|Dr Chris P. Jobling]]. ===== Text Processing with Regular Expressions ===== * We conclude our review of the Basics of JavaScript with a discussion of text manipulation with regular expressions. * Regular expressions is a key idea that we shall return to again in the context of server-side scripting. ---- The slides and notes for this lecture are based on Chapter 4 of Robert W. Sebasta, //Programming the World-Wide Web//, 3rd Edition, Addison Wesley, 2006. There is a good discussion of JavaScript regular expressions in Sections 7.2 and 9.9 of the Chris Bates, //Web Programming: Building Internet Applications//, 3rd Edition, John Wiley, 2006. A good website that intruduces this topic is [[http://www.regular-expressions.info/|Regular-Expressions.info]]. ===== Contents of this Session ===== Text processing with //regular expressions// * [[eg-259:lecture7#text_manipulation_in_javascript|Pattern Matching with Regular Expressions]] * [[eg-259:lecture7#using_regular_expressions_in_javaScript|Using Regular Expressions in JavaScript]] * [[eg-259:lecture7#form_validation|Form Validation]] * [[eg-259:lecture7#html5_pattern_attribute|HTML5 Pattern Attribute]] * [[eg-259:lecture7#debugging_javascript|Debugging JavaScript]] ===== Learning Outcomes ==== //At the end of this lecture you should be able to answer these questions//: * What is a //character class// in a pattern? * What are the //predefined character classes//, and what do they mean? * What are the symbolic quantifiers, and what do they mean? * Describe the two end-of-line anchors. ===== Learning Outcomes (2) ==== //At the end of this lecture you should be able to answer these questions//: * What does the ''i'' pattern modifier do? * What exactly does the ''String'' method ''replace'' do? * What exactly does the ''String'' method ''match'' do? ===== Text manipulation in JavaScript ===== * Text manipulation is a very important feature of many Web Applications. * Some examples: * search for a string in a document or form field * search for a string and replace it with another * validate the text fields of a form * Can use string manipulation, but it tends to be restrictive and inefficient * Better to use a technique called //pattern matching// ===== Pattern Matching ===== * JavaScript provides two ways to do pattern matching: * Using ''RegExp'' objects * Using methods on ''String'' objects * A powerful pattern matcher called a //regular expression// matcher is provided * This is our first exposure to regular expressions but it is an important topic in its own right. ---- **A Little History** Regular expression pattern matching is a technique that was first developed for the text editors //ed// and //sed// which were (and still are) part of the Unix system. The ideas were extended to the program //awk// and eventually reached their full potential in the Perl programming language. Perl regular expressions are the inspiration for JavaScript's and a variation of the Perl form of regular expression are to be found in many other contexts such as the text editors //vi// and //emacs//, most scripting languages, and even in the standard Java library. If you are interested, [[wp>Regex|Regular expression]] has more to say on the subject. ===== Demo ===== * We will be using a version of [[http://regexpal.com/|RegexPal]] to illustrate regular expressions. * There's another slightly more powerful one, a [[http://www.cuneytyilmaz.com/prog/jrx/|clever piece]] of JavaScript ((See http://www.cuneytyilmaz.com/prog/jrx/ for original version.)) magic was developed by Cüneyt Yýlmaz * You can use either of these tools to play with regular expressions. * Here's the [[http://localhost:4567/eg-259/examples/lecture7/cheat.html|set of examples]] that I will work through((We won't have time for them all, but you can look at them yourself later)). ---- The tools illustrated are both based on the JavaScript regular expression engine which itself is based on the Perl Common Regular Expression library that is used in many modern scripting languages, programmer's editors and even the Apache web server. There is a version of RegexPal on the Blackboard site that you can download and which gives you access to the global search as a switchable option. ===== Simple patterns: characters ===== Normal characters (match themselves) * E.g: ''/ee/'' matches //need//, //greed//, //weed//, but **not** //wed// or //dead// ===== Simple patterns: meta-characters ===== Meta-characters have special meanings in patterns -- they do not match themselves: \ | ( ) [ ] { } ^ $ * + ? . * A meta-character is treated as a normal character if it is escaped (preceded with a backslash ''\'') * period ( ''.'' ) is a special meta-character -- it matches any character except //newline// * ''/c.t/'' matches //Ascot//, //cat//, //cut// and //crt// but **not** //act// or //cart//. ===== Character classes ===== * Put a sequence of characters in brackets, and it defines a set of characters, any one of which matches: * ''[abcd]'' matches any of letters 'a', 'b', 'c', or 'd'. * Dashes can be used to specify spans of characters in a class: * ''[a-z]'' matches any lower-case letter (in the English alphabet). * A caret at the left end of a class definition means match anything **but** the characters in the class: * ''[^0-9]'' matches any character that is not a decimal digit. ===== Character class abbreviations ===== ^ Abbr. ^ Equiv. ^ Pattern Matches ^ | ''\d'' | ''[0-9]'' | a digit | | ''\D'' | ''[^0-9]'' | not a digit | | ''\w'' | ''[A-Za-z_0-9]'' | a word character | | ''\W'' | ''[^A-Za-z_0-9]'' | not a word character | | ''\s'' | ''[ \r\t\n\f]'' | a whitespace character | | ''\S'' | ''[^ \r\t\n\f]'' | not a whitespace character | (JavaScript) variables in patterns are interpolated ===== Quantifiers ===== Quantifiers in braces ^ Quantifier ^ Meaning ^ | ''{n}'' | //exactly// n repetitions | | ''{m,}'' | //at least// m repetitions | | ''{m, n}'' | //at least// m but //not more than// n repetitions | ===== Other Quantifiers ===== Just abbreviations for the most commonly used quantifiers * ''*'' means //zero or more repetitions// e.g., ''\d*'' means zero or more digits * ''+'' means //one or more repetitions// e.g., ''\d+'' means one or more digits * ''?'' Means //zero or one// e.g., ''\d?'' means zero or one digit ===== Anchors ===== The pattern can be forced to match only at the start with ''^'' or at the end with ''$'' * //Example 1//: ''/^Lee/'' //matches// "Lee Ann" but not "Mary Lee Ann" * //Example 2//: ''/Lee Ann$/'' matches "Mary Lee Ann", but not "Mary Lee Ann is nice" * The anchor operators (''^'' and ''$'') do not match characters in the string -- they match positions, at the beginning or end ===== Pattern modifiers ===== The ''i'' modifier tells the matcher to ignore the case of letters * Example: ''/oak/i'' matches "OAK" and "Oak" The ''x'' modifier tells the matcher to ignore whitespace in the pattern (allows comments in patterns) ===== Using Regular Expressions in JavaScript ===== In JavaScript we can use regular expressions to: * Search for text patterns in a string * Replace patterns in a string * Split a string based on some defined delimiter pattern * Match patterns found in a string and do something with each match The most common use of these is in form validation. ===== The search function ===== ''search (pattern)'' returns the position in the object string of the pattern (position is relative to zero); * returns -1 if it fails var str = "Gluckenheimer"; var position = str.search(/n/); /* position is now 6 */ ===== The replace function ===== ''replace(pattern, string)'' * Finds a substring that matches the pattern and replaces it with the string (''g'' modifier can be used) * ''g'' modifier means "//replace globally//", all matched strings will be replaced. * Matched substrings are returned in special variables ''$1'', ''$2'', etc. ===== The replace function: example ===== var str = "Some rabbits are rabid"; str.replace(/rab/g, "tim"); // str is now "Some timbits are timid" // $1 and $2 are both set to "rab" ===== The split function ===== ''split(parameter)'' * Example: var str = "grapes:apples:oranges" var fruit = str.split(/:/) // fruit is set to ["grapes", "apples", "oranges"] * ''%%":"%%'' and ''/:/'' are equivalent ===== The match function ===== ''match(pattern)'' * The most general pattern-matching method * Returns an array of results of the pattern-matching operation * With the ''g'' modifier, it returns an array of all of the substrings that matched * Without the ''g'' modifier, first element of the returned array has the matched substring, the other elements have the values of ''$1'', ''...'' ===== The match function: example ===== var str = "My 3 kings beat your 2 aces"; var matches = str.match(/[ab]/g); //matches is set to ["b", "a", "a"] ===== Form Validation ===== Common use of JavaScript is to check validity of user inputs on forms * avoids a trip to server that would result in an error page * error handling is kept local * usually triggered by //submission// button((form's ''onsubmit'' event)) * error message generated locally by writing into document object. * This example defines a function that could be used in a registration page to check that a phone number is valid (using US conventions!) HTML5 Markup: [[http://jsfiddle.net/cpjobling/rwcce/2/|fiddle with it]] [[http://localhost:4567/eg-259/examples/lecture7/forms_check.html|forms_check.html]] Script: [[http://localhost:4567/eg-259/examples/lecture7/forms_check.js|forms_check.js]] ---- Markup: Phone number tester

Phone Number Tester

An example of the use of Regular Expressions for form validation. View source to see the HTML code and use your browser's development tools to view the JavaScript.

Phone numbers should match the pattern 3 digits followed by a dash followed by four digits. The regular expression for this is /\d{3}-\d{4}/.

The example uses the DOM 0 event model which will be discussed in the next session.

The script (validation function ''validate()'' will be explained later) /* Function tst_phone_num Parameter: A string Result: Returns true if the parameter has the form of a legal seven-digit phone number (3 digits, a dash, 4 digits) */ function test_phone_number(num) { // Use a simple pattern to check the number of digits and the dash var ok = num.search(/\d{3}-\d{4}/); if (ok === 0) { return true; } else { return false; } }// end of function tst_phone_num /* Actual form validation. Called onclick */ var validate = function() { var phoneNumber = document.getElementById("phone_number"); if (test_phone_number(phoneNumber.value)) { return true; } else { alert("Phone number is invalid. Please use format ddd-dddd."); // prevent submission return false; } }; Test code for ''tst_phone_num'' // Test test_phone_number var test_phone_number_test = function() { var tests = ["444-5432", "444-r432", "44-1234"]; for (i = 0; i < tests.length; i++) { var test = test_phone_number(tests[i]); if (test) { console.log(tests[i] + " is a legal phone number
"); } else { console.error("Error in test_phone_number: " + tests[i] + " is not a legal phone number
"); } } };
===== HTML5 Pattern Attribute ===== A regular expression validator that is built-in to HTML5 * New ''pattern'' attribute can be used on some modern browsers * Pattern text is actually evaluated as the JavaScript expression ''/^//pattern//$/'' by the JavaScript engine. * You may need to provide a JavaScript fallback for older browsers (see later) * E.g. ===== HTML5 Version of the Phone Number Validator ===== * [[http://localhost:4567/eg-259/examples/lecture7/forms_check_html5.html|forms_check_html5.html]] ---- Phone number tester (HTML5)

Phone Number Tester (HTML5)

An example of the use of Regular Expressions for form validation. View source to see the code.

Phone numbers should match the pattern 3 digits followed by a dash followed by four digits. The regular expression for this is /\d{3}-\d{4}/.

HTML5 provides a new form attribute pattern whose value is a regular expression (without the slashes). When supported, this can be used instead of JavaScript for form validation.

In production, you would normally need to provide a JavaScript fallback for browsers that don't yet support the pattern attribute.

===== Debugging JavaScript: IE6+ ===== * Select ''Internet Options'' from the ''Tools'' menu * Choose the ''Advanced'' tab * Uncheck the ''Disable script debugging'' box * Check the ''Display a notification about every script error'' box * Now, a script error causes a small window to be opened with an explanation of the error ===== Debugging JavaScript: IE6+ (continued) ===== {{eg-259:l7-ie_script_error.png|Script error in Internet Explorer}} ===== Debugging JavaScript: Firefox ===== * Select ''Tools -> JavaScript Console'' * A small window appears to display script errors * Remember to clear the console after correcting an error message -- avoids confusion {{eg-259:l7-firefox_script_error.png|Script error in Firefox}} ===== Debugging JavaScript (continued) ===== * If you need to trace the execution of your scripts you need more than a JavaScript console * Both IE6 and Firefox have JavaScript Debuggers * In IE6 the debugger is part of the browser. See http://www.microsoft.com/scripting/debugger/default.htm for documentation. * For Firefox (and other Mozilla-based browsers, including Netscape), the JavaScript debugger is called //Venkman// and is an optional plug in available at http://www.mozilla.org/projects/venkman/. ===== Debugging with Firebug ===== * Firefox only! * [[http://www.getfirebug.com/|Firebug plugin]] provides sophisticated web page analysis tools including JavaScript debugging facilities and a console * [[http://www.getfirebug.com/lite.html|Firebug Lite]] provides (limited) facilities for IE and other browsers. * Demo ===== Debugging in WebKit Browsers ===== * Apple Safari * Google Chrome * Have built-in development tools ===== Summary of This Lecture ===== Text processing with //regular expressions// * [[eg-259:lecture7#text_manipulation_in_javascript|Pattern Matching with Regular Expressions]] * [[eg-259:lecture7#using_regular_expressions_in_javaScript|Using Regular Expressions in JavaScript]] * [[eg-259:lecture7#form_validation|Form Validation]] * [[eg-259:lecture7#html5_pattern_attribute|HTML5 Pattern Attribute]] * [[eg-259:lecture7#debugging_javascript|Debugging JavaScript]] ===== Learning Outcomes ==== //At the end of this lecture you should be able to answer these questions//: * What is a //character class// in a pattern? * What are the //predefined character classes//, and what do they mean? * What are the symbolic quantifiers, and what do they mean? * Describe the two end-of-line anchors. ===== Learning Outcomes (2) ==== //At the end of this lecture you should be able to answer these questions//: * What does the ''i'' pattern modifier do? * What exactly does the ''String'' method ''replace'' do? * What exactly does the ''String'' method ''match'' do? ===== Exercises ===== Write, test and debug (if necessary) HTML files that include JavaScript scripts for the following problems. When required to write functions, you must include a script to test the function with at least two different data sets. - //Input//: A text string, using ''prompt''; //Output//: either legal name or Illegal name, depending on whether the input string fits the required format, which is: //Last name, first name, middle initial// where neither of the names can have more than 15 characters. - //Input//: A text string, using ''prompt''; //Output//: The words of the input text, in alphabetical order - //Function//: ''tst_name''; //Parameter//: a string; //Returns//: ''true'' if the given string has the form: ''string1, string2, letter'' where both strings must be all lowercase letters except the first letter, and //letter// must be uppercase; ''false'' otherwise. - Use the function developed in Exercise 3 to validate a form with a text field that captures the user's name when the user presses the submit button. The form should not submit data if the name is not in the correct format. Use the example given in the session as a template. - Repeat exercise 4 using the built-in HTML5 ''pattern'' attribute to validate the name as defined in Exercise 3. ===== More Homework Exercises ===== * Further basic JavaScript exercises, taken from Chapter 4 of Chris Bates, //Web Programming: Building Internet Applications//, 3rd Edition, John Wiley, 2006., are available. See the [[eg-259:homework:9#additional_exercises|additional exercises]] for details. * Watch the two instructional videos on [[http://e-texteditor.com/blog/2007/regular_expressions_tutorial|Regular Expressions]] and [[http://www.digitalmediaminute.com/screencast/firebug-js/|Debugging in Firebug]]. * Work through the //[[eg-259:homework:9#practical_exercises|Practical Exercises]]// ===== What's Next? ===== Manipulating web documents through the Document Object Model (DOM) and the JavaScript event model. * [[eg-259:lecture8#javascript_execution_environment|JavaScript Execution Environment]] * [[eg-259:lecture8#the_document_object_model|The Document Object Model]] * [[eg-259:lecture8#element_access_in_javascript|Element Access in JavaScript]] * [[eg-259:lecture8#events_and_event_handling|Events and Event Handling]] * [[eg-259:lecture8#handling_events_from_body_elements|Handling Events from Body Elements]] * [[eg-259:lecture8#handling_events_from_button_elements|Handling Events from Button Elements]] [[eg-259:lecture6|Previous Lecture]] | [[eg-259:home]] | [[eg-259:lecture8|Next Lecture]]