~~SLIDESHOW~~
Fundamentals of Web Applications Technology
Background for Contact Hour 2: To be discussed on Tuesday 31st January, 2012.
Lecturer: Dr Chris P. Jobling.
Setting the scene for the EG-259 Web Applications Technology module.
Fundamentals of Web Applications Technology
Setting the scene for the EG-259 Web Applications Technology module.
The slides used in this lecture are based on:
Chapter 1 of Robert W. Sebasta, Programming the World-Wide Web, 3rd Edition, Addison Wesley, 2006.
Chapter 2 of James F. Kurose and Keith W. Ross, Computer Networking: A Top-down Approach
Featuring the Internet, Addison-Wesley, 2005.
Additional material from Jennifer Niederst Robbins, Web Design in a Nutshell, 3rd Ed. O'Reilly, 2006.
Contents of this Lecture
Learning Outcomes for this Lecture (1)
At the end of this lecture you should be able to answer this selection of lecture review questions:
What protocol is used by all computer connections to the Internet?
What is the task of a domain name server?
In what common situation is the document returned by a Web server created after the request is received?
What is meant by the terms document root, server root, virtual host when applied to a web server?
Learning Outcomes for this Lecture (2)
At the end of this lecture you should be able to answer this selection of lecture review questions:
What is the purpose of a MIME type specification in a request/response transaction between a browser and a server?
Prior to HTTP 1.1, how long were connections between browsers and servers normally maintained?
What is the purpose of the Common Gateway Interface?
Where is the code for JavaScript, Java Applet, Java Servlet, Perl CGI Script, and PHP script interpreted?
History of the Internet (Video)
Origins of the Internet
ARPAnet – late 1960s and early 1970s
BITNET, CS net – late 1970s & early 1980s
NSF net – 1986
NSF net eventually became known as the Internet
Notes
ARPAnet:
BITNET, CS net:
NSF net:
Originally for non-DOD funded places
Initially connected five supercomputer centers
By 1990, it had replaced ARPAnet for non-military uses
Soon became the network for all (by the early 1990s)
What is the Internet?
A world-wide network of computer networks
At the lowest level, since 1982, all connections use the Internet Protocol (IP)
IP hides the differences among devices connected to the Internet
Applications and Application-Layer Protocols
Network applications: some jargon
Application-layer protocol defines
Types of messages exchanged, e.g., request and response messages
Syntax of message types: what fields in messages and how fields are delineated
Semantics of the fields, ie, meaning of information in fields
Rules for when and how processes send & respond to messages
Public-domain protocols:
Proprietary protocols:
Client-Server Paradigm
Client:
initiates contact with server (“speaks first”)
typically requests service from server,
Web: client implemented in browser
Processes communicating across network
Addressing Processes
For a process to receive messages, it must have a globally unique identifier
Every node has a unique 32-bit IP address
Every process running on the host has a unique port number
Process identifier includes both the IP address and port number associated with the process on the host
—-
Notes
Organizations are assigned groups of IP addresses for their computers
The new standard, IPv6, uses 128 bits (1998) for host addresses.
Domain names
Form: host-name.domain-names
Fully qualified domain name
DNS servers
The last domain is controlled by naming authorities associated with the top-level ISPs.
Examples are com
, org
, edu
, gov
– assigned (mostly to) US commercial institutions, not-for-profit organizations, educational institutions, and government institutions respectively;
Examples of Geographical domains are uk
, fr
, ie
for countries in Europe, and us
, za
, ca
for other locations.
The naming authorities assign lower level domain names to ISPs or to large institutions.
Examples of ISP domains are ac.uk
(the UK Joint Academic Network), ntlworld.com
(a UK cable broadband supplier), blogspot.com
(Google's blog hoster).
Examples of institutional domains are swan.ac.uk
(Swansea University), microsoft.com
, swansea.gov.uk
(City and County of Swansea).
Within an institutional domain, the ISP or naming authority will usually assign a group of IP addresses that can be freely used. The institution acts as its own naming authority and is free to assign host names has it wishes (in fact several host names may be assigned to a single IP address)
An institutional Domain Naming Service (
DNS) server (called the
Authoritative Name Server) is used to map local host names to IP addresses.
Transport Services
What transport service does an application need?
Data loss
Timing
Bandwidth
Data loss:
some applications (e.g., audio) can tolerate some loss
other applications (e.g., file transfer, telnet) require 100% reliable data transfer
Timing:
Bandwidth:
some apps (e.g., multimedia) require minimum amount of bandwidth to be “effective”
other apps (“elastic apps”) make use of whatever bandwidth they get
Internet transport protocols services: TCP
Connection-oriented: setup required between client and server processes
Reliable transport between sending and receiving process
Flow control: sender won't overwhelm receiver
Congestion control: throttle sender when network overloaded
Does not provide: timing, minimum bandwidth guarantees
Web Uses TCP
Internet transport protocols services: UDP
Unreliable data transfer between sending and receiving process
Does not provide: connection setup, reliability, flow control, congestion control, timing, or bandwidth guarantee
-
Origins of the Web
Problem
By the mid-1980s, several different protocols had been invented and were being used on the Internet, all with different user interfaces (Telnet,
FTP, Usenet, email, Gopher)
Possible Solution
The World Wide Web
Tim Berners-Lee at CERN proposed the Web in 1989
Document form: hypertext
Objects? Pages? Documents? Resources?
Hypermedia—more than just text—images, sound, etc.
Web or Internet?
Notes
The original purpose of the world-wide web was to allow scientists to have access to many databases of scientific work through their own computers.
Objects? Pages? Documents? Resources? We'll call them documents.
Web or Internet? The Web uses the application protocol, HTTP, that runs on the Internet – there are many others (telnet, ftp, email, etc.)
Web Browsers
Mosaic – NCSA (Univ. of Illinois), in early 1993
Browsers are clients – always initiate, servers react (although sometimes servers require responses)
Most requests are for existing documents, using HyperText Transfer Protocol (HTTP)
But some requests are for program execution, with the output being returned as a document
Notes
Mosaic was the second browser to use a GUI (the first, a graphical browser that Time Berners-Lee developed at Cern, did not have a wide distribution) and led to explosion of Web use. Initially for X-Windows, under UNIX, but was ported to other platforms by late 1993
Web Servers
Provide responses to browser requests, either existing documents or dynamically built documents
Browser-server connection is now maintained through more than one request-response cycle
All communications between browsers and servers use Hypertext Transfer Protocol (HTTP)
Web servers run as background processes in the operating system
Notes
Web servers monitor a communications port on the host, accepting HTTP messages when they appear
All current Web servers came from either
The original from CERN
The second one, from NCSA
Web Server Configuration
Web servers have two main directories conventionally called the Document Root and the Server Root
Document root is accessed indirectly by clients
Virtual document trees
Virtual hosts
Proxy servers
Web servers now support other Internet protocols
Notes
Server root contains the server system software
Document root contains the servable documents
Virtual documents trees are “aliases” for part of the document tree that is not actually located in the document root.
A
virtual host is a web server which appears to have a different host name from the primary host. Such a host will have its own document root.
DNS is used to map the virtual host name to the IP address of the actual web server. The web server will then recognize the virtual host name from the HTTP request and serve documents from the right place.
Proxy web servers are web servers which can serve documents which are in the document root of another web server. They are often used to reduce the traffic from an institution to and from the Internet. For example, see
UWS Proxy Server.
Web servers now support other Internet protocols:
Modern Web Servers
You are Invited to Update this Page by providing a link to the January 2012 results and updating the tables
Market Share for Top Servers Across All Domains November 1995 - January 2011
January 2011 Data
Data from Netcraft Server Share Statistics (Netcraft Survey January 2011)
Developer | December 2010 | Percent | January 2011 | Percent | Change | Last Year (August 2009) |
Apache | 151,516,152 | 59.35% | 161,591,445 | 59.13% | -0.23 | 46.30% |
Microsoft | 56,723,544 | 22.22% | 57,392,351 | 21.00% | -1.22 | 21.94% |
nginx | 16,910,205 | 6.62% | 20,504,634 | 7.50% | 0.88 | 5.09% |
Google | 14,933,865 | 5.85% | 15,112,532 | 5.53% | -0.32 | 6.29% |
lighttpd | 1,308,935 | 0.51% | 1,866,872 | 0.68% | 0.17 | 0.90% |
September 2009 Data
Data from Netcraft Server Share Statistics (Netcraft Survey August 2009)
Developer | July 2009 | Percent | August 2009 | Percent | Change | Last Year |
Apache | 113,019,868 | 47.17% | 104,611,555 | 46.30% | -0.87 | 49.82% |
Microsoft | 55,918,254 | 23.34% | 49,579,507 | 21.94% | -1.39 | 34.88% |
qq.com | 30,447,369 | 12.71% | 30,278,988 | 13.40% | 0.69 | – |
Google | 14,226,904 | 5.94% | 14,213,976 | 6.29% | 0.35 | 5.94% |
nginx | 10,174,573 | 4.25% | 11,502,109 | 5.09% | 0.84 | – |
lighttpd | 2,942,469 | 0.55% | 2,025,521 | 0.90% | 0.34 | 1.65% |
September 2008 Data
Data from Netcraft Server Share Statistics (Netcraft Survey August 2008)
Developer | July 2008 | Percent | August 2008 | Percent | Change | Last Year |
Apache | 86,845,154 | 49.49% | 88,047,801 | 49.82% | 0.33 | 50.48% |
Microsoft | 62,411,537 | 35.57% | 61,646,837 | 34.88% | -0.69 | 34.94% |
Google | 10,001,763 | 5.70% | 10,502,299 | 5.94% | 0.24 | 4.90% |
lighttpd | 2,942,469 | 1.68% | 2,914,867 | 1.65% | -0.03 | 1.12% |
September 2007 Data
Data from Netcraft Server Share Statistics (September 2007)
Developer | August 2007 | Percent | September 2007 | Percent | Change | Last Year |
Apache | 65,153,417 | 50.96% | 68,228,561 | 50.48% | -0.49 | 65.52% |
Microsoft | 43,861,854 | 34.31% | 47,232,300 | 34.94% | 0.63 | 30.13% |
Google | 5,702,456 | 4.46% | 6,616,713 | 4.90% | 0.43 | - |
Sun | 2,195,495 | 1.72% | 2,212,821 | 1.64% | -0.08 | 0.37% |
lighttpd | 1,500,126 | 1.17% | 1,515,963 | 1.12% | -0.05 | - |
A lot of the change would appear to be due to the growth of blogging and community web sites such as Microsoft Live Spaces. Google used to host the Blogger service on Apache web servers, it now uses its own server. Live Spaces is hosted on the Microsoft's IIS. Google and Microsoft are also busy competing in the ISP space which is fueling a general growth in the number of web sites.
September 2006 Data
Data from Server Share Statistics (September 2006)
Server | August 2006 | Percent | September 2006 | Percent | Change | 2005 |
Apache | 57,906,817 | 62.52 | 59,699,872 | 61.64 | -0.88 | 69.15% |
Microsoft | 27,905,439 | 30.13 | 30,272,249 | 31.26 | -1.13 | 20.36% |
Zeus | 521,619 | 0.56 | 515,670 | 0.53 | -0.03 | 0.82% |
Sun | 344,862 | 0.37 | 345,834 | 0.36 | -0.01 | 2.61% |
Some Important Web Servers
Introducing the Apache Web Server
Still the most popular web server in use today
First web server was built by Tim Berners-Lee at CERN
First really popular web server was developed by NCSA and was available to all.
Apache was originally developed to fix bugs in NCSA Web Server version 1.3 in 1995.
It is open source and is developed and maintained by a group of volunteers.
Runs on most common platforms.
Notes
Apache market share:
September 2006: 62% of the market, 59.6 Million hosts.
September 2007: 50% of the market, 68.2 Million hosts.
August 2008: just less than 50% of the market, 86.9 Million hosts.
August 2009: around 46% of the market, 113 Million hosts.
The Apache Web Server
Official name httpd (HTTP daemon)
Open source, fast, reliable. Latest version 2.2.
Directives (operation control): ServerName
, ServerRoot
, ServerAdmin
, DocumentRoot
, Alias
, Redirect
, DirectoryIndex
, UserDir
-
Apache configuration is usually done by editing configuration files with a text editor
Scheme
Object-address
For the http
protocol, the object-address is: fully qualified domain name/document path
For the file
protocol, only the document path
is needed
URI Object Address
-
URIs cannot include spaces or any of a collection of other special characters (semicolons, colons, …)
The document path
may be abbreviated as a partial path
If the document path
ends with /
, it means it is a directory
Notes
-
Partial paths – the rest of the path is furnished by the server configuration. Partial paths are also known as relative paths.
Directory “object”: often the /
character will be mapped by the web server to a file such as index.html
, index.php
or Default.asp
.
Multipurpose Internet Mail Extensions (MIME)
Originally developed for email
Used to specify to the browser the form of a file returned by the server (attached by the server to the beginning of the document)
Type specifications
Server usually gets type from the requested file names suffix (.html
implies text/html
)
Browser gets the type explicitly from the server
Experimental types: subtype begins with x-
: e.g., video/x-msvideo
Notes
Examples of type specifications: text/plain
, text/html
, image/gif
, image/jpeg
.
Experimental types require the server to send a helper application or plug-in so the browser can deal with the file.
The Hyper Text Transfer Protocol
The protocol used by ALL web communications
Key Facts about HTTP
Response Time Modeling
Calculation of Response Time
HTTP Request Phase
HTTP method domain part of URL HTTP version
Header fields
blank line
Message body
GET /ugcourses/ HTTP/1.1
Notes
Most commonly used request methods:
GET
– Fetch a document
POST
– Execute the document, using the data in body
HEAD
– Fetch just the header of the document
PUT
– Store a new document on the server
DELETE
– Remove a document from the server
Note that although servers support all these requests, browsers only issue GET
and PUT
. This has implications for so-called RESTful web applications which we will explore in a later lecture.
Four categories of header fields: General, request, response and entity.
Accept: text/plain
Accept: text/*
If-Modified-Since: date
telnet www.swan.ac.uk http
GET /ugcourses/ HTTP/1.1
Host: www.swan.ac.uk<Return>
<Return>
Linux users have access to a useful command line tool
cURL that can issue any web server request from the command line and gives full
programmatic access to the header fields. cURL is very useful addition to the web developer's toolbox and is available for windows via
cygwin (see
Partical 0). cURL supports other protocols besides HTTP.
HTTP Response Phase
Status line
Response header fields
blank line
Response body
HTTP version status code explanation
HTTP/1.1 200 OK
Notes
1 => Informational
2 => Success
3 => Redirection
4 => Client error
5 => Server error
The header fields, Content-type
, and Content-length
are required:
Common header response fields:
Content-length: 488
Content-type: text/html
HTTP/1.1 200 OK
Date: Tues, 18 May 2004 16:45:13 GMT
Server: Apache (Red-Hat/Linux)
Last-modified: Tues, 18 May 2004 16:38:38 GMT
Etag: "841fb-4b-3d1a0179"
Accept-ranges: bytes
Content-length: 364
Connection: close
Content-type: text/html, charset=ISO-8859-1
Web Programming
Concerned with three “layers” of the current web standards stack
The Structural Layer
HTML
HTML describes the general form and layout of documents
Tools for creating
HTML documents
HTML editors – make document creation easier
-
Plug-ins
Filters
Notes
An
HTML document is a mix of
content and
controls
HTML editors – make document creation easier by providing shortcuts to typing tag names, spell-checker,
WYSIWYG HTML editors are useful in that developers need not know
HTML to create
HTML documents
Plug-ins are often integrated into tools like word processors, effectively converting them to
WYSIWYG HTML editors
Filters convert documents in other formats to
HTML
XML
A meta-markup language
Used to create a new markup language for a particular purpose or area
Because the tags are designed for a specific area, they can be meaningful
No presentation details
A simple and universal way of representing data of any textual kind
The Presentation Layer
Current
CSS standards are:
Cascading Style Sheets (CSS) Level 1
-
-
Notes
CSS Level 1 has been a Recommendation since 1996 and is now fully supported by current browsers. Level 1 contains rules that control the display of text, margins and borders.
CSS Level 2 is best known for the addition of absolute positioning of web page elements. Level 2 reached Recommendation status in 1998, and the 2.1 revision is currently a Candidate Recommendation. Support for CSS 2.1 is inconsistent in current browsers.
CSS Level 3 builds on level 2 but is modularized to make future expansion simpler and to allow different devices to support logical subsets. This version is still in development but browsers are gradually supporting more and more of the standard, often with the use of browser-specific attributes.
The Behavioural Layer
Document Object Models
Document Object Model (DOM) allows scripts and applications to access and update the content, structure and style of a document.
Achieved by formally naming each part of the document, its attributes, and how the document may be manipulated.
Originally specified incompatibly by each browser, now standardized by the
W3C.
Notes
Document Object Model (DOM) Level 1 (Core) covers core
HTML and XML documents as well as document management and manipulation. See
DOM 1.
DOM Level 2 includes a style sheet object model, making it possible to manipulate style information. See
DOM 2.
Scripting in JavaScript
Notes
Netscape introduced its web scripting language, JavaScript, with its Navigator 2.0 browser. It was originally called “LiveScript” but was later co-branded by Sun, and “Java” was added to the name. Microsoft countered with its own JScript while supporting some level of JavaScript in its Version 3.0 browser. The need for a cross-browser standard was clear!
The W3C is developing a standardized version of JavaScript in coordination with ECMA International, an international industry association dedicated to the standardization of information and communication systems. According to the Mozilla site, Netscape's JavaScript is a superset of the ECMAScript standard scripting language, with only mild differences from the published standard. In general practice, most developers simply refer to “JavaScript” and the standard implementation is implied.
Java
General purpose object-oriented programming language
Based on C++, but simpler and safer
Client-side: Applets – compiled Java programs that are downloaded from a web server and execute in the browser
Server-side:
Servlets – Java programs that execute in the server. Essentially manipulate the HTTP request and return an HTTP response. JSP template mark up allows Java code to be embedded in
HTML pages (similar to Microsoft's ASP)
Sebasta's book covers both applets and servlets, but we will not have time to cover Java in this module.
Perl
Provides server-side computation for
HTML documents, through CGI
Perl is good for CGI programming because:
Access to database systems
Perl is highly platform independent, and has been ported to all common platforms
Perl is not just for CGI
Perl is useful general purpose system administrator's tool. In fact that was Larry Wall's intention, when he originally developed the language. It can be used to manage server configuration and server logs. In many ways it is the ultimate shell programming tool with the advantage that it works on systems other than Unix!
PHP
A server-side scripting language
An alternative to CGI
Similar (in programing style) to JavaScript
Great for form processing and database access through the Web
Ruby on Rails
Ruby is a scripting language
Rails is a web applications development framework written in Ruby
Rails exploits Ruby's features to make web app development as easy as possible
Supports a RESTful development style “out of the box”
Summary of this Lecture
Learning Outcomes for this Lecture (1)
At the end of this session you should be able to answer this selection of review questions:
What protocol is used by all computer connections to the Internet?
What is the task of a domain name server?
In what common situation is the document returned by a Web server created after the request is received?
What is meant by the terms document root, server root, virtual host when applied to a web server?
Learning Outcomes for this Lecture (2)
At the end of this lecture you should be able to answer this selection of lecture review questions:
What is the purpose of a MIME type specification in a request/response transaction between a browser and a server?
Prior to HTTP 1.1, how long were connections between browsers and servers normally maintained?
What is the great advantage of XML over XHTML for describing data?
What is the purpose of the Common Gateway Interface?
Where is the code for JavaScript, Java Applet, Java Servlet, Perl CGI Script, and PHP script interpreted?
After writing up your notes for this lecture, you also should be able to answer all the Review Questions. You should also try the Homework Exercises.
What's Next?