Archive

Archive for the ‘Graduate School’ Category

Presenting tOSU Web Server – An open source web server

March 20th, 2010 Frank No comments

I’ve just finished my Winter 2010 term for my graduate degree. I took two classes this term, SE-450 and CSC-435. Both classes were great, but taking them concurrently was not a great idea. Nevertheless, I have something to share which is ultimately a derivative of the two classes. One of the project assignments in CSC-435 – Distributed Systems I – was to create a web server. We were given the basics of how a web server and client works, but then left to our own devices to gather the HTTP response codes and other such information. My intention is to share a good portion of the basics here but then also my web server (slightly modified for this site).

Background

Web servers (and browsers) work on top of basic sockets. While this entry isn’t going to be a comprehensive introduction to the networking technologies involved, one area that is key to the web server is the idea of sockets. A socket is defined as a communication channel in which two programs can communicate. The communication takes place over ports.

Now, we need to understand how to use a Socket in Java which is the implementation language for the tOSU Web Server. Java (SE 6) has two implementations for a Socket. One is the ServerSocket and the other is a Socket. The former, waits for incoming socket connections. Essentially, it becomes the server. The socket, on the other hand, is a incoming connection; it becomes the communication channel between the client and the server.

The high level idea of creating a web server is to create a ServerSocket instances and wait for incoming connections. Assuming it receives a properly formatted request for a HTTP server, we handle the request and return data — web pages — to the client. The question is, what constitutes a valid HTTP request.

The HTTP Request

HTTP is nothing more than a protocol. We’ve all heard this, I’m not sure we all know what this means. A protocol is merely a set of rules. I don’t believe that a protocol is anything more or anything less.  The HTTP protocol is actually quite comprehensive but creating the tOSU-WebServer has taught me that we do not need to implement the entire protocol for a  pedagogical web server application. We simply need to provide the basics, perhaps a little more, and it’ll work. This is what my web server represents.

HttpFox results for Cat.html

HttpFox results for Cat.html

I learned about the protocol in two ways, neither had to do with reading the actual published documentation. I utilized a Firefox Plug in called HttpFox to review the server/client communication between an existing web server (the Apache server for this site) serving a simple HTML page and I created (as an assignment) a “Listener” / echo program. The listener program is built-in (as a switch) to the tOSU-WebServer. I’ll cover utilizing this in the following sections. The screen capture on the right is my results for retrieving a HTML file called “cat.html (click on the image to zoom-in).

The top row is a single request; if you enable Httpfox for a request on this site, you’ll notice that several requests are made. Each resource (html, css, images) become a request. The left side is the request header (for the selected request) or what Firefox sent to the web server. The right side is what the web server responded to Firefox. Our web server implementation must accept and read in the request header, process it, and along with the html page / data return the response header to the client.

As I write that, it sounds like a lot but it really isn’t hard to do. There are only a few required items on each side. The important line in the request header (from the client) is the “(Request-Line)” and on the client side the “(Status-Line)”.  The request-line is what the browser is requesting — the file. The status line is the response. You can view is a list of common status codes on Wikipedia but again, only a small subset is pertinent to the implementation of a simple web server.

Headers as Implemented

The headers that tOSU-WebServer must read and generate is quite straight forward.

The line we must process from the client browser is the request line which looks like GET /overview-summary.html HTTP/1.1.  The GET indicates that the browser wants to get a file, the /overview-summary.html is the file we want and the HTTP/1.1 is the protocol the client is using — the format of the request. This single line is the only relevant line we are interested in. The client sends more but tOSU-WebServer ignores the remaining items.

The web server must respond with a few more lines but it still is not extensive. The first line, as previously mentioned, is the status line. This is formatted as HTTP/1.1 200 OK. The 200 and OK can be various numbers and statues, but the idea holds. The HTTP/1.1 is the response protocol.

The next two lines is Content-Length: 500 (where “500″ is the size in bytes) and Content-Type: text/html where text/html is the appropriate MIME type.

Each one of these must be terminated with a carriage-return and then newline. In Java, this is delimited by \r\n.  Finally, to indicate that headers are complete, we send \r\n\r\n. The browser would then expect the content.

tOSU WebServer

First, where is the code? I’ve placed the code on BitBucket. The BitBucket project path is: http://bitbucket.org/frankv01/tosu-webserver/overview BitBucket provides a software project with various services, one of which is a Mercurial based repository. The site also has the option to retrieve archived versions of the tip of the repository. This option currently exists on the page above on the far right called “get source“.

The command to clone the repository (full history) is:

hg clone http://bitbucket.org/frankv01/tosu-webserver/ tOSU-WebServer

This will give you a repository clone where you issued the command, called tOSU-WebServer; this is essentially the project’s name (for lack of a better one). Note: This article is being written against the tag “v0.5.x”. Once you clone the repository, run “hg update v0.5.1

While I’d love to review everything, including the architecture, this inaugural post can only include so much information. I figure the first aspect is understanding the overall architecture enough to looking though the code. Then we’ll take a look at the specific code segments that process incoming requests.

Architecture

As I stated at the start of this post, this program was developed while I attended an object-orientated architecture course and a distributed computing course. This combination made this program take on an architecture that is likely more complex than it needed to be, but is strongly OO in nature. This design led to a large number of classes but each with a finite task to accomplish. I beleive that this will make it easier to understand… once you can follow the design.  Please feel free to ask questions. I learn by teaching and I can only improve articles like this by receiving questions.

Package Layout & Design

I’ve used packages to organize the program; understanding these should make it easier to find what you might be looking for.

  • com.theOpenSourceU.webserver.arguments : A package to handle command-line argument/flag processing and parsing.
  • com.theOpenSourceU.webserver.debugutil : A package to handle text based debug and error messages.
  • com.theOpenSourceU.webserver.http : The core of the program, this contains the code that ultimately is the web server.
  • com.theOpenSourceU.webserver.ui : Contains the main executing class; the program to launch and manage the various pieces of the web server.

What main does

Since the goal is to understand how the program works, lets review what the program actually does. The file we are reviewing is the MyWebServer.java (in the ui package), which contains a class called (surprise) MyWebServer.

What the program basically does is:

  1. Process any given arguments, setting class level fields.
  2. Get a new instances of ServerSocket. When we construct the new instances, we give it the port (_port) and the queue size. Both values will be covered later.
  3. Next, we call servsock.accept() which is a blocking call; it will block the program until a connection is received.
  4. Once a connection is received (via the port) the program will receive an instance of that and stash it in sock.
  5. Depending on the server mode, either a new Server Worker will be created and started or a new listener. Each of these are different modes and are set via the arguments. Note that each one of these are a derivative of a Thread and hence we are starting new threads upon calling start()
  6. Go back to 3 to wait for another connection.

This is the gist of the programs flow. The details of handling the request are handled in the http package. We’ll review this package in the next section.

Implementation

The http package contains various classes, only a small subset is actually public.  We’ll review a few classes in the next few paragraphs however, the best way to review all of the classes is to generate the javadoc files and review those.

In the earlier section, we saw the class WorkerFactory. This is a class to generate appropriate instances of the two works contained in the package. A worker is a class derived from Thread that performs some task, in our case handle http requests. The two concrete classes that can be generated are HttpWorker and MyListener.

The HttpWorker class is the class of interest here. This class becomes the worker thread that handles the request sent to the server. Another way to put this is that this is what the client-browser is actually talking to, and not the MyWebServer instances. This is how the web server can handle several requests at once.

Since we are on it, why don’t we continue on from the HttpWorker class. The class extends Thread and we are implementing the run method. Let’s not go in to detail, but this is the code that processes the request and ultimately provides the content to sends back to the client browser.  Inside this method, we reference another factory — HttpContentFactory. This factory can provide implementations of HttpContent for a variety of files types, including css, html and a made up dynamic page. (Images weren’t working Status)

The contrast to HttpContent is the HttpClientHeaders instance. This represents what becomes the server response headers. This web server only supports a few codes (recall, not all need to be supported). The class HttpClientHeadersImpl provides support a 404 error, 500 error (internal server error) and 200 success status. The implementation details are not relevant to this initial introduction but it is important to know that the HttpWorker class can’t complete it’s job without an instance of this to report status (success/error) to the client.

From here, HttpWorker renders these two instances and sends the contents back to the original request.

More to Come…

In the details, the program does a lot more than what I’ve outlines in the last few sections. However, I suspect that wrapping your hands around these first sections can make reviewing the source code less intimidating.

If you have any questions or feel that the article above can be improved, please let me know via the comments. I hope to post more educational articles on tOSU-WebServer and I would greatly appropriate direction. If you are interested in a particular section, please let me know (again, via the comments). Oh, and don’t forget to follow the project on BitBucket.

Violet UML Editor

March 2nd, 2010 Frank No comments

I use UML to do quick brain storming and when exploring software. While I’ve not posted many write ups here (grad-school!),  I generally don’t want to invest a lot of time in my UML diagrams and only sometimes do I even want to save them.

Often times, especially lately, I’ve been drawing on a whiteboard that I keep in my office.  I find this to be efficient (even over paper because I’d end up throwing it away).

OO Design & Patterns, 2nd Ed Book CoverIn one of my current grad-school classes, we are using “Object-Oriented Design & Patterns” by Cay S. Horstmann as the class text book. I’ve enjoyed the book and it provides some decent examples. I bring the book up because apparently the author of the book created a UML package called Violet UML. I’ve found this to be the best software based UML brain-storming software I’ve ever found. Here are my reasons:

  • It loads quickly
  • I can efficiently draw diagrams without warnings or complex menus to navigate though.
  • The lack of UML rules enforcement means that I can draw partial diagrams; diagrams that mean nothing out of context.
  • It’s open source
  • So far, its more stable / reliable to ArgoUML

If you are looking for a UML package, I must recommend this. I searched and searched for a UML package a while back and I never came up with this. I looked at everything, no matter what and still never found it. So, if you like it, please spread the word (via your own blog, twitter, facebook, etc). I think it is well done software and worth some attention.

http://violet.sourceforge.net/

ORM patterns which are ‘Invisible to the eye’

February 10th, 2010 Frank No comments

The more I work with design patterns*, the more I come to respect them as a design tool. For grad-school, I’m writing a DVCS and I needed some information on ORM patterns. While I’m not sure if this is a real term, I Googled it and found a great article on Invisible to the eye.

The article can be found at: http://giorgiosironi.blogspot.com/2009/08/10-orm-patterns-components-of-object.html

For my project, I’m mainly interested in the Data Mapper and the Table Data Gateway. Both are patterns from Martin Folwer’s book Patterns of Enterprise Application Architecture.

While I’ve not read the book yet, I think I might… After classes…. Anyway, I mostly wanted to share on the 10 ROM patterns listed at Invisible to the eye.

*If  you’d like a good intro to design patterns, I love Head First Design Patterns

Invisible to the eye

Status – Site Still Alive

January 12th, 2010 Frank No comments

I wanted to post a bit of a status update. There hasn’t been any new writing on this site for a while and I want to apologize about that — especially since I stopped right in the middle of my Firefox research.

I do intend to continue the pursuit; however, it has taken a backseat to my Graduate studies for the time being. I don’t want to go in to too much detail, but this term I’ve enrolled in two classes instead of just one.

Hang on to the RSS feed, I’ll be back to it shortly.

Categories: Graduate School, Random Tags:

Back to School?

June 10th, 2009 Frank No comments

Since I graduated from my undergraduate program, I’ve been looking forward to enrolling in a masters program. I’ve put this desire far behind other things that have come up and I’ve decided that this year is the year.

I began by looking at different masters programs in my area. Below are some of the schools and programs that I’ve looked at. Ultimately, I’ve selected DePaul. I’ve know/known of a lot of people who have gone there and while they might not be the best they seem very good. I’m not MIT material, so I think that very good will suite me well.

DePaul [Selected]

There is no one reason that I selected DePaul. I was looking around the CDM building and it felt like a good school. I had some interaction with a professor and that interaction was positive and they fit most of my criteria that I was looking for in a graduate school. It wasn’t a perfect fit but I don’t think that any school would be…

I’ve selected the Software Engineering degree because I think of my self more as a software engineer than a computer scientist. I’ve posted both links for those that are interested.

http://www.cdm.depaul.edu/academics/Pages/MSinSoftwareEngineering.aspx

http://www.cdm.depaul.edu/academics/Pages/MSinComputerScience.aspx

Loyola

Loyola is further from my home than DePaul is; this was the biggest reason that Loyola lost out to DePaul.  But furthermore, they didn’t have a M.S. of Software Engineering degree. They have similar degrees but the Software Engineering title became important to me.

http://www.luc.edu/cs/_academics_graduate.shtml

Illinois Institute Of Technology

The campus is too far from my home. It’d be difficult to attend classes and continue working.

http://www.iit.edu/graduate_admission/programs/areas_of_study/telecom_software_engineering.shtml

Northwestern

Northwestern isn’t a likely candidate. They were an early favorite but they don’t seem to put much effort in their science degrees… They are mostly known for their business programs… Their MBA program is one of the best, according to many…

http://www.scs.northwestern.edu/grad/mscis/

Roosevelt

Roosevelt is not likely for me because they are more of a business school then a technical or general university.

http://cs.roosevelt.edu/academics/compsci/degrees/csms.html

Stack Overflow

Yes, I know stack Overflow isn’t a university. I posted some questions to gather information from the community… I wanted to share the links…

http://stackoverflow.com/questions/808546/grad-school-for-compsci-and-or-software-engineering

http://stackoverflow.com/questions/49054/computer-science-versus-software-engineering-which