Web 1: Dynamic site basics

So now that I've explained the server setup, let's understand a bit about the theory behind how the site's designed. Many sites in the independent web community are simple static sites, which are made using only HTML, CSS and perhaps JavaScript to present all things.

This site, however, is a dynamic site, meaning that content is produced on the fly programmatically on the server's backend. Let's examine some basic theory about how this is accomplished before we move on into more topics about how individual parts of the site function.

Intro To Three Tier Architecture

Dynamic web applications typically are constructed around a so-called "3 tier" architecture. This architecture consists of 3 components, which are the Presentation Tier, the Logic Tier and the Data Tier. On a single-machine server, these may be represented in different software packages -- for example, in a LAMP like this one, Apache serves the presentation tier content (i.e. fully assembled HTML/CSS/JS), while the raw data for the site, e.g. content like articles, news posts, forum entries, etc, are stored in MySQL. PHP, then, serves as the logic tier, which programmatically formats user input that comes in from the presentation tier to store in the data tier, or else programmatically retrieves output from the data tier in order to present it to the user via the presentation tier.

data/web1/3tier.png

Note that while this site uses a single server, and the tiers are represented merely by different programs, in large enterprise installs, the different tiers could be on entirely different hosts. For instance, your application tier may simply be a public facing reverse proxy that calls into an application server that serves the logic tier's role. That application server then may be making queries to a sharded database server cluster. Large complex setups like this allow for various sorts of load balancing that can allow for a site with a very large number of users to operate on a large scale.

Let's go over some of the individual elements of the stack.

Data Tier

So I'm going to present these tiers from the bottom up, starting from the data tier due to the fact that the data tier is core to a dynamic web application, and if I'm desinging something new, designing the data tier usually has to be the first thing in order to get a working application off the ground. This is because we'll use the data tier to actually store and structure the information our site's ultimately going to present.

As mentioned in Web 0, this site is based on a LAMP stack and runs MySQL for a database solution, and MySQL is a relational database system that stores data in nested datastructures.

More or less, in MySQL, there are databases, which are logical units that contain tables. Each table then contains particular data categories called columns, and a series of unique associated columns makes up a row, which is a particular data entry.

A way we can understand this more analogically is like so:

Database
Represents a comprehensive software system. I.e. your CMS or that forum script you use will often utilize its own database
Table
You can think of tables as representing a discreet "thing" within your software platform. E.g. I have a news poster on the main page. There is a table called "news" to represent newsposts as a "thing".
Column
We can think of columns as representing things that describe the thing represented by the table. So for instance, in my "news" table, each news post may be described with columns representing "headline", "content", "publication date", etc.
Row
A row is a particular set of associated columns -- i.e. it's an actual entry. So continuing from the "news" example, a particular story with a headline "something happened" and a content of "there was a thing that happened that you're reading about now" with a publication date of "1/2/1977" is a row.

Most of the actual work in designing things in the data tier is really with table design. Again, the table should describe a type of "thing", and our job in designing the table is more or less to think about what columns we'll need to describe different things about that kind of thing. Note well that we should only be describing one "thing" with our table as well. I.e. if you're making a game or something, you shouldn't try to describe both your character's stats and your character's inventory in the same table -- the stats and the inventory are discreetly different "things" and should use a different table.

So for instance, this is what my news poster table actually looks like:

+-----------+--------------+------+-----+---------+----------------+
| Field     | Type         | Null | Key | Default | Extra          |
+-----------+--------------+------+-----+---------+----------------+
| id        | int          | NO   | PRI | NULL    | auto_increment |
| headline  | varchar(128) | YES  |     | NULL    |                |
| date      | varchar(128) | YES  |     | NULL    |                |
| timestamp | int          | YES  |     | NULL    |                |
| content   | text         | YES  |     | NULL    |                |
+-----------+--------------+------+-----+---------+----------------+

So we can see that I've thought about what constitutes a newspost and figured it boils down to a headline, content, a date and a timestamp. We can also see that we need to think about what datatype should store the data -- e.g. we can use integers to store the timestamp efficiently, while the headline can be stored as a 128-bit character string ("varchar(128)"), but we need a text blob to store the potentially quite long content of the document.

In addition to merely describing the data, there's two other things we should do in order to make sure the database makes sense:

  • make sure to use a unique "key"
    • in the news table, this is the "id" column. It's set to be non-null and to auto-increment so that these keys remain unique, even if the content, timestamps, headlines, etc are for some reason the same. This allows us to always be able to reference each row uniquely, even if they otherwise contain the same data in their actual fields. If we don't use a key, we can end up with duplicate rows, which causes problems with the relational way that MySQL makes queries.
  • Make sure the columns aren't redundant
    • Note that the news example here is not compliant with this and is thus poorly designed-- timestamp and date represent the same kind of information (when the post was published), so it's a waste of space for us to store both. I should be converting the timestamp to a date in the logic layer in PHP rather than storing redundant data and decreasing the efficiency of the database. This saves storage and makes it so the database system doesn't have to track as many columns for indexing.

Now then, professionals will talk about "database normalization" and "normal forms", which are formal ways to describe properly designed databases. I never learned these concepts until recently, but if you follow the basic ideas laid out above -- that each table should describe only one descreet "thing", that we need a unique key, and that our columns aren't redundant -- then our databases will end up sufficiently normalized for most use cases anyway, even without being formally checked for whether they're in this-or-that normal form. Definitely read up on normalization to get your theory down, but for hobby projects, just follow those three principles as design rules of thumb and you should generally have functional databases.

In any case, once we have some tables built out, we can then start thinking about the next tier -- the logic tier -- and it's role in retrieving and storing information in the data tier.

If you want to look into more about how to do MySQL specifically, you can look into tutorials like this one from W3Schools.

Presentation Tier

The presentation tier consists of what you should already be familiar with from static site design. For more on basic HTML, CSS or Javascript, tutorials and documentation, please see the Mozilla mdn web docs here.

Again as mentioned above, the Presentation tier is the logical tier where we should be either showing output to the user or else receiving input from the user. On a dynamic site, we can usually think of input coming to us via HTTP parameters. I did a bigger overview of this (mostly lookin from the hacking CTF pov) here in my articles about HTTP and CURL.

On the presentation tier, we'll usually accept inputs using either HTML Forms, or for some GET parameters, by embedding them in links. Parameters like this are then used by the applications on the Logic Tier as input parameters.

Output is also handled on the presentation tier. At this level, what we should really think about is the concept of semantic web. This means that when we draft our HTML and CSS, we should maintain a strict seperation between content and presentation. To this ends, we might do something like using external style sheets that apply styling to pages. By maintaining strict seperation, we're given the benefits of being able to easily change the presentational style of multiple pages by modifying a single stylesheet, and by ensuring that our HTML is purely structural and descriptive, we allow the site to be more easily indexed by search engines, and more easily parsed by text browsers, which are necessary for implementing accessibility technologies, e.g. for people who need to use screen readers or similar tools.

When I say that markup should be structural and descriptive, consider the following from my news poster on the homepage.

data/web1/html.png

Just as above in our description of the MySQL database, in the semantically presented output, each row of the MySQL table is distinctly represented in the markup.

The entire news post row is encompassed in a div element, an H3 heading contains our "headline" column, a span contains the date, and then the rest of the space in the div is used to present the content, which is formatted into individual paragraphs as is appropriate for making text legible. Effectively, what we've done is "describe" and present each column the MySQL row in a logically marked up format that lets us logically address each piece of data for styling and final presentation. These individual pieces are then styled according to the CSS in /style.css, which simply makes the page pretty, what with all of the colors, text alignment, font faces, etc.

So that's how we can think about making presentation -- break down our data tier rows into logically marked up items on the page, and then apply styles to those individual items as desired. Before we design a web application proper in the logic tier, it may be helpful to just mock-up the output that you want the data to be structured like, and then move on to automating the production of the output in the logic tier.

Logic Tier

The logic tier is the in-between layer that allows us to actually take take from the user and place it into our database, or else take data from the database and automatically format and output it. In a LAMP stack like that used by this site, the technology implementing the logic tier will be PHP.

In this, we'll be programming. If you've never done this, that may sound intimidating, but to simplify things, programming is simply the process of designing algorithms to do tasks, and an algorithm is pretty much just a series of well-defined steps.

So for example, to take input, we might have the steps:

  1. Take the POST parameters from a user's filled out HTML form
  2. Validate the contents of the POST variables to make sure they're the right kind of data
  3. Sanitize the data for security's sake
  4. Apply formatting as needed
  5. Connect to the database
  6. Build a query to insert the user's data into the database
  7. Execute the query so that the data actually goes into the database.

Conversely, for output, we might do something like this:

  1. Connect to the database
  2. Retrieve rows from the database
  3. Sanitize the output to ensure that it doesn't contain security issues like XSS, etc.
  4. Insert the data from the database into HTML tags appropriate for what we're doing on the page
  5. Print the HTML + Data combo so that it builds a web page.

Obviously, PHP doesn't look like a mere list like this, however, most programming does in fact consist of simply using a language's specific constructs to more or less give a set of instructions to the computer. So it can help to sorta hash out what we need to do for a specific task, and then implement that more specifically in the programming language itself after we have "algorithmized" our solution.

For more specifics on how to write PHP, W3Schools' tutorial is ok, and while you're writing code, it's good to keep the reference manual open as well.

In any case, most apps we'll write will be simple scripts that either put data from the user into a database, or else retrieve data from the database and format it for viewing as HTML.

Moving forward

We'll do some deep dives on how different specific parts of this site work in the future, so that we can see real code examples and the mentality behind how I designed some of the parts of the site.

My hope with this, at least for the indy web sort of community, is to provide some concrete examples that can help those of you who are a bit newer to building things for the web kinda see how developing a dynamic site works. With a basic understanding of 3 tier architecture, hopefully we can have a useful framework for understanding my code examples as we move forward.