How the Internet works

Seeing through the magic

Is it really magic?

Arthur C. Clarke once remarked that "...any sufficiently advanced technology is indistinguishable from magic." Many of us use the Internet with increasing regularity, but still see it as largely magic. There is actually nothing wrong with that point of view, but it can be nice to know how the 'magic' really works. although the Internet is a complex phenomenon, it is possible for ordinary people to understand much of its inner workings.

There is a lot of information in this article, but don't let that intimidate you. We don't intend for you to earn a degree in computer science here. If, by some exceptionally bizarre set of circumstances you find yourself transported into the middle of a cocktail party full of Internet computer scientists, this will give you enough material to talk about so you can understand some of their jokes and slip through the buffet line before you get noticed as a neophyte.

Computing equals communication

The term 'computer' originally conjured up images of behemoth machines churning out complex mathematical calculations. The original computers did calculations all the time. Recently, though, we have not used computers as much for calculation. Most of us use our computers for communication rather that computation. The explosive growth of the Internet is a clear sign of this trend.

The early computers were designed to be shared by many users at the same time. This had an interesting side effect; people sharing the computer could send messages to each other through it. This communication ability turned out to be very handy, but it was largely lost when businesses and individuals turned to personal computers. The personal computers were almost as powerful as their larger cousins, but they were intended for one person at a time, so the communication aspect was lost.

The trend towards connectivity

Clever computer users started connecting personal computers together through Local Area Networks (sometimes abbreviated LAN) and through telephone connections and special hardware devices. The personal computers could then be used again as communication tools, but only with other computers they were directly tied to.

Gradually, LANs became more powerful, and were often tied together to make Wide Area Networks (WANs). Most businesses today use a combination of LAN and WAN technology.

At the same time, educational and defense institutions were working on ways to connect the large research machines. They had a special problem. During the height of the cold war, these computers were used in support of nuclear defense initiatives. It was vital that there be many paths between the computers, and that messages could get through even if some of the communications hubs were brought down by the bad guys.

An underlying protocol

The earliest form of the Internet was based on an ingenious idea called TCP/IP. This stands for Transfer Control Protocol / Internet protocol. TCP/IP is a big name for a simple idea. Essentially, a message is automatically broken into small parts, which are called 'packets.' A packet is labeled with its source and destination, as well as some other information. Each packet finds its own way from the starting machine to the destination, and if it finds itself blocked, it has the capacity to back up and find a new path. When the packets arrive at the destination, they are pieced back together, and the message can be read.

The entire Internet from email to web pages and streaming video, is currently based on TCP/IP packets. Anything you see or hear on the Internet was broken into these packets and sent to you. The TCP/IP protocol is invisible and automatic. Most users never see it and never have to know it is there. This has some interesting side effects. A message that goes from one machine to another in the next room might find its way to France in the meantime (not too often, but it happens). The other side effect of this is that messages you send might temporarily reside on dozens of computers you will never see before they get to the destination.

The 'traffic cops' of the Internet

As scientists were developing TCP/IP and networking technology became more prevalent among personal machines, it became apparent that there ought to be a way to connect the two. Essentially, the solution was a special class of computer called a router. The router's job is to sit between a network and the rest of the internet, and act as a kind of mailman to the network. Any traffic the network sends to the Internet goes through the router, and any messages destined for sites on the network only get there through the router. Routers are connected through high-speed cables to even more powerful machines, which are eventually connected to a number of special high-end machines, often referred to as the 'Internet Backbone'. (This network was originally called the NSF backbone, after the National Science Foundation, which provided much of the original funding. Currently, the NSF is backing a brand new version of the Internet backbone with a research focus called the 'Internet II' or 'Abilene network'. )

Hi, what's your number?

Since there are literally millions of computers connected to the Internet, it could be nearly impossible to locate just one. Fortunately, the original planners of the Internet had some clever ideas. Every machine on the Internet was assigned a number. The number would be composed of four smaller numbers between 0 and 255, separated by dots. (There are some wonderful urban legends about why the numbers don't go to 999, but the real answer is related to the vagaries of base two mathematics. Let's leave that for another session.) The number is called an IP (for internet protocol) number. IP numbers work like zip codes. They are easy for computers to understand, and they make it reasonably easy for packets to be routed to the appropriate destinations.

The Domain name solution

The problem with IP numbers is they are, well, numbers. People tend to be not good with numbers. They much prefer characters and words. For this reason, computer scientists developed the Domain Name Service (often called DNS). DNS is just a big database (actually several) that contains a bunch of computer names and the IP addresses associated with those names. The interNIC (www.internic.net) is currently the organization which manages the assignment of domain names, although the process is being privatized, and others will soon have the capacity to assign domain names. There is a registration fee for a domain name, which is currently $70.00 for two years, but that may change as competition enters the marketplace. The good news is that most of us do not need to worry about a domain name. We are usually given an account by our employer or some kind of provider, and the domain name we use reflects that entity. Part of your email address is usually your domain name. For example, I used to have an email address like this: andyharris@aol.com The part after the @ sign is domain name of my organization.

Domain names have a number of parts, and they can actually give you a lot of information about the person or entity attached to them. They usually end with a two or three digit code. The two digit codes refer to countries, so .fr means 'France' and .ca means 'Canada.' In the United States, we generally leave off the two digit country code '.us' The three digit code refers to the type of organization that owns the computer. These fall into a number of standard categories. Mine ends in '.com', which stands for 'commercial enterprise'. In addition, you often see domain names ending with '.gov' (government organization), '.edu' (educational institution), '.org' (non-profit organization), or '.net' (Internet service provider). The first part of a domain name (the 'aol' part in the example above) is the name of a particular computer or organization. Sometimes there are a number of intermediate words, that can give you more clues. For example, 'stats.math.indiana.edu' would most likely refer to the statistics section of the math department of Indiana University. (such a machine does exist, but its name has changed).

Domain names, as you can see, are used as part of email addresses, and they also make up part of the address of a web page. When used in a web address, the domain name usually comes near the beginning. We will look more closely at how web addresses work in a moment.

When are you here? Is existence essence?

It is important to determine what it means for a person or a computer to be 'on the internet,' because there is some potential for confusion. If you can use a computer to send email, is it on the Internet? Is it on the net because it has a web browser (like Internet Explorer or Netscape) installed? Is a computer always on the Internet?

Servers and clients

Some computers stay on the Internet all the time, but these tend to be large expensive machines. The computers that store information like web pages should stay on all the time, and should always have some kind of connection to the Internet. Such machines are called servers. It can be complicated and expensive to manage a permanent connection, and even more complex to manage a server. Most ordinary people don't want to do it, and want to leave those jobs to a professional. We would usually just prefer to connect our computer to a server for short periods of time, and use the services of a professional to ensure our connection stays valid and we have all the right programs in place. For example, you probably turn your home computer off at night. What if you get an email at two o'clock in the morning, when your computer is not turned on? Likewise, you might have a small business and want to host a homepage. You will want people to be able to get to that page any time of the day, not simply when your computer is turned on and 'hooked up.'

In addition to servers, the internet is also full of clients. You will frequently hear the term 'client-server' used in Internet conversation. The good news is you already know what this means:

A client-server analogy

Imagine driving up to a fast-food restaurant. You get to the speaker and the sixteen-year-old bored kid mumbles something incomprehensible into the microphone. You then order a 'cholesto-burger supreme' special, hear something that resembles a request for some cash, and you drive to the window. You then exchange the money for your meal and drive off. The cashier eagerly leaps to his microphone awaiting the opportunity to serve another customer.

In this example, the customer is the client and the cashier is the server. The server sits around waiting for a client. A client shows up and makes a request. The client and server follow a ritualized conversation (a protocol) to make a transaction. Finally, the transaction is complete, the client moves on, and the server prepares to receive another client.

Your machine is a client. The Internet programs on your own machine (like netscape, a telnet program, or an FTP program) are also considered clients. Clients exist to talk to servers. Servers can also be both machines and special programs. You will almost never directly talk to a server program, but use a client program to communicate with.

So how do I get my client talking to a server?

What most people do is subscribe to some sort of internet service provider. There are two main flavors in common use. One is the HUGE services such as America Online, Prodigy, Compuserve, and many others. These guys offer connections to the internet, and they also offer customized content only for members of the service. They can be a great choice if you are just starting out, and you have probably already gotten some software from one or more of them in the mail or when you purchased your computer. You can often get free hours to try out a service, and then you will need to pay a monthly service plan, or perhaps pay by the hour. Be very careful as you read the plan to understand its terms, particularly if you are sharing an account with members of your family. If you are unaware of an hourly service charge, you could be in for a big shock when the bill comes due.

The other main approach to connecting to the Internet is through some sort of commercial Internet Service Provider (ISP). These have sprung up all over the country, and they often offer cheaper service than the larger services, but usually without custom software or content. Many experienced Internet users prefer using an ISP, but it can often be an intimidating choice for beginners.

One other source of Internet access you might pursue is free access. Often employers, schools, or libraries will offer some kind of limited free Internet access. Most universities now include Internet access as a standard student perk, like a library card. Your employer may have free or reduced-rate Internet access available to you. Local schools, libraries, and community centers sometimes also offer some kind of access. Often these accounts are limited in some way, but they can get you started.

Is there a free lunch?

There are a few commercial ventures that get you on the Internet for free as well, but most already require you to have some kind of access to begin with. One notable exception is juno (www.juno.com) which is a free email-only service. This service includes special software to connect your machine to the internet. Of course, you will have to endure some advertising in order to receive this 'free' service, but it's not a bad trade-off, particularly if all you want right now is email.

The software you might need

You probably already have some Internet software (clients) on your machine. All of these programs 'know' how to speak one or more of the protocols and connect to the appropriate servers. That's all that internet programs are!!

Once you are connected, your machine has an IP number (and maybe also a domain name) assigned to it. This means that you can now send TCP/IP packets to and from your machine. Of course, most of us don't really want to deal directly with TCP/IP, we would prefer the packages to be put together in a more usable format.

TCP/IP is the most basic of the internet protocol, but it is used to put together fancier and more powerful protocols. A protocol is simply a name for an agreement about how a communication will ensue. Formal meetings have a very different protocol than discussions on a basketball court, for example. There are a number of protocols in common use on the Internet, but you only need to know a few. In fact, you don't need to know the protocols at all, only which clients are used for them!! We'll discuss a few anyway, just in case it comes up on a quiz show ("Internet protocols for a thousand, please.")

The wild, wonderfully wacky world wide web!!

The protocol most of us know best is called HTTP (Hyper Text Transfer Protocol) by the People Who Like Big Names For Simple Ideas. The rest of us call it the world-wide-web. HTTP is a truly wonderful protocol, because it allows us to have links and images, and gives us a chance to make much more interesting documents than we could have made in the old 'text-only' days. If you only have one Internet client program on your computer, you should get a good web browser. Browsers are powerful because the HTTP protocol can be used to handle some other protocols (although in limited ways) and because HTTP itself is just so cool. If your computer can handle it, you should definitely have one of the latest versions of the big two browsers (Netscape 4.5 or later, or Microsoft Internet Explorer 4.0+). For ordinary personal users, both are free.

This takes us back to the idea of web addresses. Addresses on the web are also called URLs (for Uniform Resource Locator). You have probably blindly typed http:// at the beginning of every web address, and you never knew why. (It's a ritual. Throw salt over your shoulder, wave a chicken over the monitor, and type http://). Now perhaps you can see why we type this. HTTP is the name of the protocol we want to use. Since web browsers are primarily for the web, we almost always type http:// (Oooooooh!!) Ocaisionally you will use a web browser to use another protocol, so you sometimes see other things there (like news:// or gopher://) These things are just other protocols.

You've got mail

Email is familiar. It actually uses a number of protocols. It is an acceptable simplification to say that email primarily uses smtp (simple mail transfer protocol) as a protocol to send email messages and pop3 to recieve them. (Don't worry, there won't be a quiz. I'm only telling you this because you may run across the terms some time). Email clients (like Eudora or the email clients built into Netscape and IE) already know how to read and write the appropriate protocols, but sometimes you need to set them up so they know where your server is.

Don't forget newsgroups

Newsgroups are an important part of the Internet that are often overlooked. These are special communication forums that are widely distributed across the web. Most of the browsers have built-in capability to work with these newsgroups, but you might want to investigate a special program to do so. Newsgroups are especially wonderful for connecting to people with similar interests as you. If you are interested in something, there is probably a global discussion going on about the subject that you can participate in.

Sometimes you want to send stuff

The File Transfer Protocol (FTP) is a protocol designed for transferring files between machines on the Internet. If will not be doing much of this, the FTP capability of your web browser will probably be enough. Some people like to use Internet accounts as a place to back up important documents, and an FTP client is a good way to handle the transfers between two accounts you own.

A classic protocol

Telnet is one of the oldest protocols on the Internet. What it does is allow one computer to act as a 'dumb terminal' to another. In the pre-web days of the Internet, telnet was the most common way to use the Internet. It was not for the faint-of heart, though, because you had to be able to use whatever machine you were connected to, which often had arcane operating systems such as unix or VMS. It is still common to use telnet if you are operating a web site, particularly if you are doing some web programming, but most beginners do not need to worry too much about the telnet protocol.

Summing it up

The Internet is by any account an exceptional thing. It is a complex, dynamic organism with no real head that still manages to work together pretty well. The core technology that makes the Internet possible is the TCP/IP protocol. This provides an underlying framework that can be packaged together in complex ways to form other protocols. The Internet contains two main classes of computers and software: clients and servers. Servers are the machines and programs that are on all the time and are run by professionals. Clients are the machines and programs that mere mortals use to connect to servers. Hooking up to the Internet entails enlisting the services of a server, establishing the basic TCP/IP connection, and running one or more client programs. There is still plenty of magic left, when we consider how exactly the protocols work, how the communications happen, and how all the various programs are written, but it is possible to understand the basic workings of the Internet. One of the most exciting things about technology is that when you understand the magic, it doesn't go away. The new insight and ability that you earn make you appear to be much more effective as a user of the technology. Maybe we could say that when we take some of the magic out of the Internet, we transfer that magic to the people who have learned the concepts.