|
The Domain Name System
How DNS Works
|
DNS - the Domain Name System - is the addressing system of the Internet. Using DNS, your computer figures out what IP address (for example, 216.112.23.11) corresponds with a particular computer hostname (for example, "www.ahref.com"). Your computer (through the magic of networking) knows how to get to any IP address on the Internet, and uses that IP address to figure out where it should send messages.
We non-computers could use the IP address instead of the domain name in some cases - for example, the URL http://216.112.23.11/ will take you to the ahref.com front page, and email sent to piou@[216.112.23.11] will get to piou@ahref.com. (The brackets in the email address are necessary when using IP addresses.) Mapping between domain names and IP addresses using DNS makes things much simpler for us, though. They also help with portability: you don't need to retain control over a particular part of a network to maintain your email address or web services; people can follow you around using your domain name.
A previous ahref.com article talked about the history of the Domain Name System, and how it was changing through ICANN and other organizations. This article will talk about how DNS actually works - how your computer figures out what address to go to for a particular machine.
|
Figuring Out Which Server Knows What
|
When you try to connect to a web page, for example:
http://www.ahref.com/guides/industry/199903/0323piou.html
- your web browser splits the URL into its component parts
- determines which part, in this case, www.ahref.com, is the hostname.
(We'll refer to the host you're trying to reach as the target machine, or target.) If you've visited a page on the target machine recently, your computer will remember the IP address of the host, and send a request for the page to that IP address.
If your computer doesn't have the target's IP address, it will connect to whatever local nameserver you have configured it to use. (Actually, it will most likely connect to one of several local nameservers you've configured.) The local nameserver most likely serves multiple machines in your (network) area. If any of the machines which this server serves have asked for the target machine recently (within the target's TTL, which we'll talk about later), the local nameserver will have that machine's IP address stored, and will immediately return that IP address to your computer. If the server does not have the IP address stored, it will try to figure out what remote nameserver has information on the target computer, and retrieve information from there.
The first place your local nameserver will ask for information will be one of the "root" nameservers - one of 13 computers which stand at the center of the Domain Name System. Every nameserver on the Internet (aside from those whose administrators have decided to alter their configuration for some reason) has the IP addresses of these root servers permanently stored. The root nameservers contain information on which nameservers are responsible for which Internet top-level domains (.com, .org, .gov, .au, .us, etc. - the last set of characters in any host or domain name). If you're looking for www.ahref.com, the root server your local nameserver contacts will point you to several nameservers that contain authoritative information for the .com top-level domain.
Once it has the .com server's IP address, your nameserver will ask it for the IP address of the nameserver which has authority over the ahref.com domain. The IP address which is returned at this point will be one of the addresses which the domain owner entered when registering the domain with Network Solutions or one of the other registrars.
Now that your local nameserver knows where to find the nameserver for the target machine, it asks that nameserver for the IP address of the target. The target's nameserver returns that information, as well as a "time to live" - the amount of time that your local nameserver should store the IP address it has received. (This time is generally set fairly low - a matter of days or hours - in case the target machine's IP address needs to be changed quickly.)
Once your local nameserver has this information, it returns the information to your computer, and you are able to connect to the target machine.
|
A vs. MX Records
|
Typically, hosts have both "A" records and "MX" records. The "A" records work as described above, and are used in determining targets for web connections, FTP, telnet, and other such services. "MX" records, though, deal with with mail delivery.
MX records serve two purposes. First, they allow you to specify a different target machine for electronic mail than for other Internet services. So, you can send all FTP, web, and telnet requests for ahref.com to one machine, while sending email to another machine elsewhere. This can be helpful in splitting load among your Internet servers, and splitting administration requirements.
Second, MX records let you specify several hosts to accept email for your domain. Because email is asynchronous, you don't want it to bounce if the main target machine is temporarily unavailable; the target machine might become available in a few hours, at which time the mail should come through. By specifying multiple MX records, you can have email spooled on a backup machine somewhere while waiting for your main mail machine to come back up. You do this by specifying an MX number for each mail host; mail will be delivered to the lowest-numbered currently available machine. If the lowest-numbered machine is not currently available, it will be delivered to a machine with a higher number, which will "spool" the mail, and deliver it to the appropriate, main, mail machine when that machine becomes available.
|
DNS Checking
|
The idea is simple: each page request is from a specific IP address. You create a script to check which hostname the IP address is associated with. Once you have that, you then resolve the hostname you found back to an IP address. This reverse DNS lookup followed by a forward DNS lookup loop should yield the same IP address as the original requesting IP address. If it doesn't then you have a spammer on your hands - block them!
PHP Code
So how do you do this on the server? It's very easy with PHP:
- Check the user agent to see if it's identifying itself as a search engine bot
- If so, get the IP address requesting the page
- Reverse DNS lookup the IP address to get a hostname
- Forward DNS lookup the hostname to get an IP address
The code:
$ua = $_SERVER['HTTP_USER_AGENT'];
if(stristr($ua, 'msnbot') || stristr($ua, 'googlebot')){
//it's pretending to be MSN's bot or Google's bot
$ip = $_SERVER['REMOTE_ADDR'];
$hostname = gethostbyaddr($ip);
if(!preg_match("/\.googlebot\.com$/", $hostname) && !preg_match("/search\.live\.com$/", $hostname)){
//the hostname does not belong to either live.com or googlebot.com.
//Remember the UA already said it is either MSNBot or Googlebot.
//So it's a spammer.
echo "Please leave";
}
else{
//Now we have a hit that half-passes the check. One last go:
$real_ip = gethostbyname($hostname);
if($ip != $real_ip){
//spammer!
echo "Please leave";
}
else{
//real bot
echo "Welcome!";
}
}
}
The functions used in the code are links to php.net for you to read more about them. Also, the comments tell you what's going on in the code. Notice that we do a case-insensitive check for the user agent using to see if it's MSNBot or Googlebot. If it is, then we do the DNS check, and check the results.
Two final comments on the preg_match checks. They simply check that the $hostname string actually ends with either live.com or googlebot.com. If not, then we caught a spammer. If the $hostname does indeed end in live.com or googlebot.com, then we either have the genuine article or someone is messing with our DNS. This last possibility takes us to the final check in the else block.
The other thing is that we're doing a negative check, that is, checking that preg_match does NOT match live.com or googlebot.com. We can implement the code with a positive check (i.e., checking that preg_match does match), but of course, the actual code logic changes a bit.
|
|