Understanding Web URLs

·

6 min read

Understanding Web URLs

URL stands for Uniform Resource Locator, and it serves as a pointer or address to a resource that can be accessed over the internet. In other words, it's the way that web browsers locate and retrieve web pages, images, files, and other types of resources on the web.

Anatomy of a URL

Scheme or Protocol

http

The scheme of a URL tells web engines what protocol (set of rules) to use when accessing the resources or how to communicate with a website’s server, send and retrieve information and access a resource on the internet.

Mostly http or https are used but there are other schemes such as mailto:// (opens the computer’s default email service provider with the email address entered in the URL), and ftp:// (a standard protocol for transferring computer files between a client and server)

We usually won't need to type the scheme when typing a web address. Mostly, the browser won’t also show the scheme in the address bar, but it is always part of the URL.

Following the scheme by the character pattern ://, the next part of a URL is known as Authority. If present, the authority includes both the domain (e.g. www.example.com) and the port (80), separated by a colon.

One example of a URL that doesn't use an authority is the mail client (mailto:foobar). It contains a scheme but doesn't use an authority component. Therefore, the colon is not followed by two slashes and only acts as a delimiter between the scheme and the mail address.

Domain (Host)

It specifies the organization(server) or entity to which the URL belongs to. It is the actual name of the website. It refers to the main website page that all other pages are linked to. Domain names must be unique as they determine the address of a website. In the early days of the web, IP address was required to go to a particular site instead of domain names.

For most URLs, the www domain can be omitted. For instance, Google.com and google.com lead to the same page. However, other subdomains cannot be omitted. For example, all pages under images.google.com require the images subdomain in the URL.

Sub-Domain

www

Specific areas of a website that the web page is directed to. The most common subdomain is ‘www’.

Second Level Domain

.google

It is immediately behind a top-level domain. It is the name of your website. It helps people know they’re visiting a certain brand’s site.

Top Level Domain (TLD)

.com

It indicates the type of organization the website is registered to, such as .com (intended for commercial entities), .org, or .edu (intended for academic institutions), etc. It is also known as the domain extension.

Port

:80

A port number specifies the type of service that is requested by the client since servers often deliver multiple services. It is a reserved channel used for specific purposes. It is usually omitted if the web server uses the standard ports of the HTTP protocol (80 for HTTP and 443 for HTTPS) to grant access to its resources, otherwise, it is mandatory. Always following a colon, it is usually not visible in URLs, but necessary.

Path

/path/to/file.html

The path is used to show which directory on the server stores the resources (files, videos, audio, etc.) that are being requested. It also directs the browser to a specific page on the website.

If you don’t specify a path and only enter a domain name, your browser is still loading a specific page; it’s just loading a default page, which usually will help you navigate to other pages.

Also known as subdirectory or subfolder structure, it also helps web crawlers understand which particular section of a webpage they’re on.

Nowadays, the path that appears in most URLs don’t forcibly reflect the directory structure on the server. Instead, paths are used to identify a route in the navigational structure of the website. It is mostly an abstraction handled by Web servers without any physical reality.

Query String (Query + Parameters)

?key1=value1&key2=value2

It follows the path component and provides a string of information that the resource can use for some purpose (for example, as parameters for a search or as data to be processed). The query string is usually a string of name and value pairs, separated from each other by an ampersand (&).

Query parameters are optional parts of a URL and help provide additional information that can be passed to the server to help find or filter a resource.

Anchor (Fragments)

#main

Appearing after the path, the anchor or fragment identifier of a URL is optional and tells the browser to scroll to or load a specific section of the page. Usually, the anchor begins with a hashtag and is used to direct your browser to a specific part of a very long page, much like a bookmark.

It is worth noting that the part after the #, is never sent to the server with the request.

Types Of URLs

Absolute URL

An absolute URL refers to a complete location on the internet, including the protocol (http or https), the domain name, and the resource path. For instance, https://www.example.com/index.html is an absolute URL.

  • Always points to the same location on the internet

  • Can be used to link to resources on other websites

Examples:

https://hashnode.com/n/web-development (Full URL)

//hashnode.com/n/web-development (Implicit Protocol)

Relative URL

A relative URL refers to a location relative to the current location. In the context of the web, relative URLs are commonly used to link to other resources on the same website. They can be either relative to the root of the website or relative to the current page, depending on the URL path specified.

For instance, index.html, ../images/logo.png (if the logo is in the images directory; one level above)

  • Does not include the protocol or domain name, only the resource path

  • Useful when linking to resources within the same website

Examples (wherein base URL https://hashnode.com/n/web-development):

/ (hashnode.com/n/web-development)

../n/web-development (hashnode.com/n/web-development)

In general, when linking to another page on your website, it is best to use a relative URL. When linking to an external website or an asset hosted on a different domain, an absolute URL is necessary.

Conclusion

Web URLs are essential for accessing resources on the internet. Like most elements of your website, URLs are more complex than they seem at first glance. By understanding how URLs are structured and what each part means, you can navigate the web more effectively and understand what's happening behind the scenes when you visit a website.

Did you find this article valuable?

Support Aanchal's blog by becoming a sponsor. Any amount is appreciated!