How to Create Human-Friendly Shortened URLs
By William Hertling
HP Developer for 3D Printing
URL shorteners - those web services that take a long URL and create a tiny, unique code to make a link that is much shorter. URL shortening can be used to squeeze a link into a limited space, such as a 140-character tweet, track metrics, or make links look cleaner.
Shortened URLs often depend on creating a unique key using base 36 (10 digits + 26 alphabetic characters) or base 62 (10 digits + 26 lowercase + 26 uppercase). A base 62 URL (such as bit.ly/2oAckLK) helps keep the key length, and hence the overall URL length, as short as possible.
When such shortened URLs are used on printed materials, such as ads, postcards, or packaging, a problem quickly arises: It is exceedingly difficult to type a shortened URL into a device. Pseudo-random base 62 strings are not easy for people to remember. Therefore, it becomes necessary to read and input one character at a time. Worse, on a mobile device, a person must switch keyboards multiple times. For the example above, they would have to press twelve keys instead of seven:
<number> 2 <alpha> o <uppercase> a c k <uppercase> L <uppercase> K
Short and easy
What if we want to make a shortened URL that is easy for people to enter manually? What principles can we use to guide the design of that URL?
Avoid switching between uppercase, lowercase, and numbers more than necessary. The mobile-keyboard example above demonstrates the cognitive and physical cost of this.
Avoid ambiguous characters. Depending on the font, the uppercase I, lowercase l and the numeral 1 can be confused with each other. Similarly, the uppercase O and slender numeral 0 can be confused. To a lesser extent, two consecutive letter Vs can sometimes look like a W. In some fonts, the number 7 can be confused with a 1.
It’s tempting to argue that ambiguous characters can be resolved with better font choice. However the software that generates shortened URLs will often be far removed from design decisions around fonts, which may be dictated by corporate branding standards and other restrictions. So the safest best is to eliminate all ambiguous characters from code generation.
Separate and alternate
Group the code into sequences of 3 or 4 digits. This is well known from the formatting of telephone and credit card numbers, which in turn stems from studies into short-term memory.
There are two strategies that can help break longer strings into shorter ones. We can use a separator character, such as a dash (-) or period (.), or we can alternate between letters and numbers. The period appears on both numeric and alphabetic mobile keyboards, making it an ideal separator.
These are examples of separator character grouping. They can use letters or numbers in any sub-group (shown with an example domain name below):
These are examples of alternation grouping. They are composed of one subgroup of letters and one subgroup of numbers:
By restricting the code in these ways:
- Using a single case and digits
- Avoiding two numbers (0, 1) and three letters (i, l, O)
- Constraining subgroups to the same type of characters (all numbers or all letters)
...we do lose a lot of combinations compared to traditional base 62 encoding. However, most websites don’t need to shorten hundreds of millions of URLs, so this mechanism will work fine.
Implementing all five of the grouping schemes shown above yields a total code space of 173 million URLs. To handle more than that, lengthening subgroups to four characters or using three subgroups will boost the code space to more than 80 billion URLs.
Humans have great difficulty reading and transcribing shortened URLs that contain a random assortment of numbers and letters in upper and lower case. To give users URLs that are easy to type and transcribe, avoid switching keyboards and types of characters more than once, use a separator character to create subgroups, and eliminate ambiguous characters. Human-friendly URLs are ideal for printed materials and anywhere shortened URLs are needed.
is a Ruby on Rails developer at HP working on 3D printing software. By night, he’s a science fiction writer, and the author of the best-selling Avogadro Corp and Kill Process. His writing has been called “chilling and compelling” by Wired, and a “must read” by Brad Feld, managing director at Foundry Group.