Introduction

Search Engines play an integral role in the lives of many of us, sometimes without us realising. A purchase on Amazon, usually starts with a product search. A new vacation booking, often begins with a hotel search. The answer to any question, is just a web search away. We look at the history behind one type of search engine - the web search engine.

Before search engines

The internet existed for many years before the first web page came online. It was initially a US military project at DARPA, then used as a way of sharing academic and scientific knowledge. The internet as we know it, runs on TCP/IP, a technology invented in 1974 by Bob Kahn and Vint Cerf (who now works for Google).[1]

Tim Berners-Lee

It was the English scientist, Sir Tim Berners-Lee, who invented the World Wide Web in 1989 while working at CERN in Switzerland. It used a technology called Hypertext Transfer Protocol (HTTP) that transmitted data over TCP/IP, which is why all URLs start with “HTTP” to this day. To make HTTP easier to interface with, Berners-Lee built the world’s first web server and web browser to navigate it in 1990. He also invented HTML (based on CERN’s SGML markup) for formatting text-based content, distributing the technology outside of CERN in 1991. Berners-Lee declared that the technology must remain freely available, with no patents or royalty-fees, to be accessible to everyone.

Usenet newsgroups were how internet users communicated, a decade before the web was invented. Even after the broad adoption of Sir Tim Berners-Lee’s technology, web pages were often shared and linked to within special-interest newsgroups. Some users began creating web pages that collated the URLs shared in newsgroups, into catalogues or directories. Rather than having to download every message in a newsgroup and searching for information or URLs within them, users could now visit a directory/portal and navigate through the categories to find the relevant web pages.

Sir Tim Berners-Lee

Open Directory Project

One of the most well-known directories was DMOZ (or the Open Directory Project), created in 1998 by two engineers at Sun Microsystems. The directory became the default way of finding information on the web for many, with a search function for faster directory navigation.

DMOZ was acquired by Netscape later that same year, having indexed (listed) around 100,000 URLs. A year later DMOZ had over a million URLs listed, and it peaked at over five million URLs, before closing in 2017. The directory’s objectives became blurred when AOL acquired Netscape, and the cost of running it for free came into question.

Volunteer editors of the directory felt betrayed, becoming unpaid workers of AOL, rather than a noble, free and open initiative. Commercial content creators such as CNN were given editorial rights to add/edit/delete entries, putting the impartiality of the project in question. Editors were found to be selling inclusion in the directory, after a DMOZ listing was rumoured to increase a website’s ranking in search engines.

AOL failed to resuscitate DMOZ several times and eventually took the website offline. A directory based on the original DMOZ database and maintained by former DMOZ editors, still exists today at Curlie.org.

Archie wasn't the first web search engine

A pub quiz will tell you that Archie was the first search engine, which is technically correct. But Archie, built in 1987 at McGill University, was designed to search for files on the internet (FTP servers), not for content on the World Wide Web.

W3Catalog (originally called “Jughead”) was later launched in September 1993 by Oscar Nierstrasz at the University of Geneva. The service mostly took existing lists/catalogues of web pages and made them searchable in a standardised format.

Aliweb (Archie Like Indexing for the Web) is widely considered to be the first web search engine. Launched in November 1993, Aliweb allowed webmasters to submit their web pages and enter the relevant keywords and descriptions for these pages. The search engine was largely forgotten though, with internet users at the time, still mostly preferring to navigate websites using directories, lists and catalogues.

Money Gets Involved

WebCrawler was the first search engine to be widely used, as well as the first to fully index the content on web pages, making every word and phrase searchable. It was developed at the University of Washington and launched in 1994, the same year as Lycos from Carnegie Mellon University. Both WebCrawler and Lycos became commercial ventures, with WebCrawler supported by two primary investors, one being Microsoft co-founder Paul Allen.

Lycos heavily invested in their brand, with TV ads featuring their iconic black labrador dog, as well as hiring a vast team of volunteer and paid Editors for their web directory.

1995 - The dawn of search engines

Excite, AltaVista & Yahoo were born

Two years after Aliweb, search engines became mainstream and big business. Excite and AltaVista both launched in 1995, along with the less well-known MetaCrawler, Magellan and Daum. But the most significant success was Yahoo, founded by Jerry Yang and David Filo.

“Yahoo!” started as a traditional web directory in 1994 by two Stanford University graduates, then launching a search engine in 1995. To the annoyance of their lesser-known rivals, Yahoo didn’t build any significant new technology. They bought and borrowed third-party tech, until the acquisition of Inktomi (a search engine for hire) in 2002. The success of Yahoo was all packaging, with a fun brand and a user-friendly interface.

Internet-connected computers started to become widely accessible in schools, libraries and homes across the globe. A new generation began using websites more than books, and search engines more than web directories. Yahoo, AltaVista and Lycos dominated, with significant investment propping up the loss-making sites.

Who is Robin Li?
(Li Yanhong / 李彦宏)

He kept his head down

“Robin” Li Yanhong is mostly an unknown in the western world, but one of the richest men in China, with an estimated net worth of nineteen billion dollars.

His parents were factory workers, with four other children to care for. They helped Li get into Peking University, where he studied Information Management. Li then went on to study at the University of Buffalo in New York, where he earned a doctorate in Computer Science. After leaving university, Li joined a New Jersey based division of Dow Jones, where he built software to manage the online edition of The Wall Street Journal.

In 1996, Li created and patented a system called RankDex, for ranking the importance of web pages in a search result. For the first time, it used “link analysis” to determine the importance of web pages by the number of other pages linking to them.

RankDex is the basis of every major search engine’s ranking algorithm today, predating Google’s “PageRank” by two years and being referenced in Larry Page’s first patent. Without RankDex, search engines may still be using keywords, not links, as their primary ranking factor today.

Shortly after welcoming in the new millennium, Robin Li and Eric Xu co-founded and incorporated Baidu, now China’s largest search engine. Just as PageRank is the soul of Google, RankDex is the soul of Baidu and the father of modern search engine ranking algorithms.

Robin Li - Baidu

Why don’t you Ask Jeeves?

Just as Boo.com was “before its time” in e-commerce (3D rotating product images in 1998?!), Ask Jeeves was before its time in search. The search engine launched in 1997, with a unique ability to answer questions.

Up until now, users had to carefully think about which “keywords” to search for on AltaVista, Yahoo and Excite, to get a useful page of results back. A typical search engine would give equal weighting and importance to each word in a question, often returning irrelevant results. Ask Jeeves was able to extract the important words and primary intent of a question, yielding much more relevant results. It became hugely popular with the growing number of non-techies surfing the web, who were not used to thinking like a computer.

The company was acquired by IAC (Match.com) in 2005 but struggled to compete against larger rivals such as Google. It rebranded to Ask.com in 2006, to save on the royalty fees to P.G Wodehouse’s estate (Jeeves was a butler character in one of his books) and appear more modern[2]. In 2010, Ask made its search team redundant and outsourced to another search engine provider.

The search engine technology Teoma that they acquired, struggled with ranking and indexing relevant pages on an ever-expanding web. Revenue generation also seemed to take priority over User Experience. Adverts consumed the search result pages when Google at the time had a clean, fast and relatively ad-free appearance. Any computer-related search query on Ask Jeeves would return a page covered in Dell adverts, making the organic results challenging to find.

A year after throwing in the towel on search technology, Ask’s CEO, Doug Leeds, said they still provided search to over 100 million people every month. Perhaps if Ask kept developing their technology, they’d be a leader in voice search today, where question-based searches are key.

A monopolistic giant is born

Google

Originally named “Backrub” for its link-based ranking algorithm, Google was founded by Larry Page and Sergey Brin at Stanford University. A play on the term “googol” (10 to the power of 100), the company had big ambitions from the outset. Page and Brin got a seed investment of $100,000 by Sun Microsystems co-founder, Andy Bechtolsheim before the company was even incorporated. The seed funding round raised a total of $1 million, the majority of which came from three key investors. One of those investors was Jeff Bezos, the founder of an up and coming online bookstore called Amazon.

Google came relatively late to the search party, building upon several existing ideas in 1996 and launching at the end of 1997. Other search engines were starting to suffer from spam and relevancy issues, which was to become Google’s golden bullet and the secret to their success.

Photo Credit: Steve Jurvetson - Google's first server rack

Google - First Server

SEO made Google a giant

SEO started as a hobby by inquisitive webmasters, who wondered how search engines ranked the order of web pages in their results. Older search engines heavily relied on trusting the “keyword” meta tag on a web page, to tell it which keyword searches to list the page for. This required a level of trust in webmasters, that they would only enter relevant keywords. Newer search engines used the content of a web page to determine its importance, often counting the number of times a keyword appeared on the page. Both of these were exploited, first for fun and then for profit. Websites would list every conceivable keyword in their meta tags and content, repeating the same keywords multiple times. These ugly lists would often be hidden at the bottom of the page or in the same colour font as the background, rendering them invisible.

Baidu founder Robin Li was one of the first to tackle this problem with RankDex, which used links from other websites as a vote of importance. Newer search engines moved to this link based citation model but counted all links as equal. Some webmasters exploited this by creating web pages with thousands of links on them, all pointing to their other websites.

Larry Page addressed this issue with his PageRank formula, which not only looked at how many backlinks a page had, but how important those linking pages were. A link from the BBC was more important than a link from a spammy hub page, as thousands of other sites would be linking to that BBC page and many of those sites would themselves have thousands of relevant links. This ripple effect was difficult to manipulate by webmasters, as every link required many levels of other links pointing at them, to help give it the authority to push a web page up the rankings.[3]

Spam-free results became increasingly important, as the internet was used for more commercial purposes. Searches for academic papers with one or two results were replaced by tens of thousands of searches a month for [car insurance], [designer clothing] and [credit cards], each with hundreds of thousands of competing pages. Google had the magic formula for weeding out the spammy SEO keyword pages and ranking more trustworthy websites first.

The stolen trillion dollar idea

Searching for profit

If the PageRank ranking formula was Google’s golden bullet, “AdWords” was the gunpowder that fired it. The computing power needed to crawl the growing web and build Page’s elaborate link graph didn’t come cheap, and all of the investors were expecting a healthy return on their investment.

Advertising was a complex problem in 1999. Search Engines needed to place banner adverts on their search results to pay for their infrastructure, but the banner ads detracted users and slowed down their result pages. Ask Jeeves and Yahoo started to experiment with paid inclusion, where websites could pay for a higher ranking in search results or a guarantee to be listed.

GoTo.com was a minor player in the market, having bought one of the oldest search technology companies, World Wide Web Worm. Their skill was in monetising searches, creating the first ad auction system in 1998, where businesses would bid on keywords. Rather than paying for banner views, companies would pay anything up to $1 per click on their premium search result listing. The company rebranded to “Overture” in 2001 and started selling its advertising solution into MSN and Yahoo, monetising hundreds of millions of searches a day. This vastly outweighed the income from their own GoTo search engine but also gave them the capital to acquire competitor search engines, AltaVista and AllTheWeb. Other search engines were being bought for status and traffic; these purchases were purely business, increasing Overture’s ad platform coverage and keeping 100% of the revenue.

Google gets taken to court

Google AdWords launched in October 2000, initially on a CPM (cost per thousand impressions) basis and transitioning to PPC (pay per click) in 2002. The PPC model was remarkably similar to Overture’s patented advertising platform. Later that year, Overture filed a lawsuit, claiming that Google had stolen their proprietary technology.

In 2003, Yahoo bought Overture for $1.63 billion. It secured the ad platform that drove most of Yahoo’s revenue, as well as increasing their market share with the portfolio of search engines that Overture had previously acquired. Then in 2004, Google settled the lawsuit with recently acquired Overture, offering Yahoo 2.7 million GOOG shares as compensation. At today’s share valuation and accounting for Google’s 2014 share split, these shares would be worth $6.9 billion.[4]

Yahoo struggled over the years to come, making poor investments, acquisitions and business priorities. Blogging, Photo Sharing and Auction websites were acquired and then left to whither while competing websites became billion-dollar businesses. Search users left in droves, as did the partner search engines that were previously powered by Yahoo. Finally, in 2017, the company was acquired by Verizon for just $4.5 billion.

What's it called again?

Microsoft’s struggle for dominance

Nothing symbolises panic and indecisiveness, better than Microsoft’s fall into the search engine world. MSN Search launched in 1998 in the wake of Google when Microsoft’s Windows operating system was used by over 90% of Americans. MSN initially used Inktomi (a search engine for hire) search results, which also powered Yahoo. They tried to stand out by blending Looksmart results and then AltaVista results into their service, with limited success.

In 2004, Microsoft finally gave search the investment that it needed, building in-house search engine technology and putting it live in 2005. MSN’s most significant success at the time was in the B2B world that Microsoft was comfortable in. They offered their technology to other search engines, internet providers and portals, gaining market share and a cut of advertising revenues.

Just as MSN Search started to pick up momentum (helped by it being the default homepage on millions of Internet Explorer browsers), Microsoft committed harakiri by renaming the service “Microsoft Live” in 2006. The decision behind this is vague and perplexing, other than reinforcing the company’s “Windows” brand and making it seem modern. Just one year later, the company rebranded its search engine again, removing the “Windows” reference and calling it “Live Search”.

Any ideas?

Former search industry leader, Danny Sullivan:

“Why not go back to MSN or Microsoft Search? Why not change it - or will it change?”

Microsoft's Kevin Johnson replied:

“There’s an opportunity for us to fix those brands. We acknowledge that we need to get that fixed. If you have suggestions, we’ll take them.”

In 2009, Microsoft announced that Live Search would be rebranded one last time, to Bing. The brand changes and search result quality of Microsoft became a point of ridicule over the years, but Bing became a serious contender for Google in the years that followed. It rarely gained more than a 10% market share in the US, but did compete with, and often beat, Google on blind result comparisons and user satisfaction studies. The same year that Bing launched, Microsoft signed a deal with their closest search engine rival, to power Yahoo’s search results. To this day, Yahoo is still “Powered by Bing™”.[5]

Photo Credit: Si1very - Danny Sullivan and Kevin Johnson

Danny Sullivan and Kevin Johnson

Yandex (Яндекс in Russian)

Searching in Russia

Reported by Konstantin Kanin

Yandex went public in 1997 and became the market leader in Russia in 2001. Other Russian search engines existed as well at the time, such as Aport.ru and Rambler.ru, but they later pivoted into a marketplace and media portal respectively.

Yandex’s dominance over Google in Russia, stemmed from it understanding the morphology of Russian language better, resulting in more precise search results for Russian websites. The company then released competing Maps and Market/Shopping products, which were also better adapted to the needs of Russian users. Google didn’t seem very interested in the Russian market at first, allowing Yandex to become so dominant. As Yandex grew, search became part of a much larger ecosystem of products for the company.

However, Russian internet users have been using two search engines for several years. Thanks to the spread of Android and Chrome, Google has managed to win back a large share of search, in particular, among young and more advanced users. At one point, Google even surpassed Yandex for mobile search, but then lost its lead again.

Taking into account the current global trend of tightening regulation over nationally important segments of the internet, I would predict Yandex acquiring an even greater market share over Google.

Some of the world’s best Machine Learning and speech recognition technologies are being developed in Russia. Yandex has its own voice assistant, which is also considered to be one of the best. Of course, in 5-10 years we will use search differently, and I am sure that in terms of technology, Yandex will be one of the leading players, in Russia and worldwide.

I think the secret to Yandex’s success is simple: they make a first-class product, that is fully adapted to the habits and needs of the local user. Yandex became part of Russian internet culture — not only the search engine, but also its Map, Market/Shopping, Taxi, and Voice Search products.

Photo Credit: Konstantin Kanin

Konstantin Kanin

Baidu - The Google of China

An unstoppable force

Baidu had an impressive pedigree from the day it launched in 2000, as a Chinese focussed search engine. The company was co-founded by Robin Li, who inspired Google’s Larry Page and his PageRank patent. Li’s RankDex idea of scoring pages based on their link profile was published back in 1996, but now Li was playing catch-up to an already popular Google.

The search engine’s growth was powered by ad revenue and Chinese government compliance. Baidu launched its auction-based Pay Per Click ad platform before Google but did not face the same patent infringement lawsuit from Overture that Google did. It also complied with requests to censor keywords and news sources by the Chinese government, which Google either refused or resisted. Highlighted in 2009 by a leaked internal document, Baidu had a long list of keywords, topics and websites that should return no results. It included news websites that were critical of the government, civil rights, protests, ethnic conflicts, democracy and the names of China’s leaders.[6] This was a necessary evil, for the search engine to have any chance of succeeding within “The Great Firewall of China”.

Google began its Simplified and Traditional Chinese search engine in the same year as Baidu launched. The Chinese government intermittently blocked the .com website, so Google launched a censored version of its search engine at Google.cn. The relationship continued to be rocky though, and Google stopped censoring in 2010, after being attacked by hackers linked to the Chinese government.[7] It struggled to maintain a sizeable market share, with the website repeatedly blocked and Chinese users preferring native alternatives such as Baidu.

Baidu's competition is from within China

Reported by Allen Qu (渠成)

Baidu has been the dominant player in China for a long time. But before Google left the Chinese market, it had almost a 50% market share. Sogou and 360 Search are also popular in China, getting most of their traffic by creating their own web browsers.

Baidu is now facing a new challenge from Toutiao Search, which is rising in popularity. It’s still far away, but could be a challenger to Baidu in the next 3-5 years.

China has a very different internet environment, from both a political and cultural point of view. Thus, it’s very difficult for international players to survive in the market, such as Google. When competing with domestic players, Baidu did have many advantages over their US rivals.

Voice search is starting to become more prevalent in China, but with the Baidu Smart Speaker, not Amazon Alexa or Google Assistant.

Photo Credit: Allen Qu (渠成)

Allen Qu

The crazy world of DuckDuckGo

DuckDuckGo After Privacy

Launched at the start of 2008, it was hard to take DuckDuckGo seriously. The search engine was founded by Gabriel Weinberg, who had recently finished his Masters at MIT and failed at launch a new social networking start-up. There were no financial backers at first, and the search results were mostly tied together from the APIs of other search engines. Why would people use it? The USP was privacy - your searches weren’t tracked or recorded like on most other search engines.

Two years prior, AOL publicly shared a data file with three months worth of search history, for research purposes. It included 20 million search queries from 650,000 users. While the searcher’s account details weren’t shown, they were given a random ID that allowed researchers to group searches by individual users. Some searches included PII (Personally Identifiable Information) and the identity of some users was revealed. Concerns over search privacy stopped becoming a paranoid techie issue and entered the mainstream conscience.

Over the next few years, DuckDuckGo started to attract a cult following of privacy-concerned users. Google and Facebook were beginning to show their cards, as collectors and sellers of personal data. In 2011, Union Square Ventures made an angel investment in DDG, “Not because we thought it would beat Google. We invested in it because there is a need for a private search engine. We did it for the Internet anarchists, people that hang out on Reddit and Hacker News”. By 2012, the search engine announced it was serving 1.5 million searches a day and made $115,000 from privacy-friendly advertising.

A year later, The Guardian and The Washington Post newspapers published an expose on an NSA operation called PRISM. Powerpoint slides leaked by Edward Snowden show how big tech companies were handing over user data and search history to US Intelligence. It stated that “98% of PRISM data is sourced from Yahoo, Google, and Microsoft”. DuckDuckGo hit 4 million searches a day that year.[8]

There was enough demand that the Firefox and Safari browsers gave DuckDuckGo as a default search engine option in 2014. Something that Google only followed suit on in 2019.

DuckDuckGo now answers 1.8 billion searches a month and has a US market share of 1.24% (Yahoo has 3.65%).[9]

DuckDuckGo Logo

Mobile Search overtakes Desktop

Mobile Search

People were able to search the web on their mobile device before Google was even invented, and the search giant themselves enabled their “WAP” based site for mobile users in 2000. WAP phones transitioned into “Feature Phones” such as the Blackberry, with pre-installed apps for searching and awkward navigation using a physical phone keypad or a “trackball” if you were lucky. Each device maker and telecoms company had their own deals for pre-installing a search engine on their phones, in exchange for a revenue share or contract deal.

Smartphones were the catalyst to a truly unrestricted search and browsing experience, putting a small computer in everyone’s pockets. But it also saw the collapse of a diverse phone and software market, where dozens of hardware and software manufacturers competed. The once-powerful Nokia and Motorola lost their edge and eventually got acquired by Microsoft and Google respectively. Microsoft couldn’t get a foothold in the phone market, losing out on hardware (Nokia), software (Windows Mobile) and search (Bing) revenue.

This moved the power of which search engines people used, even further into Google’s favour. Apple (iPhone) and Google (Android) were the kings of mobile. Most Android smartphones had Google Chrome pre-installed, and the search engine defaulted to Google. To make matters worse for Bing and Yahoo, Google signed a deal with Apple to make their search engine the default on iPhone’s Safari browser as well, reportedly paying $9 billion a year for the privilege.[10]

In 2015, Google announced that searches on mobile devices had outnumbered desktops for the first time, in 10 countries, including the US and Japan.[11] This trend continues today, with mobiles and tablets becoming the primary search devices in people’s homes.

Voice Search

Voice Search is the next phase of search, allowing users to ask their questions to a “Smart Device”, instead of typing the keywords into a web browser. At the moment, there’s more hype than substance to Voice Search. It suits searches that only have one answer, such as “What is the capital of France?”, but can’t compete with a browser when the searcher expects many results or detailed information. It’s a novelty to settle arguments, play music, save glances at a watch or switching to The Weather Channel.

Microsoft must hope that Voice Search remains a fad, with their Cortana voice assistant struggling to venture outside of the Windows desktop app. Windows has less than a 1% market share on mobile devices in most countries, and the few underdog smart speakers that used Microsoft Cortana, are struggling to sell or moving to a rival system.

Amazon is the surprise winner in the smart speaker battle, with a 28% global market share in 2019/2020 and 53% in the US. This beats Google’s 24.9% global market share and 30.9% in the US.[12] While Amazon does own a search engine called A9 (ran by former WebCrawler and AltaVista execs), the technology focusses on product and enterprise search. Instead, Amazon Alexa web searches are powered by Bing, giving Microsoft an upper hand over Google for the first time. Sometimes it pays to be 2nd best, especially when a trillion-dollar company is an arch-rival of the top search engine.

The problem with voice search is that no matter how realistic the synthetic voices sound, nobody wants to hear a computer read the web pages of the top ten search results to them. Voice search will remain a mostly one-way conversation, used to answer simple questions, carry out tasks and sometimes complete basic transactions. They say that a picture is worth a thousand words - Alexa and Google would take roughly 10 minutes to say those words compared to the seconds needed to skim-read a web page and identify the information required.

The Future of Search

Machine Learning driven chatbots and answering services are likely to play a more significant part in everyone’s lives in the future. Financial institutions are already using technology to replace their human personnel, for factual, transactional or numerical tasks (checking balances, opening accounts, overdraft increases, credit checks and chargebacks).

Search Engines are starting to dip their toes into these waters as well, with intelligent flight and credit card comparison services, served directly on the search result. It’s only a matter of time before search engines takeover the entire transaction process from the SERP (Search Engine Results Page). The user will be saved the hassle of navigating through and ordering on a third-party website. The products, images, content, shopping cart and checkout can all be served on the SERP. Search engines would get a cut of the overall transaction, instead of just a dollar for the click. Users feel safer with their money stored and managed in a “Google Wallet”.

Businesses will simply be thankful that the search engines haven’t wiped them out completely (yet).

Google Glass - Wearable Tech

Glasses & Wearable Tech

Google Glass was a massive flop, largely down to marketing and PR. They forgot that they were creating a fashion/lifestyle accessory that everyone will put on their face, not just a clever piece of tech. Fashion is driven by influencers that people aspire to be. Nobody aspires to be the tech blogger Robert Scoble or that guy who still lives in his parent’s basement. By distributing the preview devices to Silicon Valley keyboard warriors, that would ordinarily review Google’s software products; the company had an image problem from the outset. Overweight creepy Robocop impersonators aren’t going to sell the device, as well as Beyonce or David Beckham would. Don’t worry though; you’ll know if they’re photographing you, as they have to wink to take a picture. wink wink mmm… RAM.

But the wearable tech world is still booming, and smart glasses aren’t dead yet. Google Glass is finding a new market in the B2B space, helping warehouse workers locate products and doctors to look up patient data. New brands are popping up such as North and Lance, that would know how to get an Instagram celeb on-board. Amazon is building their own Glass prototypes, with Jeff Bezos’ black book of Hollywood A-listers to support it.

Smart glasses or contact lenses are the sensible next step for search, being handsfree but also visual. More tasks are possible, and more visual searches answered. It can replace your desktop screen, a lot better than a novelty radio alarm clock called Alexa.

Will anyone overtake Google?

Answering 40,000 searches a second[13], their competitors have a steep hill to climb.

Google searches since the start of