Search – it started earlier than you think.

A very brief history of search

In this post Martin White describes the history of search. It began earlier than you think…

Intranet Focus provides information management and intranet management consulting services. They also regularly publish a Research Note packed with great stuff.

In the November issue of their Research Note, there is an interesting piece on the history of Search. Martin White, the Managing Director, has granted me permission to publish it here (see below).

By the way – Martin has recently published a book –
Enterprise Search: Enhancing Business Performance.



It’s certainly on my Christmas list this year…

A very brief history of search

Search came into prominence with the advent of the web search services in the 1990s, notably Alta Vista, Google, Microsoft and Yahoo. However the history of search technology goes back much further than this. Arguably the story starts with Douglas Engelbart, a remarkable electrical engineer whose main claim to fame is that he invented the mouse that is now a standard control device for personal computers. In 1959 Engelbart started up the Augmented Human Intellect program at the Stanford Research Institute in Menlo Park, California. One of his research students was Charles Bourne, who worked on whether it would be possible to transform the batch search retrieval technology developed in the 1950s into a service based on a large mainframe computer which users could connect to over a network.

By 1963 SRI was able to demonstrate the first ‘online’ information retrieval service using a cathode ray tube (CRT) device to interact with the computer. It is worth remembering that the computers being used for this service had 64K of core memory. Even at this early stage of development the facility to cope with spelling variants was implemented in the software.  Other pioneers included System Development Corporation, Massachusetts Institute of Technology and Lockheed. The main focus of these online systems was to provide researchers with access to large files of abstracts of scientific literature to support research into space technology and other large scale scientific and engineering projects.

These services were only able to search short text documents, such as abstracts of scientific papers. In the late 1960s two new areas of opportunity arose which prompted work into how to search the full text of documents. One was to support the work of lawyers who needed to search through case reports to find precedents. The second was also connected to the legal profession, and arose from the US Department of Justice deciding to break up what it regarded as monopolies in the computer industry (targeting IBM) and later the telecommunications industry, where AT&T was the target. These actions led IBM in particular to make a massive investment into full-text search which by 1969 led to the development of STAIRS (Storage and Information Retrieval System) which was subsequently released in 1973 as a commercial IBM application. This was the first enterprise search application and remained in the IBM product catalogue until the mid-1990s.

One of the core approaches to information retrieval is the use of the vector space model for computing relevance developed by Professor Gerald Salton of Cornell University over a period of two decades starting in 1963.  The vector space model procedure uses a cosine vector coefficient to compare the similarity of the content of the document to the query terms. This is the basis for most of the enterprise search applications with the notable exceptions of Recommind (which uses Probabilistic Latent Semantic Indexing) and Autonomy.

In 1984 Dr. Michael Porter, at the University of Cambridge, wrote Muscat for the Cambridge University MUSeum CATaloguing project. Over the ensuing decade this software was arguably the first to use probability theory in natural language querying, focusing on the relative value of a word – either in the search expression, or in the document being indexed. Identifying links and correlations between significant words that co-exist together across the whole document collection creates a probabilistic model of concepts. Using a probabilistic approach to determining relevance dates back to research undertaken at the RAND Corporation in the late 1950s and by the late 1980s there was a substantial amount of research into the use of Bayesian probability models for information retrieval.

The history of Autonomy dates back to the formation in 1991 of Cambridge Neurodynamics by Dr. Mike Lynch. Cambridge Neurodynamics used neutral network and pattern recognition approaches to fingerprint recognition. In 1996 Dr. Lynch founded Autonomy together with Richard Gaunt with $15 million in funding from investors including Apax Venture Capital, Durlacher and the English National Investment Company (ENIC). The novel step was not just the use of Bayesian statistics but the combination of these statistical approaches with non-linear adaptive signal processing (used by Cambridge Neurodynamics for analysing fingerprint images) of text.  For that time the level of investment in a company with no commerical track record was quite remarkable. In 1998 the company was floated on EASDAQ which capitalised the company at around $150 million, and its shares rose quickly from $15 in October 1999 to $120 in March 2000. This valued the company at over $5 billion.

The company was floated on the London Stock Exchange in 2000, and became the only publicly-quoted search company in the world. This was important for procurements in both the corporate and public sector given that all other search companies remain privately held and do not disclose earnings and profits other than under a non-disclosure agreement with a prospective customer.

Latent Semantic Indexing dates from the late 1980s and Probabilistic Latent Semantic Indexing from the late 1990s and among other features provide solutions to the issues raised by different words having the same meaning and the same word having different meanings.

A big thanks to Martin for this information, and for bringing to my attention the names of Gerald Salton, and Douglas Englebart. I recommend that you click on the below links and read more about the fascinating work that these two have done.

I also highly recommend that you checkout  Intranet Focus’s site, and read some of the great stuff there. 

Recommended Reading
  • Gerald Salton (Wikipedia)
  • Douglas Englebart Institue (website)
  • Intranet Focus (website)
  • Martin White (Goggle hits)
  • Enterprise Search: Enhancing Business Performance.

Running a perfect Enterprise Search project

Search experts discussing an enterprise search project

How should a perfect Enterprise Search project be run?

In this post, based on a LinkedIn discussion, I describe a meeting of some of the key players in Search who get into a great discussion about how running an Enterprise Search project…

Starring

  • Charlie Hull
  • Martin White
    (Author of Enterprise Search: Enhancing Business Performance)
  • Ken Stolz
  • Otis Gospodnetić
  • Jan Høydahl
  • Stephanus van Schalkwyk
  • Helge Legernes
  • Gaston Gonzalez
  • Mike Green

A conversation about Running a perfect Enterprise Search project

It was Friday evening, and Charlie was meeting his friends for a drink. They all worked in IT and had, between them, years of experience, especially in enterprises and enterprise search, and liked to get together to catch up with what each was doing.

After a few pints and small talk, Charlie said “Guys, what do you all reckon would be the best way to build a large-scale enterprise search project?”

Martin, who had a lot of experience in this area, looked up and said, “The main thing is that you should never underestimate what is required to get the best from a search investment.”

Charlie nodded in agreement. “But how can we help the client understand what sort of a commitment is needed?”

Ken suggested using an Agile/Scrum approach for the analysis of what the client needed as well as the development of the search UI.

“Hear hear” called out the others. Otis took the chance to follow that up with “you need someone who really understands what search is all about”. Martin glanced at him and nodded. Otis carried on. “Someone who cares about search metrics, and knows what changes need to be made to improve them.”

Jan chimed in “I agree with you on some points. You‘ve got to make sure that you include all the stakeholders, and also, educate the customer. Get everyone in the same room, and start with a big picture, narrowing it down to what is actually required. And, yes, create demo’s of the search system using “real data”. It helps the customer understand the solution better.” “However,” he continued. “I’m still careful about forcing a Scrum approach on a customer that might be unfamiliar with it.”

Stephanus put down his glass. “I’ve just finished a Phase I implementation at a client. The critical thing is to make sure you is that you set the client’s expectations and get buy-in from their technical people. Especially in security and surfacing. And I agree with Jan. There are still a lot of companies that don’t use Agile, or Scrum, at the moment.”

Sitting next to Stephanus was Helge. He began to speak. “There are a few important things. Make sure you’ve got Ambassadors – people who really care and promote the project. And ask the important question – ‘How can the search solution support the business so that they can become more competitive?’ It might be necessary to tackle this department by department. Get the business users and content owners together, but as Stephanus just said, don’t forget IT. And make sure that the governance of the system is considered.

Stephanus smiled. “Yes – the workshop idea is a definite must.”

Gaston, who was sitting next to Charlie, said “An Agile approach has worked for me in the past. Creating prototypes is important. Most clients don’t know what they want until they see something tangible.” “Ok,” said Charlie, “how has that worked?”

Gaston continued “Build a small team consisting of  a UI designer, a developer, a search engineer, someone from the IA team, and no more than two of the business users. Having someone there from QA is also handy. Start with a couple of day-long workshops to go over project objectives, scoping and requirements gathering. Use one-week sprints, and then aim to produce workable prototypes. At the end of the week, schedule a time where the prototype can be demoed. The point is to get feedback about what is working, and what the goal for the next sprint should be.

Mike, the last one in the group, looked around at everyone, and then back at Charlie, and said. “Charlie – there’s a lot of great advice here. One important thing to remember is that you have to work with the client to ensure that the search solution is part of the strategy. As the others have already mentioned, work with the client and educate them. Getting all the stakeholders together for some common education, collaboration and planning can really go a long way towards getting the necessary buy-in and commitment needed for a successful project. It also is great for setting expectations and making sure everyone is on the same page.”

Charlie was impressed. He had some pretty smart friends. “Thanks guys. You’ve all had some excellent points. Let me buy you all another round”.

 Key Takeaways – Running an Enterprise Search project

  • Don’t underestimate what is required to get the best from a search investment.
  • Lead the users through the process gently. Use demonstrations and an Agile approach when trying to understand what their real user requirements are. Do the same for the development of the search UI.
  • Have at least one person who really understands search, and search metrics.
  • Ensure that you have buy-in from the departments involved, and especially IT.
  • Produce workable prototypes – these help the users understand what they are getting.
  • Ensure that everyone involved is on the same journey – include educating the users.

 

Martin White’s book

Martin White (who was involved in this discussion) has written a book – Enterprise Search: Enhancing Business Performance. You can check it out on Amazon here.

 

Interesting Resources

  • Why All Search Projects Fail by Martin White (CMS Wire)
  • Designing the Search Experience: The Information Architecture of Discovery by Tony Russell-Rose
  • How to Evaluate Enterprise Search Options by James A Martin (CIO.com)
  • Developing an enterprise search strategy by Martin White (Intranet Focus)

Disclosure – some of these links are affiliate links.