The Evolution Of Enterprise Search
There is a popular misconception that search emerged in conjunction with the Internet. In fact, enterprise search dates as far back as the 1960s when IBM developed an early internal search engine. What the advent of the internet and Google did bring about was a new set of expectations surrounding enterprise search. Google set the standard and enterprise search was expected to follow suit.
But, it didn’t.
Ironically, Google – as one of the leaders in enterprise search – are partly to blame for this failure. Google’s enterprise search offering – the Google Search Appliance (GSA) – was designed to bring the power of Google search to the enterprise. The basic premise of the system was correct: enterprise content storage is, and always will be, heterogeneous.
So, what Google and other enterprise search leaders did was design a system that integrated with a range of enterprise systems and content repositories like SharePoint, wikis, intranets, portals, content management systems, and other business applications.
Users then access and search through their enterprise content with GSA just like they would perform a search on google.com. While, on the surface, this approach should work it ultimately fails to solve the enterprise search issue.
The reason being is that enterprise search is a totally different ball game to internet search. The GSA and other enterprise search solutions are based on web fundamentals, an area which Google quite obviously excels in.
However, web search relies heavily on “page rank” which looks for links in web pages, and gives weight to pages with lots of links pointing to it. But, when you are dealing with enterprise content, it is a different game entirely. Documents and slides don’t have a notion of links to one another e.g. a particular Office document doesn’t reference another one programmatically, or even other versions of that document.
Rule-based systems have typically been built along the forms of: If X Then Do Y Or Else If P Then Do Q etc… These types of rules are extremely easy to understand and easy to code. But, things quickly get out of hand: When a system gets operationalized, one starts with 100 scenarios with 100 rules to handle it. As time goes by we encounter more and more exceptions and start making more rules to keep exceptions under control.
Other enterprise search vendors use a term frequency/inverse document frequency model. tf-idf is is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus. The trouble with tf-idf is that it is vulnerable to common words and does not consider context. tf-idf can only rank documents at the lexical level.
For enterprise search to be as effective and relevant as Google’s web search, it must consider a range of inputs and sources. Enterprises today need an approach to surfacing relevant content that leverages metadata, user activity, and content analysis to rank search results. The solution must then provide the user with a visual preview option to avoid opening and closing multiple documents.