FierceCIOFierceCIOTechWatchFierceMobileITFierceContentManagementFierceGovernmentIT   FierceComplianceITFierceHealthITFierceFinanceIT

One on One with Daniel Tunkelang of Endeca

Tools
Tags
Enteprise Search
Endeca
Daniel Tunkelang

Daniel Tunkelang is the Chief Scientist and a co-founder of enterprise search vendor, Endeca. He is a an advocate of dialog-oriented approaches to information retrieval, and has organized annual workshops on Human Computer Information Retrieval (HCIR), in collaboration with researchers at MIT, IBM and Microsoft. Tunkelang also publishes The Noisy Channel. We asked him about his company, the state of enterprise search and his thoughts on some enterprise search trends:

FCM: What is the differentiator for Endeca search?

DT: In web search, we expect to type a query and get an excellent response in the top handful of results. In enterprise search, that is rarely our experience. There are a variety of reasons for this breakdown, but the consequence is that we must get away from the paradigm of the search engine as a mind reader, and instead promote bi-directional communication so that users can effectively articulate their information needs and the system can satisfy them. The approach is known as human computer information retrieval (HCIR).

Endeca combines a set-oriented retrieval approach with user interaction to create an interactive dialogue, offering next steps or refinements to help guide users to the results most relevant for their unique needs. An Endeca-powered application responds to a query with not just relevant results, but with an overview of the user's current context and an organized set of options for incremental exploration.

We often use a concierge analogy to help illustrate the difference between Endeca's solution and conventional search. What happens when you ask a hotel concierge for a restaurant recommendation? Rather than suggesting one place or handing you a list of all the restaurants in the area, the concierge asks you follow-up questions: "What kind of cuisine do you like? Do you want one in walking distance? Is this a special occasion? What kind of atmosphere are you looking for?" This process helps you better understand your options while helping the concierge better understands what you are looking for. As a result, the concierge can give you an answer that meets your unique needs and preferences.
  
This bi-directional communication between the user and the system addresses the inherent limitations of today's best-match approaches to enterprise search.
 
FCM: Many enterprise search users want a Google experience. Why do you think this is, and how can you battle that?

DT: On one hand, Google gets resounding reviews for web search. On the other hand, it gets, at best, mixed reviews in the enterprise--even within Google itself! How can the "Google experience" be good for the web and, yet, bad for the enterprise?

The answer is multi-faceted. In the enterprise, we lack the redundant and highly-social structure of the web, that is critical for PageRank and related approaches to succeed. We also have more sophisticated information needs. Specifically, we tend to ask the kinds of informational queries that web search serves poorly--the exception being when there is a Wikipedia page that addresses our particular need. Finally, web search benefits from the fact that the most popular web sites are portals or destinations, designed to help a user shop, research specialized information, communicate with other people, etc. When a web search takes a user to a page on such a site, the site takes on the responsibility for contextualizing the user's experience.

In contrast, enterprise content often consists of a heterogeneous collection of content that has a sparse link structure and whose organization is, at best, implicit in its physical and logical arrangement. Departments within an enterprise may build user-centered portals, but it's rare to see the sort of symbiosis that occurs between web search engines and the sites they index.

So a "Google experience" in the enterprise is a misleading aspiration, since even Google is unable to transfer the success of the web to a much more demanding environment. Instead, the enterprise calls for an HCIR approach that is the foundation for Endeca's offering.

FCM: Carl Frappaolo of AIIM has said that what makes enterprise search so difficult is what he calls the "digital landfill" of information, data that is spread out across repositories in the enterprise. How does Endeca search get at the information that is locked away in a variety of repositories?

DT: Carl is right that enterprise users expect information to be consolidated and made available through a single interface. Endeca has always provided connectors to standard enterprise repositories, as well as an extensible framework to connect to custom repositories. More importantly, Endeca's flexible data model accommodates complex schemas without extensive modeling, allowing each record and document to maintain its own unique structure, similar to XML. This flexibility is essential for accommodating the heterogeneity of enterprise content without reducing it to a lowest common denominator of unstructured text.

Another consideration is that, while enterprises may seek out generic enterprise search solutions, what they often need are search applications that solve specific business problems. The flexibility of Endeca's APIs and tools make it easy to build such applications on top of our information access platform.

FCM: What are some of the areas, in your view, that need improvement in enterprise search?

DT: Many people have raised the prospect of social search in the enterprise--specifically, the idea that people will tag content within the enterprise and benefit from each other's tagging. The reality of social search, however, has not lived up to the vision.

In order for social search to succeed, enterprise workers need to supply their proprietary knowledge in a process that is not only as painless as possible, but demonstrates the return on investment. We believe that our work at Endeca, on bootstrapping knowledge bases, can help bring about effective social search in the enterprise.

The other major area that comes to mind is federation. As much as an enterprise may value its internal content, much of the content that its workers need resides outside the enterprise. An effective enterprise search tool needs to facilitate users' access to all of these content sources while preserving value and context of each.
 
FCM: What impact will semantic search have on Enterprise search and what are you exploring in that area?

DT: Semantic search means different things to different people, but broadly falls into two categories: Using linguistic and statistical approaches to derive meaning from unstructured text, using semantic web approaches to represent meaning in content and query structure. Endeca embraces both of these aspects of semantic search.

From early on, we have developed an extensible framework for enriching content through linguistic and statistical information extraction. We have developed some groundbreaking tools ourselves, but have achieved even better results by combining other vendor's document analysis tools with our unique ability to improve their results through corpus analysis.

The growing prevalence of structured data (e.g., RDF) with well-formed ontologies (e.g., OWL) is very valuable to Endeca, since our flexible data model is ideal for incorporating heterogeneous, semi-structured content. We have done this in major applications for the financial industry, media/publishing, and the federal government.

It is also important that semantic search is not just about the data. In the popular conception of semantic search, the computer is wholly responsible derives meaning from the unstructured input. Endeca's philosophy, as per the HCIR vision, is that humans determine meaning, and that our job is to give them clues using all of the structure we can provide.

Related Article:
One on One with Content Management's Movers and Shakers

Twitter   Facebook   LinkedIn   StumbleUpon  
Get Your FREE FierceContentManagement Email Newsletter:
Be the first to comment

Comments

Post new comment

The content of this field is kept private and will not be shown publicly.

More information about formatting options

To combat spam, please enter the code in the image.