Caching Search Engine Results over Incremental Indices
A Web search engine must update its index periodically to incorporate changes to the Web. We argue in this talk that index updates fundamentally impact the design of search engine result caches, a performance-critical component of modern search engines. Index updates lead to the problem of cache invalidation: invalidating cached entries of queries whose results have changed. Naive approaches, such as flushing the entire cache upon every index update, lead to poor performance and in fact, render caching futile when the frequency of updates is high. Solving the invalidation problem efficiently corresponds to predicting accurately which queries will produce different results if re-evaluated, given the actual changes to the index.
To obtain this property, we propose a framework for developing invalidation predictors and define metrics to evaluate invalidation schemes. We describe concrete predictors using this framework and compare them against a baseline that uses a cache invalidation scheme based on time-to-live (TTL). Evaluation over several workloads shows that selective invalidation of cached search results can serve results of near-perfect freshness with small or even negligible impact on the cache's hit rate, in contrast with TTL which must severely reduce the cache's performance in order to achieve the same freshness.
[Joint work with Roi Blanco, Eddie Bortnikov, Flavio Junqueira, Luca Telloli, Hugo Zaragoza, and Kolman Vornovitsky]
About the Speaker
Ronny Lempel joined Yahoo! Research in October 2007 as the director of Yahoo! Israel Labs, where he oversees R&D activities at the cutting edge of Web sciences. Prior to joining Yahoo! Labs, Ronny spent 4.5 years at IBM's Haifa Research Lab with the Information Retrieval Group, where his duties included research and development in the area of enterprise search systems. Prior to joining IBM, Ronny received his BSc, MSc and PhD from the Faculty of Computer Science at Technion, Israel Institute of Technology in 1997, 1999 and 2003 respectively. Both his MSc and PhD focused on search engine technology. During his PhD studies, Ronny spent two summer internships at the AltaVista search engine.