Index Sharding and Query Routing in Distributed Search Engines

NSF III-1718680: Index Sharding and Query Routing in Distributed Search Engines

This page describes research conducted at the NYU Tandon School of Engineering by Prof. Torsten Suel and his research group, funded by the National Science Foundation under grant III-1718680: Index Sharding and Query Routing in Distributed Search Engines, and by a grant from Amazon. We first give a high-level description of the problems addressed and the studied approaches, written for the non-expert. Then we describe our work in more detail, and list project participants, research publications and manuscripts, and available software and data sets.

1. Introduction and Motivation
2. Our Research Efforts
2.1. Faster Single-Node Query Processing
2.2. Distributed Search Methods
2.3. Load Balancing and Query Routing in Distributed Search Engines
2.4. Software Infrastructure and Reproducibility
2.5. Other Topics
3. Project Participants
4. Project Publications and Manuscripts
5. Software and Datasets

1. Introduction and Motivation

This project focuses on the efficiency of large distributed search engines. By this we mean search systems, such as Google Search, Bing and others, where the document collection is partitioned into subsets called shards that are assigned and replicated across a large number of machines in a data center. Queries in such systems are processed by (1) routing the query to machines holding the needed shard replicas, (2) processing the query locally in the machines, and (3) aggregating the results supplied by the different machines into a Search Engine Result Page (SERP) returned to the user.

When optimizing the efficiency of a search architecture, we need to balance two objectives, latency and throughput, while of course returning high quality results. This is often modeled via a Service Level Agreement (SLA) that may, e.g., specify that 90% of all queries much be answered in 100ms, and 99% in 250ms. (More complex SLAs may also specify degradations in search quality that users are willing to accept.) Thus, the goal is to maximize the number of queries per second (the throughput) that can be processed with existing hardware and energy resources, while satisfying the SLA.

Overall, the query processing efficiency of a search engine mainly depends on the following three aspects:

how efficiently queries can be processed on individual machines and shards (single-node query processing)
how documents are partitioned over shards, and how shards are selected for an incoming query (distributed search)
how queries are routed to shard replicas in a way that balances load among the machines while allowing for node failures and reducing tail latencies (load balancing and query routing)

Our research focuses on all three aspects, and on the interplay between them. For single-node query processing, we focus on index reordering methods, faster top-k query processing, the role of threshold estimation techniques, and sparse learned indexing methods. For distributed search, we study index sharding methods and selective search and index tiering approaches. For load balancing and query routing, we look at the performance of existing and new replica assignment and load balancing methods under different workloads, including selective search workloads, and at policies for reissuing queries.

In addition to publishing our research in academic venues, we also aim to create software prototypes that are useful to other researchers, and to foster reproducibility of research results. In particular, this project supports the development of a toolkit for research on indexing and query processing called PISA (Performant Indexes and Search for Academia) [1] in which we implement many of our proposed techniques. More details are provided below.

2. Our Research Efforts:

We now provide more details on our research efforts, organized into the three different thrusts discussed above. Our research involved four PhD students at NYU Tandon, Antonio Mallia, Juan Rodriguez, Michal Siedlaczek, and Qi Wang, a number of MS and undergraduate researchers, and outside collaborators at several institutions.

2.1. Faster Single-Node Query Processing:

This research thrust builds on previous research by the PI's research group on optimizing the efficiency of query processing on a single node and shard. In particular, we focus on the candidate generation problem, where the goal is to quickly find a few hundred to a few thousand documents that can then be reranked using more complex, often machine-learned, ranking functions. One part of our work looks at the role of document and index reorderings on query efficiency. In particular, work by PhD student Qi Wang in [2] proposed the first formal model for the efficiency of conjunctive queries on reordered indexes, and described a heuristic algorithm for reordering that speeds up conjunctive queries by about 10-20% over the best previously known ordering. In [3] we performed a replication study of the current state-of-the-art reordering method, the BP algorithm of Dhulipala et al. in KDD 2016. This study confirmed the results in the original work proposing the BP algorithm, and also resulted in a complete and highly optimized implementation of the BP algorithm in the PISA toolkit. (The original paper did not make any implementation available.)

Another focus are algorithms for safe disjunctive top-k search, where the goal is to retrieve the k documents with the highest score (according to a fairly simple ranking function) that contain at least one of the query terms. This problem is particularly relevant for the case of large k, in the hundreds or thousands, where it is commonly used to identify candidates for complex reranking. Our work in [4], lead by PhD students Antonio Mallia and Michal Siedlaczek, performed an extensive experimental comparison of existing algorithms under different compression mathods. It was published as a reproducibility paper at ECIR, with optimized implementations of these methods integrated into PISA. Subsequent work by Antonio and Michal proposed improved top-k algorithms that use conjunctions for faster disjunctive queries [5], and improved live-block based methods [6] that follow the approach of Dimopoulos et al. in SIGIR 2013.

Several of the above methods require, or benefit from, the use of threshold estimation techniques that estimate the score of the k-th highest-scoring result for an incoming query. In [7] we performed an experimental comparison of the four main approaches that have been proposed, and the two most promising have been integrated into PISA. In other work, PhD student Mallia proposed improvements to the VBMW algorithm for top-k query processing [8], while PhD student Juan Rodriguez explored the potential to trade off space versus speed in index pruning methods [9]. Faster algorithms for index decompression on Graphical Processing Units (GPUs) are described in [10].

A final focus in this thrust was on sparse neural indexing, where the goal is to learn sparse index structures that can approximate the result quality obtained by much more expensive deep neural rankers while preserving the efficiency of inverted indexes and similar sparse structures. Our work in [19] proposed a method called Deep Impact that learns an optimized sparse inverted index structure that basically approximates a given transformer-based ranker. Experiments show that the approach significantly outperforms traditional simple rankers such as BM25 and closely matches the retrieval quality of more complex approaches such as Colbert and Splade. Follow-up work in [24] showed how to archieve additional speedups using an approach called guided traversal.

2.2. Distributed Search Methods:

The second research focus attempts to improve methods for distributed search, with emphasis on selective search and on index tiering methods.

Selective search is an approach described by Kulkarni and Callan in TOIS 2015 that assigns documents to shards based on topical similarity. Each incoming query is then routed only to those shards deemed most relevant to the query. Selective search can lead to significant reductions in query costs, and also allows trading off quality and efficiency during peak times and under suitable SLAs. Work in [11] by PhD students Michal Siedlaczek and Juan Rodriguez showed how to improve the efficiency of selective search by exploiting global document orderings in addition to topical sharding. This allows for exhaustive processing of the most relevant shards, while doing only a limited access into marginally relevant shards. Work by MS student Saitong Zhao looked at alternative methods for creating good topical shards.

Index tiering is a widely used optimization in distributed search engines that divides documents by overall quality, into tiers of ``high quality'' and ``low quality'' documents. All queries are evaluated on the first tier of shards containing high-quality documents, but only some queries are evaluated on the second tier. Most work on index tiering assigns complete documents to tiers. Work by PhD student Juan Rodriguez evaluated a number of different tier assignment schemes for documents in [21].

2.3. Load Balancing and Query Routing in Distributed Engines:

The third research focus is on load balancing and query routing mechanisms in distributed search, where the goal is to maximize throughput while satisfying a given SLA. Our work thus far has looked at the following three issues. First, work by undergraduate researcher Mingyang Wang and PhD student Michal Siedlaczek has implemented a simulation of various load balancing policies in the DESMO-J discrete event simulator. Experiments also consider different ways of assigning shard replicas to machines, since a good assignment is important for load balance and robustness to failures.

Second, undergraduate researcher Sofiia Barysheva has studied diffusive methods for load balancing in search engines, inspired in particular by the approach for multi-commodity flow of Awerbuch and Leighton in FOCS 1993. Under such an approach, requests are routed to shard replicas with a simple local control algorithm, and under certain conditions the system will automatically converge to a new routing policy in the presence of machine failures. The challenges here are to prove convergence, and to characterize the situations when a new fully balanced policy is possible.

The third effort focuses on reissue policies for interactive services, including search. The idea underlying reissue policies is that user experience can be adversely impacted by a few requests, say to a particular service or shard replica, that take much longer than average. This problem can be addressed by reissuing such requests to a different replica of the service or shard, as proposed by Dean and Barroso in CACM 2013 (using the term ``hedged request''). However, reissuing requests too early or frequently can increase overall system load, while waiting too long is not that useful, and thus a good reissue policy has to balance these concerns.

Our work builds on previous work by Kaler et al. in SPAA 2017, which claimed that a simple class of policies that only reissues requests at a single point in time is at least as powerful as a larger class that has multiple reissue points. In work with PhD student Michal Siedlaczek and undergraduate researcher Jingxian Xu, we have shown that this is in fact not true in general, and that policies with multiple reissue points can significantly outperform the simple case in certain scenarios. Also, while finding an optimal policy is NP Complete, there are simple algorithms that can approximate the optimal policy reasonably well.

2.4. Software Infrastructure and Reproducibility:

One important goal in this project is to contribute software prototypes that can be used by other groups to replicate and extend our work. During the project, we have created and contributed to several open-source projects, in particular PISA, FXT, and CIFF.

PISA (Performant Indexing and Search for Academia) [1] is a C++-based toolkit focusing on indexing and query processing that was initiated by PhD student Antonio Mallia, with contributions from PhD student Michal Siedlaczek and Joel Mackenzie of RMIT University. PISA provides a simple platform for experimenting with new ideas in indexing, index compression, document reordering, and query processing, with focus on search-engine efficiency research. It provides implementations of many state-of-the-art index compression, index reordering, and query processing algorithms, and many of our codes for this project have been added into PISA. PISA has already been used by several other research groups. A set of Python wrappers called PyPISA is under development.

FXT [12] is a toolkit for feature extraction from large document collections, with focus on learning-to-rank applications, that is spearheaded by Shane Culpepper's group at RMIT University, with contributions from PhD students Antonio Mallia and Michal Siedlaczek. CIFF (Common Index File Format) [13] is a definition of a common format for inverted indexes that allows different search engine prototypes to exchange index data. This facilitates reproducibility as it alleviates the impact of different parsers and parsing settings on the resulting index. CIFF is a collaboration between six institutions, with Antonio and Michal contributing from NYU.

As discussed, many of our codes are available via the PISA toolkit. However, some other codes do not really fit into PISA or have not been integrated, and are distributed via separate repositories, mostly on github.

In addition, we have published several reproducibility studies as part of this project, in particular a comparison of many state-of-the-art index compression and top-k query processing algorithms [4], a reimplementation and evaluation of the BP index reordering algorithm [3], and a comparison of threshold estimation algorithms [7]. PISA was also submitted as an entry to the 2019 Open-Source IR Replicability Challenge (OSSIRC 2019).

2.5 Other Topics:

In addition, there have been other research activities under this project that do not clearly fit into the three major themes. Internships of PhD students Qi Wang and Michal Siedlaczek at Blippar Inc., a company working on augmented reality applications, resulted in a research collaboration on index compression and query processing for instance retrieval systems (systems for recognizing objects in images), published in [14][15]. Work started by REU researcher Michael Chen, and continued by MS student Yuanhao Han, proposed a new method for incremental positional index updates, while work by REU researcher Rohan Kandi explored language models for search queries. MS Thesis research by Lina Qiu proposed extensions to the open-source Luwak subscription search engine [16]. Research by undergraduate student Zishi Deng proposed and evaluated methods for sharding social network graphs [17].

3. Project Participants and Collaborations

Participants at NYU include the PI, Torsten Suel, four PhD students, Antonio Mallia, Juan Rodriguez, Michal Siedlaczek, and Qi Wang, MS students Yuanhao Han, Lina Qiu, Mengyang Sun, and Saitong Zhao, and undergraduate researchers Sofiia Barysheva, Michael Chen, Zishi Deng, Rohan Kandi, Kejian Shi, Jingxian Xu, Mingyang Wang, Yuhong Zhang, and Victor Zheng.

The project produced four PhD graduates, Antonio Mallia, Juan Rodriguez, Michal Siedlaczek, and Qi Wang, and two MS Thesis graduates, Lina Qiu and Mengyang Sun. Lina and Mengyang are now PhD students at BU and Tsinghua, while undergraduate researcher Mingyang Wang entered the PhD program at Duke. Two undergraduate researchers, Michael Chen and Rohan Kandi, received REU support through this project.

Various parts of the project also include outside collaborators from RMIT University and U. of Melbourne (Australia), Blippar Inc., U. of Waterloo (Canada), U. of Otago (New Zealand), and University of Pisa (Italy). Finally, additional funds for parts of the project were provided by a grant from Amazon.

4. Project Publications and Manuscripts

[1] PISA: Performant Indexes and Search for Academia. Antonio Mallia, Michał Siedlaczek, Joel Mackenzie, and Torsten Suel. Proceedings of the Open-Source IR Replicability Challenge (OSIRRC 2019), July 2019. PDF

[2] Document Reordering for Faster Intersection. Q. Wang and T. Suel. 45th International Conference on Very Large Data Bases, August 2019. PDF

[3] Compressing Inverted Indexes with Recursive Graph Bisection: A Reproducibility Study. J. MacKenzie, A. Mallia, M. Petri, S. Culpepper, and T. Suel. European Conference on Information Retrieval, April 2019. PDF

[4] An Experimental Study of Index Compression and DAAT Query Processing Methods. A. Mallia, S. Siedlaczek, and T. Suel. European Conference on Information Retrieval, April 2019. PDF

[5] Using Conjunctions for Faster Disjunctive Top-k Queries. S. Siedlaczek, A. Mallia, and T. Suel. 15th ACM International Conference on Web Search and Data Mining, March 2022. PDF

[6] Fast Disjunctive Candidate Generation Using Live Block Filtering. A. Mallia, S. Siedlaczek, and T. Suel. 14th ACM International Conference on Web Search and Data Mining, March 2021. PDF

[7] A Comparison of Top-k Threshold Estimation Techniques for Disjunctive Query Processing. S. Siedlaczek, A. Mallia, T. Suel, and M. Sun. 29th International Conference on Information and Knowledge Engineering, November 2020. PDF

[8] Faster BlockMax WAND with Longer Skipping. A. Mallia and E. Porciani. European Conference on Information Retrieval, April 2019. PDF

[9] Exploring Size-Speed Trade-Offs in Static Index Pruning. J. Rodriguez and T. Suel. IEEE International Conference on Big Data, December 2018. PDF

[10] GPU-Accelerated Decoding of Integer Lists. Antonio Mallia, Michał Siedlaczek, Torsten Suel, and Mohamed Zahran. 28th International Conference on Information and Knowledge Engineering, November 2019. PDF

[11] Exploiting Global Impact Ordering for Higher Throughput in Selective Search. S. Siedlaczek, J. Rodriguez, and T. Suel. European Conference on Information Retrieval, April 2019. PDF

[12] Feature Extraction for Large-Scale Text Collections. L. Gallagher, A. Mallia, S. Culpepper, T. Suel, and B. Cambazoglu. 29th International Conference on Information and Knowledge Engineering, Resource Track, November 2020. PDF

[13] Supporting Interoperability Between Open-Source Search Engines with the Common Index File Format. J. Lin, J. Mackenzie, C. Kamphuis, C. Macdonald, A. Mallia, M. Siedlaczek, A. Trotman, and A. de Vries. CoRR abs/2003.08276, March 2020. PDF

[14] Fast Bag-Of-Words Candidate Selection in Content-Based Instance Retrieval. M. Siedlaczek, Q. Wang, Y. Chen, and T. Suel. IEEE International Conference on Big Data, December 2018. PDF

[15] Forward Index Compression for Instance Retrieval in an Augmented Reality Application. Qi Wang, Michał Siedlaczek, Yen-Yu Chen, Michael Gormish, and Torsten Suel. IEEE International Conference on Big Data, December 2019. PDF

[16] Optimizing Subscription Processing in Luwak. L. Qiu. MS Thesis, NYU Tandon School of Engineering, May 2019. PDF

[17] Optimizing Iterative Algorithms for Graph Sharding. Z. Deng and T. Suel. IEEE International Conference on Big Data, December 2021. PDF

[18] Top-K Threshold Estimation in Information Retrieval. M. Sun. MS Thesis, NYU Tandon School of Engineering, May 2019. PDF

[19] Learning Passage Impacts for Inverted Indexes. A. Mallia, O. Khattab, T. Suel, and N. Tonellotto. 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, July 2021. PDF

[20] Efficiency and Scalability of Large Search Architectures. M. Siedlaczek. Ph.D. Thesis, NYU Tandon School of Engineering, 2021. PDF

[21] Optimizing Search Engine Efficiency with Static Index Pruning and Tiering. J. Rodriguez. Ph.D. Thesis, NYU Tandon School of Engineering, 2021. PDF

[22] Optimizing Search Indexes Using Query Distributions. Q. Wang. Ph.D. Thesis, NYU Tandon School of Engineering, 2019. PDF

[23] Efficient Disjunctive Query Processing for Large-Scale Search Architectures. A. Mallia. Ph.D. Thesis, NYU Tandon School of Engineering, 2022. PDF

[24] Faster Learned Sparse Retrieval with Guided Traversal. A. Mallia, T. Suel, and N. Tonellotto. 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, July 2022. PDF

5. Software and Datasets

We strive to make codes and data sets developed under this project available for easy access by other groups. Many of our algorithms are implemented as part of the PISA toolkit at https://github.com/pisa-engine, while some other codes are outside of PISA in other github repositories. Following are links to the major code bases that are currently available.

PISA engine:
Repo
Documentation
Docker

PISA entry at OSSIRC'19 [1]:
Docker

PISA implementation of BP algorithm in [3]:
Repo

PISA implementation of Taily threshold estimation used in [7] and [11]:
Repo

PISA implementation of longer skipping in VBMW [8]:
Repo

Implementation of reordering for faster intersection [2] (not in PISA) and of the BP implementation used as baseline in [2] (different from the one in PISA):
Repo
Repo for BP Code

Indexing and data analysis libraries used in [11] (not in PISA):
Indexing Repo
Rest of code Repo

GPU Decompression Methods [10]:
Repo

FXT Feature Extraction Toolkit [12]:
Repo

CIFF Common Index File Format [13]:
Repo

DeepImpact Sparse Learned Inddexing [19,24]:
Repo