Random Access to Grammar-Compressed Strings and Trees

Friday, March 25, 2011 - 11:00am - 12:00pm EDT

  • Location:Dibner Building, LC400
  • Contact:Torsten Suel


Rajeev Raman
University of Leicester, UK


Modern applications involving processing textual and semi-structured data need to do fairly complex computations on data held in the main memory of a  computer, which is a limited resource in many contexts. While data compression can reduce the size of data, it presents significant obstacles to operating on the compressed data, and new data structuring techniques need to be developed to overcome these obstacles. After an introduction to the issues, we will focus on one particular recent result.

Let S be a string of length N compressed into a context-free grammar S of size n. We present a representation of S achieving O(log N) random access time, and O(n) construction time and space on the RAM. We are able to generalize our results to navigation and other operations on grammar-compressed trees. To  achieve these bounds, we introduce several new techniques and data structures of independent interest, including a predecessor data structure, two "biased" weighted ancestor data structures, and a compact representation of heavy-paths in grammars.

About the Speaker

After defending his PhD thesis in October 1991 at the University of Rochester, Rejeev Raman took up a Postdoctoral Fellowship in the Algorithms and Complexity Group at the Max-Planck-Institut fuer Informatik, which is headed by Kurt Mehlhorn. In January 1993 he joined the University of Maryland Institute for Advanced Computer Studies as a Research Associate working with Uzi Vishkin. Crossing the Atlantic yet again, he joined the Algorithm Design Group at King's College London in 1994. He has been at Leicester since January 2001.