# Summary of Research

My research interests mainly lie in the general area of sequential decision making under uncertainty, where an agent interacts sequentially with an unknown environment in order to maximize some specified reward. As the environment is unknown, the agent is faced with the dilemma between exploitation and exploration, i.e., acting optimally based on current knowledge of the environment or selecting actions which are most informative about the environment. The goal here is to design appropriate strategies for the agent which have low regret, i.e., sub-optimality w.r.t. a hypothetical agent which acts optimally with full knowledge of the environment.

I have worked on the design and analysis of information acquisition strategies for the following problems which lie within the general framework described above:

- Sequential Model Based Optimization
- Active Learning
- Adaptive Resource Allocation for Statistical Estimation

Besides these, I have also worked on some other problems such as phylogenetic reconstruction and control of parallel queues which are described in the “Other Results" section below.

### 1. Sequential Model Based Optimization (SMBO)

In SMBO, the goal of an agent is to design an adaptive strategy to query a noisy and expensive black-box function $f:\mathcal{X}=[0,1]^d\mapsto \mathbb{R}$, guided by a surrogate model to efficiently learn about its maximizer.
Most algorithms in this area use Gaussian Processes~(GPs) as surrogates, and are analyzed either in the fully Bayesian framework~(Bayesian Optimization) or in the frequentist setting~(Kernelized Bandits).
*In my research in this area, I have proposed new algorithms with the best-known theoretical guarantees and lower computational complexity and also derived algorithm-independent impossibility results characterizing the fundamental performance limits*.

Bayesian Optimization~(BO): In (Shekhar and Javidi, 2018)

^{1}, we proposed an algorithm which adaptively partitions the input space and zooms into the near-optimal regions. The main technical result underlying this adaptive partitioning was a bound on the supremum of a class of GPs in terms of the radius of their index sets. The proposed algorithm has lower comlexity and better regret bounds than existing baselines, and can also be extended to deal with*contextual bandits*.Kernelized Bandits: Here, $f$ is assumed to be a fixed but unknown element of the RKHS (Reproducing Kernel Hilbert Space) of a given kernel $K$. In (Shekhar and Javidi, 2020a)

^{2}, we proposed an algorithm which exploits the embedding of certain RKHS into Holder spaces, to augment the global GP surrogate with local estimators over an adaptively constructed partition to guide the querying strategy. The proposed algorithm again achieves improved, and in some cases even minimax near-optimal regret bounds and was also shown to outperform several baselines in empirical benchmarking tasks.**Incorporating Gradient Information in BO.**Most prior works in BO literature only use the*zeroth-order*information about the unknown function $f$. However, in practice, the agent often has access to the noisy gradient (i.e., first-order) information as well. In (Shekhar and Javidi, 2020b)^{3}, we*provided a precise characterization of the benefits of incorporating gradient information in BO*by- deriving information theoretic lower bound of $\Omega(\sqrt{2^d n})$ for any zeroth-order algorithm, and
- proposing a first-order algorithm which, under some conditions, achieves a $\mathcal{O}(d (\log n)^2 )$ regret.

### 2. Non-parametric Active Learning

The general ideas of using adaptive partitioning and local estimators introduced in our BO and kernelized bandits work is quite versatile, and it can also be utilized in several related problems with appropriate low-level technical modifications. In (Shekhar and Javidi, 2019) ^{4} and
(Shekhar, Ghavamzadeh and Javidi, 2020) ^{5}, we showed that these ideas are well suited to the problems of active non-parametric estimation and classification.

GP Level Set Estimation. In the level-set estimation problem~(LSE), given a budget $n$, the goal of an agent is to adaptively query a black-box function $f:\mathcal{X}\mapsto \mathbb{R}$, in order to accurately estimate its $\tau-$level set of $f$, i.e., the region of $\mathcal{X}$ where $f-$value is at least $\tau$. In (Shekhar and Javidi, 2019)

^{4}, we proposed an algorithm for the LSE problem, which combines the adaptive partitioning strategy along with a novel point-selection rule to*achieve tighter theoretical bound*than existing results.Active Classification with Abstention. Classification with abstention refers to the classification problems in which the learner can also abstain from declaring a label, i.e., a

*“don’t know”*option. It models several applications such as medical diagnostic systems, voice assistant devices and content filtering. In (Shekhar, Ghavamzadeh and Javidi, 2020)^{5}, we proposed the first active learning algorithm for this problem and**precisely characterized the benefits of active over passive learning for this problem**. We first obtained information-theoretic lower bound of $ \Omega(n^{-\beta(\alpha+1)/(2\beta + d + \alpha\beta)})$ for the excess risk of any passive algorithm~(here $\alpha$ and $\beta$ are the*margin*and*smoothness*parameters resp.). Then, we showed that the corresponding minmax rate in the active setting is $\Theta\left( n^{-\beta(1+\alpha)/(d+2\beta)}\right)$.

### 3. Adaptive Resource Allocation

In the next project, I worked on the problem of adaptive resource allocation among $K$ probability distributions to estimate all of them uniformly well. This is motivated by the problem of constructing a uniformly accurate model of the transition structure of an unknown Markov Decision Process (MDP), given a finite exploration budget. We studied this problem in two settings:

Learning probability distributions with bandit feedback. In (Shekhar, Javidi and Ghavamzadeh, 2020)

^{6}, we worked under the simplifying assumption that the agent can transition between any pair of the $K$ distributions and provided a general treatment for various distance measures such as*total-variation*,*KL-divergence*and*separation distace*. Our result greatly expands the scope of this problem, referred to as*active learning in bandits*, as the prior work was restricted to learning the means of $K$ distribution in squared-error sense. Additionally, we also proved the optimality of our proposed allocation scheme by deriving information-theoretic lower bounds.Estimating Transition Structure of MDPs. In (Tarbouriech et al, 2020)

^{7}, we relaxed the assumption of access to generative model, and proposed an adaptive non-stationary policy to estimate the transition structure of an unknown MDP~(i.e., $K= SA$) in $\ell_1$ distance. Since, here we cannot sample from arbitrary distributions (i.e, the $(s,a)$ pair), the key idea was to implement policies which spend an*appropriate*fraction of time in the different states.

### 4. Other Results

Besides my work on sequential learning and optimization described above, I have also worked on the following problems:

Sample Complexity of Species-Tree Estimation. In (Shekhar, Roch and Mirarab, 2018)

^{8}we showed that $\Theta (f^{-2} \log n)$ gene trees are both necessary and sufficient for efficiently reconstructing the underlying species tree for the popular reconstruction algorithm ASTRAL. Here $f$ is the length of the shortest branch and $n$ is the number of leaves (i.e., extant species).Control of Parallel Queues with Observation Delay. In (Shekhar, 2017)

^{9}, we obtained the optimal policy for a system of parallel queues with one server under a fixed observation delay $D>0$. The optimal policy is the one which allocates the server to the queue whose occupancy stochastically dominates that of all other queues. This result generalizes the Longest Connected First (LCF) policy of Tassiulas and Ephremides~(1993) to the case of delayed observations.

#### Related Publications

Shekhar and Javidi, (2018): Gaussian Process Bandits with Adaptive Discretization.

*Electronic Journal of Statistics*link ↩︎Shekhar and Javidi, (2020): Multi-scale zero order optimization of smooth functions in an RKHS.

*preprint*link ↩︎Shekhar and Javidi, (2020): Significance of Gradient Information in Bayesian Optimization.

*in preparation*↩︎Shekhar and Javidi, (2019): Multiscale Gaussian Process Level Set Estimation.

*AISTATS*link ↩︎Shekhar, Ghavamzadeh and Javidi, (2020): Active Learning for Classification with Abstention.

*ISIT*link ↩︎Shekhar, Javidi and Ghavamzadeh, (2020): Adaptive Sampling for Learning Probability Distributions.

*ICML*link ↩︎Tarbouriech, Shekhar, Pirotta, Ghavamzadeh and Lazaric, (2020): Active Model Estimation in MDPs.

*UAI*link ↩︎Shekhar, Roch and Mirarab, (2018): Species tree estimation using ASTAL: how many genes are enough?

*IEEE TCBB*link ↩︎Shekhar: Dynamic Server Allocation with Observation Delays.

*unpublished*link ↩︎