## Generic optimization in infinitely large search domains

In 1995 and the following few years, Wolpert and Macready published a technical reports and a journal article on what has come to be called the No Free Lunch theorems. These theorems say that all optimization algorithms yield the same average performance in optimizing a completely random function. Let $D$ be a finite search domain, $V \subset \mathbb{R}$ a set of real numbers, and $f \in V^D$ a function drawn according to a uniform distribution over the all possible functions $V^D$. Let algorithm mean the sequential generation of unique points $x_{n+1} \in D$ in the search domain based on previously visited points and function values according to some function $x_{n+1} = h(x_1,f(x_1),\ldots,x_n,f(x_n))$ or probability distribution $p(x_{n+1} | x_1,f(x_1),\ldots,x_n,f(x_n))$. Given these assumptions, the probability of obtaining any particular sequence of function values $f(x_1), f(x_2), \ldots$ is independent of the algorithm used to sequentially select points $x_k$. The reason is simply that the function values at any two (or more) points are completely uncorrelated and independent, so that nothing can be learned from previously visited points.

A number of commentaries and extensions of the original analysis have been published (as of today, a Cited Reference Search at Web of Science yields more than 700 hits for Wolpert and Macready’s 1997 paper). One of the most interesting and original is an article by Griffiths and Orponen, which specifically studies binary-valued functions ($V=\{0,1\}$) and a particular performance measure—the maximum value in the sequence of function values $f(x_1), \ldots, f(x_n)$ for a fixed, arbitrary number of points $n$. The NFL theorems only hold for a rather trivial class of probability distributions over $V^D$, but the specialization to maximum-value performance allows NFL-like results to be proved for certain non-trivial distributions as well. Griffiths and Orponen’s work deserves to be more well-known and it would be interesting to see it extended to functions that are not binary-valued.

But what prompted me to write this note was another development altogether. Only elementary mathematics is required to state, prove and reason about the NFL theorems when the search domain $D$ and set of values $V$ are both finite sets. This was the case considered by Wolpert and Macready. Allowing infinite sets $V \subset \mathbb{R}$ requires care, but the real mathematical difficulties arise in the case of an infinitely large search domain $D$. These cases are explored in a new paper by Auger and Teytaud. Their analysis gets rather technical, but concludes that for countably infinite search domains, the NFL theorem holds in a weaker form than for finite domains. For continuous search domains, the NFL theorem do not hold. The reason for this is that there is no way for the function values $f(x)$ and $f(x')$ of any two points $x$ and $x'$ in a continuous domain to be uncorrelated (independent). Uncountably infinite search domains are, in a sense, too big too allow the assumption that there are absolutely no correlations. The authors remark:

The deep reason for this fact is that in “bigger” space (and continuous spaces are “very” big), random fields (and distributions of fitness functions are non-trivial random fields in the continuous case) necessarily have correlations [11].

Auger and Teytaud are also able to construct optimal algorithms, which turn out to be related to a class of algorithms called Estimation of Distribution Algorithms.

References

E. J. Griffiths and P. Orponen. Optimization, block designs and no free lunch theorems. Inform. Proc. Lett. 94:55 (2005)

A. Auger and O. Teytaud. Continuous Lunches Are Free Plus the Design of Optimal Optimization Algorithms. Algorithmica (Online First, DOI: 10.1007/s00453-008-9244-5) [Also available as eprint inria-00369788]