A priori bias in the Dembski-Marks representation

Dembski and Marks (2009b) recently published a minimalistic (and simplistic) representation of a search problem consisting of a search space \Omega and a distinguished target set T \subset \Omega (blogger reactions: 1, 2, 3, 4, 5, 7). From the discussion in the paper and two other articles it is clear that the authors’ object of study is related to, though not equivalent to, the issues raised by Wolpert and Macready’s No Free Lunch theorems. Despite its minimalistic character, the Dembski-Marks representation is not less restrictive than the Wolpert-Macready representation of a search/optimization problem. The Dembski-Marks representation, i.e., a distinguished target in a search space, can easily be introduced as an extra feature in the Wolpert-Macready representation. However, it is not possible to introduce the full Wolpert-Macready representation within the Dembski-Marks representation. Indeed, the absence of a constant, distinguished target set is a prerequisite for all No Free Lunch theorems. It is therefore interesting to estimate how much of a restriction it is to make the Wolpert-Macready representation conform to the Dembski-Marks representation.

To this end, let V \subset \mathbb{R} denote a range of fitness values. Wolpert and Macready considered the average performance of optimization algorithms over the set of all fitness functions F = V^{\Omega}. Sharpened No Free Lunch theorems apply also to non-uniform probability measures over V^{\Omega} with special permutation symmetry (Igel and Toussaint). In this representation, a fitness function is identified with a problem instance; in particular, success criteria for search algorithms cannot refer to anything else than the fitness function values. A target is obtained by first letting V_{+} \subset V and V_{-} = V\setminus V_{+} be the range of satisfactory and unsatisfactory fitness values, respectively. Second, it is necessary to restrict attention to only those fitness functions that map the target set into the satisfactory fitness range and the rest of the search space into the unsatisfactory fitness range. That is, the restriction is to a subset F_T \subset F of functions f with images f(T) \subseteq V_{+} and f(\Omega\setminus T) \subseteq V_{-}. It should be noted that the No Free Lunch theorem does not apply to the class of functions F_T, because no probability measure on F_T can have the necessary symmetry and search algorithms that visit points in T before points in \Omega\setminus T clearly have superior performance to other search algorithms. The conclusion of the No Free Lunch theorem, i.e., that all search algorithms have equivalent performance, is directly contradicted in this case—the theorem cannot apply.

How biased and severe the restriction from F = V^{\Omega} to F_T is may be measured by the ratio

\displaystyle  \frac{|F_T|}{|F|} = \frac{|{V_{+}}^T| \cdot |{V_{-}}^{\Omega\setminus T}|}{|V^{\Omega}|} = \left(\frac{|V_{+}|}{|V_{-}|}\right)^{|T|} \left(\frac{|V_{-}|}{|V|}\right)^{|\Omega|},

where |\cdot| denotes the number of elements of a finite set and at least \Omega is assumed to be finite. (V may be taken to be either finite or a continuum of finite measure, in which case |\cdot| denotes the Lebesgue measure where appropriate.)

Dembski and Marks use the word `information’ to mean certain probabilities and restrictions on logical possibilities arising from model assumptions. Their objections to model assumptions take the form of reified talk about information. Example:

“Simulations like Dawkins’s WEASEL, Adami’s AVIDA, Ray’s Tierra, and Schneider’s ev appear to support Darwinian evolution, but only for lack of clear accounting practices that track the information smuggled into them.” (Dembski and Marks, 2009b)

Assumptions are a necessary ingredient in any mathematical model or calculation. Dembski and Marks are as guilty of making model assumptions as anyone else. Adopting their terminology, the logarithm of the ratio |F|/|F_T| may be termed information and taken to measure how strong a priori assumptions Dembski and Marks rely on in their discussions of search problems. Thus,

\displaystyle J = -\ln \frac{|F_T|}{|F|} = |\Omega| \ln \frac{|V|}{|V_{-}|} + |T| \ln \frac{|V_{-}|}{|V_{+}|}

\displaystyle = |\Omega| \ln \frac{|V|}{|V|-|V_{+}|} + |T| \ln \frac{|V|-|V_{+}|}{|V_{+}|}

\displaystyle = |\Omega| \left(\beta_T \ln \frac{1-\alpha_{+}}{\alpha_{+}} - \ln(1-\alpha_{+})\right)

where \alpha_{+} = |V_{+}|/|V| is the fraction of satisfactory fitness values and \beta_T = |T|/|\Omega| is the relative size of the target. It is easily seen that, for a given \beta_T, the minimum of J is attained when \alpha_{+} = \beta_T. Furthermore, J decreases with decreasing \beta_T (assuming \alpha_{+} \leq \frac{1}{2}). The smallest possible value of \beta_T is 1/|\Omega|. Therefore a lower bound on the amount of information smuggled by Dembski and Marks is obtained by inserting \alpha_{+}=\beta_T=1/|\Omega| into the above equation, yielding

\displaystyle J \geq \ln(|\Omega|-1) - |\Omega| \ln(1 - |\Omega|^{-1}) \geq 1 + \ln |\Omega| - \frac{1}{|\Omega|}.

This lower bound is never attained in practice, at least not for large search spaces. For instance, most computer implementations make use of 64-bit floating-point numbers to represent real numbers. All numerical calculations are then carried out approximately to a relative precision of 10^{-16} so that the case \alpha_{+} < 10^{-16} hardly ever is meaningful. A more realistic estimate in practice is therefore

\displaystyle J \geq -|\Omega| \ln(1-10^{-16}) = 10^{-16} |\Omega|,

or J/|\Omega|\geq 10^{-16} nats per search space point. The below figure shows how J/|\Omega| varies with (left) \alpha_{+} for fixed \beta_T and (right) \beta_T for fixed \alpha_{+} (note the logarithmic scales).

Information as a function of (a) alpha and (b) beta.

In the case of the Weasel program used by Dawkins to illustrate the difference between cumulative natural selection and random sampling, the search space consists of all 28-character sequences that can be formed from a symbol set consisting of 26 letters and a blankspace, giving |\Omega| = 27^{28} \approx 10^{40}. The target contains a single sequence, |T|=1. The range of fitness values contains exactly |V|=28 different values since it is based on a Hamming distance and a single value is satisfactory so that |V_{+}| = 1. This gives \alpha_{+} = \frac{1}{28}, \beta_T = 27^{-28} and J = 4.35 \times 10^{38} nats. Thus, Dembski and Marks smuggle 4.35 \times 10^{38} nats (or 6.28 \times 10^{38} bits) of information into their analysis of the Weasel program. In order to reach the theoretical lower bound, which in this case is 93.28 nats (or 134.6 bits), the Weasel program must be modified so that fitness is evaluated on an absurdly biased scale with 10^{40} possible unsatisfactory fitness values for every possible satisfactory value.

The fundamental lesson here is that the Dembski-Marks approach to evaluating model assumptions is both arbitrary and a poor reflection of scientific reasoning. Model assumptions are not accepted or rejected based on a numerical measure of how many logical possibilities that are ruled out or how far probability distributions deviate from uniform measures. Rather, model assumptions are accepted or rejected based on predictive and descriptive accuracy, domain of applicability, ability to unify existing models and empirical knowledge, and so on. Merely talking about target sets requires assumptions that impose structure on the search space and processes related to it. Dembski and Marks (2009a) mention “a simple self-replicating molecule, an autocatalytic set, or a lipid membrane” as examples of biological targets. However, these chemical concepts presuppose model assumptions from physics, chemistry and biochemistry and they could hardly be independent of the chemical processes by which they are thought to arise. The same chemistry that allows us to define self-replicating molecules, autocatalytic sets and lipid membranes also has consequences for how such phenomena might arise in nature. In the language of the Dembski-Marks (or Wolpert-Macready) representation, this means that the target set (or fitness function) and the search algorithm are not independent of each other—they are both consequences of chemical model assumptions.


W. A. Dembski and R. J. Marks II. Life’s Conservation Law. (2009a) [Available at the authors’ homepage.]

W. A. Dembski and R. J. Marks II. Conservation of Information in Search: Measuring the Cost of Success. IEEE Trans. Sys. Man. Cybernetics A 39:1051 (2009b) [Also available at the authors’ homepage.]

W. A. Dembski and R. J. Marks II. The Search for a Search: Measuring the Information Cost of Higher Level Search. (2009c) [Available at the authors’ homepage.]

C. Igel and M. Toussaint. No Free Lunch Theorem for Non-uniform Distributions of Target Functions. J. Math. Model. Algorithms 3:313 (2004) [eprint (different title): cs.NE/0303032]

D. H. Wolpert and W. G. Macready. No Free Lunch Theorems for Optimization. IEEE Trans. Evol. Comp. 1:67 (1997)

Update: Manual trackback from The Panda’s Thumb.


19 Responses to A priori bias in the Dembski-Marks representation

  1. […] the Metropolis Sampler has published a more technical analysis of the paper, concluding […]

  2. Extremely well written summary.

  3. […] relevant to this thread…. A technical review of the Dembski & Marks paper in IEEE And as has been pointed out multiple times in this thread, the ID creationist argument relies on […]

  4. RBH says:

    How come the trackback comes from a goddam aggregator (with no attribution), rather than the one I threw from PT? Hmmmmm.

    • tom w says:

      Thanks for the attention at Panda’s Thumb—Wordpress tells me that most visitors followed that link. I’ll add a link as a token of my gratitude 🙂

      Pingbacks like the one from the aggregator site work like a charm. I’ve had trouble with trackbacks before, not sure what’s wrong. There seems to be no record of your trackback anywhere on my WordPress Dashboard.

  5. Andrew Freeman says:

    I’m intrigued but confused by:

    “It should be noted that the No Free Lunch theorem does not apply to the class of functions F_T, because no probability measure on F_T can have the necessary symmetry and search algorithms that visit points in T before points in \Omega\setminus T clearly have superior performance to other search algorithms”

    Could you explain further?

    • Tom English says:

      The NFL analytic framework predicates sampling without replacement. Olle Haggstrom recognized that the symmetry necessary and sufficient for NFL had an established name, exchangeability.

      Dembski and Marks permit sampling with replacement, so I don’t see the relevance of the NFL theorem. The Weasel program samples with replacement, and partitioned search samples with restricted replacement.

      • tom w says:

        I don’t think the possibility of revisiting points (sampling with replacement) matters. Dembski and Marks care about how many attempts it takes before the first target point is found.

    • tom w says:

      Well, let P(y1,…,yN|A) be the probability that the first point visited by a search algorithm A has fitness value y1, the second y2, and so on. Different No Free Lunch theorems provide conditions under which P(y1,…,yN|A) is independent of the search algorithm. (As a corollary, any two search algorithms must have the same expectation value of any function of the sampled fitness values y1,…,yN. E.g., the expected minimum, mean, median, and maximum of the sampled fitness values is independent of search algorithm.) But it’s easy to construct counter-examples if the fitness function maps a target set into one range Y+ and the rest of the search space into another, disjoint range Y-. For every target set T, there are algorithms AT and BT that sample the points in the target set first and last, respectively. The resulting distributions, P(y1,…,yN|AT) and P(y1,…,yN|BT), are clearly different.

      Does this help?

      • Andrew Freeman says:

        I think I understand.

        The technique works, it seems to me, because AT and BT are defined based on Y+ and Y-. They pay no attention to the fitness function, and thus have the same performance across all fitness functions.

        However, it would seem to still depend on Y+ and Y-. Doing something similar by averaging across all Y+ and Y- would, I suspect, give something like an NFL result.

      • tom w says:

        AT and BT are defined based on T, not Y+. For every given T, there are algorithms AT and BT with very different performance.

        The reason we can consider averages over several fitness function is that the search algorithm’s only access to the fitness function is via the sampled values. What would be the point of averaging over different choices of Y+? It corresponds to odd situations like search algorithms that don’t “know” whether to minimize or maximize the fitness function.

      • Andrew Freeman says:

        Yeah, I had Y+ and T mixed up.

        It bothers me that AT and BT are defined based on already knowing what I’m searching for.

  6. Tom English says:

    [Also at The Panda’s Thumb:]

    I’m trying to get some help in arguing against the Dembski-Marks physical interpretation of active information.

    As an unwashed non-physicist, I don’t know of an instance in nature of a uniform random process over the full range of configurations of a macroscopic system, except when there is in some sense external control of the process. I’m thinking, for instance, that a deck of cards does not shuffle itself. Please set me straight, if necessary, or help me firm up my statement.

    As a computer scientist, I know that it takes a lot of work to get a stream of apparently unbiased and uncorrelated bits from observations of quantum phenomena. Were I to be less “active” by ceasing to clean up the bit stream, a program using the stream to implement random search could gain active information. That is, reduced intervention of an “intelligent” entity in a search can introduce active information.

    In the realm of engineering, the choice of a particular search algorithm over uniform sampling with replacement is in an intuitive sense active. We know that there is an engineer doing the choosing, and it is reasonable to ask, “What did you gain by doing things the way you did, rather than in an arbitrary fashion, and how did you know what to do?” But, as I have already shown, random search in computation does not result from inactivity. Thus the loaded term “active information” is problematic from the get-go.

    When we turn to nature, there is no engineer to challenge as to choice of one process over another, unless we beg the question that nature is engineered. Deviations from uniformity in nature no more necessitate activity of some external entity than do deviations from uniformity in computation.

    Thanks in advance for contributions to the argument.

    • tom w says:

      I’m not sure what you mean by “full range of configurations”. Is it relevant to your concern to note that uniform probability is as much a feature of the choice of representation as it is of nature? Coarse-graining or fine-graining of the sample space can change a uniform probability measure to a non-uniform measure, and vice versa. A change of variables can transform common empirical distributions (Gaussian, power-law, etc.) into uniform distributions.

      I agree with what you say about the rest. IMO, you could widen the critique. Consider the example of “simple self-replicating molecules” as a target set. It is perhaps not clear what the search space is in this case, but the definition of the target set clearly depends on chemical model assumptions. For instance, assumptions about chemical bonds, chemical reactions must be balanced, reaction kinetics, etc. are required for the term “self-replicating molecule” to make sense. A “search for a search” in the Dembski-Marks sense must be restricted to processes that don’t violate the chemical assumptions used to define the target, otherwise the target becomes ill-defined.

  7. Tom English says:

    tom w:

    Thanks for helping me clarify my thinking on what’s wrong with scientific application of the active information measure.

  8. James F says:

    I would love to see this expanded into a rebuttal paper in an IEEE journal.

  9. […] A priori bias in the Dembski-Marks representation « The Metropolis … Tags: Engineer, Jobs, Metropolis Related posts: […]

  10. JimmyBean says:

    I don’t know If I said it already but …This blog rocks! I gotta say, that I read a lot of blogs on a daily basis and for the most part, people lack substance but, I just wanted to make a quick comment to say I’m glad I found your blog. Thanks, 🙂

    A definite great read..Jim Bean

  11. RobD says:

    Hey, I found your blog while searching on Google your post looks very interesting for me. I will add a backlink and bookmark your site. Keep up the good work! 🙂

%d bloggers like this: