Occam's razor

In philosophy, Occam's razor (also spelled Ockham's razor or Ocham's razor; Latin: novacula Occami) is the problem-solving principle that recommends searching for explanations constructed with the smallest possible set of elements. It is also known as the principle of parsimony or the law of parsimony (Latin: lex parsimoniae). Attributed to William of Ockham, a 14th-century English philosopher and theologian, it is frequently cited as Entia non sunt multiplicanda praeter necessitatem, which translates as "Entities must not be multiplied beyond necessity",^[1]^[2] although Occam never used these exact words. Popularly, the principle is sometimes paraphrased as "The simplest explanation is usually the best one."^[3]

This philosophical razor advocates that when presented with competing hypotheses about the same prediction and both hypotheses have equal explanatory power, one should prefer the hypothesis that requires the fewest assumptions,^[4] and that this is not meant to be a way of choosing between hypotheses that make different predictions. Similarly, in science, Occam's razor is used as an abductive heuristic in the development of theoretical models rather than as a rigorous arbiter between candidate models.^[5]^[6]

History

The phrase Occam's razor did not appear until a few centuries after William of Ockham's death in 1347. Libert Froidmont, in his On Christian Philosophy of the Soul, gives him credit for the phrase, speaking of "novacula occami".^[7] Ockham did not invent this principle, but its fame—and its association with him—may be due to the frequency and effectiveness with which he used it.^[8] Ockham stated the principle in various ways, but the most popular version, "Entities are not to be multiplied without necessity" (Non sunt multiplicanda entia sine necessitate) was formulated by the Irish Franciscan philosopher John Punch in his 1639 commentary on the works of Duns Scotus.^[9]

Formulations before William of Ockham

Part of a page from John Duns Scotus's book *Commentaria oxoniensia ad IV libros magistri Sententiarus*, showing the words: "*Pluralitas non est ponenda sine necessitate*", i.e., "Plurality is not to be posited without necessity"

The origins of what has come to be known as Occam's razor are traceable to the works of earlier philosophers such as John Duns Scotus (1265–1308), Robert Grosseteste (1175–1253), Maimonides (Moses ben-Maimon, 1138–1204), and even Aristotle (384–322 BC).^[10]^[11] Aristotle writes in his Posterior Analytics, "We may assume the superiority ceteris paribus [other things being equal] of the demonstration which derives from fewer postulates or hypotheses." Ptolemy (c. AD 90 – c. 168) stated, "We consider it a good principle to explain the phenomena by the simplest hypothesis possible."^[12]

Phrases such as "It is vain to do with more what can be done with fewer" and "A plurality is not to be posited without necessity" were commonplace in 13th-century scholastic writing.^[12] Robert Grosseteste, in Commentary on [Aristotle's] the Posterior Analytics Books (Commentarius in Posteriorum Analyticorum Libros) (c. 1217–1220), declares: "That is better and more valuable which requires fewer, other circumstances being equal... For if one thing were demonstrated from many and another thing from fewer equally known premises, clearly that is better which is from fewer because it makes us know quickly, just as a universal demonstration is better than particular because it produces knowledge from fewer premises. Similarly in natural science, in moral science, and in metaphysics the best is that which needs no premises and the better that which needs the fewer, other circumstances being equal."^[13]

The Summa Theologica of Thomas Aquinas (1225–1274) states that "it is superfluous to suppose that what can be accounted for by a few principles has been produced by many." Aquinas uses this principle to construct an objection to God's existence, an objection that he in turn answers and refutes generally (cf. quinque viae), and specifically, through an argument based on causality.^[14] Hence, Aquinas acknowledges the principle that today is known as Occam's razor, but prefers causal explanations to other simple explanations (cf. also Correlation does not imply causation).

William of Ockham

William of Ockham (circa 1287–1347) was an English Franciscan friar and theologian, an influential medieval philosopher and a nominalist. His popular fame as a great logician rests chiefly on the maxim attributed to him and known as Occam's razor. The term razor refers to distinguishing between two hypotheses either by "shaving away" unnecessary assumptions or cutting apart two similar conclusions.

While it has been claimed that Occam's razor is not found in any of William's writings,^[15] one can cite statements such as Numquam ponenda est pluralitas sine necessitate ("Plurality must never be posited without necessity"), which occurs in his theological work on the Sentences of Peter Lombard (Quaestiones et decisiones in quattuor libros Sententiarum Petri Lombardi; ed. Lugd., 1495, i, dist. 27, qu. 2, K).

Nevertheless, the precise words sometimes attributed to William of Ockham, Entia non sunt multiplicanda praeter necessitatem (Entities must not be multiplied beyond necessity),^[16] are absent in his extant works;^[17] this particular phrasing comes from John Punch,^[18] who described the principle as a "common axiom" (axioma vulgare) of the Scholastics.^[9] William of Ockham himself seems to restrict the operation of this principle in matters pertaining to miracles and God's power, considering a plurality of miracles possible in the Eucharist^{[further explanation needed]} simply because it pleases God.^[12]

This principle is sometimes phrased as Pluralitas non est ponenda sine necessitate ("Plurality should not be posited without necessity").^[19] In his Summa Totius Logicae, i. 12, William of Ockham cites the principle of economy, Frustra fit per plura quod potest fieri per pauciora ("It is futile to do with more things that which can be done with fewer"; Thorburn, 1918, pp. 352–53; Kneale and Kneale, 1962, p. 243.)

Later formulations

To quote Isaac Newton, "We are to admit no more causes of natural things than such as are both true and sufficient to explain their appearances. Therefore, to the same natural effects we must, as far as possible, assign the same causes."^[20]^[21]In the sentence hypotheses non fingo, Newton affirms the success of this approach.

Bertrand Russell offers a particular version of Occam's razor: "Whenever possible, substitute constructions out of known entities for inferences to unknown entities."^[22]

Around 1960, Ray Solomonoff founded the theory of universal inductive inference, the theory of prediction based on observations – for example, predicting the next symbol based upon a given series of symbols. The only assumption is that the environment follows some unknown but computable probability distribution. This theory is a mathematical formalization of Occam's razor.^[23]^[24]^[25]

Another technical approach to Occam's razor is ontological parsimony.^[26] Parsimony means spareness and is also referred to as the Rule of Simplicity. This is considered a strong version of Occam's razor.^[27]^[28] A variation used in medicine is called the "Zebra": a physician should reject an exotic medical diagnosis when a more commonplace explanation is more likely, derived from Theodore Woodward's dictum "When you hear hoofbeats, think of horses not zebras".^[29]

Ernst Mach formulated the stronger version of Occam's razor into physics, which he called the Principle of Economy stating: "Scientists must use the simplest means of arriving at their results and exclude everything not perceived by the senses."^[30]

This principle goes back at least as far as Aristotle, who wrote "Nature operates in the shortest way possible."^[27] The idea of parsimony or simplicity in deciding between theories, though not the intent of the original expression of Occam's razor, has been assimilated into common culture as the widespread layman's formulation that "the simplest explanation is usually the correct one."^[27]

Justifications

Aesthetic

Prior to the 20th century, it was a commonly held belief that nature itself was simple and that simpler hypotheses about nature were thus more likely to be true. This notion was deeply rooted in the aesthetic value that simplicity holds for human thought and the justifications presented for it often drew from theology.^{[clarification needed]} Thomas Aquinas made this argument in the 13th century, writing, "If a thing can be done adequately by means of one, it is superfluous to do it by means of several; for we observe that nature does not employ two instruments [if] one suffices."^[31]

Beginning in the 20th century, epistemological justifications based on induction, logic, pragmatism, and especially probability theory have become more popular among philosophers.^[7]

Empirical

Occam's razor has gained strong empirical support in helping to converge on better theories (see Uses section below for some examples).

In the related concept of overfitting, excessively complex models are affected by statistical noise (a problem also known as the bias–variance tradeoff), whereas simpler models may capture the underlying structure better and may thus have better predictive performance. It is, however, often difficult to deduce which part of the data is noise (cf. model selection, test set, minimum description length, Bayesian inference, etc.).

Testing the razor

The razor's statement that "other things being equal, simpler explanations are generally better than more complex ones" is amenable to empirical testing. Another interpretation of the razor's statement would be that "simpler hypotheses are generally better than the complex ones". The procedure to test the former interpretation would compare the track records of simple and comparatively complex explanations. If one accepts the first interpretation, the validity of Occam's razor as a tool would then have to be rejected if the more complex explanations were more often correct than the less complex ones (while the converse would lend support to its use). If the latter interpretation is accepted, the validity of Occam's razor as a tool could possibly be accepted if the simpler hypotheses led to correct conclusions more often than not.

Even if some increases in complexity are sometimes necessary, there still remains a justified general bias toward the simpler of two competing explanations. To understand why, consider that for each accepted explanation of a phenomenon, there is always an infinite number of possible, more complex, and ultimately incorrect, alternatives. This is so because one can always burden a failing explanation with an ad hoc hypothesis. Ad hoc hypotheses are justifications that prevent theories from being falsified.

For example, if a man, accused of breaking a vase, makes supernatural claims that leprechauns were responsible for the breakage, a simple explanation might be that the man did it, but ongoing ad hoc justifications (e.g., "... and that's not me breaking it on the film; they tampered with that, too") could successfully prevent complete disproof. This endless supply of elaborate competing explanations, called saving hypotheses, cannot be technically ruled out – except by using Occam's razor.^[32]^[33]^[34]

Any more complex theory might still possibly be true. A study of the predictive validity of Occam's razor found 32 published papers that included 97 comparisons of economic forecasts from simple and complex forecasting methods. None of the papers provided a balance of evidence that complexity of method improved forecast accuracy. In the 25 papers with quantitative comparisons, complexity increased forecast errors by an average of 27 percent.^[35]

Practical considerations and pragmatism

Mathematical

One justification of Occam's razor is a direct result of basic probability theory. By definition, all assumptions introduce possibilities for error; if an assumption does not improve the accuracy of a theory, its only effect is to increase the probability that the overall theory is wrong.

There have also been other attempts to derive Occam's razor from probability theory, including notable attempts made by Harold Jeffreys and E. T. Jaynes. The probabilistic (Bayesian) basis for Occam's razor is elaborated by David J. C. MacKay in chapter 28 of his book Information Theory, Inference, and Learning Algorithms,^[36] where he emphasizes that a prior bias in favor of simpler models is not required.

William H. Jefferys and James O. Berger (1991) generalize and quantify the original formulation's "assumptions" concept as the degree to which a proposition is unnecessarily accommodating to possible observable data.^[37] They state, "A hypothesis with fewer adjustable parameters will automatically have an enhanced posterior probability, due to the fact that the predictions it makes are sharp."^[37] The use of "sharp" here is not only a tongue-in-cheek reference to the idea of a razor, but also indicates that such predictions are more accurate than competing predictions. The model they propose balances the precision of a theory's predictions against their sharpness, preferring theories that sharply make correct predictions over theories that accommodate a wide range of other possible results. This, again, reflects the mathematical relationship between key concepts in Bayesian inference (namely marginal probability, conditional probability, and posterior probability).

The bias–variance tradeoff is a framework that incorporates the Occam's razor principle in its balance between overfitting (associated with lower bias but higher variance) and underfitting (associated with lower variance but higher bias).^[38]

Other philosophers

Karl Popper

Karl Popper argues that a preference for simple theories need not appeal to practical or aesthetic considerations. Our preference for simplicity may be justified by its falsifiability criterion: we prefer simpler theories to more complex ones "because their empirical content is greater; and because they are better testable".^[39] The idea here is that a simple theory applies to more cases than a more complex one, and is thus more easily falsifiable. This is again comparing a simple theory to a more complex theory where both explain the data equally well.

Elliott Sober

The philosopher of science Elliott Sober once argued along the same lines as Popper, tying simplicity with "informativeness": The simplest theory is the more informative, in the sense that it requires less information to a question.^[40] He has since rejected this account of simplicity, purportedly because it fails to provide an epistemic justification for simplicity. He now believes that simplicity considerations (and considerations of parsimony in particular) do not count unless they reflect something more fundamental. Philosophers, he suggests, may have made the error of hypostatizing simplicity (i.e., endowed it with a sui generis existence), when it has meaning only when embedded in a specific context (Sober 1992). If we fail to justify simplicity considerations on the basis of the context in which we use them, we may have no non-circular justification: "Just as the question 'why be rational?' may have no non-circular answer, the same may be true of the question 'why should simplicity be considered in evaluating the plausibility of hypotheses?'"^[41]

Richard Swinburne

Richard Swinburne argues for simplicity on logical grounds:

... the simplest hypothesis proposed as an explanation of phenomena is more likely to be the true one than is any other available hypothesis, that its predictions are more likely to be true than those of any other available hypothesis, and that it is an ultimate a priori epistemic principle that simplicity is evidence for truth.
— Swinburne 1997

According to Swinburne, since our choice of theory cannot be determined by data (see Underdetermination and Duhem–Quine thesis), we must rely on some criterion to determine which theory to use. Since it is absurd to have no logical method for settling on one hypothesis amongst an infinite number of equally data-compliant hypotheses, we should choose the simplest theory: "Either science is irrational [in the way it judges theories and predictions probable] or the principle of simplicity is a fundamental synthetic a priori truth."^[42]

Ludwig Wittgenstein

From the Tractatus Logico-Philosophicus:

3.328 "If a sign is not necessary then it is meaningless. That is the meaning of Occam's Razor."

(If everything in the symbolism works as though a sign had meaning, then it has meaning.)

4.04 "In the proposition, there must be exactly as many things distinguishable as there are in the state of affairs, which it represents. They must both possess the same logical (mathematical) multiplicity (cf. Hertz's Mechanics, on Dynamic Models)."
5.47321 "Occam's Razor is, of course, not an arbitrary rule nor one justified by its practical success. It simply says that unnecessary elements in a symbolism mean nothing. Signs which serve one purpose are logically equivalent; signs which serve no purpose are logically meaningless."

and on the related concept of "simplicity":

6.363 "The procedure of induction consists in accepting as true the simplest law that can be reconciled with our experiences."

Uses

Science and the scientific method

Andreas Cellarius's illustration of the Copernican system, from the *Harmonia Macrocosmica* (1660). Future positions of the sun, moon and other solar system bodies can be calculated using a geocentric model (the earth is at the centre) or using a heliocentric model (the sun is at the centre). Both work, but the geocentric model arrives at the same conclusions through a much more complex system of calculations than the heliocentric model. This was pointed out in a preface to Copernicus' first edition of *De revolutionibus orbium coelestium*.

In science, Occam's razor is used as a heuristic to guide scientists in developing theoretical models rather than as an arbiter between published models.^[5]^[6] In physics, parsimony was an important heuristic in Albert Einstein's formulation of special relativity,^[43]^[44] in the development and application of the principle of least action by Pierre Louis Maupertuis and Leonhard Euler,^[45] and in the development of quantum mechanics by Max Planck, Werner Heisenberg and Louis de Broglie.^[6]^[46]

In chemistry, Occam's razor is often an important heuristic when developing a model of a reaction mechanism.^[47]^[48] Although it is useful as a heuristic in developing models of reaction mechanisms, it has been shown to fail as a criterion for selecting among some selected published models.^[6] In this context, Einstein himself expressed caution when he formulated Einstein's Constraint: "It can scarcely be denied that the supreme goal of all theory is to make the irreducible basic elements as simple and as few as possible without having to surrender the adequate representation of a single datum of experience."^[49]^[50]^[51] An often-quoted version of this constraint (which cannot be verified as posited by Einstein himself)^[52] reduces this to "Everything should be kept as simple as possible, but not simpler."

In the scientific method, Occam's razor is not considered an irrefutable principle of logic or a scientific result; the preference for simplicity in the scientific method is based on the falsifiability criterion. For each accepted explanation of a phenomenon, there may be an extremely large, perhaps even incomprehensible, number of possible and more complex alternatives. Since failing explanations can always be burdened with ad hoc hypotheses to prevent them from being falsified, simpler theories are preferable to more complex ones because they tend to be more testable.^[53]^[54]^[55] As a logical principle, Occam's razor would demand that scientists accept the simplest possible theoretical explanation for existing data. However, science has shown repeatedly that future data often support more complex theories than do existing data. Science prefers the simplest explanation that is consistent with the data available at a given time, but the simplest explanation may be ruled out as new data become available.^[5]^[54] That is, science is open to the possibility that future experiments might support more complex theories than demanded by current data and is more interested in designing experiments to discriminate between competing theories than favoring one theory over another based merely on philosophical principles.^[53]^[54]^[55]<