The Trouble With Objective Bayesianism
In philosophy, “Bayesianism” has kind of become shorthand for people who think that probabilistic models of credence (that is, the mental state that assesses things like proper betting odds and how much a body of evidence supports some proposition), evidence, and inference are good and should be used frequently-to-exclusively. The name comes from Rev. Bayes’s celebrated theorem connecting the probability of a hypothesis given some evidence with the probability of that evidence given the hypothesis. We further subdivide Bayesians into two broad categories: objective Bayesians and subjective Bayesians.
Subjective Bayesians are the permissive ones. They think that credences should be probabilistic, and they endorse the rule known as conditionalization (in brief: once I learn something new, I should change my credence to reflect what my credences conditional on that proposition were, which is typically given by the ratio formula [p(a|b) = p(a & b)/p(b)] but there are some nuances here around probability 0 events) for updating credences. But those are the only constraints they put on epistemic rationality (for credences, anyway).
Objective Bayesians, by contrast, think that some credences are out of bounds, even if they are probabilistic and updated by conditionalization. Since this is a big and heterogenous category (any rules whatsoever), what I say will be about the typical objective Bayesian.
I think the best way to think about Objective Bayesianism is as having standards for rationally permissible ur-priors. An ur-prior is the (hypothetical) probability function that an agent has before they encounter any evidence. But because of the ratio formula, they encode all of the conditional relations between propositions that constitute the agent’s analysis of evidential support. A perfect agent with no evidence and a rational ur-prior – a superbaby, as David Lewis put it – needs only to run around learning things and conditionalizing. The most restrictive Objective Bayesians will think that there is a unique rational ur-prior. Others (sometimes called moderate permissivists) will think that there is a range of rational ur-priors.
By far the most common Objective Bayesian norm (especially historically) is known as the Principle of Indifference. The Principle of Indifference is inspired by probability theory’s natural home: games of chance. Suppose a fair die is cast. How should I divide my expectation over the possible results? Indifference reasoning says that, because there is an evidential symmetry between the six outcomes (conditional on a fair die, the evidence pre-cast cannot favor one face over another), my credence should be 1/6 in each.
More generally, then, the Principle of Indifference says something like this: when the evidence does not favor any of a set of given alternatives, credence should be apportioned equally amongst them.
This works pretty well in the casino, but quickly falls apart in the real world. More specifically, it falls prey to Bertrand’s Paradox. Bas van Frassen illustrates the problem with this memorable example:
This logical difficulty with the idea was expounded systematically in a series of paradoxes by Joseph Bertrand at the end of the nineteenth century…Let us turn immediately to a paradigmatic but simple example: the perfect cube factory. A precision tool factory produces iron cubes with edge length ≤ 2 cm. What is the probability that a cube has length ≤ 1 cm, given that it was produced by that factory? A naive application of the Principle of Indifference consists in choosing length l as parameter and assuming a uniform distribution. The answer is then ½. But the problem could have been stated in different words, but logically equivalent form: [for example, by focusing on the area of the cube’s sides, which are ≤ 4 cm, where a naïve application of indifference gets us probability of length ≤ 1 cm of ¼, even though assuming that the area of the sides ≤ 4 cm just is assuming that the length of the edges is ≤ 2 cm] – Laws and Symmetry ch. 12, pp. 303-304
What Bertrand’s Paradox tells us is that indifference reasoning is partition relative. Depending on how we divide up or describe a problem, the “symmetries” over which we are tempted to be indifferent change. Even those these repartitions are all logically equivalent.
The best version of an “indifference principle” is E.T. Jaynes’s “Principle of Maximum Entropy.” Entropy here is Shannon Entropy, and the basic idea is to try and encode as little information in the prior as possible so that it is maximally responsive to evidence. Setting aside whether this adequately captures some of the motivating thoughts behind objective Bayesianism (e.g. that some propositions are just inherently more plausible/reasonable to lend credence to than others), the principle still suffers from a kind of relativity: it requires a background measure, and when the problem doesn’t determine a unique background measure, we will find the same kind of behavior we do in Bertrand-style paradoxes.
There’s some reason to think that representation-dependence is unavoidable in probability (see, e.g., this paper). Authors have tried to find ways to make peace with this, but any victory is fated to be somewhat pyrrhic.
Other approaches focus not on symmetries or “filling in the gaps of uncertainty,” but on trying to identify the features of hypotheses that make them intrinsically credible. Recent work by Paul Draper, for instance (see chapter 4 of his new book on the problem of evil), has posited the following criteria as the sole determinants of intrinsic plausibility:
(1) Specificity (what he calls modesty, but since modest/immodest has a different meaning in the context of probability functions I am sticking with the classic name)
(2) Coherence
Specificity is meant to correspond roughly to “how much” a hypothesis asserts. Although there are tools available (e.g. Shannon entropy, Kolmogorov Complexity, Compressibility) to make this idea precise, he sticks with some examples that are enough to establish (a) that his notion of specificity is stronger than the idea of logical strength, and (b) that it is probably representation-dependent. One of his examples is that sentence “Lebron James will be the 50th president” is less specific than “Chelsea Clinton or Ivanka Trump will be the 49th president,” presumably because the latter is a disjunction. But that is just a surface-level representation; change the language and we could presumably reverse which sentence is simple, and which is disjunctive.
Coherence is meant to capture “how well the parts of the hypothesis support each other.” The thought is something like: if we take a hypothesis and divide it into subhypotheses, we can compare the probability of subhypothesis 1 given subhypothesis 2 and vice-versa, and if those conditional probabilities are high, the hypothesis as whole exhibits high coherence and therefore more intrinsic plausibility.
Of course, this will be highly partition-relative as well. If we think of a proposition as a set of worlds (which is very natural in the context of credence functions), we will always be able to find highly coherent and totally incoherent decompositions of the proposition. Without a favored description the coherence criterion is not very helpful.
One of the most-discussed approaches to “intrinsic plausibility” invokes the idea of simplicity, with the thought that simpler hypotheses are more plausible. This is a hard idea to make precise (which makes many authors’ proposals trick to evaluate), but some of the better options out there (Kolmogorov Complexity springs again to mind) will very straightforwardly be language-dependent or non-computable (and so less helpful in practice).
So, it looks like some amount of partition-relativity or representation-dependence is unavoidable. To me, that undermines the motivation for objective Bayesianism. The hope was to have a rationally circumscribed starting point for uncertainty of all kinds, including uncertainty about the right partition/description/representation (in Ted Sider’s terminology: the language of the book of the world). But this hope remains unfulfilled and may in fact be unfulfillable.
But back in the corner remains subjective Bayesianism. To subjective Bayesians, all of this relativity is no problem. It works as advertised. Old Reliable. Probabilism, Conditionalization, and nothing else.

I believe in objective probability. It does not seem to be a matter of opinion on what the probability is that an atom of uranium will decay in the next hour. True, rational agents observing many such decay events will converge on a common probability. But I feel like two perfectly rational agents who have identical observation histories should reach the same exact answer -- the one true probability for an agent in that position.
Solomonoff induction / Kolmogorov complexity seems like the most promising way forward. If it is uncomputable, I'm okay with that. I don't need to be able to actual compute the one true probability -- I just want to know it exists! The dependence on choosing a programing language / universal Turing machine is a big problem though. However, it's possible there are approaches that will solve this issue. Markus Mueller has done some very interesting work on this topic, with both an approach that didn't work (https://arxiv.org/abs/cs/0608095) and an approach he thinks is more promising (https://arxiv.org/abs/1712.01816).
Unless I’m missing something, the language dependence problem being as inescapable as it is seems to imply that there was nothing especially wrong with Goodman’s own approach to the New Riddle of Induction, and that claims that a higher prior for simpler theories is needed to solve it are mistaken, since depending on whether or not language-dependence is a problem, that approach is either inadequate or unnecessary.