Laplace and the law of small data

One of my recent favorite reads is The Computer Science of Human Decisions by Brian Christian/Tom Griffiths. In the age of “big” data, encountering uncertainty and little data is also a norm in our daily lives and Laplace offers us a rule of thumb to make an optimized decision event with one observation.

Pierre-Simon, marquis de Laplace is a ubiquitous character in the annals of Science history in the 1700s and my undergraduate Mathematics years. The term “inverse Laplace transform” would be met with an uncontrollable shudder during finals especially if you had allergies towards solving differential equations using integral transforms.

The Marquis de Laplace

However, a few decades later and with the prevalence of matrix operations and linear algebra in deep learning and by extension my overall appreciation for advanced mathematics, I’ve been fascinated by some of his work. Laplace was a mathematician and physicist ands appears all over the place in the field. He is known as the “French Newton” , a bonafide virtuoso with contributions like the Laplace transforms, Laplace equation, Laplace operators amongst other things. If that weren’t enough, he also enlightened the world with theories on black holes and gravitational collapse. He was also a marquis in the french court after the Bourbon Restoration (which as much as it wants to does not refer to the weekend festivities in my backyard, it actually refers to a period in French history following Napoleon’s downfall ).

Laplace essentially wrote the first hand book on Probability with “A philosophical essay on probabilities” – a magnificent treatise that reflects the author’s depth of knowledge and curiosity. A bit dense in parts but a fascinating look at 18th century French life from the eyes of a polymath. Unless a deep researcher in Probabilistic history, the material is organized well enough to comb through points of interest. Part 1 is a “philosophical essay on probabilities” while part 2 is an “application of the calculus of probabilities”.

Laplace’s rule of Succession primarily solving the Sunrise problem is extremely important to compute probabilities when the originating events have the same probability.

Every day the sun same up n times in a row, what’s the probability it will rise tomorrow? One can imagine he got ridiculed for it since we have never known/experienced a day the sun never rises and hence it is the end of the world if it’s not going to rise the next day. More specifically, the problem does not seem realistic considering it assumes every day is an independent event ile random variables for the sun rising on each day.

We have evidence that the sun has risen n times in a row, but we don’t know what the value of P or the probability is. Treating this P as unknown brings to fore a long standing debate in statistics between frequentists and Bayesians. As per the Bayesian point of view, since P is unknown, we treat P as a random variable with distribution. As with Bayes theorem, we start with prior beliefs about P before we have any data. Once we collect data, we then use Bayes rule to update this based on our evidence.

The integral calculus leading to deriving this rule is masterfully explained here in this lecture on moment generating functions (MGFs) by Joe Blitzstein. Amazing explanations if you can sit through the detailed derivations.

The probability of the sun rising tomorrow is n+1/n+2 or as Wikipedia puts it:

” if one has observed the sun rising 10000 times previously, the probability it rises the next day is  10001/10002 = 0.99990002. Expressed as a percentage, this is approximately 99.990002%  chance.”

Pretty good odds it seems.

Essentially per Laplace, for any possible drawing of w winning tickets in n attempts, the expectation is the number of wins + 1, divided by the number of attempts + 2.

Said differently, if we have n experiments which each results in success (s) or failure (n -s), the probability that the next repetition will succeed is (s+1)/(n+2).

If I make 10 attempts at playing a musical piece and 8 of them succeed, per Laplace – my overall chance at this endeavor is 9/12 or 75% of the time. If I play it once and succeed, the probability is 2/3 (66.6%) which is intuitively more reliable than assuming I have a 100% chance of nailing this the next time.

Some fascinating quotes –

“Man, made for the temperature which he enjoys, and for the element which he breathes, would not be able, according to all appearance, to live upon the other planets. But ought there not to be an infinity of organization relative to the various constitutions of the globes of this universe? If the single difference of the elements and of the climates make so much variety in terrestrial productions, how much greater the difference ought to be among those of the various planets and of their satellites! The most active imagination can form no idea of it; but their existence is very probable.”

(Pg. 181)

“the transcendent results of calculus are, like all the abstractions of the understanding, general signs whose true meaning may be ascertained only by repassing by metaphysical analysis to the elementary ideas which have led to them; this often presents great difficulties, for the human mind tries still less to transport itself into the future than to retire within itself. The comparison of infinitely small differences with finite differences is able similarly to shed great light upon the metaphysics of infinitesimal calculus.”

(Pg. 44)

 “The day will come, when, by study pursued through several ages, the things now concealed with appear with evidence; and posterity will be astonished that truths so clear had escaped us”

Laplace quoting Seneca

Probability is relative, in part to this ignorance, in part to our knowledge. We know that of three or a greater number of events a single one ought to occur; but nothing induces us to believe that one of them will occur rather than the others-

Laplace, Concerning Probability

The Rule of Succession is essentially the world’s first simple algorithm for choosing problems of small data. It holds well when we have all known possible outcomes before observing the data. If we apply this in problems where the prior state of knowledge is not well known, the results may not be useful as the question being asked is then of a different nature based on different prior information.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.