Laplace and the law of small data

One of my recent favorite reads is The Computer Science of Human Decisions by Brian Christian/Tom Griffiths. In the age of “big” data, encountering uncertainty and little data is also a norm in our daily lives and Laplace offers us a rule of thumb to make an optimized decision event with one observation.

Pierre-Simon, marquis de Laplace is a ubiquitous character in the annals of Science history in the 1700s and my undergraduate Mathematics years. The term “inverse Laplace transform” would be met with an uncontrollable shudder during finals especially if you had allergies towards solving differential equations using integral transforms.

The Marquis de Laplace

However, a few decades later and with the prevalence of matrix operations and linear algebra in deep learning and by extension my overall appreciation for advanced mathematics, I’ve been fascinated by some of his work. Laplace was a mathematician and physicist ands appears all over the place in the field. He is known as the “French Newton” , a bonafide virtuoso with contributions like the Laplace transforms, Laplace equation, Laplace operators amongst other things. If that weren’t enough, he also enlightened the world with theories on black holes and gravitational collapse. He was also a marquis in the french court after the Bourbon Restoration (which as much as it wants to does not refer to the weekend festivities in my backyard, it actually refers to a period in French history following Napoleon’s downfall ).

Laplace essentially wrote the first hand book on Probability with “A philosophical essay on probabilities” – a magnificent treatise that reflects the author’s depth of knowledge and curiosity. A bit dense in parts but a fascinating look at 18th century French life from the eyes of a polymath. Unless a deep researcher in Probabilistic history, the material is organized well enough to comb through points of interest. Part 1 is a “philosophical essay on probabilities” while part 2 is an “application of the calculus of probabilities”.

Laplace’s rule of Succession primarily solving the Sunrise problem is extremely important to compute probabilities when the originating events have the same probability.

Every day the sun same up n times in a row, what’s the probability it will rise tomorrow? One can imagine he got ridiculed for it since we have never known/experienced a day the sun never rises and hence it is the end of the world if it’s not going to rise the next day. More specifically, the problem does not seem realistic considering it assumes every day is an independent event ile random variables for the sun rising on each day.

We have evidence that the sun has risen n times in a row, but we don’t know what the value of P or the probability is. Treating this P as unknown brings to fore a long standing debate in statistics between frequentists and Bayesians. As per the Bayesian point of view, since P is unknown, we treat P as a random variable with distribution. As with Bayes theorem, we start with prior beliefs about P before we have any data. Once we collect data, we then use Bayes rule to update this based on our evidence.

The integral calculus leading to deriving this rule is masterfully explained here in this lecture on moment generating functions (MGFs) by Joe Blitzstein. Amazing explanations if you can sit through the detailed derivations.

The probability of the sun rising tomorrow is n+1/n+2 or as Wikipedia puts it:

” if one has observed the sun rising 10000 times previously, the probability it rises the next day is  10001/10002 = 0.99990002. Expressed as a percentage, this is approximately 99.990002%  chance.”

Pretty good odds it seems.

Essentially per Laplace, for any possible drawing of w winning tickets in n attempts, the expectation is the number of wins + 1, divided by the number of attempts + 2.

Said differently, if we have n experiments which each results in success (s) or failure (n -s), the probability that the next repetition will succeed is (s+1)/(n+2).

If I make 10 attempts at playing a musical piece and 8 of them succeed, per Laplace – my overall chance at this endeavor is 9/12 or 75% of the time. If I play it once and succeed, the probability is 2/3 (66.6%) which is intuitively more reliable than assuming I have a 100% chance of nailing this the next time.

Some fascinating quotes –

“Man, made for the temperature which he enjoys, and for the element which he breathes, would not be able, according to all appearance, to live upon the other planets. But ought there not to be an infinity of organization relative to the various constitutions of the globes of this universe? If the single difference of the elements and of the climates make so much variety in terrestrial productions, how much greater the difference ought to be among those of the various planets and of their satellites! The most active imagination can form no idea of it; but their existence is very probable.”

(Pg. 181)

“the transcendent results of calculus are, like all the abstractions of the understanding, general signs whose true meaning may be ascertained only by repassing by metaphysical analysis to the elementary ideas which have led to them; this often presents great difficulties, for the human mind tries still less to transport itself into the future than to retire within itself. The comparison of infinitely small differences with finite differences is able similarly to shed great light upon the metaphysics of infinitesimal calculus.”

(Pg. 44)

 “The day will come, when, by study pursued through several ages, the things now concealed with appear with evidence; and posterity will be astonished that truths so clear had escaped us”

Laplace quoting Seneca

Probability is relative, in part to this ignorance, in part to our knowledge. We know that of three or a greater number of events a single one ought to occur; but nothing induces us to believe that one of them will occur rather than the others-

Laplace, Concerning Probability

The Rule of Succession is essentially the world’s first simple algorithm for choosing problems of small data. It holds well when we have all known possible outcomes before observing the data. If we apply this in problems where the prior state of knowledge is not well known, the results may not be useful as the question being asked is then of a different nature based on different prior information.

ToneWood amp review

I rarely break out my fleet of electric guitars anymore so my usual go-to is my trusty old Cordoba Iberia that’s usually within reach . Having instruments lying around the house is a huge aspect of getting to practice more. The ToneWood amp caught my eye immediately as I’ve been looking for simple amplification while playing outdoors or in places with absolutely poor acoustics where a little echo/reverb or delay can go a long way in justifying the piece I’m trying to play and even serve to feed some creativity.

3 essential knobs

Its essentially a lightweight effects unit that can be mounted on the back of the acoustic guitar to give you amplification and a few effects. There are magnets as part of the install that hook the ToneWood on the back of your guitar and the effects are amplified from the body as the amp picks up sound from the pickup on the acoustic and sends it back via a “vibrating driver” so the sounds becomes augmented with the effects. It essentially blends the natural guitar sound with the effects and comes out the sound hole as a unified sonic experience. The patent explains the concept well and is pretty ingenious.

Magnetic attachment to the back

The natural sound of the unamplified guitar coming from the soundboard seamlessly blends with the effects radiating outward via the sound hole, creating a larger than life soundscape.  All that’s required is some type of pickup installed in the guitar to provide signal to the device. You can connect the ToneWood to an external Amp/PA via the 1/4″ output port and it is iDevice interface that is great if you are on the Mac ecosystem. It also has the 1/4″ standard guitar input and a 1/8″ TRRS I/O for iDevice. The processor takes in 3 AA batteries for an average of 8 hours.

The installation took me about 10 minutes. It required me to slacken the strings, place a X-brace unit inside the guitar pointing the magnets so that the ToneWood amp could attach itself to the outside back of the guitar using the suction provided. This took some adjusting and I’m not sure i’ve dialed in the optimal most optimal position but it’s close enough.Once you stick the batteries in, it’s showtime. The display screen and knobs are intuitive and the barrier to entry here is phenomenally low.

From an effects perspective , it’s really everything you need considering you are playing an acoustic guitar. All the effects come with Gain and Volume settings.

  • Hall Reverb with Decay, Pre-delay and Hi-cut settings. These settings are accessed by pressing on the knobs on the ToneWood
  • Room Reverb with Decay, Pre-delay and Hi-cut settings
  • Plate Reverb with Decay, Pre-delay and Hi-cut settings
  • Delay with Speed, Feedback and Reverb. ( Note: you are not going to sound like the The Edge on the Skrydstrup switching system anytime soon with this)
  • Tremolo with Rate, Depth and Delay
  • Leslie style tremolo with rate, depth and reverb
  • Auto-Wah with Sensitivity, Envelope , Reverb
  • Overdrive with Drive, Filter and Reverb
  • DSP Bypass to mute the processor
  • Notch Filters to Notch Low and Notch High to filter based on the frequency

There is also the ability to save effect settings based on the tweaks you make which seems useful though I’ve not really played around with it.

I’ve largely played around with the Hall and Room effects for my purpos. You can tweak this plenty but I’d like to make sure I’m not sounding “wall of sound Spector-mode” on my Cordoba for every track.

All in all, a great addition to enhance the acoustic and more than anything else, the convenience factor is amazing. It’s much more easy to optimize practice time now without switching guitars or hooking up effects racks to my Ibanez for a 10-minute session. If you want more control over ambience and soundscapes with minimal setup or complexity, this is it.

I recorded a quick demo with the Hall Reverb with Decay and Hi-cut set to default and no audio edits off the iPhone camera. The audio needs to be enhanced and it doesn’t fully do justice to the ToneWood sound. The jam is me noodling on S&G’s cover of Anji by Davey Graham. The nylon strings don’t lend to much slack in bending at all but point was to capture a small moment of a few hours testing this wonderful amp.

Note – I don’t have any affiliation with ToneWood.

Spark AI Summit 2020 Notes

Spark + AI Summit - Databricks

Spark AI Summit just concluded this week and as always, plenty of great announcements. (Note: I was one of the speakers at the event but this post is more about the announcements and areas of my personal interest in Spark. The whole art of virtual public speaking is another topic). The ML enhancements and impact is a bigger topic probably for another day as I catch up with all the relevant conference talks and try out the new features.

Firstly, I think the online format worked for this instance. This summit ( and I’ve been to it every year since its inception) was way more relaxing and didn’t leave me exhausted physically and mentally with information overload. Usually held at the Moscone in San Francisco, the event becomes a great opportunity to network with former colleagues, friends and Industry experts which is the most enjoyable part yet taxing in many ways with limited time to manage. The virtual interface was way better than most of the online events I’ve been to before – engaging and convenient. The biggest drawback was the networking aspect and the online networking options just don’t cut it. The video conferencing fatigue probably didn’t hit since it was 3 days and the videos were available instantly online so plenty of them are in my “Watch Later” list. (Note the talks I refer to below are only the few I watched so plenty of many more interesting ones)

The big announcement was the release of Spark 3.0 – Hard to believe but it’s been 10 years of evolution. I remember 2013 as the year I was adapting to the Hadoop ecosystem writing map-reduce using Java/Pig/Hive for large scale data pipelines when Spark started emerging as a fledgling competitor with an interesting distributed computational engine using Resilient Distributed Datasets (RDD). Fast-forward to 2020 and Spark is the foundation of large scale data implementations across the industry and its ecosystem has evolved to frameworks and engines like Delta and MLflow which are also gaining a foothold as foundational to the enterprise across Cloud providers. More importantly, smart investment into its DataFrames API has reduced the barrier to entry to it with the SQL access patterns.

There were tons of new features introduced but focusing on the ones I paid attention to. There has not been a major release of Spark for years so this is pretty significant (2.0 was in 2016).

Spark 3.0

  • Adaptive Query execution: At the core, this helps in changing the number of reducers at runtime. It divides the SQL Execution plan into stages earlier instead of the usual RDD graph. Newer stages help injecting optimizations before the queries get executed as later stages have the full picture of the entire query plan to have a global picture of all shuffle dependencies . The execution plans can be auto-optimized at runtime for example changing a SortMergeJoin to a BroadcastJoin where applicable. This is huge in large-scale implementation when I see tons of poorly formed queries eating a lot of compute thanks to skewed joins. More specifically, settings like the number of shuffle partitions set using spark.sql.shuffle.partitions which has defaulted to 200 since inception can now be automatically tuned based on the reducers required for the mapping stage output – i.e. setting it high for larger data and smaller for smaller data.

  • Dynamic partition pruning: Enables the ability to perform filter pushdowns versus table scans by adding a partition pruning filter. At the core if you consider a broadcast hash join between a fact and dimension table, the enhancement intercepts the result of the broadcast and plugs them as a filter on top of the dynamic filter on the fact table as opposed to the earlier approach of pushing out the broadcast hash table derived from the dimension table to every worker to determine the value of the join with the fact. This is huge to avoid scanning irrelevant data. This session explains it well.

  • Accelerator-aware scheduler: Traditionally, the bottleneck usually has been small data in partitions that GPUs find hard to handle, cache processing efficiencies, slow I/O on disk, UDFs that need CPU processing and a lot more issues. But GPUs are massively useful for high cardinality datasets, matrix operations, window operations and transcoding situations. Originally termed project Hydrogen, this feature helps Spark be GPU-aware. The cluster managers now have GPU support that schedulers can request from. The schedulers can now understand GPUs allocations to executors and assign GPUs appropriately to tasks. The GPU resources still need to be configured using the configs to assign the appropriate resources. We can request resources at the executor, drive and the task level. This also allows the resources to be discovered on the nodes and their assignments. This is supported in YARN, Kubernetes and Standalone modes.
  • Pandas UDF overhaul: Extensive use of python type annotations – this becomes more and more imperative as codebases scale and newer engineers take longer to understand and maintain the code effectively. instead of writing hundreds of test cases or worse find out about it from irate users. Great documentation and examples here.

  • PySpark UDF: Another feature that I’ve looked forward is to enable PySpark to handle Pandas Vectorized UDFs as an array. In the past, we needed to jump through god awful hoops like writing scala functions as a helper and then switch over to Python in order to help Python read these as arrays. ML engineers will welcome this.

  • Structured Streaming UI: Great to see more focus on the UI and additional features appearing in the workspace interface which frankly has got to be pretty stale over the last few years. The new tab shows more statistics for running and completed queries and more importantly will help developers debug exceptions quickly rather than poring through log files.

  • Proleptic Gregorian calendar: Switched to this from the previous hybrid (Julian + Gregorian). This uses Java 8 API classes from the java.time packages that are based on ISO chronology . The “proleptic” part comes from extending the Gregorian calendar backward to dates before before 1582 when it was officially introduced.

    Fascinating segway here –
Pope Gregory XIII portrait.jpg

The Gregorian Calendar (named after pope Gregory the 13th , not the guy who gave us the awesome Gregorian Chants, that was Gregory 1 ) is what we use today as part of ISO 8601:2004. The Gregorian calendar’ replaced the the Julian Calendar due to its inaccuracies in determining an actual year plus issues where it could not really take into the complexities of adding a leap year almost every 4 years. Catholics liked this and adopted it while protestants held out for 200 years (!) with suspicion before  England and the colonies switched over advancing the date from September 2 to September 14, 1752! Would you hand over 12 days of your life as a write -off? In any case, you can thank Gregory the 13th for playing a part in this enhancement.

  • Also a whole lot of talk on better ANSI SQL compatibility that I need to look closer at. Working with a large user base of SQL users, this could only be good news.

  • A few smaller super useful enhancements:
    • “Show Views” command
    • “Explain” output formatted for better usability instead of a jungle of text
    • Better documentation for SQL
    • Enhancements on MLlib, GraphX

Useful talks:

Delta Engine/Photon/Koalas

Being a big Delta proponent, this was important to me especially as adoption grows and large-scale implementations need continuous improvements in this product to justify rising storage costs on cloud providers as the scale grows.

The Delta Engine now has an improved query optimizer and a native vectorized execution engine written in C++. This builds on the optimized reads and writes in today’s NVMe SSDs that eclipse the SATA SSDs found in previous generations along with faster seek times. Gaining these efficiencies out of the CPU at the bare metal level is significant especially as data teams deal with more and more unstructured data and high velocity. The C++ implementation helps exploiting data-level and instruction-level parallelism as explained in detail in the keynote by Reynold Xin. Some interesting benchmarks on strings using regex to demonstrate faster processing. Looking forward to more details on how the optimization works under the hood and implementation guidelines.

Koalas 1.0 now implements 80% of the Pandas APIs. We can invoke accessors to use the Pyspark APIs from Koalas. Better type hinting and a ton of enhancements on DataFrames, Series and Indexes with support for Python 3.8 make this another value proposition on Spark.

A lot of focus on Lakehouse in ancillary meetings were encouraging and augurs well for data democratization on a single linear stack versus fragmenting data across data warehouses and data lakes. The Redash acquisition will provide another option for large scale enterprises for easy-to-use dashboarding and visualization capabilities on these curated data lake. Hope to see more public announcements on that topic.

MLflow

More announcements around the MLflow model serving aspects with Model Registry (announced in April) that lets data scientists track model lifecycle across versions such as Staging, Production, or Archived. With MLflow in the Linux Foundation, it helps evangelizing it to a larger audience with a vendor-independent non-profit managing this project.

  • Autologging : Enables automatic logging of Spark datasource information at read-time, without the need for explicit log statements. mlflow.spark.autolog() will enable auto logging for spark data sources if you provide the relevant data and versions using Delta Lake so the managed Databricks implementation definitely looks slicker with the UI. Implementation would be as easy as attaching a ml-flow spark JARS and then call mlflow.spark.autolog. More significantly, enables the cloning of models.
  • On Azure – the updated mlflow.azureml.deploy API for deploying MLflow models to AzureML. This now uses the up-to-date Model.package() and Model.deploy() APIs.
  • Model schemas for input and out schemas, custom metadata tags for tracking which means more metadata to track which is great.
  • Model Serving : Ability to deploy models via a rest endpoint on Hosted ML Flow which is great. Would have loved to see more turnkey methods to deploy to an agnostic deployment endpoint say, a managed Kubernetes service – the current implementation is for databricks clusters from what I noticed.

  • Lots of cool UI fixes including highlighting different parameter values when comparing runs, UI plot updates with scaling to thousands of points.

Useful talks:

Looking forward to trying out the new MLflow features which will go on public preview later in July.

Let it read – A character-based RNN generating Beatles lyrics

The Bob Spitz biography stares at me all day as I spend most of my waking hours at my home office desk taunting me to finish it one of these days. Probably subliminally influenced me to write a quick app to generate lyrics using a Recurrent Neural Net. The Databricks community edition or Google collab makes it a breeze to train these models at a zero or reasonable price point using GPUs.

All the buzz around GPT-3 and my interest in Responsible AI is helping inspire some late night coding sessions blown away by the accuracy of some of these pretrained language models. Also wanted to play around with Streamlit which saves me from tinkering around with javascript frameworks and what not while trying to deploy an app. The lack of a WSGI option limits some deployment options but I found it relatively easy to containerize and deploy on Azure.

The official TensorFlow tutorial is great for boilerplate which in turn is based on this magnum opus. The tutorial is pretty straight forward and google collab is great for training on GPUs.

Plenty of libraries available to scrape data – this one comes to mind. The model is character-based and predicts the next character in the sequence given a sequence. I tweaked around training this on small batches of text and finally settled on around 150 characters to start seeing some coherent structures. My source data scould do better, it stops after Help I believe. On my todo list is to embellish it with full discography so as to make the model better.

The model summary is as defined below:

Didn’t really have to tweak the sequential model too much to start seeing some decent output. The 3 layers were enough along with a Gated Recurrent Unit ( GRU). The GRU seemed to give me a better output than LSTM so I let it be…

As per the documentation, for each character the embedding is inputted into the GRU which is then run with one timestep. The dense layer generates the logits predicting the log-likelihood of the next character.

For each character the model looks up the embedding, runs the GRU one timestep with the embedding as input, and applies the dense layer to generate logits predicting the log-likelihood of the next character.

The standard tf.keras.losses.sparse_categorical_crossentropy  loss function is my usual go-to in this case because the classes are all mutually exclusive. Couldn’t get past 50 epochs on my mac book pro without the IDE hanging so had to shift to a Databricks instance which got the job done with no sweat on a 28.0 GB Memory, 8 Core machine.

All things must pass and 30 minutes later, we had a trained model thanks to an early stopping callback monitoring loss.

The coolest part of the RNN is the text generating function from the documentation that you can configure the number of characters to generate. It uses the start string and the RNN state to get the next character. The next predicted character is based on the categorical distribution which provides the index of the highest distributed category as the next input into the model. The state of the model is retained per input and modified states of the model are fed back into the model to help it learn. This is the magical mystery of the model. Plenty of more training to do as I wait for the Peter Jackson release.

Link to app: https://stream-web-app.azurewebsites.net/

Github: https://github.com/vishwanath79/let_it_read

Alesis V49

Finally bit the bullet and added the Alesis v49 to my motley collection of instruments. 49 keys is enough for a novice like me. I’ve been near midi keyboards a lot when I actively played guitars for years but never really felt the inclination to own one. The piano & guitar combination is still my favourite duo when it comes to making music that feels good- think massive drum fills, shredding guitars with minor arpeggios, edgy bass, FM bells and a moody atmosphere and you got my code.

 The main goal here is to get better at programming accompaniments to instrumental guitar and to get better at arranging compositions.The 8 LED backlit pads respond OK though I haven’t really used them a lot. The sensitivity index on them seems different so it seems like a bit of bother to use them both especially for fast passages. The 4 assignable knobs are super convenient to program. I really like the Pitch and Mod wheels which let you wail so I can channel my inner Sherinian.

The 49 keys are full-sized and semi-weighted. Compared to a legit piano, they size up pretty well and feel great. The form factor is slightly more harder to press down if you do a direct comparison with a full-fledged digital keyboard but its not too far off. Alesis also allows you to alter the channel, transposition, octave and velocity curve for the keys – I did that immediately after unboxing as most of my earlier research suggested it best to fix that early before I got too comfortable with the “stiffness”.

Overall build quality seems solid. The 37-inch size of this workspace is where its appeal lies for me. I can easily place it on my workdesk without needing separate storage space for it.The package also comes with the Ableton Live Lite Alesis Edition & MIDI editor software. The activation and account creation steps were a breeze – firmware upgrade and I was good to go in 15 minutes.

Demo