2020 in Books


I usually set reading goals at the beginning of the year to counter the array of distractions that have dented my reading habit over the last few decades. goodreads is great for maintaining some accountability and I’m in awe of some of my friends who seem to knock out 80-100 books easily every year despite their hectic work lives. I tried to veer myself away from my usual mix of management/leadership books this year to reignite fiction reading. The results were mixed thanks to massive mood swings through the year and I ended up with a random mix of tech, biographies, music, non-fiction and horror/sci-fi tomes that helped distract me from a depressing year.

Complete list of books here.

Some notable ( and not so notable) reads this year:

Greetings from Bury Park by Sarfraz Manzoor: A Springsteen tribute along with an immigrant experience? Sign me up! After enjoying the Blinded by the night with my better half, I had to get my hands on Sarfraz Manzoor’s ode to the boss amidst the backdrop of immigrant life in 80s Luton. The book is set in a “non-linear” timeline mode which may put off some readers but I found it a wonderful read albeit with some cliched moments probably dramatized. It also took me to pilgrimage of the boss’ older catalog and some of his late 80s/early 90s work that I love.(Think Tunnel of Love/ Human Touch era). As a huge music fan and an immigrant who traveled halfway around the world to chase dreams built on the foundational goals of exposing myself to new cultures/thinking and seeing my favorite musicians in live arenas in the flesh,the book resonated with me on the universality of music across cultures.

The Billionaire Raj: A Journey Through India’s New Gilded Age: India is a land of skewed levels of haves and have nots at an unprecedented scale. You don’t understand it till you see it and even when you see it, your understanding is peripheral at best when confronted with the magnitude of the problem and the inequality at scale. James Crabtree’s detailed tome on the lifestyles of the rich and famous in India helps decrypt the inequalities and the “crony capitalism” that ensures the system stays that way. The nexus between politics, Bollywood, business tycoons are all deciphered out and connected together to explain the irony of situations like a ~2 Billion dollar personal home in Mumbai towering over a squalor of a million people in a nearby slum.

Tesseract by Alex Garland : I’m a big fan of Alex Garland’s works – 28 Days Later, The Beach, Sunshine and Ex-machina. On top of it all, he slam dunked the best version of Dredd on us innocent fans and immortalized Keith Urban in that role. I had huge hopes for Tesseract but it ended up being a random story of disparate characters linked together by the thinnest of chances and a shallow plot. Not his finest hour and I could not wait to finish this as it felt as slow as  drug-induced slow-motion sequences in Dredd which were way more enjoyable. Still a huge AG fan regardless.

Seven years in Tibet by Heinrich Harrer: Heinrich Harrer’s discovery of Tibet deserves it own post.

Devolution by Max Brooks: Anyone who has read World War Z knows the doom-laden nightmarish scenarios that the author can generate and this one is no exception. Obnoxious characters dealing with first-world problems in their isolated eco-friendly community encounter an even more ominous situation with Mt.Rainier erupting and get blockaded. Hell breaks loose in the form of rampaging sasquatches who ( thankfully in some cases) start taking out the characters one by one. A personal journal left behind serves as the narrative and tons of interesting sasquatch legends abound including this one of Roosevelt’s own” encounter” with the sasquatch.

The Long Walk by Stephen King: Big fan of the king. The plot and premise was great here but it did slow down towards the end. On hindsight, this book is probably best enjoyed via an audio book while on the treadmill. Overall an OK read.

Ibn Fadlan and the Land of Darkness: Arab Travellers in the Far North: Travel log of Ibn Fadlan , a tenth-century diplomat who, in 922 AD, was sent on a mission from Baghdad to the far north by the caliph Muqtadir. His journal serves as an important account of life in mordern-day Russia/Middle east and the areas in between. Repetitive and slow moving in places , the book abounds with interesting details on the trading routes, strange customs, vikings raids, savage rituals, food habits, wealth management, religion and a disconnected world at a strange dark time.

Bloodcurdling Tales of Horror and the Macabre: The Best of H. P. Lovecraft by H.P. Lovecraft : Lovecraft (despite his controversial views) is the master of horror fiction and I’ve always enjoyed the predictable nature of a classic lovecraft pulp. More so for the constant reminders that much of what we know is unknown and we are all inconsequent specks of dust in the vastness of time ( To somewhat paraphrase Carl Sagan, if the entire history of earth was compressed into 365 days, humans would have existed for ~30 seconds.) Usually a Halloween ritual, this year was no different with Lovecraft to distract me from the horrors raging outside. This collection has all the usual ones including Call of Cthulhu, Dunwich Horror and the Shadow over Innsmouth.

Idea Makers: Personal Perspectives on the Lives & Ideas of Some Notable People by Stephen Wolfram : Short biographies of giants in the filed of mathematics/computer science. Despite the authors inclination to insert the use of Mathematica tool into a “what-if” scenario into every biography, this was still an enjoyable read. The author is a giant in his field and I suppose some level of braggadocio is expected. The fascinating backstories into characters like Leibnz, Babbage, Feynman and Ramanujan is written powerfully and by someone with the grasp of minutiae of their research areas which is awe-inspiring. My highlights here.


The tech books followed a predictable pattern of excellent reads this year helping me keep abreast of work-related topics mostly focused on Spark, Databases and ML.

Database Internals: A deep-dive into how distributed data systems work by Alex Petrov: Informative but would have preferred more examples with practical scenarios. No code and this all mostly conceptual. Some good references to papers for subsequent reading. The first part of the book deals primarily with storage and covers an in-depth discussion of b-trees and types. The second half is focused on distributed systems and has useful sections on consensus protocols. Concepts like “2-phase commits” are explained well with figures. However, the lack of practical examples/code and overall dry subject matter made this a laborious read. Good book to reference theoretical concepts. 

Practical Deep Learning for Cloud, Mobile, and Edge: Real-World AI & Computer-Vision Projects Using Python, Keras & Tensorflow: Plenty of examples and links for more research. The material is too vast enough to make an all encompassing book but this delivers in terms of practical tips. Lots of practical tips provided that will find a place in any serious ML engineer repertoire. The consolidated list of tips are worth the book alone. Excellent comparisons of Raspberry Pi, Jetson Nano, and Google Coral. The reinforcement learning sections could have used some more practical examples in areas like q-learning but overall great read and reference material.


The books this year included Van Halen ( Eddie R.I.P), Megadeth, Glyn Johns amongst others.

Hard to Handle: The Life and Death of the Black Crowes–A Memoir by Steve Gorman: Great read start to finish. The Black Crowes were one of the many soundtracks of my teenage years in the 90s. The first three albums are seminal works that stand out despite having to contend with changing musical climate with the rise of grunge and the decline of hair metal. The book is a page-turner for anyone even vaguely familiar with the Black Crowes. It made me go back and re-immerse into the catalog especially with albums like By your side which is an underrated gem produced by the great Kevin Shirley and their successful but short-lived collaboration with Jimmy Page – Live at the Greek. Gorman’s insight as a founding member and the frank admission of all the dysfunction makes up for a great story. Im a huge fan of Rich Robinson’s use of open G tuning and this book has led to more inspired practicing in that vein.

Overall 35 books for the year which I can hopefully better in 2021. Current reading list here.

ONNX for ML Interoperability

Having been a Keras user since  I read  the seminal Deep Learning with Python , I’ve been experimenting with exporting formats to different frameworks to be more framework-agnostic.

ONNX ( Open Neural Network Exchange) is an open format for representing traditional and deep learning ML models.  Key goal being promoting inter-operability between a variety of frameworks and target environments. ONNX helps you to export a fully trained model into its format and enables targeting diverse environments without you doing manual optimization and painful rewrites of the models to accommodate environments.
It defines an extensible computation graph model along with built-in operators and standard data types to allow for a compact and cross-platform representation for serialization. A typical use case could be scenarios where you want to use transfer learning to use model weights of another model possibly built in another framework into your own model i.e. if you build  a model in Tensorflow, you get a protobuf (PB) file as output and it would be great if there is one universal format that you can now convert to the PT format to load and reuse in Pytorch or use its own hardware agnostic runtime.

For high-performance inference requirements in varied frameworks, this is great with platforms like NVIDIA’s TensorRT supporting ONNX with optimizations aimed at the accelerator present on their devices like the Tesla GPUs or the Jetson embedded devices.


The ONNX file is a protobuf encoded tensor graph. List of operators supported are documented here and operations are referred to as “opsets” i.e. operation sets. Opsets are defined for different runtimes in order to enable interoperability. The operations are a growing list of widely used linear operations, functions and other primitives used to deal with tensors.

The operations include most of the typical deep learning primitives, linear operations, convolutions and activation functions. The model is mapped to the ONNX format by executing the model with often just random input data and tracing the execution. The operations executed are mapped to ONNX operations and so the entire model graph is mapped into the ONNX format. After this the ONNX model is then saved as .onnx protobuf file which can be read and executed by a wide and growing range of ONNX runtimes.

Note – Opsets are fast evolving and with fast release cycles of competing frameworks, it may not always be easy to upgrade to the latest ONNX version if it breaks compatibility with other frameworks. The file format consists of the following:

  • Model: Top level construct
    • Associates version Info and Metadata with a graph
  • Graph: describes a function
    • Set of metadata fields
    • List of model parameters
    • List of computation nodes – Each node has zero or more inputs and one or more outputs.
  • Nodes: used for computation
    • Name of node
    • Name of an operator that it invokes a list of named inputs
    • List of named outputs
    • List of attributes

More details here.


The ONNX model can be inferenced with ONNX runtime that uses a variety of hardware accelerators for optimal performance. The promise of ONNX runtime is that it abstracts the underlying hardware to enable developers to use a single set of APIs for multiple deployment targets. Note – the ONNX runtime is a separate project and aims to perform inference for any prediction function converted to the ONNX format.

This has  advantages over dockerized pickle models that is usually the approach in a lot of production deployments where there are runtime restrictions (i.e. can run only in .NET or JVM) , memory and storage overhead, version dependencies, and batch prediction requirements.

ONNX runtime has been integrated in WINML, Azure ML with MSFT as its primary backer. Some of the new enhancements include INT8 quantization to reduce floating point numbers for reducing model size, memory footprint and to increase efficiencies benchmarked here.

The usual path to proceed :

  • Train models with frameworks
  • Convert into ONNX with ONNX converters
  • Use onnx-runtime to verify correctness and Inspect network structure using netron (https://netron.app/)
  • Use hardware-accelerated inference with ONNX runtime ( CPU/GPU/ASIC/FPGAs)


To convert Tensorflow models, the easiest way is to use the tf2onnx tool from the command line. This converts the saved model to a model representation that includes the inference graph.

Here is an end-to-end example of saving a simple Tensorflow model , converting it to ONNX and then running the predictions using the ONNX model and verifying the predictions match.


However, some things to consider while using this format is the lack of “official support” from frameworks like Tensorflow. For example, Pytorch does provide the functionality to exports models into ONNX (torch.ONNX ) however I could not find any function to import an ONNX model to out put a Pytorch model. Considering CAFFE 2 that is a part of PyTorch fully supports ONNX import/export, it may not be totally unreasonable to expect an official conversion importer(there is a proposal already documented here).

The Tensorflow converters seem to be part of the ONNX project i.e. not an official/out of the box Tensorflow implementation. List of Tensorflow Ops supported are documented here. The github repo is a treasure trove of information on the computation graph model and the operators/data types that power the format. However, as indicated earlier depending on the complexity of the model (especially in transfer learning scenarios), it’s likely to encounter conversion issues during function calls that may cause the ONNX converter to fail. In this case, there are likely scenarios which may necessitate modifying the graph in order to fit the format. I’ve had a few issues running into StatefulPartitionednCalls especially in using TransferLearning situations for larger encoders in language models.

I have also had to convert Tensorflow to PyTorch by first converting Tensorflow to ONNX. Then the ONNX models to Keras using onnx2keras and then convert to Pytorch using MMdn with mixed results and a lot of debugging and many abandons. However, I think ONNX runtime for inference rather than framework-to-framework conversions will be a better use of ONNX.

The overall viability of a universal format like ONNX though well intentioned and highly sought may not fully ever come into fruition with so many divergent interests amongst the major contributors and priorities though its need cannot be disputed.


Replay is a collaboration track and part of an evolving experiment with multi-tracked guitars revolving around cyclic patterns. More collaborations and sounds to follow.

Arjun on Bass

Sunder – Drums (Instagram and Facebook – @onlysunder)

Mini-glossary of terms in audio production

I’ve used a lot of audio engineering terms over the years and realized that a lot of them were not exactly what I was referring to/meant. While talking to a lot of experienced audio engineers, I’ve always found the below glossary useful to convey my objectives effectively. Hopefully this serves as starter boilerplate for more research with more terms to be added on.  A lot of these and more are covered in Coursera’s excellent course on the Technology of Music Production.

Nature of sound


Tracks, Files and Editing



Dynamic Effects

Filter and Delay Effects

Nature of sound

Amplitude: Size of the vibration of sound. Larger sizes (louder sound) indicate louder amplitude. Measured in decibels. Multiple places in the signal flow where we measure amplitude.

  • In the air: dBSPL or decibels of sound pressure level
  • In the digital domain: dBFS or decibels full scale

Compression: Compression is one of the most commonly used type of dynamic processing. It is used to control uneven dynamics in individual tracks in a multi track mix and also to be used in creative ways like decays of notes and for fatter sounds. Compressors provide gain reduction which is measured by metrics like Ratio control.

  • For example a ratio like 4:1 , means audio that goes above 4 dB above Threshold will be reduced to it only goes 1 dB above only.

Decibel: The words bel and decibel are units of measurement of sound intensity. Bel” is a shortening of the name of inventor Alexander Graham Bell (1847-1922).

  • A bel is equivalent to ten decibels and used to compare two levels of power in an electrical circuit.
  • The normal speaking range of the human voice is about 20-50 decibels.
  •  Noise becomes painful at 120 db. Sounds above 132 db lead to permanent hearing damage and eardrum rupture.

Frequency: Speed of the vibration which determines the pitch of the sound. Measured as the number of wave cycles that occur in one second.

Propagation: Sequence of waves of pressure (sound) moving through a medium such as water, solids or air.

Timbre: Term used to indicate distinguished characteristics of a sound. For example a falsetto versus a vibrato.

Transducer: Another term for a microphone. Converts one energy type to another. A microphone converts sound pressure variations in the air into voltage variations in a wire.

Digital Audio Workstation (DAW)

Bit Rate: Product of sampling rate and sampling depth and measured as bits per second. Higher bit rates indicates more quality. Compressed audio formats (mp3) have lower bit rates than uncompressed (wave).

Buffer Size: Amount of time allocated to the DAW for processing audio.Used to balance the delay between the audio input ( say a guitar plugged in ) to the sound playback and to minimize any delay. It usually works best to set the buffer size to a lower amount to reduce the amount of latency for more accurate monitoring. However, this puts more load on the computer’s processing power and could cause crashes or interruptions.

Sampling Rate: Rate at which samples of an analog signal are taken to be converted to a digital form Expressed in samples per second (hertz). Higher sampling rates indicate better sound as they indicate higher samples per second. An analogy could be FPS i.e Frames per second in video. Some of the values we comes across are 8kHz, 44.1kHz, and 48kHz. 44.1 kHz are most common sampling rates for audio CDs.

Sampling Depth: Measured in bits per sample indicates the number of data points of audio. An 8-bit sample depth indicates a 2^8 = 256 distinct amplitudes for each audio sample. Higher the sample depth, better the quality. This is analogous to image processing where higher number of bits indicate higher quality.

Sine waveCurve representing periodic oscillations of constant amplitude. Considered the most fundamental of sound. A sine wave can be easily recognized by the ear. Since sine waves consist of a single frequency, it’s used to depict/test audio.

In 1822, French mathematician Joseph Fourier discovered that sinusoidal waves can be used as simple building blocks to describe and approximate any periodic waveform, including square waves. Fourier used it as an analytical tool in the study of waves and heat flow. It is frequently used in signal processing and the statistical analysis of time series.


  • Wave: Uncompressed at chosen bit rate and sampling speed. Takes up memory and space.
  • AIFF: Audio Interchange File Format (AIFF): Uncompressed file format (originally from Apple). High level of device compatibility and used in situations for mastering files for audio captured live digitally.
  • MP3: Compressed Audio layer of the larger MPEG video file format.Smaller sizes and poorer quality that the formats above. Compresses data using a 128 kbit/s setting that results in a file about 1/11th of the size of the data.
  • MIDI: Musical Instrument Digital Interface – commonly defined as a set of instructions instructing the computers sound card on creating music.Small in size and control notes of each instrument, loudness, scale, pitch etc.

Tracks, Files and Editing

  • Cycling: Usually refers to musical cycles formed by a group of cycles.Useful for arrangements and re-arrangements
  • Comping: Process where you use the best parts of multiple takes and piece them together for one take.DAWS such as ProTools allow multiple takes that are stocked in a folder in a single track.
  • Destructive editing: Editing in which changes are permanently written to the audio file. Though these can usually be undone based on the DAW undo history in reverse order. Helps when you have less processing power and need to see changes applied immediately and in case where you know you don’t want to repeat that change again. Non-destructive editing uses computer processing power to make changes on the fly.
  • Fades: Fades are progressive increases (fade-in) or decreases (fade-out) of audio signals. Most commonly used when no obvious ending of a song. Crossfades are transitional regions that can bridge regions so the ending of one fades into another.


  • Controllers: Hardware or software that generates and transmits MIDI data to MIDI-enabled devices, typically to trigger sounds and control parameters of an electronic music performance.
  • Quantization: One of the more important concepts. Quantization has many meanings based on the task to be performed but in this context, it’s for making music with precision with respect to timing of notes. To compensate for human error on precision, quantization can help nail the right note at the mathematically perfect time. While great for MIDI note data, it does become challenging but a worthwhile effort to quantize MIDI tracks. Most DAWS have this built-in but this is not a magic wand to blow away all your problems. Quantization in my experience works best when Ive performed a track with acceptable level of timing.
  • Velocity: Force with which a note is played and used to making MIDI sounds more human ( or more mechanical if thats the intent). This typically controls the volume of the note and can be used to control dynamics, filters, multiple samples and other functions.


  • Automation: Process where we can program the arrangements, level, EQ to change based on pre-determined pattern. For example automation to increase the reverb just before the chorus or add delays to a particular part in the mix. 
  • Auxiliary sends: Type of output used in mixers while recording. Allows the producer to create an ‘auxiliary” mix where you can control each input channel on the mixer. This helps route multiple input channels to a single output send. A mixer can choose how much of a signal that needs to be sent to the aux channel. In Ableton, two Aux channels (Titled A and B) are created by default. Aux channels are great for filtering in effects such as reverb and delay. 
  • Channel strip – Type of preamp with additional signal processing units, similar to an entire channel in a mixing console (example).
  • Bus: Related to Aux sends above, a bus is a point in the signal flow where multiple channels are routed into the same output. In Ableton, this is the Master channel – where all the tracks merge together before being exported.
  • Unbalanced cables pick up noise ( from electrical, radio and power interference from nearby cables) and are best used for short distances, for example a short cable to connect different analog pedals with each other. Quarter inch TS (tip, sleeve) cables are used for unbalanced cables. 
  • Balanced cables: Have ground wire and carry two copies of the same signal that are reversed in polarities and they travel down the cable and cancel each other out. Once the two signals get to the other side of the cable, the polarity of the negative signal gets reversed so both signals are in sync. The noise as the signals travelled is picked up by both signals but not reversed in polarity effectively eliminating it.

Dynamic Effects

  • Downward compressor: Same as a compressor which is reducing the level of louder things. When explicitly called out , “upward compressors” bring up the volume of the quiet material. One of the most important effects in audio engineering. Compressors are used for dynamic range and compresses the signal.Expander” Expander expands dynamic range. Louder parts become louder, quieter parts become quieter. Making it louder means amplifying the signal that passes the threshold, it is the opposite of a compressor. 
  • Gate: Provides a floor level for the signal to cross to get through – if the signal is below the gate level if will be treated as silence. Used to cut out the audio when it’s quiet.
  • Limiter: Serves as a ceiling above which the signal cannot pass. It’s essentially a compressor with a very high ratio – as the compression increases, the ratio increases.

Filter and Delay Effects

  • Convolution reverb: Convolution reverbs digitally simulate the reverberation of a physical or virtual space. They are based on mathematical convolution operations and use pre-recorded audio samples of the impulse response of the space being modeled. These use an Impulse Response (IR) to create reverbs. An impulse response is a representation of the signal change as it goes through a system. The advantage of a convolution reverb is its ability to accurately simulate reverb for natural sounding effects. The disadvantage is that it can be computationally expensive. Impulse response is the recording of a real space that we are applying with this mathematical procedure called convolution. In most Convolution plugins, we can find a wide variety of audio files that are representing a large number of real spaces. So, DAWS have large selections where we can simulate different places say a small club versus a stadium.
  • Algorithmic reverb: Algorithmic reverbs are based on the settings we set in our DAW. These simulate the impulse responses. Algorithmic reverbs use delay lines, loops and filters to simulate the general effects of a reverb environment. All non-convolution reverbs can be considered as algorithmic. Algorithm reverbs are kind of like synthesizers  since we are creating the impression of a space with an algorithm of some sort of a mathematical representation. These create echoes using mathematical algorithms to simulate the delays that occur in reverb. Tradeoff is that these may sound less natural than convolution reverbs.
  • Comb filtering: Two audio signals that are playing the same signal arrive at the listeners ears at different times due to a delay. The signals look like a comb when graphed out. 
  • Dry/wet: Dry sounds that has no effects of any kinds of modifications. Raw unprocessed sound. Wet sounds are processed sounds with effects that are added while recording or after mixing.
  • Low Shelf filter: Low shelf filters cut or boost signals of frequencies below a threshold. These usually use “cutoff frequencies” to cut /boost lower frequencies mostly to ensure instruments don’t interfere with each other. Used a lot during guitar EQ mixing and vocals.

Deep Learned Shred Solo

Music generation with Recurrent Neural Nets has been of great interest to me with projects like Magenta displaying amazing feats of ML-driven creativity. AI is increasingly being used to augment human creativity and this trend will lay to rest creativity blocks like in the future. As someone who is usually stuck in a musical rut, this is great for spurring creativity.

With a few covid-induced reconnects with old friends (some of whom are professional musicians) and some inspired late night midi programming on Ableton, I decided to modify some scripts / tutorials that have been lying around on my computer to blend deep learning and compose music around it as I research on the most optimal ways to integrate Deep Learning into original guitar music compositions.

There’s plenty of excellent blogs and code on the web on LSTMs including this one and this one on generating music using Keras. LSTMs have plenty of boiler plate code on github that demonstrate LSTM and GRUs for creating music. For this project, I was going for recording a guitar solo based on artists I like and to set up a template for future experimentation for research purposes. A few mashed up solos of Yngwie served as the source data but the source data could have been pretty much anything in the midi format and it helps to know how to manipulate these files in the DAW, which in my case was Ableton. Most examples on the web have piano midi files that generate music in isolation. However, I wanted to combine the generated music with minimal accompaniment so as to make it “real”.

With the key of the track being trained on being in F Minor , I also needed to make sure i have some accompaniment in the key of FMinor for which I recorded a canned guitar part with some useful drum programming thanks to EZDrummer.

Tracks in Ableton

Note: this was for research purposes only and for further research into composing pieces that actually make sense based on the key being fed into the model. 

Music21 is invaluable for manipulating midi via code. Its utility is that is lets us manipulate starts, durations and pitch. I used Ableton to use the midi notes generated to plug in an instrument along with programmed drums and rhythm guitars.

Step 1:

Find the midi file(s) you want to base your ML solo on. In this case, Im going for generating a guitar solo to layer over a backing track. This could be pretty much anything as long as its midi that can be processed by Music21.

Step 2:

Preprocessing the midi file(s): The original midi file had guitars over drums, bass and keyboards. So, the goal was to extract the list of notes first to save them, the instrument.partitionByInstrument() function, separates the stream into different parts according to the instrument. If we have multiple files we can loop over the different files to partition it by individual instrument. This returns a list of notes and chords in the file.

from tqdm import tqdm
songs = glob(' /ml/vish/audio_lstm/YJM.mid') # this could be any midi file to be trained
notes = []
for file in tqdm(songs):
    midi = converter.parse(file) # convert all supported data formates to music21 objects
    notes_parser = None
        # partition parts for each unique instrument
        parts = instrument.partitionByInstrument(midi)
        print("No uniques")

    if parts: 
        notes_parser = parts.parts[0].recurse()
        notes_parser = midi.flat.notes # flatten notes to get all the notes in the stream
        print("parts == None")

    for element in notes_parser:
        if isinstance(element, note.Note):# check if elements are in the note class
            notes.append(str(element.pitch))  # Returns  Pitch objects found as a Python List
        elif(isinstance(element, chord.Chord)):
          notes.append('.'.join(str(n) for n in element.normalOrder))  
print("notes:", notes)

Step 3:

Creating the model inputs: Convert the items in the notes list to an integer so they can serve as model inputs. We create arrays for the network input and output to train the model. We have 5741 notes in  our input data and have defined a sequence length of 50 notes. The input sequence will be 50 notes and the output array will store the 51st note for every input sequence that we enter. Then we reshape and normalize the input vector sequence. We also one hot encoder on the integers so that we have the number of columns equal to the number of categories to get a network output shape of  (5691, 92). I’ve commented out some of the output so the results are easier to follow.

pitch_names = sorted(set(item for item in notes))   # ['0', '0.3.7', '0.4.7', '0.5', '1', '1.4.7', '1.5.8', '1.6', 10', '10.1.5',..]
note_to_int = dict((note, number) for number, note in enumerate(pitch_names))  #{'0': 0,'0.3.7': 1, '0.4.7': 2,'0.5': 3, '1': 4,'1.4.7': 5,..]
sequence_length = 50
len(pitch_names) # 92
range(0, len(notes) - sequence_length, 1) #range(0, 5691)
# Deifne input and output sequence
network_input = []
network_output = []
for i in range(0, len(notes) - sequence_length, 1):
    sequence_in = notes[i: i + sequence_length]
    sequence_out = notes[i + sequence_length]
    network_input.append([note_to_int[char] for char in sequence_in]) 
print("network_input shape (list):", (len(network_input), len(network_input[0]))) #network_input shape (list): (5691, 50)
print("network_output:", len(network_output)) #network_output: 5691
patterns = len(network_input)  
print("patterns , sequence_length",patterns, sequence_length) #patterns , sequence_length 5691 50
network_input = np.reshape(network_input, (patterns , sequence_length, 1)) # reshape to array of (5691, 50, 1)
print("network input",network_input.shape) #network input (5691, 50, 1)
n_vocab = len(set(notes))
print('unique notes length:', n_vocab) #unique notes length: 92
network_input = network_input / float(n_vocab) 
# one hot encode the output vectors to_categorical(y, num_classes=None)
network_output = to_categorical(network_output)  
network_output.shape #(5691, 92)

Step 4:

Model: We invoke Keras to build out the model architecture using LSTM. Each input note is used to predict the next note. Code below uses standard model architecture from tutorials without too many tweaks. Plenty of tutorials online that explain the model way better than I can such as this: http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Training on the midi input can be expensive and time consuming so I suggest setting a high epoch number with calls backs defined based on the metrics to monitor, In this case,  I used loss and also created checkpoints for recovery and save the model as ‘weights.musicout.hdf5’. Also note , I trained this on community edition Databricks for convenience.

def create_model():
  from tensorflow.keras.models import Sequential
  from tensorflow.keras.layers import Activation, Dense, LSTM, Dropout, Flatten

  model = Sequential()
  model.add(LSTM(128, input_shape=network_input.shape[1:], return_sequences=True))
  model.add(LSTM(128, return_sequences=True))
  model.compile(loss='categorical_crossentropy', optimizer='adam',metrics=["accuracy"])
  return model

from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping
model = create_model()

save_early_callback = EarlyStopping(monitor='loss', min_delta=0,
                                    patience=3, verbose=1,
epochs = 5000
filepath = 'weights.musicout.hdf5'
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=0, save_best_only=True)
model.fit(network_input, network_output, epochs=epochs, batch_size=32, callbacks=[checkpoint,save_early_callback])

Step 5:

Predict: Once we have the model trained, we can start generating nodes based on the trained model weights along with feeding the model a sequence of notes. We can pick a random integer and a random sequence from the input sequence as a starting point. In my case, it involved calling the model.predict function for a 1000 notes that can be converted to a midi file. The results might vary at this stage, for some reason I saw some degradation after 700 notes so some tuning required here.

start = np.random.randint(0, len(network_input)-1)  # randomly pick an integer from input sequence as starting point
print("start:", start)
int_to_note = dict((number) for number in enumerate(pitch_names))
pattern = network_input[start]
prediction_output = [] # store the generated notes
print("pattern.shape:", pattern.shape)
pattern[:10] # check shape

# generating 1000 notes

for note_index in range(1000):
    prediction_input = np.reshape(pattern, (1, len(pattern), 1))
    prediction_input = prediction_input / float(n_vocab)

    prediction = model.predict(prediction_input, verbose=0) # call the model predict function to predict a vector of probabilities
    predict_index = np.argmax(prediction)  # Argmax is finding out the index of the array that results in the largest predict value
    #print("Prediction in progress..", predict_index, prediction)
    result = int_to_note[predict_index]   

    pattern = np.append(pattern, predict_index)
    # Next input to the model
    pattern = pattern[1:1+len(pattern)]

print('Notes generated by model...')
prediction_output[:25] # Out[30]: ['G#5', 'G#5', 'G#5', 'G5', 'G#5', 'G#5', 'G#5',...

Step 6:

Convert to Music21: Now that we have our prediction_output numpy array with the predicted notes, it’s time to convert it back into a format that Music21 can recognize with the objective of converting that back to a midi file.

offset = 0
output_notes = []

# create note and chord objects based on the values generated by the model
# convert to Note objects for  music21
for pattern in prediction_output:
    if ('.' in pattern) or pattern.isdigit():  # pattern
        notes_in_chord = pattern.split('.')
        notes = []
        for current_note in notes_in_chord:
            new_note = note.Note(int(current_note))
            new_note.storedInstrument = instrument.Piano() 
        new_chord = chord.Chord(notes)
        new_chord.offset = offset
    else:  # pattern
        new_note = note.Note(pattern)
        new_note.offset = offset
        new_note.storedInstrument = instrument.Piano()  

    # increase offset each iteration so that notes do not stack
    offset += 0.5

#Convert to midi
midi_output = music21.stream.Stream(output_notes)
print('Saving Output file as midi....')
midi_output.write('midi', fp=' /ml/vish/audio_lstm/yjmout.midi')

Step 7:

Once we have the midi file with the generated notes, the next step was to load the midi track into Ableton. The next steps were standard recording processes one would follow to record a track in the DAW.

a) Compose and Record the Rhythm guitars, drums and Keyboards.

Instruments/software I used:


b) Insert the midi track into the DAW and quantize and sequence accordingly. This can take significant time depending on the precision wanted. In my case, this was just a quick fun project not really destined for the charts so a quick rough mix and master sufficed.

The track is on soundcloud here. The solo kicks in around the 16 second mark. Note I did have to adjust the pitch to C to blend in with the rhythm track though it was originally trained on a track in F minor

There are other ways of dealing with more sophisticated training like using different activation functions or by normalizing inputs. GRUs are another way to get past this problem and I plant iterate on more complex pieces blending deep learning with my compositions. This paper gives a great primer on the difference between LSTMs and GRUs: https://www.scihive.org/paper/1412.355

TFDV for Data validation

Working with my teams trying to build out a robust feature store these days, it’s becoming even more imperative to ensure feature Engineering data quality. The models that gain efficiency out of a performant feature store are only as good as the underlying data. 

Tensorflow Data Validation (TFDV) is a python package from the TF Extended ecosystem. The package has been around for a while but now has evolved to a point of being extremely useful for machine learning pipelines as part of feature engineering and determining data drift scenarios. Its main functionality is to compute descriptive statistics, infer  schema,and detect data anomalies.  It’s well integrated with the Google Cloud Platform and Apache Beam. The core API uses Apache Beam transforms to compute statistics over input data.

I end up using it in cases where I need quick checks on data to validate and identify drift scenarios before starting expensive training workflows. This post is a summary of some of my notes on the usage of the package. Code is here.

Data Load

TFDV accepts CSV, Dataframes or TFRecords as input.

The csv integration and the built-in visualization function makes it relatively easy to use within Jupyter notebooks. The library takes input feature data and then analyzes them by feature to visualize them. This makes it easy to get a quick understanding of the distribution of values, helps identifying anomalies and identifying training/test/validate skew. Also a great way to discover bias in the data since you can infer aggregates of values that skewed towards certain features.

As evident, with trivial amount of code you can spot issues immediately – missing columns, inconsistent distribution and data drift scenarios where newer dataset could have different statistics compared to earlier trained data.

I used a dataset from Kaggle to quickly illustrate the concept:

import tensorflow_data_validation as tfdv
train = tfdv.generate_statistics_from_csv(data_location='Data/Musical_instruments_reviews.csv', delimiter=',')
# Infer schema
schema = tfdv.infer_schema(TRAIN)

This generates a data structure that stores summary statistics for each feature.

TFDV Schema

Schema Inference

The schema properties describe every feature present in the 10261 reviews. Example:

  • their type (STRING)
  • Uniqueness of features – for example 1429 unique reviewer IDs.
  • the expected domains of features.
  • the min/max of the number of values for a feature in each example. For example: If A2EZWZ8MBEDOLN is a reviewerid and has 36 occurrences
top_values {
        value: "A2EZWZ8MBEDOLN"
        frequency: 36.0
datasets {
  num_examples: 10261
  features {
    type: STRING
    string_stats {
      common_stats {
        num_non_missing: 10261
        min_num_values: 1
        max_num_values: 1
        avg_num_values: 1.0
        num_values_histogram {
          buckets {
            low_value: 1.0
            high_value: 1.0
            sample_count: 1026.1
          buckets {
            low_value: 1.0
            high_value: 1.0
            sample_count: 1026.1

Schema inference is usually tedious but becomes a breeze with TFDV. This schema is stored as a protocol buffer

schema = tfdv.infer_schema(train)

The schema also generates definitions like “Valency” and “Presence”. I could not find too much detail in the documentation but I found this useful paper that describes it well.

  • Presence: The expected presence of each feature, in terms of a minimum count and fraction of examples that must contain the feature.
  • Valency: The expected valency of the feature in each example, i.e., minimum and maximum number of values.

TFDV has inferred the revewerName as STRING and the universe of values around them termed as Domain. Note – TFDV can also encode your fields as BYTES. Im not seeing any function call in the API to update the column type as-is but you could easily update it externally if you want to explicitly specify a string. From the documentation, its explicitly advised to review the inferred schema and refine it per the requirement so as to embellish this auto-inference with our domain knowledge based on the data. You could also update the Feature based on the Data Type to BYTES, INT, FLOAT or STRUCT.

# Convert to BYTES
tfdv.get_feature(schema, 'helpful’).type=1 

Once loaded, you can generate the statistics from the csv file.
For a comparison and to simulate a  dataset validation scenario, I cut down the Musical_instruments_reviews.csv to 100 rows to compare with the original and also added an extra feature called ‘Internal’ with the values A, B,C randomly interspersed for every row.

Visualize Statistics

After this you can pass in the ‘visualize_statistics’ call to first visualize the two datasets based on the schema of the first dataset (TRAIN in the code). Even though this is limited to two datasets, this is a powerful way to identify issues immediately. For example – it can right off the bat identify “missing features” such as over 99.6% values in the feature. “reviewerName” as well as split the visualization into numerical and categorical features based on its inference of the data type.

# Load test data to compare
TEST = tfdv.generate_statistics_from_csv(data_location='Data/Musical_instruments_reviews_100.csv', delimiter=',')
# Visualize both datasets
tfdv.visualize_statistics(lhs_statistics=TRAIN, rhs_statistics=TEST, rhs_name="TEST_DATASET",lhs_name="TRAIN_DATASET")

A particularly nice option is the ability to choose a log scale for validating categorical features. The ‘Percentages’ option can show quartile percentages.


Anomalies can be detected using  the display_anomalies call. The long and short descriptions allow easy visual inspection of the issues in the data. However, for large scale validation this may not be enough and you will need to   use tooling that handle a stream of defects being presented. 

# Display anomalies
anomalies = tfdv.validate_statistics(statistics=TEST, schema=schema)

The various kinds of anomalies that can be detected and their invocation are described here. Some especially useful ones are:


Schema Updates

Another useful feature here is the ability to update the schema and values to make corrections. For example, in order to insert a particular value

# Insert Values
names = tfdv.get_domain(schema, 'reviewerName').value
names.insert(6, "Vish") #will insert "Vish" as the 6th value of the reviewerName feature

You can also adjust the minimum number of values that must be preset in the domain and choose to drop it if is below a certain threshold.

# Relax the minimum fraction of values that must come from the domain for feature reviewerName
name = tfdv.get_feature(schema, 'reviewerName')
name.distribution_constraints.min_domain_mass = 0.9


The ability to split data into ‘Environments’ helps indicate the features that are not necessary in certain environments. For example,if we want the ‘internal’  column to be in the TEST data but not the TRAIN data. Features in schema can be associated with a set of environments using:

  •  default_environment
  •  in_environment
  •  not_in_environment
# All features are by default in both TRAINING and SERVING environments.

# Specify that 'Internal' feature is not in SERVING environment.
tfdv.get_feature(schema2, 'Internal').not_in_environment.append('TESTING')

tfdv.validate_statistics(TEST, schema2, environment='TESTING')

Sample anomaly output:

string_domain {
    name: "Internal"
    value: "A"
    value: "B"
    value: "C"
  default_environment: "TESTING"
anomaly_info {
  key: "Internal"
  value {
    description: "New column Internal found in data but not in the environment TESTING in the schema."
    severity: ERROR
    short_description: "Column missing in environment"
    reason {
      short_description: "Column missing in environment"
      description: "New column Internal found in data but not in the environment TESTING in the schema."
    path {
      step: "Internal"
anomaly_name_format: SERIALIZED_PATH

Skews & Drifts

The ability to detect data skews and drifts is invaluable. However, the drift  here does not indicate a divergence from the mean but refers to the “L-infinity”  norm of the difference between the summary statistics of the two datasets. We can specify a threshold which if exceeded for the given feature flags the drift. 

Lets say we have two vectors [2,3,4] and [-4,-7,8] , the L-infinity norm is the maximum absolute value of the difference between the two vectors so in this case the absolute maximum of [6,10,-4] which is 1.

#Skew comparison
                 'helpful').skew_comparator.infinity_norm.threshold = 0.01
skew_anomalies = tfdv.validate_statistics(statistics=TRAIN,

Sample Output:

anomaly_info {
  key: "helpful"
  value {
    description: "The Linfty distance between training and serving is 0.187686 (up to six significant digits), above the threshold 0.01. The feature value with maximum difference is: [0, 0]"
    severity: ERROR
    short_description: "High Linfty distance between training and serving"
    reason {
      short_description: "High Linfty distance between training and serving"
      description: "The Linfty distance between training and serving is 0.187686 (up to six significant digits), above the threshold 0.01. The feature value with maximum difference is: [0, 0]"
    path {
      step: "helpful"
anomaly_name_format: SERIALIZED_PATH

The drift comparator is useful in cases where you could have the same data being loaded in a frequent basis and you need to watch for anomalies to reengineer features. The validate_statistics call combined with the drift_comparator threshold can be used to monitor for any changes that you need to action on.

#Drift comparator
tfdv.get_feature(schema,'helpful').drift_comparator.infinity_norm.threshold = 0.01
drift_anomalies = tfdv.validate_statistics(statistics=TRAIN,schema=schema,previous_statistics=TRAIN)
Anomaly_info {
  key: "reviewerName"
  value {
    description: "The feature was present in fewer examples than expected."
    severity: ERROR
    short_description: "Column dropped"
    reason {
      short_description: "Column dropped"
      description: "The feature was present in fewer examples than expected."
    path {
      step: "reviewerName"

You can easily save the updated schema in the format you want for further processing.

Overall, this has been useful to me to use for mainly models within the TensorFlow ecosystem and the documentation indicates that using components like StatisticsGen with TFX makes this a breeze to use in pipelines with out-of-the box integration on a platform like GCP.

The use case for avoiding time-consuming preprocessing/training steps by using TFDV to identify anomalies for feature drift and inference decay is a no-brainer however defect handling is up to the developer to incorporate. It’s important to also consider that ones domain knowledge on the data plays a huge role in these scenarios for optimizing data according to your needs so an auto-fix on all data anomalies may not really work in cases where a careful review is unavoidable.

This can also be extended for overall general data quality by applying to any validation cases where you are constantly getting updated data for the features. The application of TFDV could even be post-training for any data input/output scenario to ensure that values are as expected.

Official documentation is here.

Autoencoders for Data Anomalies

With more and more emphasis on data anomaly detection and the proliferation of build/buy options, I’ve been exploring auto encoders for a few projects. In a nutshell, Autoencoders are a type of neural network that take an input (image, data) minimize it down to core features and then reverse the process to recreate the input. Key aspect being that the encoding part is actually done in an unsupervised manner hence the ‘auto’.

For example, dismantling a picture of a automobile, taking out every part and representing ( encoding) them as chassis, wheels as representative components and then reassembling them (decoding) from the encoding minimizing some amount of expected reconstruction errors.

Autoencoders use an encoder that learns the concise representation of the input data and the decoder reconstructs that representation that has been compressed. A lot of the literature online calls this compressed vector to be the “latent space representation”.

The seminal paper on the subject that shows the benefits of Autoencoders has been dissected many times and demonstrates the use of Restricted Boltzmann Machines (a 2-layer Autoencoder consisting of a visible/hidden layer) that learns the difference between the hidden and visible layer using a metric called K-L divergence and provides a greater dimensionality reduction than Principal Component Analysis. Thankfully the implementation is much more approachable than some of the background math used to prove the model!

These are feedforward, non-recurrent neural networks having an input layer, output layer and one of more hidden layers with the count of output nodes matching the input nodes minimizing “noise’ instead of predicting a target variable as we do in supervised learning implementations. Hence, they dont require labels which qualifies them to be unsupervised.

In a market rife with products offering “data quality” solutions, using Autoencoders to detect for anomalies could have the potential for a low cost, easy to use solution built in house to add to existing options.

My focus has been more on exploring this for analyzing data anomalies in structured data. In terms of cost/benefit here, one could argue this might be overkill to use a neural network instead of more rule-based checks on the data which is very valid and extensively used in large enterprises instead of neural net deployments. However, the benefits of squashing the input data into a smaller representative vector help in cases where we deliberately need dimensionality reduction and recognizing outliers at scale. There are tons of material on the web for Image processing using autoencoders for use cases such as image compression, image denoising and medical imaging. For example, fascinating results by converting THIS to THIS make colorizing an engrossing endevor. Also tons of applications in the Natural Language Processing field for understanding text, word embeddings and learning semantic meaning of words.

Autoencoders – unlike GANS can’t generate newer datapoints since their core goal is to determine an identity function suing compression. Also, if the goal is to just achieve compression, they are poor general-purpose image compressors.

There are a few types of an Autoencoder well described here:

  • Denoising autoencoder
  • Sparse Autoencoder
  • Deep Autoencoder
  • Contractive Autoencoder
  • Undercomplete Autoencoder
  • Convolutional Autoencoder
  • Variational Autoencoder

Most of the examples I found online dealt with images, so for exploration I used Faker to generate a million records to simulate a data scenario for regular versus non-regular coffee consumers. The irregulars were determined on a random rule say those who spent less than a specified threshold.

The objective was to have the Autoencoder learn from the fabricated data examples on what the values for the “regular” customers were, test against a holdout dataset from the “regular” group and then use the model to identify anomalies post reconstruction to identify cases of irregularities. Essentially, have the autoencoder achieve reasonable compression on the data and then identify anomalous inputs while reading out data with irregular values that do not match the original representation.

Below is a simple gist I created for a walkthrough of the process for a possible implementation with comments inline that should be self-explanatory.

  • Customer Test Score : 0.014549643226719535
  • Customer Validation Score : 0.014477944187138688
  • Irregular Customer Validation Score : 3.541257450764963

The scores reflected the anomaly for a synthetic dataset consisting of a million records and I was able to use spark to scale this to well over 10 million records. Essentially, as you can tell the Irregular Customer validation scores against a validation dataset is around 35% well over the Customer validation score over the entire data set (1%). Next step is to try some of these approaches against more “production”-type data at scale and implement some alerting against this data to make this more actionable.

There are tons of considerations that make a quality data anomaly solution work for particular use cases not limited to Statistical analysis use cases, storage considerations, UI/UX for test case development, the right orchestration tools, database/data lake operability, scaling and developmental costs and security audit requirements. Hence, the methodology for detection is just one piece for a much larger puzzle.

Some interesting reads/videos:

Laplace and the law of small data

One of my recent favorite reads is The Computer Science of Human Decisions by Brian Christian/Tom Griffiths. In the age of “big” data, encountering uncertainty and little data is also a norm in our daily lives and Laplace offers us a rule of thumb to make an optimized decision event with one observation.

Pierre-Simon, marquis de Laplace is a ubiquitous character in the annals of Science history in the 1700s and my undergraduate Mathematics years. The term “inverse Laplace transform” would be met with an uncontrollable shudder during finals especially if you had allergies towards solving differential equations using integral transforms.

The Marquis de Laplace

However, a few decades later and with the prevalence of matrix operations and linear algebra in deep learning and by extension my overall appreciation for advanced mathematics, I’ve been fascinated by some of his work. Laplace was a mathematician and physicist ands appears all over the place in the field. He is known as the “French Newton” , a bonafide virtuoso with contributions like the Laplace transforms, Laplace equation, Laplace operators amongst other things. If that weren’t enough, he also enlightened the world with theories on black holes and gravitational collapse. He was also a marquis in the french court after the Bourbon Restoration (which as much as it wants to does not refer to the weekend festivities in my backyard, it actually refers to a period in French history following Napoleon’s downfall ).

Laplace essentially wrote the first hand book on Probability with “A philosophical essay on probabilities” – a magnificent treatise that reflects the author’s depth of knowledge and curiosity. A bit dense in parts but a fascinating look at 18th century French life from the eyes of a polymath. Unless a deep researcher in Probabilistic history, the material is organized well enough to comb through points of interest. Part 1 is a “philosophical essay on probabilities” while part 2 is an “application of the calculus of probabilities”.

Laplace’s rule of Succession primarily solving the Sunrise problem is extremely important to compute probabilities when the originating events have the same probability.

Every day the sun same up n times in a row, what’s the probability it will rise tomorrow? One can imagine he got ridiculed for it since we have never known/experienced a day the sun never rises and hence it is the end of the world if it’s not going to rise the next day. More specifically, the problem does not seem realistic considering it assumes every day is an independent event ile random variables for the sun rising on each day.

We have evidence that the sun has risen n times in a row, but we don’t know what the value of P or the probability is. Treating this P as unknown brings to fore a long standing debate in statistics between frequentists and Bayesians. As per the Bayesian point of view, since P is unknown, we treat P as a random variable with distribution. As with Bayes theorem, we start with prior beliefs about P before we have any data. Once we collect data, we then use Bayes rule to update this based on our evidence.

The integral calculus leading to deriving this rule is masterfully explained here in this lecture on moment generating functions (MGFs) by Joe Blitzstein. Amazing explanations if you can sit through the detailed derivations.

The probability of the sun rising tomorrow is n+1/n+2 or as Wikipedia puts it:

” if one has observed the sun rising 10000 times previously, the probability it rises the next day is  10001/10002 = 0.99990002. Expressed as a percentage, this is approximately 99.990002%  chance.”

Pretty good odds it seems.

Essentially per Laplace, for any possible drawing of w winning tickets in n attempts, the expectation is the number of wins + 1, divided by the number of attempts + 2.

Said differently, if we have n experiments which each results in success (s) or failure (n -s), the probability that the next repetition will succeed is (s+1)/(n+2).

If I make 10 attempts at playing a musical piece and 8 of them succeed, per Laplace – my overall chance at this endeavor is 9/12 or 75% of the time. If I play it once and succeed, the probability is 2/3 (66.6%) which is intuitively more reliable than assuming I have a 100% chance of nailing this the next time.

Some fascinating quotes –

“Man, made for the temperature which he enjoys, and for the element which he breathes, would not be able, according to all appearance, to live upon the other planets. But ought there not to be an infinity of organization relative to the various constitutions of the globes of this universe? If the single difference of the elements and of the climates make so much variety in terrestrial productions, how much greater the difference ought to be among those of the various planets and of their satellites! The most active imagination can form no idea of it; but their existence is very probable.”

(Pg. 181)

“the transcendent results of calculus are, like all the abstractions of the understanding, general signs whose true meaning may be ascertained only by repassing by metaphysical analysis to the elementary ideas which have led to them; this often presents great difficulties, for the human mind tries still less to transport itself into the future than to retire within itself. The comparison of infinitely small differences with finite differences is able similarly to shed great light upon the metaphysics of infinitesimal calculus.”

(Pg. 44)

 “The day will come, when, by study pursued through several ages, the things now concealed with appear with evidence; and posterity will be astonished that truths so clear had escaped us”

Laplace quoting Seneca

Probability is relative, in part to this ignorance, in part to our knowledge. We know that of three or a greater number of events a single one ought to occur; but nothing induces us to believe that one of them will occur rather than the others-

Laplace, Concerning Probability

The Rule of Succession is essentially the world’s first simple algorithm for choosing problems of small data. It holds well when we have all known possible outcomes before observing the data. If we apply this in problems where the prior state of knowledge is not well known, the results may not be useful as the question being asked is then of a different nature based on different prior information.

ToneWood amp review

I rarely break out my fleet of electric guitars anymore so my usual go-to is my trusty old Cordoba Iberia that’s usually within reach . Having instruments lying around the house is a huge aspect of getting to practice more. The ToneWood amp caught my eye immediately as I’ve been looking for simple amplification while playing outdoors or in places with absolutely poor acoustics where a little echo/reverb or delay can go a long way in justifying the piece I’m trying to play and even serve to feed some creativity.

3 essential knobs

Its essentially a lightweight effects unit that can be mounted on the back of the acoustic guitar to give you amplification and a few effects. There are magnets as part of the install that hook the ToneWood on the back of your guitar and the effects are amplified from the body as the amp picks up sound from the pickup on the acoustic and sends it back via a “vibrating driver” so the sounds becomes augmented with the effects. It essentially blends the natural guitar sound with the effects and comes out the sound hole as a unified sonic experience. The patent explains the concept well and is pretty ingenious.

Magnetic attachment to the back

The natural sound of the unamplified guitar coming from the soundboard seamlessly blends with the effects radiating outward via the sound hole, creating a larger than life soundscape.  All that’s required is some type of pickup installed in the guitar to provide signal to the device. You can connect the ToneWood to an external Amp/PA via the 1/4″ output port and it is iDevice interface that is great if you are on the Mac ecosystem. It also has the 1/4″ standard guitar input and a 1/8″ TRRS I/O for iDevice. The processor takes in 3 AA batteries for an average of 8 hours.

The installation took me about 10 minutes. It required me to slacken the strings, place a X-brace unit inside the guitar pointing the magnets so that the ToneWood amp could attach itself to the outside back of the guitar using the suction provided. This took some adjusting and I’m not sure i’ve dialed in the optimal most optimal position but it’s close enough.Once you stick the batteries in, it’s showtime. The display screen and knobs are intuitive and the barrier to entry here is phenomenally low.

From an effects perspective , it’s really everything you need considering you are playing an acoustic guitar. All the effects come with Gain and Volume settings.

  • Hall Reverb with Decay, Pre-delay and Hi-cut settings. These settings are accessed by pressing on the knobs on the ToneWood
  • Room Reverb with Decay, Pre-delay and Hi-cut settings
  • Plate Reverb with Decay, Pre-delay and Hi-cut settings
  • Delay with Speed, Feedback and Reverb. ( Note: you are not going to sound like the The Edge on the Skrydstrup switching system anytime soon with this)
  • Tremolo with Rate, Depth and Delay
  • Leslie style tremolo with rate, depth and reverb
  • Auto-Wah with Sensitivity, Envelope , Reverb
  • Overdrive with Drive, Filter and Reverb
  • DSP Bypass to mute the processor
  • Notch Filters to Notch Low and Notch High to filter based on the frequency

There is also the ability to save effect settings based on the tweaks you make which seems useful though I’ve not really played around with it.

I’ve largely played around with the Hall and Room effects for my purpos. You can tweak this plenty but I’d like to make sure I’m not sounding “wall of sound Spector-mode” on my Cordoba for every track.

All in all, a great addition to enhance the acoustic and more than anything else, the convenience factor is amazing. It’s much more easy to optimize practice time now without switching guitars or hooking up effects racks to my Ibanez for a 10-minute session. If you want more control over ambience and soundscapes with minimal setup or complexity, this is it.

I recorded a quick demo with the Hall Reverb with Decay and Hi-cut set to default and no audio edits off the iPhone camera. The audio needs to be enhanced and it doesn’t fully do justice to the ToneWood sound. The jam is me noodling on S&G’s cover of Anji by Davey Graham. The nylon strings don’t lend to much slack in bending at all but point was to capture a small moment of a few hours testing this wonderful amp.

Note – I don’t have any affiliation with ToneWood.