Prisoners of Geography: Ten Maps That Explain Everything About the World by Tim Marshall is an enjoyable read that traces the worlds geography and its impact on today’s geopolitics. I picked this up on a whim as the description seemed to indicate an enjoyable refresher on the state of the world as it came to be in terms of geopolitics. The maps were pretty awful to research on my kindle paperwhite and I had to resort to getting the paperback to navigate the maps better.
A lot of the worlds problems today remain as gridlocked as they were at the origin despite decades of evolution and political talks ( think border and territory conflicts across the globe – Israel/Palestine, South China sea, South America and the list goes on). This book is not a comprehensive treatise on the evolution of those problems but a great overview. It does a good job in identifying the factors that drive the national interest and conflict in these areas and the impact of the countries dealing with the limitations/opportunities that their geography has bestowed them with.
I never did realize the importance of navigable and intersecting rivers or natural harbors that impact the destinies of countries. Here are some of my observations and notes from the book.
Russia – The books kicks off with the author highlighting the 100-year forward thinking of the Russians and the obsession with “warm water” ports with direct access to the ocean unlike the ports on the Arctic like Murmansk that freeze for several months. This limits the Russian fleet and its aspiration to be a bigger global power. While the oil and gas and being the 2nd biggest supplier of natural gas in the world brings its own geographical advantages and prop the country up, its aspiration remain for fast maneuvering to move out of areas like the Black Sea or even the Baltic Sea to counter a feared NATO strike. The author describes moves like the annexation of Crimea to be moves to construct more naval ports to boost its fleets. Countries like Moldova and Georgia ( and propensity to the west) have a huge bearing on foreign policy and military planning. Interestingly , ‘Bear ‘is a Russian word, but per the author the Russians are also wary of calling this animal by its name, fearful of conjuring up its darker side. They call it medved, “the one who likes honey.”
It doesn’t matter if the ideology of those in control is czarist, Communist, or crony capitalist—the ports still freeze, and the North European Plain is still flat.
China – The Chinese civilization, over 4000 years old that originated around the Yangtze river is today comprised of 90% Han people united by ethnicity and politics. This sense of identity pervades all aspect of modern Chinese life and powers its ascent as a global power. The massive Chinese border touches Mongolia in the North, Russia,Vietnam, Laos in the East and India, Pakistan, Afghanistan and Tajikistan in the West with various levels of protracted conflicts/disagreements. For example – The India/China border is perceived by China as the Tibetan-Indian border and integral to protect the Tibetan plateau which could open a route for an Indian military push into the Chinese heartland never mind the low probability of that ever happening. The book provides a brief overview of the origin of the Tibet occupation and the worlds attention to it. The author says that if the population were to be give a free vote, the unity of the Han would crack and weaken the hold of the communist party. The need for China to extend its borders and grab land it perceives as its own also extend to the seas. The growing naval fleet and its need to assert supremacy in the south china sea also fuels conflict with Japan and its neighbors. Scouring the length and breadth of Africa for minerals and precious metals in return for cheap capital and modern form of debt slavery is another strategy to dominate the world. This part of the book did not offer any new insight however a society that holds unity and economic progress as the highest priorities is definitely admirable considering the “developing” status that it once had.
Unites States – The geographical position of invulnerability, fertile land, navigable river systems and the unification of the states ensures prosperity and greatness for the U.S. The author goes into the evolution of the states as they came together after the revolutionary war such as the Louisiana purchase, the ceding of Florida by the Spanish, the Mexican war to acquire Mexico and the purchase of Alaska. Post world war 2 and the Marshall Plan, the formation of NATO then assured the US of being the greatest firepower across the world.The author deems the Russian threat largely seen off and insists China is the rising power that the US is concerned about (as the current geopolitical climate in 2021 validates). The domination of the sea-lanes will occupy the attention with numerous potential flashpoints. Self-sufficient in energy will continue to America’s position as the preeminent economic power. Overall, this section was well summarized with the progress of American domination despite hiccups over the centuries like the great depression. I still think the author painted a rosier picture than the current situation suggests. Post-pandemic, it remains to be seen if these assertions still hold with all the internal struggles faced in the American society with respect to race relations, inclusivity and attention to a wide variety of social issues.
The California gold rush of 1848–49 helped, but the immigrants were heading west anyway; after all, there was a continental empire to build, and as it developed, more immigrants followed. The Homestead Act of 1862 awarded 160 acres of federally owned land to anyone who farmed it for five years and paid a small fee. If you were a poor man from Germany, Scandinavia, or Italy, why go to Latin America and be a serf, when you could go to the United States and be a free land-owning man?
Western Europe – Again, the geographical blessings in this case ensured an agreeable climate mostly to cultivate the right crops at large scale, the right minerals to power the industrial revolution and abundant natural harbors. This led to industrial scale wars as well as Europe remains an amalgam of linguistic and culturally disparate countries yet remains an industrial power. The contrast between northern and southern Europe in terms of prosperity is attributed to industrialization, the domination of Catholicism, and the availability of coastal plains. Spain, Greece, U.K, Germany, Poland, Denmark, Sweden and the contrasts in their economical status are discussed and attributed to geographical limitations. I was hoping the author provides more than a passing nod to the concerns of immigration and prejudice. Prejudice against immigration and the rise of nationalism remains on the rise across the world and its troubling to see this rise of hate groups, holocaust deniers and all other abhorrent tribes that debase basic human ideals of equality, peace and harmony. The demographic change with the inverted pyramid of older people at the top with fewer people paying taxes to support them in the future needs to be reversed and the benefits of legal immigration need to be given greater attention rather than burying them under misdirected xenophobic fears.
Africa – This was an enlightening section on the lack of utility of African rivers for transportation due to waterfalls and natural obstacles. Africa developed in isolation from the Eurasian landmass and the author asserts that the lack of idea exchange played a huge part in its under development. Sub-saharan exposure to virulent diseases, crowded living conditions and poor health-care infrastructure has also impeded growth. The great rivers of Africa—the Niger, the Congo, the Zambezi, the Nile, and others—don’t connect to its own detriment. The 56~ countries have relatively unchanged borders over the years along with the legacy of colonialism which like most parts of the world divided societies on the basis of ethnicities. The rise of radical Islamist groups has been attributed to the sense of underdevelopment and overall malcontent. On a more positive note, every year roads and railroads are fueling infrastructure boom and greater connectivity with rising education and healthcare.
“You could fit the United States, Greenland, India, China, Spain, France, Germany, and the UK into Africa and still have room for most of Eastern Europe. We know Africa is a massive landmass, but the maps rarely tell us how massive.”
Middle East – Another witness to the ancient civilization that rose from the fertile plains of Mesopotamia. The largest continuous sand desert that British and French colonists carved up as part of the Sykes-Picot carving reflects some of the unrest and extremism today. Its interesting that prior to Sykes-Picot, there was no Syria, Lebanon, Jordan, Iraq, Saudi Arabia, Kuwait, Israel, or Palestine. These are all modern entities with a short history unified by versions of the same religion. Conflict and chaos have ruled supreme in some of the countries ( Iraq, Lebanon etc) while prosperity from the oil fields have propelled some to the world stage (UAE). Lot of detail on Iran-Iraq history, Palestine, the failed promise of the Arab spring, Turkey and others. The complexity of the demographics and religious idealism compounds an already volatile region.
Sykes-Picot is breaking; putting it back together, even in a different shape, will be a long and bloody affair.
India & Pakistan– A population of 1.4 Billion pitted against another of 182 million with impoverishment, volatility and mistrust at both ends. Post-pandemic, this section is dated as the imminent Indian emerging economic power described by the author is no longer a reality at least in the near term. Had to skip over this section as there wasn’t much I didn’t already know.
Korea and Japan– Tension between the Koreas is well known to the world and the author describes the origins of the Hermit kingdom and the lack of strategy from the USA in dealing with the problem. The 38th parallel was yet another hasty line of division and an uninformed repetition of the line drawn in the aftermath of the Ruso-Japanese war of 1904. I have fond memories of visiting Seoul years ago and it was interesting on how the concept of unification was welcomes by some of the South Koreans I had the opportunity to interact with (peering through the binoculars in the DMZ to the North Korean side was a thrilling experience and emphasized the proximity of the two sides). Not sure if that is the general sentiment but there is enough justification there considering the Northern nuclear power in control of a dictator. The Japanese post-war stance is described in detail and the author contends the increasing Japanese defense budget displays the intent of resolve against Chinese threats.
Latin America– Limitation of the Latin America originates from the historical inequality, the reluctance of the original settles to move away from the coats and the lack of subsequent infrastructure in the interiors. Geographical limitations plague Mexico, Brazil, Chile, Argentina despite natural resources. The civil wars of the 19th century broke apart independent countries with border disputes that persist, naval arms races between countries like Brazil, Argentina and Chile held back development of all three and drug cartels have devastated societies. The Panama canal’s newer rival – the Nicaragua Grand Canal that has a huge Chinese investment across the continent seem questionable in terms of value to Latin America.
The Arctic– The effects of global warming are alarmingly showing in the Arctic coinciding with the discovery of energy deposits. The complex land ownership includes land in parts of Canada, Finland, Greenland, Iceland, Norway, Russia, Sweden, and the United States (Alaska). The melting ice has far flung ramifications globally in terms of projected flooding effects in countries far away as Maldives. The melting ice has also opened up new transportation corridor that hugs the Siberian coastline and more access to energy reserves much to the interest of multiple countries that are now jostling for superiority including Russia building an Arctic army.
The word arctic comes from the Greek arktikos, which means “near the bear,” and is a reference to the Ursa Major constellation, whose last two stars point toward the North Star.
A lot of rich detail in the book on various nuances of the geographies and this was an enjoyable read however it did make me pessimistic as status quo or deterioration of the situation in a lot of these geographies has been the norm. As the 21st century progresses, there is not much indication that change is afoot unless a planet threatening situation like climate change becomes a forcing function to minimize petty squabbles to focus on larger resolutions. Being Idealistic or moralistic will not jive well with the ideas in this book and the way forward is to think of creative and new ideas to resolve a lot of these global problems. Great ideas and great leaders need to arise to challenge these realities and put humanity first.
As the author ruefully writes:
” A human being first burst through the top layer of the stratosphere in 1961 when twenty-seven-year-old Soviet cosmonaut Yuri Gagarin made it into space aboard Vostok 1. It is a sad reflection on humanity that the name of a fellow Russian named Kalashnikov is far better known.”
A couple of other reads recommended to me are Peter Zeihan’s “The Accidental Superpower” and Robert D Kaplan’s “The Revenge of Geography”. I look forward to reading them as well.
Hugging Face is the go-to resource open source natural language processing these days. The Hugging Face hubs are an amazing collection of models, datasets and metrics to get NLP workflows going. Its relatively easy to incorporate this into a mlflow paradigm if using mlflow for your model management lifecycle. mlflow makes it trivial to track model lifecycle, including experimentation, reproducibility, and deployment. mlflow’s open format makes it my go-to framework for tracking models in an array of personal projects and It also has an impressive enterprise implementation that my teams at work enable for large enterprise use cases. For smaller projects, its great to use mlflow locally for any projects that requires model management as this example illustrates.
The beauty of Hugging Face (HF) is the ability to use their pipelines to to use models for inference. The models are products of massive training workflows performed by big tech and available to ordinary users who can use them for inference. The HF pipelines offer a simple API dedicated to performing inference in these models thus sparing the ordinary the user the complexity and compute / storage requirements for running such large models.
The goal was to put some sort of tracking around all my experiments with the Hugging Face Summarizer that I’ve been using to summarize text and then use the mlflow Serving via REST as well as running predictions on the inferred model by passing in a text file. Code repository is here with snippets below.
Running the Text Summarizer and calling it via curl
Text summarization consists of Extractive and Abstractive types where Extractive selects sentence that has the most valuable context while Abstractive is trained to create summaries.
Considering I was running on a CPU, I picked a small model like the T5-small model trained on Wikihow All data set that has been trained to write summaries. The boiler plate code on the HuggingFace website gives you all you need to get started. Note that this models input length is set to 512 tokens max which may not be optimum for usecases with larger text.
a) First step is to define a wrapper around the model code so it can be called easily later on by subclassing it with the mlflow.pyfunc.PythonModel to use custom logic and artifacts.
class Summarizer(mlflow.pyfunc.PythonModel):
'''
Any MLflow Python model is expected to be loadable as a python_function model.
'''
def __init__(self):
from transformers import pipeline, AutoTokenizer, AutoModelWithLMHead
self.tokenizer = AutoTokenizer.from_pretrained(
"deep-learning-analytics/wikihow-t5-small")
self.summarize = AutoModelWithLMHead.from_pretrained(
"deep-learning-analytics/wikihow-t5-small")
def summarize_article(self, row):
tokenized_text = self.tokenizer.encode(row[0], return_tensors="pt")
# T5-small model trained on Wikihow All data set.
# model was trained for 3 epochs using a batch size of 16 and learning rate of 3e-4.
# Max_input_lngth is set as 512 and max_output_length is 150.
s = self.summarize.generate(
tokenized_text,
max_length=150,
num_beams=2,
repetition_penalty=2.5,
length_penalty=1.0,
early_stopping=True)
s = self.tokenizer.decode(s[0], skip_special_tokens=True)
return [s]
def predict(self, context, model_input):
model_input[['name']] = model_input.apply(
self.summarize_article)
return model_input
b) We define the tokenizer to prepare the inputs of the model and the model using the HuggingFace specifications. This is a smaller model trained on Wikihow All data set. From the documentation – the model was trained for 3 epochs using a batch size of 16 and learning rate of 3e-4. Max_input_length is set as 512 and max_output_length is 150.
c) Then define the model specifications of the T5-small model by calling the summarize_article function with the tokenized text that will called it for every row in the dataframe input and eventually return the prediction.
d) The prediction function calls the summarize_article providing the model input and calling the summarizer and returns the prediction. This is also where we can plug in mlflow to infer the predictions.
The input and output schema are defined in the ModelSignature class as follows :
e) We can set mlflow operations by setting the tracking URI which was “” in this case since its running locally. Its trivial in a platform like Azure to spin up a databricks workspace and get a tracking server spun up automatically so you can persist all artifacts at cloud scale.
Start tracking the runs by wrapping the mlflow.start_run invocation. The key here is to call the model for inference using the mlflow.pyfunc function to make the python code load into mlflow. In this case , the dependencies of the model are all stored directly with the model. Plenty of parameters here that can be tweaked described here.
f) Check the runs via mlflow UI either using the “mlflow ui” command or just invoke the commandmlflow models serve -m runs:/<run_id>
g) Thats it – Call the curl command using sample text below:
curl -X POST -H "Content-Type:application/json; format=pandas-split" --data '{"columns":["text"],"data":[["Howard Phillips Lovecraft August 20, 1890 – March 15, 1937) was an American writer of weird and horror fiction, who is known for his creation of what became the Cthulhu Mythos.Born in Providence, Rhode Island, Lovecraft spent most of his life in New England. He was born into affluence, but his familys wealth dissipated soon after the death of his grandfather. In 1913, he wrote a critical letter to a pulp magazine that ultimately led to his involvement in pulp fiction.H.P.Lovecraft wrote his best books in Masachusettes."]]}' http://127.0.0.1:5000/invocations
Output:
"name": "Know that Howard Phillips Lovecraft (H.P.Lovecraft was born in New England."}]%
Running the Text Summarizer and calling it via a text file
For larger text, its more convenient reading the text from a file, formatting it and running the summarizer on it. The predict_text.py does exactly that.
a) Clean up the text in article.txt and load the text into a dictionary.
b) Load the model using pyfunc.load_model and then run the model.predict on the dictionary.
# Load model as a PyFuncModel.
loaded_model = mlflow.pyfunc.load_model(logged_model)
# Predict on a Pandas DataFrame.
summary = loaded_model.predict(pd.DataFrame(dict1, index=[0]))
print(summary['name'][0])
One of my favorite features in Spotify are the recommendations. The app’s recommendations includes the Discover Weekly, Daily Mix, Release Radar and the Artist Radio features. I could go through hours of recommendations substituting white noise while working on projects and usually encounter a song or an artist that appeals to my guitar/keyboard driven sensibilities in a session. While Discover Weekly, Daily Mix yield gems once in a while, the song specific ones usually based on Artist / Song radio yield a lot more matches to my sensibilities.
The recommendations endpoints that generates reccs based on a seed is a favorite. I’ve usually had a good match rate with songs that “stick” based on the API. There are plenty of other endpoints (artists, songs etc) that could be easily plugged in to generate relevant predictions.
The API documentation of Spotify has always been stellar and its usability is enhanced by being able to test all the API calls easily within their developer console.
This API also has a bunch of parameters that can be configured for fine-tuning the recommendation: key, genre, loudness, energy, instrumentalness, popularity, speechiness, danceability etc.
Per the official docs – “Recommendations are generated based on the available information for a given seed entity and matched against similar artists and tracks. If there is sufficient information about the provided seeds, a list of tracks will be returned together with pool size details. For artists and tracks that are very new or obscure there might not be enough data to generate a list of tracks.”
One of the key things here is to generate seeds for the recommendations, this can be done by using endpoints like Get a User’s Top Artists and Tracks to obtain artists and tracks based on my listening history and use these artists and tracks as seeds for the Get Recommendations Based on Seeds endpoint. This endpoint will only return tracks. The Web API Authorization Guide is a must to read before querying these endpoints and the developer console makes it super easy to try out different endpoints.
I wanted a quick way to query the recommendations API for new recommendations and the combination of the streamlit + Spotify API was quick simple solve to get that working. At a high level I wanted to be able to query a song or artist and generate recommendations based on it. A secondary need is also to collect data for a reccomender I am training to customize ML-driven reccomendations but more on that in a different post.
A lot of the code is boilerplate and pretty self explanatory but at a high level it consists of the class to interact with the Spotify API (spotify_api.py) , a UI wrapper using Streamlit to render the app (spotify_explorer.py). Given a client id and client secret, spotify_api.py gets client credentials from the Spotify API to invoke the search. Sample code inline with comments. The code can obviously be much more modular and pythonic but for investing a quick hour of hacking, this got the job done.
class SpotifyAPI(object):
access_token = None
access_token_expires = datetime.datetime.now()
access_token_did_expire = True
client_id = None
client_secret = None
token_url = 'https://accounts.spotify.com/api/token'
def __init__(self, client_id, client_secret, *args, **kwargs):
self.client_id = client_id
self.client_secret = client_secret
# Given a client id and client secret, gets client credentials from the Spotify API.
def get_client_credentials(self):
''' Returns a base64 encoded string '''
client_id = self.client_id
client_secret = self.client_secret
if client_secret == None or client_id == None:
raise Exception("check client IDs")
client_creds = f"{client_id}:{client_secret}"
client_creds_b64 = base64.b64encode(client_creds.encode())
return client_creds_b64.decode()
def get_token_header(self): # Get header
client_creds_b64 = self.get_client_credentials()
return {"Authorization": f"Basic {client_creds_b64}"}
def get_token_data(self): # Get token
return {
"grant_type": "client_credentials"
}
def perform_auth(self): # perform auth only if access token has expired
token_url = self.token_url
token_data = self.get_token_data()
token_headers = self.get_token_header()
r = requests.post(token_url, data=token_data, headers=token_headers)
if r.status_code not in range(200, 299):
print("Could not authenticate client")
data = r.json()
now = datetime.datetime.now()
access_token = data["access_token"]
expires_in = data['expires_in']
expires = now + datetime.timedelta(seconds=expires_in)
self.access_token = access_token
self.access_token_expires = expires
self.access_token_did_expire = expires < now
return True
def get_access_token(self):
token = self.access_token
expires = self.access_token_expires
now = datetime.datetime.now()
if expires < now:
self.perform_auth()
return self.get_access_token()
elif token == None:
self.perform_auth()
return self.get_access_token()
return token
# search for an artist/track based on a search type provided
def search(self, query, search_type="artist"):
access_token = self.get_access_token()
headers = {"Content-Type": "application/json",
"Authorization": f"Bearer { access_token}"}
# using the search API at https://developer.spotify.com/documentation/web-api/reference/search/search/
search_url = "https://api.spotify.com/v1/search?"
data = {"q": query, "type": search_type.lower()}
from urllib.parse import urlencode
search_url_formatted = urlencode(data)
search_r = requests.get(
search_url+search_url_formatted, headers=headers)
if search_r.status_code not in range(200, 299):
print("Encountered isse=ue")
return search_r.json()
return search_r.json()
def get_meta(self, query, search_type="track"): # meta data of a track
resp = self.search(query, search_type)
all = []
for i in range(len(resp['tracks']['items'])):
track_name = resp['tracks']['items'][i]['name']
track_id = resp['tracks']['items'][i]['id']
artist_name = resp['tracks']['items'][i]['artists'][0]['name']
artist_id = resp['tracks']['items'][i]['artists'][0]['id']
album_name = resp['tracks']['items'][i]['album']['name']
images = resp['tracks']['items'][i]['album']['images'][0]['url']
raw = [track_name, track_id, artist_name, artist_id, images]
all.append(raw)
return all
The get_recommended_songs function is the core of the app querying the API for results based on the query passed in. The more the parameters the better the results. Customizing the call to any API call is fairly trivial.
def get_reccomended_songs(self, limit=5, seed_artists='', seed_tracks='', market="US",
seed_genres="rock", target_danceability=0.1): # reccomendations API
access_token = self.get_access_token()
endpoint_url = "https://api.spotify.com/v1/recommendations?"
all_recs = []
self.limit = limit
self.seed_artists = seed_artists
self.seed_tracks = seed_tracks
self.market = market
self.seed_genres = seed_genres
self.target_danceability = target_danceability
# API query plus some additions
query = f'{endpoint_url}limit={limit}&market={market}&seed_genres={seed_genres}&target_danceability={target_danceability}'
query += f'&seed_artists={seed_artists}'
query += f'&seed_tracks={seed_tracks}'
response = requests.get(query, headers={
"Content-type": "application/json", "Authorization": f"Bearer {access_token}"})
json_response = response.json()
# print(json_response)
if response:
print("Reccomended songs")
for i, j in enumerate(json_response['tracks']):
track_name = j['name']
artist_name = j['artists'][0]['name']
link = j['artists'][0]['external_urls']['spotify']
print(f"{i+1}) \"{j['name']}\" by {j['artists'][0]['name']}")
reccs = [track_name, artist_name, link]
all_recs.append(reccs)
return all_recs
Wrapping both the calls in a Streamlist app is refreshingly simple and dockerizing and pushing to Azure container registry was trivial.
Part 2 to follow at some point as I continue building out a custom recommender that compares the current personalizer with a custom personalizer that takes in Audio features and more personalized inputs and tuneable parameters.
I usually set reading goals at the beginning of the year to counter the array of distractions that have dented my reading habit over the last few decades. goodreads is great for maintaining some accountability and I’m in awe of some of my friends who seem to knock out 80-100 books easily every year despite their hectic work lives. I tried to veer myself away from my usual mix of management/leadership books this year to reignite fiction reading. The results were mixed thanks to massive mood swings through the year and I ended up with a random mix of tech, biographies, music, non-fiction and horror/sci-fi tomes that helped distract me from a depressing year.
Some notable ( and not so notable) reads this year:
Greetings from Bury Park by Sarfraz Manzoor: A Springsteen tribute along with an immigrant experience? Sign me up! After enjoying the Blinded by the night with my better half, I had to get my hands on Sarfraz Manzoor’s ode to the boss amidst the backdrop of immigrant life in 80s Luton. The book is set in a “non-linear” timeline mode which may put off some readers but I found it a wonderful read albeit with some cliched moments probably dramatized. It also took me to pilgrimage of the boss’ older catalog and some of his late 80s/early 90s work that I love.(Think Tunnel of Love/ Human Touch era). As a huge music fan and an immigrant who traveled halfway around the world to chase dreams built on the foundational goals of exposing myself to new cultures/thinking and seeing my favorite musicians in live arenas in the flesh,the book resonated with me on the universality of music across cultures.
The Billionaire Raj: A Journey Through India’s New Gilded Age: India is a land of skewed levels of haves and have nots at an unprecedented scale. You don’t understand it till you see it and even when you see it, your understanding is peripheral at best when confronted with the magnitude of the problem and the inequality at scale. James Crabtree’s detailed tome on the lifestyles of the rich and famous in India helps decrypt the inequalities and the “crony capitalism” that ensures the system stays that way. The nexus between politics, Bollywood, business tycoons are all deciphered out and connected together to explain the irony of situations like a ~2 Billion dollar personal home in Mumbai towering over a squalor of a million people in a nearby slum.
Tesseract by Alex Garland : I’m a big fan of Alex Garland’s works – 28 Days Later, The Beach, Sunshine and Ex-machina. On top of it all, he slam dunked the best version of Dredd on us innocent fans and immortalized Keith Urban in that role. I had huge hopes for Tesseract but it ended up being a random story of disparate characters linked together by the thinnest of chances and a shallow plot. Not his finest hour and I could not wait to finish this as it felt as slow as drug-induced slow-motion sequences in Dredd which were way more enjoyable. Still a huge AG fan regardless.
Devolution by Max Brooks: Anyone who has read World War Z knows the doom-laden nightmarish scenarios that the author can generate and this one is no exception. Obnoxious characters dealing with first-world problems in their isolated eco-friendly community encounter an even more ominous situation with Mt.Rainier erupting and get blockaded. Hell breaks loose in the form of rampaging sasquatches who ( thankfully in some cases) start taking out the characters one by one. A personal journal left behind serves as the narrative and tons of interesting sasquatch legends abound including this one of Roosevelt’s own” encounter” with the sasquatch.
The Long Walk by Stephen King: Big fan of the king. The plot and premise was great here but it did slow down towards the end. On hindsight, this book is probably best enjoyed via an audio book while on the treadmill. Overall an OK read.
Ibn Fadlan and the Land of Darkness: Arab Travellers in the Far North: Travel log of Ibn Fadlan , a tenth-century diplomat who, in 922 AD, was sent on a mission from Baghdad to the far north by the caliph Muqtadir. His journal serves as an important account of life in mordern-day Russia/Middle east and the areas in between. Repetitive and slow moving in places , the book abounds with interesting details on the trading routes, strange customs, vikings raids, savage rituals, food habits, wealth management, religion and a disconnected world at a strange dark time.
Bloodcurdling Tales of Horror and the Macabre: The Best of H. P. Lovecraft by H.P. Lovecraft : Lovecraft (despite his controversial views) is the master of horror fiction and I’ve always enjoyed the predictable nature of a classic lovecraft pulp. More so for the constant reminders that much of what we know is unknown and we are all inconsequent specks of dust in the vastness of time ( To somewhat paraphrase Carl Sagan, if the entire history of earth was compressed into 365 days, humans would have existed for ~30 seconds.) Usually a Halloween ritual, this year was no different with Lovecraft to distract me from the horrors raging outside. This collection has all the usual ones including Call of Cthulhu, Dunwich Horror and the Shadow over Innsmouth.
Idea Makers: Personal Perspectives on the Lives & Ideas of Some Notable People by Stephen Wolfram : Short biographies of giants in the filed of mathematics/computer science. Despite the authors inclination to insert the use of Mathematica tool into a “what-if” scenario into every biography, this was still an enjoyable read. The author is a giant in his field and I suppose some level of braggadocio is expected. The fascinating backstories into characters like Leibnz, Babbage, Feynman and Ramanujan is written powerfully and by someone with the grasp of minutiae of their research areas which is awe-inspiring. My highlights here.
Tech
The tech books followed a predictable pattern of excellent reads this year helping me keep abreast of work-related topics mostly focused on Spark, Databases and ML.
Database Internals: A deep-dive into how distributed data systems work by Alex Petrov: Informative but would have preferred more examples with practical scenarios. No code and this all mostly conceptual. Some good references to papers for subsequent reading. The first part of the book deals primarily with storage and covers an in-depth discussion of b-trees and types. The second half is focused on distributed systems and has useful sections on consensus protocols. Concepts like “2-phase commits” are explained well with figures. However, the lack of practical examples/code and overall dry subject matter made this a laborious read. Good book to reference theoretical concepts.
Practical Deep Learning for Cloud, Mobile, and Edge: Real-World AI & Computer-Vision Projects Using Python, Keras & Tensorflow: Plenty of examples and links for more research. The material is too vast enough to make an all encompassing book but this delivers in terms of practical tips. Lots of practical tips provided that will find a place in any serious ML engineer repertoire. The consolidated list of tips are worth the book alone. Excellent comparisons of Raspberry Pi, Jetson Nano, and Google Coral. The reinforcement learning sections could have used some more practical examples in areas like q-learning but overall great read and reference material.
Music
The books this year included Van Halen ( Eddie R.I.P), Megadeth, Glyn Johns amongst others.
Hard to Handle: The Life and Death of the Black Crowes–A Memoir by Steve Gorman: Great read start to finish. The Black Crowes were one of the many soundtracks of my teenage years in the 90s. The first three albums are seminal works that stand out despite having to contend with changing musical climate with the rise of grunge and the decline of hair metal. The book is a page-turner for anyone even vaguely familiar with the Black Crowes. It made me go back and re-immerse into the catalog especially with albums like By your side which is an underrated gem produced by the great Kevin Shirley and their successful but short-lived collaboration with Jimmy Page – Live at the Greek. Gorman’s insight as a founding member and the frank admission of all the dysfunction makes up for a great story. Im a huge fan of Rich Robinson’s use of open G tuning and this book has led to more inspired practicing in that vein.
Overall 35 books for the year which I can hopefully better in 2021. Current reading list here.
Having been a Keras user since I read the seminal Deep Learning with Python , I’ve been experimenting with exporting formats to different frameworks to be more framework-agnostic.
ONNX ( Open Neural Network Exchange) is an open format for representing traditional and deep learning ML models. Key goal being promoting inter-operability between a variety of frameworks and target environments. ONNX helps you to export a fully trained model into its format and enables targeting diverse environments without you doing manual optimization and painful rewrites of the models to accommodate environments. It defines an extensible computation graph model along with built-in operators and standard data types to allow for a compact and cross-platform representation for serialization. A typical use case could be scenarios where you want to use transfer learning to use model weights of another model possibly built in another framework into your own model i.e. if you build a model in Tensorflow, you get a protobuf (PB) file as output and it would be great if there is one universal format that you can now convert to the PT format to load and reuse in Pytorch or use its own hardware agnostic runtime.
For high-performance inference requirements in varied frameworks, this is great with platforms like NVIDIA’s TensorRT supporting ONNX with optimizations aimed at the accelerator present on their devices like the Tesla GPUs or the Jetson embedded devices.
Format
The ONNX file is a protobuf encoded tensor graph. List of operators supported are documented here and operations are referred to as “opsets” i.e. operation sets. Opsets are defined for different runtimes in order to enable interoperability. The operations are a growing list of widely used linear operations, functions and other primitives used to deal with tensors.
The operations include most of the typical deep learning primitives, linear operations, convolutions and activation functions. The model is mapped to the ONNX format by executing the model with often just random input data and tracing the execution. The operations executed are mapped to ONNX operations and so the entire model graph is mapped into the ONNX format. After this the ONNX model is then saved as .onnx protobuf file which can be read and executed by a wide and growing range of ONNX runtimes.
Note – Opsets are fast evolving and with fast release cycles of competing frameworks, it may not always be easy to upgrade to the latest ONNX version if it breaks compatibility with other frameworks. The file format consists of the following:
Model: Top level construct
Associates version Info and Metadata with a graph
Graph: describes a function
Set of metadata fields
List of model parameters
List of computation nodes – Each node has zero or more inputs and one or more outputs.
Nodes: used for computation
Name of node
Name of an operator that it invokes a list of named inputs
The ONNX model can be inferenced with ONNX runtime that uses a variety of hardware accelerators for optimal performance. The promise of ONNX runtime is that it abstracts the underlying hardware to enable developers to use a single set of APIs for multiple deployment targets. Note – the ONNX runtime is a separate project and aims to perform inference for any prediction function converted to the ONNX format.
This has advantages over dockerized pickle models that is usually the approach in a lot of production deployments where there are runtime restrictions (i.e. can run only in .NET or JVM) , memory and storage overhead, version dependencies, and batch prediction requirements.
ONNX runtime has been integrated in WINML, Azure ML with MSFT as its primary backer. Some of the new enhancements include INT8 quantization to reduce floating point numbers for reducing model size, memory footprint and to increase efficiencies benchmarked here.
The usual path to proceed :
Train models with frameworks
Convert into ONNX with ONNX converters
Use onnx-runtime to verify correctness and Inspect network structure using netron (https://netron.app/)
Use hardware-accelerated inference with ONNX runtime ( CPU/GPU/ASIC/FPGAs)
Tensorflow
To convert Tensorflow models, the easiest way is to use the tf2onnx tool from the command line. This converts the saved model to a model representation that includes the inference graph.
Here is an end-to-end example of saving a simple Tensorflow model , converting it to ONNX and then running the predictions using the ONNX model and verifying the predictions match.
Challenges
However, some things to consider while using this format is the lack of “official support” from frameworks like Tensorflow. For example, Pytorch does provide the functionality to exports models into ONNX (torch.ONNX ) however I could not find any function to import an ONNX model to out put a Pytorch model. Considering CAFFE 2 that is a part of PyTorch fully supports ONNX import/export, it may not be totally unreasonable to expect an official conversion importer(there is a proposal already documented here).
The Tensorflow converters seem to be part of the ONNX project i.e. not an official/out of the box Tensorflow implementation. List of Tensorflow Ops supported are documented here. The github repo is a treasure trove of information on the computation graph model and the operators/data types that power the format. However, as indicated earlier depending on the complexity of the model (especially in transfer learning scenarios), it’s likely to encounter conversion issues during function calls that may cause the ONNX converter to fail. In this case, there are likely scenarios which may necessitate modifying the graph in order to fit the format. I’ve had a few issues running into StatefulPartitionednCalls especially in using TransferLearning situations for larger encoders in language models.
I have also had to convert Tensorflow to PyTorch by first converting Tensorflow to ONNX. Then the ONNX models to Keras using onnx2keras and then convert to Pytorch using MMdnwith mixed results and a lot of debugging and many abandons. However, I think ONNX runtime for inference rather than framework-to-framework conversions will be a better use of ONNX.
The overall viability of a universal format like ONNX though well intentioned and highly sought may not fully ever come into fruition with so many divergent interests amongst the major contributors and priorities though its need cannot be disputed.
Replay is a collaboration track and part of an evolving experiment with multi-tracked guitars revolving around cyclic patterns. More collaborations and sounds to follow.
I’ve used a lot of audio engineering terms over the years and realized that a lot of them were not exactly what I was referring to/meant. While talking to a lot of experienced audio engineers, I’ve always found the below glossary useful to convey my objectives effectively. Hopefully this serves as starter boilerplate for more research with more terms to be added on. A lot of these and more are covered in Coursera’s excellent course on the Technology of Music Production.
Amplitude: Size of the vibration of sound. Larger sizes (louder sound) indicate louder amplitude. Measured in decibels. Multiple places in the signal flow where we measure amplitude.
In the air: dBSPL or decibels of sound pressure level
In the digital domain: dBFS or decibels full scale
Compression: Compression is one of the most commonly used type of dynamic processing. It is used to control uneven dynamics in individual tracks in a multi track mix and also to be used in creative ways like decays of notes and for fatter sounds. Compressors provide gain reduction which is measured by metrics like Ratio control.
For example a ratio like 4:1 , means audio that goes above 4 dB above Threshold will be reduced to it only goes 1 dB above only.
Decibel: The words bel and decibel are units of measurement of sound intensity. Bel” is a shortening of the name of inventor Alexander Graham Bell (1847-1922).
A bel is equivalent to ten decibels and used to compare two levels of power in an electrical circuit.
The normal speaking range of the human voice is about 20-50 decibels.
Noise becomes painful at 120 db. Sounds above 132 db lead to permanent hearing damage and eardrum rupture.
Frequency: Speed of the vibration which determines the pitch of the sound. Measured as the number of wave cycles that occur in one second.
Propagation: Sequence of waves of pressure (sound) moving through a medium such as water, solids or air.
Timbre: Term used to indicate distinguished characteristics of a sound. For example a falsetto versus a vibrato.
Transducer: Another term for a microphone. Converts one energy type to another. A microphone converts sound pressure variations in the air into voltage variations in a wire.
Digital Audio Workstation (DAW)
Bit Rate: Product of sampling rate and sampling depth and measured as bits per second. Higher bit rates indicates more quality. Compressed audio formats (mp3) have lower bit rates than uncompressed (wave).
Buffer Size: Amount of time allocated to the DAW for processing audio.Used to balance the delay between the audio input ( say a guitar plugged in ) to the sound playback and to minimize any delay. It usually works best to set the buffer size to a lower amount to reduce the amount of latency for more accurate monitoring. However, this puts more load on the computer’s processing power and could cause crashes or interruptions.
Sampling Rate:Rate at which samples of an analog signal are taken to be converted to a digital form Expressed in samples per second (hertz). Higher sampling rates indicate better sound as they indicate higher samples per second. An analogy could be FPS i.e Frames per second in video. Some of the values we comes across are 8kHz, 44.1kHz, and 48kHz. 44.1 kHz are most common sampling rates for audio CDs.
Sampling Depth: Measured in bits per sample indicates the number of data points of audio. An 8-bit sample depth indicates a 2^8 = 256 distinct amplitudes for each audio sample. Higher the sample depth, better the quality. This is analogous to image processing where higher number of bits indicate higher quality.
Sine wave: Curve representing periodic oscillations of constant amplitude. Considered the most fundamental of sound. A sine wave can be easily recognized by the ear. Since sine waves consist of a single frequency, it’s used to depict/test audio.
In 1822, French mathematician Joseph Fourier discovered that sinusoidal waves can be used as simple building blocks to describe and approximate any periodic waveform, including square waves. Fourier used it as an analytical tool in the study of waves and heat flow. It is frequently used in signal processing and the statistical analysis of time series.
Wave: Uncompressed at chosen bit rate and sampling speed. Takes up memory and space.
AIFF: Audio Interchange File Format (AIFF): Uncompressed file format (originally from Apple). High level of device compatibility and used in situations for mastering files for audio captured live digitally.
MP3: Compressed Audio layer of the larger MPEG video file format.Smaller sizes and poorer quality that the formats above. Compresses data using a 128 kbit/s setting that results in a file about 1/11th of the size of the data.
MIDI: Musical Instrument Digital Interface – commonly defined as a set of instructions instructing the computers sound card on creating music.Small in size and control notes of each instrument, loudness, scale, pitch etc.
Tracks, Files and Editing
Cycling: Usually refers to musical cycles formed by a group of cycles.Useful for arrangements and re-arrangements
Comping: Process where you use the best parts of multiple takes and piece them together for one take.DAWS such as ProTools allow multiple takes that are stocked in a folder in a single track.
Destructive editing: Editing in which changes are permanently written to the audio file. Though these can usually be undone based on the DAW undo history in reverse order. Helps when you have less processing power and need to see changes applied immediately and in case where you know you don’t want to repeat that change again. Non-destructive editing uses computer processing power to make changes on the fly.
Fades: Fades are progressive increases (fade-in) or decreases (fade-out) of audio signals. Most commonly used when no obvious ending of a song. Crossfades are transitional regions that can bridge regions so the ending of one fades into another.
MIDI
Controllers: Hardware or software that generates and transmits MIDI data to MIDI-enabled devices, typically to trigger sounds and control parameters of an electronic music performance.
Quantization: One of the more important concepts. Quantization has many meanings based on the task to be performed but in this context, it’s for making music with precision with respect to timing of notes. To compensate for human error on precision, quantization can help nail the right note at the mathematically perfect time. While great for MIDI note data, it does become challenging but a worthwhile effort to quantize MIDI tracks. Most DAWS have this built-in but this is not a magic wand to blow away all your problems. Quantization in my experience works best when Ive performed a track with acceptable level of timing.
Velocity: Force with which a note is played and used to making MIDI sounds more human ( or more mechanical if thats the intent). This typically controls the volume of the note and can be used to control dynamics, filters, multiple samples and other functions.
Mixer
Automation: Process where we can program the arrangements, level, EQ to change based on pre-determined pattern. For example automation to increase the reverb just before the chorus or add delays to a particular part in the mix.
Auxiliary sends: Type of output used in mixers while recording. Allows the producer to create an ‘auxiliary” mix where you can control each input channel on the mixer. This helps route multiple input channels to a single output send. A mixer can choose how much of a signal that needs to be sent to the aux channel. In Ableton, two Aux channels (Titled A and B) are created by default. Aux channels are great for filtering in effects such as reverb and delay.
Channel strip – Type of preamp with additional signal processing units, similar to an entire channel in a mixing console (example).
Bus: Related to Aux sends above, a bus is a point in the signal flow where multiple channels are routed into the same output. In Ableton, this is the Master channel – where all the tracks merge together before being exported.
Unbalanced cables pick up noise ( from electrical, radio and power interference from nearby cables) and are best used for short distances, for example a short cable to connect different analog pedals with each other. Quarter inch TS (tip, sleeve) cables are used for unbalanced cables.
Balanced cables: Have ground wire and carry two copies of the same signal that are reversed in polarities and they travel down the cable and cancel each other out. Once the two signals get to the other side of the cable, the polarity of the negative signal gets reversed so both signals are in sync. The noise as the signals travelled is picked up by both signals but not reversed in polarity effectively eliminating it.
Dynamic Effects
Downward compressor: Same as a compressor which is reducing the level of louder things. When explicitly called out , “upward compressors” bring up the volume of the quiet material. One of the most important effects in audio engineering. Compressors are used for dynamic range and compresses the signal.Expander” Expander expands dynamic range. Louder parts become louder, quieter parts become quieter. Making it louder means amplifying the signal that passes the threshold, it is the opposite of a compressor.
Gate: Provides a floor level for the signal to cross to get through – if the signal is below the gate level if will be treated as silence. Used to cut out the audio when it’s quiet.
Limiter: Serves as a ceiling above which the signal cannot pass. It’s essentially a compressor with a very high ratio – as the compression increases, the ratio increases.
Filter and Delay Effects
Convolution reverb: Convolution reverbs digitally simulate the reverberation of a physical or virtual space. They are based on mathematical convolution operations and use pre-recorded audio samples of the impulse response of the space being modeled. These use an Impulse Response (IR) to create reverbs. An impulse response is a representation of the signal change as it goes through a system. The advantage of a convolution reverb is its ability to accurately simulate reverb for natural sounding effects. The disadvantage is that it can be computationally expensive. Impulse response is the recording of a real space that we are applying with this mathematical procedure called convolution. In most Convolution plugins, we can find a wide variety of audio files that are representing a large number of real spaces. So, DAWS have large selections where we can simulate different places say a small club versus a stadium.
Algorithmic reverb: Algorithmic reverbs are based on the settings we set in our DAW. These simulate the impulse responses. Algorithmic reverbs use delay lines, loops and filters to simulate the general effects of a reverb environment. All non-convolution reverbs can be considered as algorithmic. Algorithm reverbs are kind of like synthesizers since we are creating the impression of a space with an algorithm of some sort of a mathematical representation. These create echoes using mathematical algorithms to simulate the delays that occur in reverb. Tradeoff is that these may sound less natural than convolution reverbs.
Comb filtering: Two audio signals that are playing the same signal arrive at the listeners ears at different times due to a delay. The signals look like a comb when graphed out.
Dry/wet: Dry sounds that has no effects of any kinds of modifications. Raw unprocessed sound. Wet sounds are processed sounds with effects that are added while recording or after mixing.
Low Shelf filter: Low shelf filters cut or boost signals of frequencies below a threshold. These usually use “cutoff frequencies” to cut /boost lower frequencies mostly to ensure instruments don’t interfere with each other. Used a lot during guitar EQ mixing and vocals.
Music generation with Recurrent Neural Nets has been of great interest to me with projects like Magenta displaying amazing feats of ML-driven creativity. AI is increasingly being used to augment human creativity and this trend will lay to rest creativity blocks like in the future. As someone who is usually stuck in a musical rut, this is great for spurring creativity.
With a few covid-induced reconnects with old friends (some of whom are professional musicians) and some inspired late night midi programming on Ableton, I decided to modify some scripts / tutorials that have been lying around on my computer to blend deep learning and compose music around it as I research on the most optimal ways to integrate Deep Learning into original guitar music compositions.
There’s plenty of excellent blogs and code on the web on LSTMs including this one and this one on generating music using Keras. LSTMs have plenty of boiler plate code on github that demonstrate LSTM and GRUs for creating music. For this project, I was going for recording a guitar solo based on artists I like and to set up a template for future experimentation for research purposes. A few mashed up solos of 80s guitar solos served as the source data but the source data could have been pretty much anything in the midi format and it helps to know how to manipulate these files in the DAW, which in my case was Ableton. Most examples on the web have piano midi files that generate music in isolation. However, I wanted to combine the generated music with minimal accompaniment so as to make it “real”.
With the key of the track being trained on being in F Minor , I also needed to make sure i have some accompaniment in the key of FMinor for which I recorded a canned guitar part with some useful drum programming thanks to EZDrummer.
Tracks in Ableton
Note: this was for research purposes only and for further research into composing pieces that actually make sense based on the key being fed into the model.
Music21 is invaluable for manipulating midi via code. Its utility is that is lets us manipulate starts, durations and pitch. I used Ableton to use the midi notes generated to plug in an instrument along with programmed drums and rhythm guitars.
Step 1:
Find the midi file(s) you want to base your ML solo on. In this case, Im going for generating a guitar solo to layer over a backing track. This could be pretty much anything as long as its midi that can be processed by Music21.
Step 2:
Preprocessing the midi file(s): The original midi file had guitars over drums, bass and keyboards. So, the goal was to extract the list of notes first to save them, the instrument.partitionByInstrument() function, separates the stream into different parts according to the instrument. If we have multiple files we can loop over the different files to partition it by individual instrument. This returns a list of notes and chords in the file.
from tqdm import tqdm
songs = glob(' /ml/vish/audio_lstm/YJM.mid') # this could be any midi file to be trained
notes = []
for file in tqdm(songs):
midi = converter.parse(file) # convert all supported data formates to music21 objects
notes_parser = None
try:
# partition parts for each unique instrument
parts = instrument.partitionByInstrument(midi)
except:
print("No uniques")
if parts:
notes_parser = parts.parts[0].recurse()
else:
notes_parser = midi.flat.notes # flatten notes to get all the notes in the stream
print("parts == None")
for element in notes_parser:
if isinstance(element, note.Note):# check if elements are in the note class
notes.append(str(element.pitch)) # Returns Pitch objects found as a Python List
elif(isinstance(element, chord.Chord)):
notes.append('.'.join(str(n) for n in element.normalOrder))
print("notes:", notes)
Step 3:
Creating the model inputs: Convert the items in the notes list to an integer so they can serve as model inputs. We create arrays for the network input and output to train the model. We have 5741 notes in our input data and have defined a sequence length of 50 notes. The input sequence will be 50 notes and the output array will store the 51st note for every input sequence that we enter. Then we reshape and normalize the input vector sequence. We also one hot encoder on the integers so that we have the number of columns equal to the number of categories to get a network output shape of (5691, 92). I’ve commented out some of the output so the results are easier to follow.
Model: We invoke Keras to build out the model architecture using LSTM. Each input note is used to predict the next note. Code below uses standard model architecture from tutorials without too many tweaks. Plenty of tutorials online that explain the model way better than I can such as this: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Training on the midi input can be expensive and time consuming so I suggest setting a high epoch number with calls backs defined based on the metrics to monitor, In this case, I used loss and also created checkpoints for recovery and save the model as ‘weights.musicout.hdf5’. Also note , I trained this on community edition Databricks for convenience.
Predict: Once we have the model trained, we can start generating nodes based on the trained model weights along with feeding the model a sequence of notes. We can pick a random integer and a random sequence from the input sequence as a starting point. In my case, it involved calling the model.predict function for a 1000 notes that can be converted to a midi file. The results might vary at this stage, for some reason I saw some degradation after 700 notes so some tuning required here.
start = np.random.randint(0, len(network_input)-1) # randomly pick an integer from input sequence as starting point
print("start:", start)
int_to_note = dict((number) for number in enumerate(pitch_names))
pattern = network_input[start]
prediction_output = [] # store the generated notes
print("pattern.shape:", pattern.shape)
pattern[:10] # check shape
# generating 1000 notes
for note_index in range(1000):
prediction_input = np.reshape(pattern, (1, len(pattern), 1))
prediction_input = prediction_input / float(n_vocab)
prediction = model.predict(prediction_input, verbose=0) # call the model predict function to predict a vector of probabilities
predict_index = np.argmax(prediction) # Argmax is finding out the index of the array that results in the largest predict value
#print("Prediction in progress..", predict_index, prediction)
result = int_to_note[predict_index]
prediction_output.append(result)
pattern = np.append(pattern, predict_index)
# Next input to the model
pattern = pattern[1:1+len(pattern)]
print('Notes generated by model...')
prediction_output[:25] # Out[30]: ['G#5', 'G#5', 'G#5', 'G5', 'G#5', 'G#5', 'G#5',...
Step 6:
Convert to Music21: Now that we have our prediction_output numpy array with the predicted notes, it’s time to convert it back into a format that Music21 can recognize with the objective of converting that back to a midi file.
offset = 0
output_notes = []
# create note and chord objects based on the values generated by the model
# convert to Note objects for music21
for pattern in prediction_output:
if ('.' in pattern) or pattern.isdigit(): # pattern
notes_in_chord = pattern.split('.')
notes = []
for current_note in notes_in_chord:
new_note = note.Note(int(current_note))
new_note.storedInstrument = instrument.Piano()
notes.append(new_note)
new_chord = chord.Chord(notes)
new_chord.offset = offset
output_notes.append(new_chord)
else: # pattern
new_note = note.Note(pattern)
new_note.offset = offset
new_note.storedInstrument = instrument.Piano()
output_notes.append(new_note)
# increase offset each iteration so that notes do not stack
offset += 0.5
#Convert to midi
midi_output = music21.stream.Stream(output_notes)
print('Saving Output file as midi....')
midi_output.write('midi', fp=' /ml/vish/audio_lstm/yjmout.midi')
Step 7:
Once we have the midi file with the generated notes, the next step was to load the midi track into Ableton. The next steps were standard recording processes one would follow to record a track in the DAW.
a) Compose and Record the Rhythm guitars, drums and Keyboards.
b) Insert the midi track into the DAW and quantize and sequence accordingly. This can take significant time depending on the precision wanted. In my case, this was just a quick fun project not really destined for the charts so a quick rough mix and master sufficed.
The track is on soundcloud here. The solo kicks in around the 16 second mark. Note I did have to adjust the pitch to C to blend in with the rhythm track though it was originally trained on a track in F minor
There are other ways of dealing with more sophisticated training like using different activation functions or by normalizing inputs. GRUs are another way to get past this problem and I plant iterate on more complex pieces blending deep learning with my compositions. This paper gives a great primer on the difference between LSTMs and GRUs: https://www.scihive.org/paper/1412.355