The usual Synchronous versus Asynchronous versus Concurrent versus Parallel is a topic in technical interviews that usually leads to expanded conversations in the candidates overall competency on scaling and leads to interesting rabbit holes and examples. While keeping the conversations open-ended, I’ve noticed when candidates usually incorporate techniques to speed up parallelism or enhance concurrency or mention other ways of speeding up processing, its a great sign.
Its also important to distinguish between CPU-bound and IO-bound tasks in such situations since parallelism is effective on CPU-bound tasks (example preprocessing data, running ensemble of models) while concurrency works best for IO-bound tasks (web scraping, database calls). CPU-bound tasks are great for parallelization using multiple CPUs where a task can be split into multiple subtasks while IO-bound are not CPU dependent but depend on time reading/writing from disk.
Key standard libraries in Python for concurrency include:
- AsyncIO – for concurrency with coroutines and eventloops. This is most similar to a pub-sub model.
- Concurrent.futures – for concurrency via threading
Both are limited by the global interpreter lock (GIL) and single process, multi-threaded.
Note for parallelism, the library I’ve usually used is multiprocessing which is another post.
AsyncIO is a great tool to execute tasks concurrently and is a great way to add asynchronous calls to your program. However, the right usecase matters as it can also lead to unpredictability since tasks can start, run and complete in overlapping times using context switching between threads. Threads can be blocked using asyncio and the next available thread in the queue can be processed until it completes or is blocked. The key here is the lack of any manual checking if the thread is freed up, as asyncio will announce the availability of the thread when it does actually free up.
This post has a couple of quick examples of asyncio using the async/await syntax for event-loop management. Plenty of other libraries available but async is usually sufficient for most workloads and a couple of simple examples go a long way in explaining the concept.
The key calls here are –
- async – tells the Python interpreter to run the coroutine asynchronously with an event loop
- await – while waiting for the result to be returned, this passes control back to the event loop, suspending the execution of the current coroutine to let the event loop run other things until the await has a result returned
- run – Schedule a coroutine (python 3.7+) . In earlier versions (3.5/3.6) you can a
- sleep – suspends execution of a task to switch to the other task
- gather – Execute multiple coroutines that need to finish before resuming the current context with the list of responses from each coroutine
Example of using async for multiple tasks:
A trivial example like above goes a long way in explaining the core usage of the sync library especially while bolting it onto long running python processes that are primarily slow due to IO.
aiohttp is another great resource for synchronous HTTP requests which by nature are a great usecase for asynchronicity while requests wait for servers to respond to do other tasks. This basically works by creating a client session that can be used to support multiple individual requests and make connections upto 100 different servers at the same time.
Non async example
A quick example to handle requests from a website (https://api.covid19api.com/total/country/{country}/status/confirmed) that provides a JSON string based on the specific request. The specific request is not important here and used only for demonstrative purposes to demonstrate the async functionality.
Async example using aiohttp which will be needed in order to asynchronously call the same endpoint.
The example clearly shows the time difference where the async calls halve the time taken for the same calls. Granted its a trivial example but it shows the benefit of the non-blocking Ascync call and can be applied to any situation that deals with multiple requests and calls to different servers.
Key Points
- AsyncIO is usually a great fit for IP bound problems
- Putting async before every function will not be beneficial as the blocking calls can slow down the code so validate the usecase first
- async await support a specific set of methods only so for specific calls (say to databases), the specific python wrapper library you use will need to support async await.
- https://github.com/timofurrer/awesome-asyncio is the go-to place for higher level async APIs along with https://docs.python.org/3/library/asyncio-task.html