Having been a Keras user since  I read  the seminal Deep Learning with Python , I’ve been experimenting with exporting formats to different frameworks to be more framework-agnostic.


ONNX ( Open Neural Network Exchange) is an open format for representing traditional and deep learning ML models.  Key goal being promoting inter-operability between a variety of frameworks and target environments. ONNX helps you to export a fully trained model into its format and enables targeting diverse environments without you doing manual optimization and painful rewrites of the models to accommodate environments.
It defines an extensible computation graph model along with built-in operators and standard data types to allow for a compact and cross-platform representation for serialization. A typical use case could be scenarios where you want to use transfer learning to use model weights of another model possibly built in another framework into your own model i.e. if you build  a model in Tensorflow, you get a protobuf (PB) file as output and it would be great if there is one universal format that you can now convert to the PT format to load and reuse in Pytorch or use its own hardware agnostic runtime.

For high-performance inference requirements in varied frameworks, this is great with platforms like NVIDIA’s TensorRT supporting ONNX with optimizations aimed at the accelerator present on their devices like the Tesla GPUs or the Jetson embedded devices.

Format

The ONNX file is a protobuf encoded tensor graph. List of operators supported are documented here and operations are referred to as “opsets” i.e. operation sets. Opsets are defined for different runtimes in order to enable interoperability. The operations are a growing list of widely used linear operations, functions and other primitives used to deal with tensors.

The operations include most of the typical deep learning primitives, linear operations, convolutions and activation functions. The model is mapped to the ONNX format by executing the model with often just random input data and tracing the execution. The operations executed are mapped to ONNX operations and so the entire model graph is mapped into the ONNX format. After this the ONNX model is then saved as .onnx protobuf file which can be read and executed by a wide and growing range of ONNX runtimes.

Note – Opsets are fast evolving and with fast release cycles of competing frameworks, it may not always be easy to upgrade to the latest ONNX version if it breaks compatibility with other frameworks. The file format consists of the following:

  • Model: Top level construct
    • Associates version Info and Metadata with a graph
  • Graph: describes a function
    • Set of metadata fields
    • List of model parameters
    • List of computation nodes – Each node has zero or more inputs and one or more outputs.
  • Nodes: used for computation
    • Name of node
    • Name of an operator that it invokes a list of named inputs
    • List of named outputs
    • List of attributes

More details here.

Runtime


The ONNX model can be inferenced with ONNX runtime that uses a variety of hardware accelerators for optimal performance. The promise of ONNX runtime is that it abstracts the underlying hardware to enable developers to use a single set of APIs for multiple deployment targets. Note – the ONNX runtime is a separate project and aims to perform inference for any prediction function converted to the ONNX format.

This has  advantages over dockerized pickle models that is usually the approach in a lot of production deployments where there are runtime restrictions (i.e. can run only in .NET or JVM) , memory and storage overhead, version dependencies, and batch prediction requirements.


ONNX runtime has been integrated in WINML, Azure ML with MSFT as its primary backer. Some of the new enhancements include INT8 quantization to reduce floating point numbers for reducing model size, memory footprint and to increase efficiencies benchmarked here.


The usual path to proceed :

  • Train models with frameworks
  • Convert into ONNX with ONNX converters
  • Use onnx-runtime to verify correctness and Inspect network structure using netron (https://netron.app/)
  • Use hardware-accelerated inference with ONNX runtime ( CPU/GPU/ASIC/FPGAs)

Tensorflow


To convert Tensorflow models, the easiest way is to use the tf2onnx tool from the command line. This converts the saved model to a model representation that includes the inference graph.


Here is an end-to-end example of saving a simple Tensorflow model , converting it to ONNX and then running the predictions using the ONNX model and verifying the predictions match.


Challenges


However, some things to consider while using this format is the lack of “official support” from frameworks like Tensorflow. For example, Pytorch does provide the functionality to exports models into ONNX (torch.ONNX ) however I could not find any function to import an ONNX model to out put a Pytorch model. Considering CAFFE 2 that is a part of PyTorch fully supports ONNX import/export, it may not be totally unreasonable to expect an official conversion importer(there is a proposal already documented here).

The Tensorflow converters seem to be part of the ONNX project i.e. not an official/out of the box Tensorflow implementation. List of Tensorflow Ops supported are documented here. The github repo is a treasure trove of information on the computation graph model and the operators/data types that power the format. However, as indicated earlier depending on the complexity of the model (especially in transfer learning scenarios), it’s likely to encounter conversion issues during function calls that may cause the ONNX converter to fail. In this case, there are likely scenarios which may necessitate modifying the graph in order to fit the format. I’ve had a few issues running into StatefulPartitionednCalls especially in using TransferLearning situations for larger encoders in language models.


I have also had to convert Tensorflow to PyTorch by first converting Tensorflow to ONNX. Then the ONNX models to Keras using onnx2keras and then convert to Pytorch using MMdn with mixed results and a lot of debugging and many abandons. However, I think ONNX runtime for inference rather than framework-to-framework conversions will be a better use of ONNX.



The overall viability of a universal format like ONNX though well intentioned and highly sought may not fully ever come into fruition with so many divergent interests amongst the major contributors and priorities though its need cannot be disputed.