Enhancing the Scalability of the Axyon platform.
Allowing Fintech AI applications deal with large customer datasets through GPU-powered supercomputers.
Axyon AI is an Italian fintech start-up whose current applications are mainly focused on financial time series analysis with Machine Learning algorithms. More specifically, Axyon AI partners with financial institutions (asset managers, hedge funds, trading desks) to improve the performance and risk profiles of investment strategies. The main objective of the pilot is to work with EOSC DIH as a proof of concept of using the EOSC infrastructure and competences to enhance the TRL of the company services.
The key result of the ESAX project was bringing the computational scalability of the Axyon Platform to a new level, almost quadrupling the previous peak of parallely executed jobs, with no issue in terms of system management load or network utilization. Moreover, the Platform now supports the optimization of large deep neural network models on multi-GPU multi-node HPC clusters, drastically reducing execution times and paving the way to more complex models and next-generation (e.g. exascale) computing capabilities.
How they used eosc hub services
ESAX used the CINECA HPC Tier-0 system Marconi100 (M100), an HPC cluster composed of 980 nodes equipped with 4 Nvidia v100 GPUs per node and an IBM Power9 AC922 at 3.1 GHz processor. Axyon’s AI engine is based on the TensorFlow framework, which can natively exploit the Nvidia multi-GPU technology available on the cluster. M100 also enabled the testing of multi-node distributed training thanks to the Horovod framework that supports TensorFlow. For the project, the Axyon Platform consumed approximately 300k core-hours.
The value proposal
At the core of the Axyon Platform sits an automatic meta-optimization engine that executes several parallel jobs using a multitude of different Deep Neural Network morphologies, automatic feature engineering and selection, which requires a large amount of computational power (the typical run cycle of one meta-optimization run takes approximately 1 week on a small GPU cluster). With ESAX, the scalability of the Axyon Platform was dramatically improved, enabling the execution of a much greater number of parallel jobs in parallel, while the runtime of each single job was also reduced by parallelising it over multiple GPUs or even multiple nodes. This allows Axyon to expand its offering by training AI models with exponentially larger datasets, which may for instance include a wider array of financial assets with much higher granularity level and different explanatory variables (e.g. sentiment data).
How used eosc hub helped
The design and computing power of CINECA HPC Tier-0 system Marconi100 made it an ideal infrastructure for the ESAX project, as the Axyon Platform heavily relies on GPU computing through TensorFlow. The high parallelism capacity of M100 allowed running stress tests on the Platform workload management system to assess and improve its performance and scalability, with invaluable insights provided by CINECA consultants.