Artificial Intelligence for rare disease diagnosis.

Assessing the probability of development of further diseases in Gaucher disease patients.

The Spanish Foundation for the Study and Treatment of Gaucher Disease and other Lysosomal Diseases (FEETEG) promotes the scientific research of Gaucher disease and its treatment methods. The Foundation is interested in predicting the probability of development of diseases such as neoplasms or Parkinson’s disease in patients of Gaucher disease (correlations between diseases). For this purpose, Kampal Data Solutions was contacted by FEETEG to develop an advanced analytical model based on Artificial Intelligence with the information available in the Gaucher Spanish Disease Registry.


Kampal Data Solutions has developed a machine learning model able to cope with big data samples by using the cloud infrastructure provided by EOSC DIH. The case study was based on a medical data set provided by the FEETEG containing information of patients with Gaucher’s disease. In addition, extra data was generated following what the current literature considers normal values. This way allowed to obtain a big data sample that loosely resembles the natural proportion of patients with Gaucher Disease. To be able to handle the problem size increment the parallelization of the code was required, benefiting from cloud computing.

How they used EOSC-hub services

The pilot required extra computational resources to cope with the problem size (1 million samples). For that, Kampal Data Solutions got benefit from the EOSC DIH  cloud infrastructure where 16 VCPUs with 32GB of RAM were used. To speed up the process and benefit from all the cores, the code was parallelized. This way, different operations can be done simultaneously on each core using only a fraction on the sequential computational time. The parallelization of the code was greatly simplified by using the R packages parallel and dplyr.

The value proposal of the pilot

Although the obtained results do not have medical value, this proof of concept shows that the chosen model is scalable and could be efficiently applied to other conditions or illnesses where more data is available.  The challenge now will be to identify the business opportunities to exploit the model. 

How EOSC-hub helped

EOSC-hub has provided Kampal Data Solutions with powerful cloud infrastructure to support the scaled up analytics required for validating the proof of concept. Using the computing power of the EOSC-hub services, Kampal Data Solutions could experiment and test its new models for the disease prediction.

The technical support provided from the EOSC DIH team helped Kampal Data Solutions to access and manage the Cloud and provide the company with a better understanding of the EOSC computing infrastructure, meanwhile the visibility service enhance the exposure of the pilot through different European communities. 


Supporting project