Received: June 2, 2019 / Revised: August 22, 2019 / Accepted: August 23, 2019 / Published: August 25, 2019

Leukemia is a deadly cancer and has two main types: acute and chronic. Each type has two other subtypes: lymphoid and myeloid. Altogether there are four types of leukemia. This study presents a new method to diagnose all leukemia subtypes from blood cell microscopic images using a convolutional neural network (CNN), which requires a large training data set. Therefore, we also investigated the effect of adding data synthetically to an increasing number of training methods. We used two publicly available leukemia data sources: ALL-IDB and ASH Image Bank. Next, we used seven different methods to transform the images as additional data. We have developed a CNN architecture capable of detecting all subtypes of leukemia. In addition, we also reviewed some popular machine learning algorithms such as Naive Bayes, Support Vector Machine, k-nearest neighbor, and decision tree. To evaluate our approach, we performed a series of experiments and used 5-fold cross-validation. The results we obtained from the experiment showed that the performance of our CNN model has an accuracy of 88.25% and 81.74% in leukemia compared to health and multiclass classification of all subtypes, respectively. Finally, we have also shown that the CNN model has better performance than other known machine learning algorithms.

Leukemia is a malignant disease of the white blood cells (WBC) that affects the bone marrow and blood of the human body. This disease can cause the destruction of the immune system of the human body. There are two main types of leukemia, acute and chronic leukemia, depending on how quickly it progresses. In acute leukemia, infected WBC do not form or behave like normal WBC; while in chronic leukemia it can function as a normal WBC. Therefore, chronic leukemia can be difficult because it cannot be distinguished from normal WBC. In addition, depending on the size and shape of WBC, there are two types of each type of leukemia: lymphoid and myeloid. Generally, there are four types of leukemia as shown in Figure 1, Acute Lymphoblastic Leukemia (ALL), Acute Myeloid Leukemia (AML), Chronic Lymphocytic Leukemia (CLL) and Chronic Myeloid Leukemia (CML) [1, 2]. Knowing leukemia and its specific forms is important for hematologists to avoid medical risks and determine appropriate treatments. The use of intelligent diagnostic methods helps and quickly discovers leukemia subtypes with the help of blood samples (ie blood samples).

Microscopic blood tests are considered the main method of diagnosing leukemia [2]. A blood test is the most common way to diagnose leukemia, but it is not the only way. Interventional radiology is another way to diagnose leukemia. However, radiological techniques such as percutaneous aspiration, biopsy, and catheter drainage suffer from internal limitations in imaging model sensitivity and resolution of radioimages [3]. In addition, other methods such as molecular cytogenetics, long inverse polymerase chain reaction (LDI-PCR), and array-based comparative genomic hybridization (aCGH) require more work and time to identify types of leukemia [4]. Because of the time and money required for this technique, invasive blood tests and bone marrow tests are the most common methods of identifying leukemia subtypes.

A machine learning (ML) algorithm helps distinguish leukemic blood cells from HEALTHY cells when a large training set is available. The ALL-IDB Leukemia Image Library [5] is one of the sets used as a reference by several medical researchers [1, 6, 7]. Another leukemia dataset comes from the American Society of Hematology (ASH) and is available online at their website [8]. Thanh et al. [7] used the ASH database to identify AML leukemia in their research. Google is another source of unlabeled leukemia images where images are collected randomly from websites. Karthikeyan et al. [9] used small images collected by Google in their research to identify leukemia, with the authors describing their images. Successful implementation of machine learning-based leukemia diagnosis can be based on the use of defined image datasets.

Identifying leukemia subtypes from HEALTHY samples is challenging. In the literature, many researchers have only analyzed the binary classification between subtype and good health [1, 7, 9, 10, 11, 12]. They have a high efficiency, even more than 96% efficiency. In addition, Shafique et al. [6] further classified samples with ALL subtypes based on the size of the cell and the shape of its nucleus. However, dealing with the identification of all subtypes of leukemia is a more difficult task than a simple binary classification [13]. As far as we know, there is no automatic detection method that covers all subtypes of leukemia.

Several ML algorithms help classify and diagnose leukemias based on microscopic images. Paswan et al. [10], used support vector machine (SVM) and k-nearest neighbor (k-NN) to classify AML leukemia subtype, they found an accuracy of 83%. Pate et al. [1] used SVM to classify ALL leukemia subtypes and obtained 93% accuracy. Karthikeyan et al. [9] also used SVM and c-means clustering method to separate WBC from the background and found 90% accuracy. Although using the method of deep learning (DL) seems to be more effective, its effectiveness depends on the amount and quality of the data used [6]. Convolutional Neural Network (CNN) is one of the neural network architectures commonly used to deal with image quality and registration problems. Shafique et al. [6] used a convolutional neural network (CNN) to identify ALL leukemia subtypes. Their results showed 99% binary classification between ALL and HEALTHY samples and 96% additional classification of ALL subtypes alone. Thanh et al. [7] also developed a CNN model composed of five convolutional layers for binary classification of ALL leukemia subtypes and obtained 96.6% accuracy. Unfortunately, the power classification in this type of neural network requires a large amount of training data to learn to recognize important features from every image. However, creating a large data set takes a lot of time and is a huge task. To avoid this problem, we recommend adding a small number of samples with the addition of the image. Using an insufficient number of image samples in the training data set can lead to a higher problem [14]. Therefore, many researchers in the literature rely on the use of some image transformation method to further increase the number of training set samples to avoid the problem of overload. Pate et al. [1] applied intermediate and low-pass filters to remove noise and blur. Many image transformation methods have been used in the literature, such as B. Image rotation and mirroring, histogram equalization, image translation, grayscale transformation, image blurring and image clipping [6, 9, 10]. Using image enhancement allows using the DL method which requires a large number in the training data set.

In this study, we present a new method for diagnosing leukemia using blood micrographs that show four types of leukemia (i.e., ALL, AML, CLL, and CML) using a deep learning CNN architecture. To our knowledge, this is the first study to look at all four leukemia subtypes.

