Research Article | | Peer-Reviewed

A Machine Learning-Based Prediction of Malaria Occurrence in Kenya

Received: 20 July 2024     Accepted: 9 August 2024     Published: 20 August 2024
Views:       Downloads:
Abstract

For many years’ malaria has been a health public concern in Kenya as well as many parts of Africa and other parts of the world. The purpose of this study is to develop and evaluate a supervised machine learning model to predict malaria occurrence (final malaria test results) in Kenya. The study investigated twelve predictor variables on the outcome variable (malaria test results), where five machine learning models namely; k-nearest neighbors, support vector machines, random forest, tree bagging, and boosting, were estimated. During the model evaluation, random forest emerged as the best overall model in the classification and prediction of final malaria test results. The model attained a higher classification accuracy of 97.33%, sensitivity of 71.1%, specificity of 98.4%, balanced accuracy of 84.7% and an area under the curve of 98.3%. From the final model, the presence of plasmodium falciparum emerged most important feature, followed by region, endemic zone and anemic level. The feature with the least importance in predicting final malaria test results was having mosquito nets. In conclusion, employing Machine learning algorithms enhances early detection, optimizing resource allocation for interventions, and ultimately reducing the incidence and impact of malaria in the Kenya. The study recommends allocation of resources and funds to areas with the presence of plasmodium falciparum, region susceptible to malaria, endemic zones and anemic prone areas.

Published in American Journal of Theoretical and Applied Statistics (Volume 13, Issue 4)
DOI 10.11648/j.ajtas.20241304.11
Page(s) 65-72
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2024. Published by Science Publishing Group

Keywords

Machine Learning, Accuracy, Sensitivity, Specificity, Feature, Balance Accuracy, Malaria

1. Introduction
Malaria is a killer disease and has caused great threat in many regions especially in the malaria tropical regions and endemic zones . However, the disease is considered deadly but curable. Unlike other diseases, malaria is caused and spread by female Anopheles mosquitos which carries plasmodium parasite and is not transmitted from one person to another . Malaria infection is accompanied by quite a number of signs and symptoms which can be regarded as mild and some regarded as deadly . Some of the mild malaria signs and symptoms include fever, headache and chills. On the other hand, life threatening signs and symptoms include confusion, seizures, jaundice, dark urine, and difficulty breathing among others. The threat from malaria infection varies significantly from one group to another. In a report by World Health Organization 2022, children under five (5) years, pregnant women, and travelers are at great risk of the threat caused by this killer disease. It is important to note that malaria infection is not caused by a single type of malaria parasite. There exist five type of mosquito parasite from the female anopheles’ mosquitos that causes malaria and the two of them are plasmodium falciparum and plasmodium vivax . These species of malaria parasites exist in various regions, however, the most prevalent type of malaria parasite that exists in most parts of Africa is the plasmodium falciparum . The species is the most threatening malaria species. Plasmodium vivax is most prevalent in other countries in Africa outside the sub-Saharan parts of Africa. The other three species of malaria parasite are plasmodium malariae, plasmodium ovale and plasmodium knowlesi.
In their 2022 report, the World Health Organization (WHO) reported that there were 249 million malaria cases with approximately 608,000 malaria-related deaths in 2022 as compared to 610,000 malaria-related deaths in 2021. Despite the decrease in malaria-related deaths in 2022, the disease in still a life-threatening and requires continuous and proactive measures to prevent its resurgence and manage its transmission effectively. These statistics are reported from 85 countries. Out of the number of malaria-related deaths reported globally, nearly half the number of deaths is reported from four African countries which include Nigeria, Uganda, Democratic Republic of Congo (DRC) and Mozambique. In their study, reported that higher share of malaria cases is disproportionately higher in African countries as compared to any other country. Besides, approximately, 95% of death in Africa are malaria-related deaths which is close to 580,000 deaths . In the general population, children under five (5) years of age are at a great risk of malaria infection and malaria-related death. In their report, WHO reports that 80% of the deaths among children under five years were found to be malaria related.
Kenya as one of the African countries in sub-Saharan part of Africa, faces the same threat from malaria infection just like any other African tropical countries . Many countries lying within 350 S and 350 N are likely to fall in the tropical region . Since the equator, tropic of Cancer, tropic of Capricorn nearly runs through the middle of Africa, it makes Africa the most tropical continent resulting to higher susceptibility to malaria infection. The tropical climate of Africa, characterized by warm temperatures higher humidity and sufficient rainfall, creates favorable conditions for the breeding of mosquitoes, which are the primary vectors for malaria, making the continent highly susceptible to malaria infection . Equator passing through Kenya places the country in the tropical region with warm temperature and higher humidity, conditions ideal for the breeding and survival of anopheles’ mosquitoes responsible for the transmission of malaria . These climatic conditions make the country susceptible to malaria infection, a great public health concern. Several initiatives have been put in places in attempts to reduce malaria infection cases and deaths, however, the reduction in the number of cases and deaths has not been significant . In Kenya, malaria infection cases and deaths are still relatively higher and a robust action is needed to address and mitigate the infection. In a report by US Presidential Malaria Initiative (PMI) 2022, the number of malaria-related mortality rate for children under five years between 2003 and 2022 reduced from 11.5% to 4.1%, indicating 0.389% decrease in malaria-related mortality rate among children under five years. The decrease in mortality in this group was made possible by allocating more funds and resource to area such as high endemic zones and lake endemic zones in accordance with the PMI initiatives.
It is worth acknowledging that much has been done to address the malaria infection cases in Kenya as well as many parts of world, however, this paper seeks to compliment what has been done by developing machine learning predictive models to model and predict malaria occurrence in Kenya. Application of machine learning in binary and multi-classification is relevant due to the ability of the algorithms to analyze vast amount of data to uncover hidden insights and patterns that could not be uncovered by the traditional methods . Despite the reports showing a reduction in the malaria-related mortality rate, malaria infection is still a public health concern in Kenya. Application of machine learning in this study made use of various factors including demographic factors, environmental factors and health related factors to accurately predict malaria occurrences. The accuracy in the prediction of malaria cases in this study was made possible due to the ability of machine learning models to analyze and derive insights from both linear and non-linear relationship between features. This ability in deriving insights from complex data set is vital in developing an intervention program to address malaria-related threats especially among children under five years, pregnant women and travelers. Further, application of machine learning algorithms in this study will help ensure that prediction of malaria case is accurate over time as more data get incorporated into the model which increase the performance of the ML models. Therefore, the integration of machine learning in this study aimed to complement existing efforts and provide a robust, data-driven approach to predict and mitigate malaria occurrence, improving public health outcomes in Kenya. The purpose of this study is thus to develop and evaluate machine learning models to predict malaria occurrence in Kenya, with the objectives of enhancing early detection, optimizing resource allocation for interventions, and ultimately reducing the incidence and impact of malaria in the Country.
2. Methodology
2.1. Data Collection
The data used in this study was obtained from the Kenya National Data Archive (KeNADA) website using the link https://statistics.knbs.or.ke/nada/index.php/catalog/111/related-materials. The data used was well documented, accurate and relevant in addressing the research objectives in this study. The dataset had 31,302 observations with 223 variables. Upon cleaning the data to remain with the most relevant information, we remained with thirteen variables comprising twelve predictors including region, endemic zones, anemic level, number of mosquito bed nets, mother’s educational level, presence of various plasmodium species among other variables. The predictors were all categorical and coded appropriately. The outcome variable in this study is the final malaria test results showing either positive or negative, indicating that an individual is infected or not infected, respectively .
2.2. Data Analysis
The paper adopted supervised machine learning algorithms for binary classification and prediction of malaria occurrences in Kenya. Five machine learning algorithms were adopted, namely; Support Vector Machines (SVM), K-Nearest Neighbors (K-NN), Random Forest, Tree Bagging, and Boosting.
1) Support Vector Machines
Training the SVM model involves solving two optimization problems in primal and dual . The primal and the dual optimization problem is expressed as shown below;
Primal form;
12w2+Ci=1nξiw,b,ξmin(1)
Solving the primal optimization problem is subject to the following conditions;
yi(wxi+b)1-ξi,0,i=1,,n (2)
The dual form;
i=1nαi-12i,j=1nαiαjyiyjK(xixjαmax(3)
The solution to the optimization problem above is subject to the following;
i=1nαiyi=0, 0αiC, i=1,,n(4)
The final model is expressed in terms of support vector expressed as follows;
fx= i=1nαiyiK(xi, xj)+b(5)
For the new input feature x (test set), the model predicts the class label (Positive or Negative) using the sign of fx as give in the equation 6 below
Predicted Class=signi=1nαiyiKxi, xj+b(6)
2) K-Nearest Neighbors
The concept of the K-NN algorithm is built behind the idea of commonalities and neighbors' distance around the response variable's target class known as k-Nearest Neighbors determined by the distance metric known as Euclidean distance Dk .
dXa, Xb= j=1m(xja- xjb)2 (7)
The aggregation of neighbors' output is found as shown;
ŷ=mode(yi for iNk) (8)
The predicted class ŷ for the test instance, 𝑥 is the class that appears most frequently among the 𝑘 selected neighbors:
ŷ=iNkNI(yi=c)cCargmax(9)
3) Random Forest
This algorithm is an ensemble that uses the majority voting as indicated by the formula below to increase the accuracy and reduce overfitting ;
ŷ= ArgmaxcCb=1BI(TbX=C) (10)
Letting T be the number of trees in the developed random forest model, and ŷtx be the predictor of the tth tree, for instance, x. The final prediction will be given by;
ŷx=modeŷt(x)t=1T(11)
4) Tree Bagging
Give the dataset 𝒟 with n sample (Xi,Yi), the algorithm creates B bootstrap samples D1,D2DB. The algorithm trains the decision tree fb̂ on each bootstrap sample Db For the new input X, the prediction is given as shown in the equation 12.
Classification: ŷ=model(ŷ1,ŷ2,,ŷB)(12)
5) Boosting
Ada boost model is an ensemble that combines multiple weak learners from multiple decisions . In this method, the boosting algorithm assigns equal weight to all training samples. The wi is given by 1N where i =1, 2, …, N where N is the number of training samples. The model is developed using several weak learners m = 1,2, 3,..., M. The weaker learner hm(x) is developed from the weighted training samples. Each of the weak learners developed is accompanied by weighted error rate and the learner’s weight as given by the equation 13 and 14.
Weighted error rate= ϵm=i=1NwimI(yihmxi)i=1Nwi(m)(13)
Learner's weight= αm= 12lm1-ϵmϵm (14)
Each learner’s weight is updated iteratively as shown in equation 15;
Updated learner'sweight= wi(m+1)=wi(m)exp(-αmyihmxi)(15)
The final step of training the ada boosting model is the weight normalization
Weight normalization= wi(m+1)=wi(m+1)j=1Nwj(m+1) (16)
Final ada boost model=sign m=1Mαmhm(x)(17)
Steps in Machine Learning Modeling
Machine learning modeling takes quite a number of steps. The initial step in ML model development was the problem formulation followed data acquisition and then data preprocessing. In the preprocessing step, the data was cleaned to remain with the most relevant information. Data partitioning was to obtain the training set and testing set, where the training set was used to estimate the model and the test set used to evaluate the model’s performance.
3. Results and Discussions
3.1. Descriptive Statistics
Table 1. Distribution of Malaria Test Results.

Malaria Test Results

N = 3,280

Negative

3149 (96%)

Positive

131(4%)

A total of 3280 households were enrolled in the study. Majority of households, 96.0% (n = 3,149) tested negative and 4.0% (n = 131) positive of malaria as shown in Table 1 above. This is a case of imbalance data set.
Table 2. Two way Table Showing the Distribution of Malaria Test Results Across Various Factors.

Characteristic

Negative, N = 3,1491

95% CI2

Positive, N = 1311

95% CI2

p-value3

Endemic Zones

Highland Epidemic

540 (17%)

16%, 19%

4 (3.1%)

0.98%, 8.1%

<0.001

Lake Endemic

1,088 (35%)

33%, 36%

107 (82%)

74%, 88%

Coastal Endemic

353 (11%)

10%, 12%

14 (11%)

6.2%, 18%

Seasonal

743 (24%)

22%, 25%

6 (4.6%)

1.9%, 10%

Low Risk

425 (13%)

12%, 15%

0 (0%)

0.00%, 3.6%

Number of Children Slept Under Net Last Night

None

1,202 (38%)

36%, 40%

40 (31%)

23%, 39%

0.008

One

1,269 (40%)

39%, 42%

70 (53%)

45%, 62%

Two

585 (19%)

17%, 20%

17 (13%)

8.0%, 20%

Three

82 (2.6%)

2.1%, 3.2%

2 (1.5%)

0.26%, 6.0%

Four

11 (0.3%)

0.18%, 0.64%

2 (1.5%)

0.26%, 6.0%

Anemia Level

Severe

69 (2.2%)

1.7%, 2.8%

8 (6.1%)

2.9%, 12%

<0.001

Moderate

774 (25%)

23%, 26%

66 (50%)

42%, 59%

Mild

749 (24%)

22%, 25%

32 (24%)

18%, 33%

Not anemic

1,557 (49%)

48%, 51%

25 (19%)

13%, 27%

Mother's Highest Educational Level

No education

525 (17%)

15%, 18%

10 (7.6%)

3.9%, 14%

<0.001

Primary

1,428 (45%)

44%, 47%

89 (68%)

59%, 76%

Secondary

866 (28%)

26%, 29%

26 (20%)

14%, 28%

Higher

330 (10%)

9.4%, 12%

6 (4.6%)

1.9%, 10%

Table 2 shows two-way distribution of malaria test cases across endemic zone, number of children who slept under net last night, anemic level, and mother highest education level. From the results, there is a statistically significant association between the factors identified in the results and the malaria test results.
3.2. Model Estimation and Validation
Model estimation in this paper was done with ten folds cross validation repeated five times, using repeated cross validation. Besides, the class function adopted in this study was the two-class summary since the outcome variable is a binary variable with (0=Negative, 1 = Positive). The results of ML models are reported in Table 3.
3.3. Models Evaluation
3.3.1. Models Performance Metrics
Considering the given performance metrics results in Table 3, random Forest emerges as the best overall model. The model achieves the highest sensitivity (0.711) and a strong specificity (0.984), indicating that the model effectively identifies both positive and negative cases. Its precision (0.643) and F1-Score (0.675) reflect a good balance between precision and recall, ensuring reliable positive predictions. Moreover, Random Forest has the highest balanced accuracy (0.847), demonstrating its superior capability in handling imbalanced datasets compared to other models. While boosting also performs well with high specificity (0.987) and precision (0.657), its slightly lower sensitivity (0.605) and balanced accuracy (0.796) make it less optimal than Random Forest. Hence, Random Forest stands out as the most robust and balanced model for this classification task. The results are in line with a study by who found out that random forest emerged the best overall model in malaria prediction after applying SMOTE. However, the results in this study are inconsistent with what found out, who in their study, Naïve Bayes outperformed kNN, SVM and logistic regression in predicting malaria outbreak.
Table 3. Model's Performance Evaluation.

Performance Metrics

Support Vector Machines

K Nearest Neighbors

Random Forest

Tree Bagging

Boosting

Sensitivity

0.447

0.053

0.711

0.711

0.605

Specificity

0.985

0.999

0.984

0.974

0.987

Precision

0.548

0.667

0.643

0.529

0.657

F1-Score

0.493

0.098

0.675

0.607

0.63

Balanced Accuracy

0.716

0.526

0.847

0.842

0.796

3.3.2. Receiver Operating Characteristic (ROC) and Area under the Curve (AUC)
Received operating curve (ROC) and area under the curve (AUC) are important performance metrics in machine learning especially for binary. The ROC and AUC estimated from our ML models in this study are reported in Figure 1.
Considering the performance metrics in Table 3 together with the results in Figure 1, random forest demonstrates a higher ability in classifying the positive cases and the overall classification ability. While boosting has the highest AUC and strong performance metrics, random forest's superior sensitivity and balanced accuracy make it the most robust and reliable model for this classification task.
Figure 1. ROC and AUC for the ML Models.
3.4. Relative Feature Importance
The relative performance of feature shows the percentage contribution of feature to the variation in the outcome variable. Figure 2 shows the feature’s relative importance from the random forest model which was the overall best ML model in this study. Results shows that the presence of the species Falciparum is the most important feature in the classification and prediction of malaria occurrence giving 100% relative importance. Having or not having mosquito net was found to have 0% relative importance in classifying and predicting the occurrence malaria in Kenya.
Figure 2. Feature’s Relative Importance Plot.
3.5. Confusion Matrix
The confusion matrix in Figure 3 shows the correctly classified cases of malaria tests results and the mis-classified tests results as well. The matrix aid in the calculation of model’s accuracy. The model’s accuracy is obtained as shown in equation 18
Accuracy= True Positive+True Negativerue Positive+True Negative+False Positive+False Negative(18)
Figure 3. The Random Forest Confusion Matrix.
4. Conclusion
Malaria is still a killer disease globally and Kenya is not exempted from the threat of this disease. As a result, possible measure and mitigation strategies have to be put in place to address the malaria incidences in Kenya and reduce malaria related deaths among children below five years which is the most hit category. In the five ML models estimated in this study to classify and predict the final malaria results test, random forest emerged as the most preferred model due to its higher classification accuracy and better model performance. The model attained a higher classification accuracy of approximately 97.33%, with a higher sensitivity and specificity of approximately 71.1% and 98.4%, respectively. Besides, random forest model had a relatively higher balance accuracy of approximately 84.7% and an area under the curve of 95.6%. The results indicated that the presence of plasmodium falciparum was found to be the most important feature in classifying final malaria test results, followed by region, endemic zone, and anemic level. In conclusion, employing Machine learning algorithms enhances early detection, optimizing resource allocation for interventions, and ultimately reducing the incidence and impact of malaria in the Kenya. From the results, the target intervention, resources and funds allocation should be channeled to area with presence of plasmodium falciparum, regions susceptible to malaria, endemic zones and areas with higher anemic severity.
Abbreviations

AUC

Area Under the Curve

ROC

Receiver Operating Characteristic

KNBS

Kenya National Bureau of Statistics

SVM

Support Vector Machine

RF

Random Forest

KNN

Kernel Nearest Neighbors

ML

Machine Learning

Acknowledgments
We would like to express our sincere gratitude to the Kenya National Bureau of Statistics (KNBS) and the Kenya National Data Archive (KeNADA) for providing the data used in this study.
Author Contributions
Dennis Muriithi: Conceptualization, Data curation, Formal Analysis, Methodology, Writing – original draft, Writing – review & editing
Victor Wadera Lumumba: Conceptualization, Data curation, Formal Analysis, Methodology, Writing – original draft, Writing – review & editing
Mark Okongo: Conceptualization, Data curation, Formal Analysis, Methodology, Writing – original draft, Writing – review & editing
Funding
This research received no external funding.
Conflicts of Interest
The author declares no conflicts of interest.
References
[1] Capili, B. (2021). Cross-Sectional Studies. The American Journal of Nursing/American Journal of Nursing, 121(10), 59–62.
[2] Chapelle, O. (2007). Training a Support Vector Machine in the Primal. Neural Computation, 19(5), 1155–1178.
[3] Adeyemo, A. O., Aborode, A. T., Bello, M. A., Obianuju, A. F., Hasan, M. M., Kehinde, D. O., Hossain, M. S., Bardhan, M., Imisioluwa, J. O., & Akintola, A. A. (2022). Malaria vaccine: The lasting solution to malaria burden in Africa. Annals of Medicine and Surgery, 79, 104031.
[4] Agapaki, E., & Nahangi, M. (2020). Scene understanding and model generation. Elsevier EBooks, 65–167.
[5] Al-Obaidi, K. M., Ismail, M., & Malek, A. (2014). A study of the impact of environmental loads that penetrate a passive skylight roofing system in Malaysian buildings. Frontiers of Architectural Research, 3(2), 178–191.
[6] Galal, A., Marwa Talal, & Moustafa, A. A. (2022). Applications of machine learning in metabolomics: Disease modeling and classification. Frontiers in Genetics, 13.
[7] Giesen, C., Roche, J., Redondo-Bravo, L., Ruiz-Huerta, C., Gomez-Barroso, D., Benito, A., & Herrador, Z. (2020). The impact of climate change on mosquito-borne diseases in Africa. Pathogens and Global Health, 114(6), 1–15.
[8] Ileperuma, K., Jampani, M., Sellahewa, U., Panjwani, S., & Amarnath, G. (2023). Predicting Malaria Prevalence with Machine Learning Models Using December 2023 Colombo, Sri Lanka.
[9] Lee, Y. W., Choi, J. W., & Shin, E.-H. (2021). The machine learning model for predicting malaria using clinical information. Computers in Biology and Medicine, 129, 104151.
[10] Oladipo, H. J., Tajudeen, Y. A., Oladunjoye, I. O., Yusuff, S. I., Yusuf, R. O., Oluwaseyi, E. M., AbdulBasit, M. O., Adebisi, Y. A., & El-Sherbini, M. S. (2022). Increasing challenges of malaria control in sub-Saharan Africa: Priorities for public health research and policymakers. Annals of Medicine and Surgery, 81(104366).
[11] Popkin, Z. R., Seth, M. D., Madebe, R. A., Rule Budodo, Bakari, C., Francis, F., Dativa Pereus, Giesbrecht, D. J., Mandara, C. I., Mbwambo, D., Aaron, S., Abdallah Lusasi, Lazaro, S., Bailey, J. A., Juliano, J. J., Gutman, J. R., & Ishengoma, D. S. (2023). Malaria species prevalence among asymptomatic individuals in four regions of Mainland Tanzania. MedRxiv (Cold Spring Harbor Laboratory).
[12] Sato, S. (2021). Plasmodium—a Brief Introduction to the Parasites Causing Human Malaria and Their Basic Biology. Journal of Physiological Anthropology, 40(1).
[13] Stavropoulos, G., Voorstenbosch, R. van, Schooten, F.-J. van, & Smolinska, A. (2020). Random Forest and Ensemble Methods. Elsevier EBooks, 661–672.
[14] Takken, W. (2021). The mosquito and malaria. Routledge EBooks, 109–122.
[15] Trampuz, A., Jereb, M., Muzlovic, I., & Prabhu, R. M. (2003). Clinical review: Severe Malaria. Critical Care, 7(4), 315.
[16] WHO. (2024). Malaria. WHO | Regional Office for Africa.
[17] Cunningham, P., & Delany, S. J. (2007, April 27). k-Nearest neighbor classifiers. ResearchGate; Association for Computing Machinery.
[18] Kazeem, I., & Adebanji, S. (2021, November 22). A model for predicting malaria outbreak using machine learning technique. ResearchGate; Scientific Annals of Computer Science.
[19] World. (2023, December 4). Malaria. Who.int; World Health Organization: WHO.
[20] Owoko, L. (2024, June 11). Kenya’s child malaria deaths fall three-fold on campaigns. Business Daily; Business Daily.
Cite This Article
  • APA Style

    Muriithi, D., Lumumba, V. W., Okongo, M. (2024). A Machine Learning-Based Prediction of Malaria Occurrence in Kenya. American Journal of Theoretical and Applied Statistics, 13(4), 65-72. https://doi.org/10.11648/j.ajtas.20241304.11

    Copy | Download

    ACS Style

    Muriithi, D.; Lumumba, V. W.; Okongo, M. A Machine Learning-Based Prediction of Malaria Occurrence in Kenya. Am. J. Theor. Appl. Stat. 2024, 13(4), 65-72. doi: 10.11648/j.ajtas.20241304.11

    Copy | Download

    AMA Style

    Muriithi D, Lumumba VW, Okongo M. A Machine Learning-Based Prediction of Malaria Occurrence in Kenya. Am J Theor Appl Stat. 2024;13(4):65-72. doi: 10.11648/j.ajtas.20241304.11

    Copy | Download

  • @article{10.11648/j.ajtas.20241304.11,
      author = {Dennis Muriithi and Victor Wandera Lumumba and Mark Okongo},
      title = {A Machine Learning-Based Prediction of Malaria Occurrence in Kenya
    },
      journal = {American Journal of Theoretical and Applied Statistics},
      volume = {13},
      number = {4},
      pages = {65-72},
      doi = {10.11648/j.ajtas.20241304.11},
      url = {https://doi.org/10.11648/j.ajtas.20241304.11},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajtas.20241304.11},
      abstract = {For many years’ malaria has been a health public concern in Kenya as well as many parts of Africa and other parts of the world. The purpose of this study is to develop and evaluate a supervised machine learning model to predict malaria occurrence (final malaria test results) in Kenya. The study investigated twelve predictor variables on the outcome variable (malaria test results), where five machine learning models namely; k-nearest neighbors, support vector machines, random forest, tree bagging, and boosting, were estimated. During the model evaluation, random forest emerged as the best overall model in the classification and prediction of final malaria test results. The model attained a higher classification accuracy of 97.33%, sensitivity of 71.1%, specificity of 98.4%, balanced accuracy of 84.7% and an area under the curve of 98.3%. From the final model, the presence of plasmodium falciparum emerged most important feature, followed by region, endemic zone and anemic level. The feature with the least importance in predicting final malaria test results was having mosquito nets. In conclusion, employing Machine learning algorithms enhances early detection, optimizing resource allocation for interventions, and ultimately reducing the incidence and impact of malaria in the Kenya. The study recommends allocation of resources and funds to areas with the presence of plasmodium falciparum, region susceptible to malaria, endemic zones and anemic prone areas.
    },
     year = {2024}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - A Machine Learning-Based Prediction of Malaria Occurrence in Kenya
    
    AU  - Dennis Muriithi
    AU  - Victor Wandera Lumumba
    AU  - Mark Okongo
    Y1  - 2024/08/20
    PY  - 2024
    N1  - https://doi.org/10.11648/j.ajtas.20241304.11
    DO  - 10.11648/j.ajtas.20241304.11
    T2  - American Journal of Theoretical and Applied Statistics
    JF  - American Journal of Theoretical and Applied Statistics
    JO  - American Journal of Theoretical and Applied Statistics
    SP  - 65
    EP  - 72
    PB  - Science Publishing Group
    SN  - 2326-9006
    UR  - https://doi.org/10.11648/j.ajtas.20241304.11
    AB  - For many years’ malaria has been a health public concern in Kenya as well as many parts of Africa and other parts of the world. The purpose of this study is to develop and evaluate a supervised machine learning model to predict malaria occurrence (final malaria test results) in Kenya. The study investigated twelve predictor variables on the outcome variable (malaria test results), where five machine learning models namely; k-nearest neighbors, support vector machines, random forest, tree bagging, and boosting, were estimated. During the model evaluation, random forest emerged as the best overall model in the classification and prediction of final malaria test results. The model attained a higher classification accuracy of 97.33%, sensitivity of 71.1%, specificity of 98.4%, balanced accuracy of 84.7% and an area under the curve of 98.3%. From the final model, the presence of plasmodium falciparum emerged most important feature, followed by region, endemic zone and anemic level. The feature with the least importance in predicting final malaria test results was having mosquito nets. In conclusion, employing Machine learning algorithms enhances early detection, optimizing resource allocation for interventions, and ultimately reducing the incidence and impact of malaria in the Kenya. The study recommends allocation of resources and funds to areas with the presence of plasmodium falciparum, region susceptible to malaria, endemic zones and anemic prone areas.
    
    VL  - 13
    IS  - 4
    ER  - 

    Copy | Download

Author Information