MACHINE LEARNING, MACHINE BIAS: A SYSTEMATIC SURVEY ON ML-BASED FLOOD MODEL PREDICTION by CLIO TSAO A THESIS Presented to the Department of Math and Computer Science and the Robert D. Clark Honors College in partial fulfillment of the requirements for the degree of Bachelor of Science November 2025 2 An Abstract of the Thesis of Clio Tsao for the degree of Bachelor of Science in the Department of Math and Computer Science to be taken November 2025 Title: Machine Learning, Machine Bias: A Systematic Survey on ML-Based Flood Model Prediction Approved: Daniel Lowd, Ph.D., Associate Professor Primary Thesis Advisor Every year, floods devastate communities with great social and economic costs. Flood modeling can allow for greater anticipation and response to these natural disasters. Artificial intelligence (AI) and machine learning (ML) are rapidly evolving fields with increasing capacities, holding potential for use in processing historical data to create more effective flood model predictions. However, with ML models, the risk of bias remains ever prevalent when considering its application. In this systematic survey, I examine recent studies surrounding the applications of ML-based flood prediction and bias through four central guiding questions regarding the most prevalent bias concerns, debiasing methods, and the efficacy of these methods, as well as the suggested direction for future research. Studies tend to focus on a technical definition and address of bias, seeking to mitigate bias in datasets and models. Future research suggests focusing on improving generalizability, expanding datasets, and integrating different models to improve performance. However, there exists a gap proposing concrete policy actions, as well as discussion of ethics-based handling of bias beyond a technical definition. Nonetheless, ML-based flood models still hold strong potential for enhancing existing flood prediction frameworks. 3 Acknowledgements I am grateful for all the support that I have from my thesis committee and the Clark Honors College community throughout all the twists and turns of the thesis process. Thank you to my Primary Thesis Advisor, Daniel Lowd, for providing guidance and direction from the early stages of this project. Thank you to my CHC Representative, Trond Jacobsen, for instruction on the many facets that go into a thesis. Thank you to Miriam Alexis Castellón Jordan, for coordinating all the logistics behind this thesis process. Thank you to my CHC Advisor, Kristen Rahilly, for addressing my scheduling questions and concerns. And thank you to my family and friends for lending a listening ear and providing a patient presence. I am truly lucky to have such a wonderful community. 4 Table of Contents Introduction 6 Question and Methods 8 Motivation 8 Research Methods 8 Research Questions 9 Search Strategy 10 Search Resources 10 Search String 10 Inclusion and Exclusion Criteria 11 Results 14 RQ1: Bias Concerns 14 RQ2: Debiasing Methods 16 RQ3: Efficacy of Methods 18 RQ4: Future Research 19 Discussion 21 Limitations and Future Research 24 Conclusions 25 Bibliography 26 Supporting Materials Slides: Thesis Prospectus Oral Presentation 5 List of Tables Table 1: Publication channel and number of selected papers. ...................................................... 11 Table 2: Types of data extracted from studies .............................................................................. 12 Table 3: Quality assessment values .............................................................................................. 13 Table 4: Data summary of bias concerns ...................................................................................... 14 Table 5: Data summary of debiasing methods .............................................................................. 16 Table 6: Data summary of future research .................................................................................... 19 6 Introduction Artificial intelligence (AI) is a rapidly evolving field, driven by machine learning (ML), algorithms that learn from data without explicit programming. ML can process large datasets, with applications in summarizing data, analyzing trends, and generating reports for all kinds of fields. However, as researchers explore new technical opportunities, practical applications introduce critical ethical considerations. Fairness, bias, and accountability are issues being defined in the discourse over what it means to create ethical AI systems [29] [42] [43] [39] [33], as well as considerations for data privacy, sovereignty, and costs. This led to the focus of this thesis on the ethical use of ML technology in addressing bias. I first began by searching for existing literature reviews on the topic of interest, which was machine learning and bias. They focus on identifying biases commonly addressed in research, finding that many studies examine the impact of race and gender bias in hiring, criminal justice, and healthcare applications; ML algorithms can perpetuate existing inequalities, even leading to feedback loops [48] [49]. The preliminary search identified gaps in research directed towards disability, religious, and political bias, as well as the influence of machine bias in climate and environmental modeling. Data bias, such as collection gaps in historical data can influence how models are developed and used for specific communities in climate adaptation or disaster prediction with AI. When decision makers use these predictions to inform policy implementations, which groups are disenfranchised? This question guided me to narrow down the research focus to studies regarding ML-based predictive models for flood prediction and bias. Flooding devastates communities with great social and economic costs, and climate change has only exacerbated the unpredictability and impact of these floods [27]. Thus, leveraging machine learning advancements to anticipate and ameliorate flood disasters is a 7 worthwhile field of research that serves communities around the world. ML has been used in historical flood modeling, but recent research in ML-based methods offer advantages to traditional methods [8]. Traditional physics-based models face limitations such as the need for extensive hydro-geomorphological monitoring datasets, which may be difficult to collect and update [11]. The complexity of these traditional models also means that real-time flood prediction is infeasible, especially with limited computational resources [5]. ML-based models can process data with higher variation and peaks in a variety of sources [12], and enhance existing flood prediction by integrating real-time data, allowing for greater live forecast capacities and early warning systems [28]. Thus, the advantages of using AI in flood prediction over traditional physics-based models include improved data processing capacities, reduced need for highly specialized technical expertise, and the potential for real-time response and warning applications. With greater, more accurate computational potential, ML-based flood prediction can aid in mitigating flood impacts through functions like identification of vulnerabilities, early warning systems, advising of resource allocation, and beyond. Nonetheless, data bias, model bias, and other ethical concerns such as data privacy endure in this application of ML; it is crucial for researchers to critically engage with what that means for the models they develop and the most vulnerable populations it may potentially impact. There are barriers to using AI for informing decision making and risk management, including inflated expectations leading to overestimating the performance of AI and overreliance with a lack of critical human examination of implications before turning to action. Another risk is the black-box nature of ML, which makes it difficult for users to examine and understand the results generated [14]. Thus, additional inspection is imperative for identifying how we can use ML-based flood prediction techniques ethically. 8 Question and Methods Motivation Without any warning, floods can cause devastation and displacement of vulnerable communities. The implementation of timely policies informed by predictive models could allow for preemptive evacuation or reinforcement against the disaster. On the other hand, should predictive models fail, it would decrease trust in future predictions or limit the community’s ability to react in advance. By addressing bias in predictive models, researchers could produce more cohesive and effective forecasts, which may in turn help lessen the social and economic costs of floods and other natural disasters. The potential applications of ML in climate and disaster predictions could serve communities in informing preemptive action against natural disasters, but social and technical biases such as data gaps or poor generalizability still pose risks to utilizing this technology effectively. This review aims to summarize current research, identify gaps in literature, and collective a cohesive view on the next step for policy action and research. While regulations can rarely keep up with the rate at which modern technology and information develop, a clear internal ethical guideline for handling bias, informed by the current state of research, would allow researchers to proceed with an informed and conscientious baseline. Research Methods The goal of this systematic literature review (SLR) is to perform exploratory research on how ML-based flood prediction models, with a specific focus on how researchers address bias in the models. To achieve this goal, I follow the guidelines suggested for the major steps of a SLR as outlined in a guideline by Kitchenham and Charters (2004) and referenced the methodology of 9 other literature review studies published on machine learning topics [19] [20] [29]. The first step is planning the review, which contains the relevant stages of 1) identification of the need for a review, 2) specifying the research question(s), and 3) developing a review protocol. The next step is conducting the review, which has the associated stages of 1) identification of research, 2) selection of primary studies, 3) study quality assessment, 4) data extraction and monitoring, and 5) data synthesis. Research Questions I established four guiding research questions: RQ1: What bias concerns are raised regarding the use of machine learning in flood prediction models? RQ2: What are the most frequently implemented methods for debiasing flood prediction models? RQ3: Based on the findings from RQ1 and RQ2, how closely do proposed solutions address key ethical concerns in the research and development of ML-based flood prediction models? RQ4: What are the recommended directions of future research for addressing bias in ML- based flood prediction modeling? By addressing these main research questions, I aim to gain deeper insight into the current trajectory of machine learning and flood prediction modeling, as well as synthesize potential next steps for addressing bias during applications of ML-based flood modeling. 10 Search Strategy Search Resources I searched for articles in the following three digital libraries: IEEE Xplore, Science Direct, and Springer Nature Link. I chose these publication channels as well-established platforms for peer reviewed journals and conference papers in fields including computer science and climate science. For each channel, I searched for the target search terms using their “Advanced Search” option and filtered according to my inclusion criteria, outlined in a later section. I used Zotero to save and manage the library of relevant literature. Search String For the scope of this literature review, I wanted to focus specifically on studies regarding the topic of disaster prediction, specifically flood prediction, and machine learning, with discussion on addressing or mitigating machine bias when using AI predictive models. Thus, after implementing some pilot searches, I refined the search term as: (“flood* prediction” OR “flood* model*”) AND “machine learning” AND “data bias” I use the wildcard character (*) with the phrase “flood*” to capture any variations of “flood”, “floods”, “flooding”; tying it to “prediction” or “model*” returns results that more specifically contain the target phrase in the context of predicting floods, the natural disaster, thereby reducing false positives of the search term. (This was quite crucial as simply “flood” can result in false positives such as “information flooding a system”.) Using (AND) with “machine learning” as well as “data bias” filters more specifically for studies that include machine learning techniques and discuss data bias. This allowed me to cast a search net with more precise filtering without overly restrictive target search terms. 11 Inclusion and Exclusion Criteria I defined the criteria to include or exclude a paper in this review. I used these criteria to carefully screen each result by title and abstract to remove ones that were not relevant to the research question. Inclusion criteria: • The study is written in English. • The study is published in a peer-reviewed publication channel: IEEE Xplore, Springer Nature Link, or Science Direct. • The study was published between 2020 – 2025. • The study is related to flood prediction and bias in machine learning predictive models. Exclusion criteria: • The study is not relevant to any research questions. • The full text of the study is not available in the search database and accessible using institutional accounts. • The study is related to disaster prediction but not machine learning. • The study is related to machine learning but not to disaster prediction. • The study in a language other than English. • The study is not identified as peer reviewed. • The study is duplicated. After screening, I determined 26 studies to include in the primary analysis of the SLR, as documented in Table 1. Table 1: Publication channel and number of selected papers. PUBLICATION CHANNEL SELECTED PAPERS Ieee Xplore 15 Science Direct 9 Springer Nature Link 2 Total 26 12 Data Collection I documented extracted data from the final selected papers for review in an Excel spreadsheet. Table 2 details the types of data recorded. Table 2: Types of data extracted from studies TYPE DATA Standard Information Title, authors, DOI, publication year, publication or conference name, paper type, reason for exclusion (if applicable) Research Questions RQ1: bias concerns, RQ2: debiasing methods, RQ3: efficacy of methods, RQ4: future research Machine Learning Related ML model(s), evaluation metrics, recommended model/application, data period, data source Flood Modeling Related Region, risk/prediction factor Study Quality Assessment Values for: data sourcing, data validation, model selection, model validation, bias assessment, generalizability After all the studies included in the primary analysis were reviewed, the final findings were summarized and documented in a separate table to identify the major themes among each research question. 13 I also performed quality assessment of the reviewed studies, developing a set of quality assessment criteria [25]. Table 3: Quality assessment values QUALITY CATEGORIES HIGH (3) MEDIUM (2) LOW (1) Data Sourcing Data sources clearly documented and critically assessed, appropriate temporal and spatial resolution Data sources partially documented and assessed or lack of appropriate temporal and spatial resolution Data source description entirely lacking Data Validation Data gaps/outliers are addressed and handled appropriately, data sets are cleaned Data validation has been somewhat performed No data validation Model Selection ML model is clearly described and justified ML model is somewhat described and justified No discussion of ML model or rationale for selection Model Validation Model validation is thoroughly performed and justified Model validation somewhat performed No data validation Bias Assessment Biases and potential causes are defined and discussed Some definition and discussion of biases and potential causes No discussion of biases and potential causes Generalizability There is a proper use- case to evaluate results, results are general enough to be expanded to other situations Minimal use-case for testing, results are somewhat generalizable Highly specific results constrained to the study, lacking a use- case to evaluate the results Based on the values they receive in each category, I assigned each study a final quality rating, which is the average of the applicable categories. 14 Results RQ1: Bias Concerns What bias concerns are raised regarding the use of machine learning in flood prediction models? Table 4: Data summary of bias concerns BIAS CONCERNS COUNT Insufficient data, limited monitoring 8 Model bias 5 Data imbalance 4 Data privacy 2 Computational constraints 2 Algorithmic transparency, accountability 1 Unequal access to climate prediction tools 1 Interpretability barriers 1 Overburdening of certain socio-demographic populations 1 The major bias concerns raised in the reviewed studies include data imbalance, data privacy, bias from insufficient data or limited monitoring, and model bias. ML-based flood prediction models are prone to underestimating the upper ranges of flood occurrences due to the data imbalance where extreme flood occurrences have less recorded instances. Additionally, computational constraints are an area of concern limiting the effectiveness of ML-based flood prediction models. Issues of algorithmic transparency and accountability, unequal access to climate prediction tools, interpretability barriers, and overburdening of certain socio- 15 demographic populations are also mentioned, although with much less frequency. This bias concern relates to failures in the representativity of data and its impact on vulnerable populations. For example, crowdsourced data has data biases related to systemic exclusions of certain populations from uneven coverage and limited access to internet services; these “digitally invisible populations” may be misgauged or overlooked [9] and high-risk regions with fewer data points may be classified as lower risk [34]. Most studies discuss bias in a technical perspective, focusing on algorithmic methods that can allow for the mitigation of bias in the data or the model. Definitions of bias with more societal impact focus are lacking; some papers only discuss it as a passing model metric (percent bias value), listed purely as a technical term. Reviews that I reference beyond my primary SLR analysis tend to provide a greater ethics-based perspective. The focus of ethics-based perspective papers, discuss concepts like overburdening vulnerable populations [24] in more depth compared to the technical explorations of ML for climate modeling. Lack of discussion in literature suggests that there is an insufficient framework for defining and critically addressing bias when it comes to ML-based flood modeling. 16 RQ2: Debiasing Methods What are the most frequently implemented methods for debiasing flood prediction models? Table 5: Data summary of debiasing methods DEBIASING METHODS COUNT Statistical bias correction 8 Model development 5 Data augmentation 4 Crowdsourced/open-source data 3 Federated learning 1 Post-hoc interpretation methods 1 The main debiasing methods utilized by studies in this review fall under the categories of data augmentation, statistical bias correction (e.g., BCSD, downscaling, Multivariate Hawkes process), and model development. Use of crowdsourced and open-source data, federated learning, and other post-hoc interpretation methods are also mentioned as a potential method to address bias. Specifically, data augmentation methods such as SMOTE, undersampling, and oversampling are frequently utilized to address bias stemming from data imbalances, where specific classes of data ranges are underrepresented. Statistical bias correction is used to account for systemic bias in the model predictions. Oversampling, which includes techniques like SMOTE (Synthetic Minority Over- sampling Technique), involve selecting an instance in the minority class, identifying its k-nearest neighbors, and generating synthetic samples that are in a similar range to that sample of the minority class. This rebalances the data and minimizes bias towards the minority class. Undersampling, on the other hand, randomly removes instances from the majority class, 17 achieving a similar effect, although oversampling tends to yield more accurate models than undersampling [2]. Other methods under the category of “model development” include integrating different ML-models to augment their performance and improve their performance against key metrics like accuracy, precision, ROC, and percent bias. Ensemble machine learning techniques combine multiple models to reduce the bias of individual models, minimize overfitting, and enhance overall model robustness. Random Forest (RF) is a technique where the algorithm creates multiple decision trees based on random parts of the data, picks random features so that each tree is different, and combines the predictions that each tree produces. It is one of the most frequently used ensemble techniques, outperforming single models, and helps with identifying key features and avoiding overfitting [1] [2] [3] [7] [14] [23] [32] [40] [46]. Presence-only models are another alternative to dealing with data gaps since they only focus on locations where flood data has been observed, thus circumventing the issue of conflating lack of historical data with true non-flood zones [3] [10]. Federated learning is one method proposed to mitigate data privacy concerns [6] by training a model on multiple clients, keeping their data decentralized. Furthermore, using crowdsourced data to address areas with historical data gaps or limited monitoring may allow for more data coverage of otherwise data-sparse areas, though the risk of overlooking digitally invisible populations remains [9]. 18 RQ3: Efficacy of Methods Based on the findings from RQ1 and RQ2, how closely do proposed solutions address key ethical concerns in the research and development of ML-based flood prediction models? As summarized in the previous section, many of the studies focused on technical definitions of bias, and the methods they employed to mitigate these biases were also technical and algorithmic based. The high frequency of data bias and corresponding data augmentation methods align effectively. Ensemble methods and statistical bias correction aid with reducing bias from systemic bias in individual models. Concerns of data privacy are also addressed with methods like federated learning. However, focusing on mitigating data bias may not actually be the most valuable direction of reducing burden on vulnerable communities [24]. The greatest challenge remains for researchers to audit for and mitigate representational bias. This may require more field studies of affected populations, as well as cross-border collaboration and management of data collection. Furthermore, studies propose use of explainable AI (XAI) to address concerns of hidden bias due to the “black-box” nature of ML [14], aiding to establish greater trust in AI systems. However, XAI frameworks are not widely implemented or developed in the studies that focus on technical methods for debiasing. Thus, even though proposed and tested methods for addressing bias are successful, they are still not always applied, nor is there a cohesive method for integrating all methods into ML-based flood prediction models for widespread, organized applications. Studies still frequently mention limited knowledge about representational bias as a limitation and direction for future research. 19 RQ4: Future Research What are the recommended directions of future research for addressing bias in ML-based flood prediction modeling? Table 6: Data summary of future research FUTURE RESEARCH COUNT Generalizability 9 Expanding datasets 9 Compare against alternatives 8 Integration (of other techniques, models) 5 Validation/Follow-up monitoring 4 Scaling model application/capacity 3 Improve interpretability 2 Enhancement/Further development 2 Qualitative research of communities 1 Most further improvements fall under the broad categories of integrating models or methods, expanding datasets, and improving generalizability. Many suggestions for future research focus on improving the performance and accuracy of the prediction model, such as expanding the generalizability of models by testing them with different data or in different climate conditions, collecting more data or implementing greater monitoring to expand datasets for model training, and integration of machine learning with other models to improve performance. Another important direction for future research is to improve interpretability and transparency of ML-based models. This includes continued development of explainable (XAI) in 20 future research, such as exploring how XAI tools can identify inherent data bias that causes model inaccuracies, comparing models with XAI tools to check for consistency, and addressing sources of discrepancies in XAI results compared to historical models [14]. This will allow future model developments to identify if different predictions from ML-based models arise from model bias, data bias, or discovery of new patterns. Furthermore, studies call for additional research to audit for representational bias, such as socio-economic bias. This includes auditing existing and novel data sources, such as geospatial data [13] or crowdsourced data channels [11] to further improve training datasets. How are existing models transferable to underrepresented areas? From the topics that studies suggest for future research, there exists a consciousness of broader issues like interpretability, scalability, and impact of local communities. However, significant continual research efforts are still required to better identify and address them. 21 Discussion From the results of the SLR, a summary of key findings is that: • Most studies focus on bias concerns of data imbalance, data privacy, insufficient data or limited monitoring, and model bias. • Frequently implemented techniques for debiasing are data augmentation, statistical bias correction, and model development. • Studies focus on technical sources of bias and debiasing, with less discussion of ethical considerations and bias definitions. While the efficacy of technical debasing methods is high for technical bias, it is lower for representational biases. • Future research directions include improving generalizability, expanding datasets for training, further improving model performance, and increasing model interpretability. • There is strong potential for effective applications of ML-based technologies for flood prediction and recommendation for further research, but direct policy suggestions are minimal. With flood modeling research, it is key that the studies can translate into something that can aid policymakers and other relevant decision makers take meaningful action towards mitigating negative flood impacts. However, there exists a gap among the reviewed studies in recommending some sort of governance policy or action plan based on their findings. Many of the studies conclude with a recommendation for flood policy experts to leverage their findings or technology to make more informed decisions for flood response but lack direct actionable suggestions. This may be because of the knowledge gap between subject area expertise. What is the recommended governance, policy action, or real-world response given the results of ML-based flood prediction modeling? Studies state recommendations to use their 22 predictions to guide development and infrastructure away from high-risk zones [31]. However, this may still be insufficiently helpful to decision makers because they still lack information on how to define these zones. One way to address this gap could be to include guidelines for response to model predictions. For example, what should be the response policy for a “high” flood risk but “low” flood impact? What about “low” flood risk but “high” flood impact? Or something in the middle? What predictions fall under each category of flood risk? Management of resource allocation, another frequently mentioned way to leverage flood model predictions for serving communities [9] [34], could also use similar collaboration with experts knowledgeable about natural and human resource management. For example, the risk of data imbalances where models underestimate upper flood values is that policymakers underestimate the impact of predicted floods and react insufficiently. If there is lower perceived risk than real risk, there would be inadequate response. Therefore, it would be key for model reports to highlight the chance for models to underestimate upper ranges and provide a metric that details the possible “high end” of flood risk. Addressing these data imbalances with data augmentation faces limitations as well. Data augmentation methods call for strata or classes of data ranges to be defined, which requires sufficient prior information to identify the ranges as well as proficient understanding to properly identify the extreme ranges. For example, when models are validated with methods such as stratified k-fold cross validation, which splits the dataset into k subsets, trains it on k-1 subsets, and tests it against the kth subset. This process is iterated and averaged over k iterations to estimate generalization ability of the model; stratification allows the original distribution of each class to be retained in each of these k subsets. This is an effective method for evaluating a model, particularly with imbalanced data. However, the application of this type of validation hinges on 23 knowing the ranges. With shifting climate conditions and precipitation patterns that deviate from historical norms, it may become more difficult to accurately stratify data ranges. Another key point for consideration for studies regarding AI fairness, bias, and accountability is for researchers to critically define what kind of fairness they aim for through debiasing. With certain algorithms, sufficient data diversity can mitigate undesirable side effects of varied data [43]. Thus, increasing data sources and collection to gain diverse dataset may help promote more “fair” ML outputs. At the same time, the definition of “fairness” holds certain assumptions about a system. The two main conflicting definitions of fairness are individual fairness and group fairness. Individual fairness assumes that individuals who are similar should be treated similarly; individuals should receive consistent and equal treatment based on their features, regardless of what demographic group they belong to. Group fairness, or non- discrimination, assumes that demographic groups should be treated similarly as a whole; protected groups should receive equal outcomes to address historical and structural biases [42]. Both definitions are valid and applicable in different cases, but the adoption of a definition should be a conscious choice, which defines the assumption that goes into the core of a model’s worldview. When debiasing a model, it should also be defined what kind of standard for fairness is being used. Is the goal individual fairness or group fairness? This contradiction may help explain why papers “fail to engage critically with what constitutes ‘bias’ in the first place” [29]. The ML research community has worked to identify strategies to mitigate unwanted bias and promote fairness in the outputs of ML flood models. However, researchers are still working to fill a gap with regards to defining and articulating the assumptions that ML systems inherently encode from its creator(s) and its limitations. Nonetheless, ML methods can complement and enhance existing flood prediction models. 24 Limitations and Future Research Due to financial and time constraints, I only assessed studies that were open access or accessible to students with institutional accounts from three publication channels. Thus, this review may omit relevant studies with restricted access. Thus, while many of the papers included in this review focus on technical bias, this may be due to the limited scope of my search string. However, my findings align with the results of other systematic literature reviews in similar fields and corroborate that there are still studies that lack a critical discussion of bias. Additionally, this review focuses on recent studies to track the most recent developments in the application of ML-based flood modeling. However, older applications of ML exist. For additional in-depth future reviews, it may be valuable to compare older publications to compare the evolution of the field. Another future research topic may be to investigate how ethical uses of ML are regulated in other well-established fields to gain a better interdisciplinary understanding of ML use and draw from shared terminology and regulatory policies. Additionally, another crucial consideration when it comes to application of ML-based technologies is whether the computational demands of AI would put additional strain on certain communities despite serving another. This ethical decision for AI use impacts vulnerable communities. One area I would like to investigate beyond the scope of this thesis is the possibility of using AI models in a way conscientious of the demands that powering AI can put on power grids and water infrastructures of communities. Does this put unfair demands on certain communities just to provide benefits to others? Based purely on resources spent and resources saved, for example energy, would it be possible to offset the net cost? Even if there is a net gain, where would that allocation and distribution be and who benefits? Systems analysis research could provide insight into these questions. 25 Conclusions Bias is inherent in AI systems. Following best practices to debias data and models can allow for better predictions, but it may be impossible to eliminate bias from any human-based system. Thus, it remains imperative that researchers and policymakers work together to define protocols that acknowledge the existence of biases and make the most of the valuable insight that AI can provide while maintaining human judgment. Gaps between technical ML research and ethical governance policy are still apparent in the way that bias is discussed and addressed in the studies reviewed. These gaps signals areas for decision making and suggest a need for greater interdisciplinary committees to collaborate and determine ethical policies. Effective risk management and decision-making techniques when faced with uncertainty are at the core of developing a strategy that maximizes the utility of ML-based flood predictions and minimizes the shortcomings of inherent biases. After all, an algorithm can generate a near-perfect model, but it takes the understanding of human interactions to bring real change to a community. 26 Bibliography [1] S. A. Rufus, N. A. Ahmad, N. Abdullah, and Z. Abdul-Malek, “A Comparative Analysis Using Machine Learning Approach for Thunderstorm Prediction in Southern Region of Peninsular Malaysia,” in 2023 International Symposium on Lightning Protection (XVII SIPDA), Oct. 2023, pp. 1–6. doi: 10.1109/SIPDA59763.2023.10349193. [2] Md. A. Rahman, A. Akter, F. S. Richi, A. Shoud, and T. Ahmed, “A Comparative Study of Undersampling and Oversampling Methods for Flood Forecasting in Bangladesh using Machine Learning,” in 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), Jul. 2023, pp. 1–7. doi: 10.1109/ICCCNT56998.2023.10306368. [3] A. Haghizadeh, R. Fathiganji, E. Sohrabi, A. Lotfi, and L. Ghasemi, “A framework for flood risk zoning and prioritization combining maximum entropy and game theory,” Sci Rep, vol. 15, no. 1, p. 24153, Jul. 2025, doi: 10.1038/s41598-025-08220-x. [4] L. Addison, A. Hosang, T.-A. Tuitt, K. Manohar, and P. Hosein, “A LLM-Based Platform for Flood Risk Education and Weather Alerts in SIDS,” in 2024 IEEE International Conference on Technology Management, Operations and Decisions (ICTMOD), Nov. 2024, pp. 1–6. doi: 10.1109/ICTMOD63116.2024.10878134. [5] W. Al-Sabhan, M. Mulligan, and G. A. Blackburn, “A real-time hydrological model for flood prediction using GIS and the WWW,” Computers, Environment and Urban Systems, vol. 27, no. 1, pp. 9–32, Jan. 2003, doi: 10.1016/S0198-9715(01)00010-2. [6] S. H. Mahir et al., “Advanced Hydro-Informatic Modeling Through Feedforward Neural Network, Federated Learning, and Explainable AI for Enhancing Flood Prediction,” IEEE Open Journal of the Computer Society, vol. 6, pp. 726–738, 2025, doi: 10.1109/OJCS.2025.3556424. [7] S. Karim, T. S. Nova, and S. Tasneem, “Advancing Rainfall Seasonality Detection via Bias Correction and Spatial Disaggregation (BCSD) with CHIRPS-v2 for Climate Modeling in Bangladesh,” in 2025 International Conference on Electrical, Computer and Communication Engineering (ECCE), Feb. 2025, pp. 1–6. doi: 10.1109/ECCE64574.2025.11013178. [8] P. Wang, X. Wu, and Yichen, “Ai-driven approaches to flood risk management: overcoming data bias and enhancing decision-making,” Climate Risk Management, vol. 50, p. 100752, Jan. 2025, doi: 10.1016/j.crm.2025.100752. [9] N. Coleman, A. Clarke, M. Esparza, and A. Mostafavi, “Analyzing common social and physical features of flash-flood vulnerability in urban areas,” International Journal of Disaster Risk Reduction, vol. 122, p. 105437, May 2025, doi: 10.1016/j.ijdrr.2025.105437. [10] M. El Baida, F. Boushaba, M. Chourak, and M. Hosni, “Are Presence-Only Machine Learning Models Reliable Enough for Hazard Mapping? Insights from Flood Hazard in the Maghreb,” in 2025 International Conference for Artificial Intelligence, Applications, Innovation and Ethics (AI2E), Feb. 2025, pp. 1–4. doi: 10.1109/AI2E64943.2025.10982848. [11] Z. Liu, N. Coleman, F. I. Patrascu, K. Yin, X. Li, and A. Mostafavi, “Artificial intelligence for flood risk management: A comprehensive state-of-the-art review and future directions,” International Journal of Disaster Risk Reduction, vol. 117, p. 105110, Feb. 2025, doi: 10.1016/j.ijdrr.2024.105110. [12] F.-J. Chang, L.-C. Chang, and J.-F. Chen, “Artificial Intelligence Techniques in Hydrology and Water Resources Management,” Water, vol. 15, no. 10, p. 1846, Jan. 2023, doi: 10.3390/w15101846. [13] C. M. Gevaert, T. Buunk, and M. J. C. van den Homberg, “Auditing Geospatial Datasets for Biases: Using Global Building Datasets for Disaster Risk Management,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 17, pp. 12579–12590, 2024, doi: 10.1109/JSTARS.2024.3422503. https://doi.org/10.1109/SIPDA59763.2023.10349193 https://doi.org/10.1109/ICCCNT56998.2023.10306368 https://doi.org/10.1038/s41598-025-08220-x https://doi.org/10.1109/ICTMOD63116.2024.10878134 https://doi.org/10.1016/S0198-9715(01)00010-2 https://doi.org/10.1109/OJCS.2025.3556424 https://doi.org/10.1109/ECCE64574.2025.11013178 https://doi.org/10.1016/j.crm.2025.100752 https://doi.org/10.1016/j.ijdrr.2025.105437 https://doi.org/10.1109/AI2E64943.2025.10982848 https://doi.org/10.1016/j.ijdrr.2024.105110 https://doi.org/10.3390/w15101846 https://doi.org/10.1109/JSTARS.2024.3422503 27 [14] F. Huang, W. Shangguan, Q. Li, L. Li, and Y. Zhang, “Beyond prediction: An integrated post-hoc approach to interpret complex model in hydrometeorology,” Environmental Modelling & Software, vol. 167, p. 105762, Sep. 2023, doi: 10.1016/j.envsoft.2023.105762. [15] R. C. Hales, G. P. Williams, E. James Nelson, R. B. Sowby, D. P. Ames, and J. L. S. Lozano, “Bias correcting discharge simulations from the GEOGloWS global hydrologic model,” Journal of Hydrology, vol. 626, p. 130279, Nov. 2023, doi: 10.1016/j.jhydrol.2023.130279. [16] A. Ramesh, Z. Gulmira, A. H. Shnain, R. Ramya, P. Nagaveni, and A. S. K, “Big Data and Machine Learning for Climate Change Prediction: An Integrated Approach to Environmental Monitoring,” in 2025 International Conference on Automation and Computation (AUTOCOM), Mar. 2025, pp. 1384–1389. doi: 10.1109/AUTOCOM64127.2025.10956420. [17] F. AlZaatiti, J. Halwani, and M. R. Soliman, “Climate change impacts on flood risks in the Abou Ali River Basin, Lebanon: A hydrological modeling approach,” Results in Engineering, vol. 25, p. 104186, Mar. 2025, doi: 10.1016/j.rineng.2025.104186. [18] E. Wallace, T. Z. Zhao, S. Feng, and S. Singh, “Concealed Data Poisoning Attacks on NLP Models,” Apr. 12, 2021, arXiv: arXiv:2010.12563. doi: 10.48550/arXiv.2010.12563. [19] M. A. Zaidi, “Conceptual Modeling Interacts with Machine Learning – A Systematic Literature Review,” in Computational Science and Its Applications – ICCSA 2021, O. Gervasi, B. Murgante, S. Misra, C. Garau, I. Blečić, D. Taniar, B. O. Apduhan, A. M. A. C. Rocha, E. Tarantino, and C. M. Torre, Eds., Cham: Springer International Publishing, 2021, pp. 522–532. doi: 10.1007/978-3-030-87013-3_39. [20] A. Khakpour and R. Colomo-Palacios, “Convergence of Gamification and Machine Learning: A Systematic Literature Review,” Tech Know Learn, vol. 26, no. 3, pp. 597–636, Sep. 2021, doi: 10.1007/s10758-020-09456-4. [21] B.-C. Jhong, F.-W. Chen, and C.-P. Tung, “Development of a real-time dynamic inundation risk assessment approach on paddy fields during typhoons: Exploration of adaptation strategies and quantification of risks,” Journal of Environmental Management, vol. 380, p. 124981, Apr. 2025, doi: 10.1016/j.jenvman.2025.124981. [22] T. Atmaja and K. Fukushi, “EMPOWERING GEO-BASED AI ALGORITHM TO AID COASTAL FLOOD RISK ANALYSIS: A REVIEW AND FRAMEWORK DEVELOPMENT,” ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. V-3–2022, pp. 517–523, May 2022, doi: 10.5194/isprs- annals-V-3-2022-517-2022. [23] F. Taromideh, R. Fazloula, B. Choubin, M. Masoodi, and A. Mosavi, “Ensemble Machine Learning for Urban Flood Hazard Assessment,” in 2024 IEEE 22nd World Symposium on Applied Machine Intelligence and Informatics (SAMI), Jan. 2024, pp. 000525–000530. doi: 10.1109/SAMI60510.2024.10432902. [24] C. M. Gevaert, M. Carman, B. Rosman, Y. Georgiadou, and R. Soden, “Fairness and accountability of AI in disaster risk management: Opportunities and challenges,” Patterns, vol. 2, no. 11, p. 100363, Nov. 2021, doi: 10.1016/j.patter.2021.100363. [25] K. S. Khan, R. Kunz, J. Kleijnen, and G. Antes, “Five steps to conducting a systematic review,” J R Soc Med, vol. 96, no. 3, pp. 118–121, Mar. 2003, doi: 10.1258/jrsm.96.3.118. [26] B. Kitchenham and S. Charters, “Guidelines for performing systematic literature reviews in software engineering,” Technical report, EBSE Technical Report EBSE-2007-01, 2007. Accessed: Nov. 05, 2025. [Online]. Available: https://docs.edtechhub.org/lib/EDAG684W [27] C. Wasko et al., “Incorporating climate change in flood estimation guidance,” Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 379, no. 2195, p. 20190548, Mar. 2021, doi: 10.1098/rsta.2019.0548. https://doi.org/10.1016/j.envsoft.2023.105762 https://doi.org/10.1016/j.jhydrol.2023.130279 https://doi.org/10.1109/AUTOCOM64127.2025.10956420 https://doi.org/10.1016/j.rineng.2025.104186 https://doi.org/10.48550/arXiv.2010.12563 https://doi.org/10.1007/978-3-030-87013-3_39 https://doi.org/10.1007/s10758-020-09456-4 https://doi.org/10.1016/j.jenvman.2025.124981 https://doi.org/10.5194/isprs-annals-V-3-2022-517-2022 https://doi.org/10.5194/isprs-annals-V-3-2022-517-2022 https://doi.org/10.1109/SAMI60510.2024.10432902 https://doi.org/10.1016/j.patter.2021.100363 https://doi.org/10.1258/jrsm.96.3.118 https://docs.edtechhub.org/lib/EDAG684W https://doi.org/10.1098/rsta.2019.0548 28 [28] N. Fatima et al., “Integrating Machine Learning Models With Probability Distribution Methods for Extreme Flood Risk Assessment,” IEEE Access, vol. 13, pp. 160922–160938, 2025, doi: 10.1109/ACCESS.2025.3598121. [29] S. L. Blodgett, S. Barocas, H. D. III, and H. Wallach, “Language (Technology) is Power: A Critical Survey of ‘Bias’ in NLP,” May 29, 2020, arXiv: arXiv:2005.14050. doi: 10.48550/arXiv.2005.14050. [30] J. Salas, A. Saha, and S. Ravela, “Learning inter-annual flood loss risk models from historical flood insurance claims,” Journal of Environmental Management, vol. 347, p. 118862, Dec. 2023, doi: 10.1016/j.jenvman.2023.118862. [31] R. M. A. Ikram, M. Wang, H. Moayedi, and A. A. Dehrashid, “Management and prediction of river flood utilizing optimization approach of artificial intelligence evolutionary algorithms,” Sci Rep, vol. 15, no. 1, p. 22787, Jul. 2025, doi: 10.1038/s41598-025-04290-z. [32] J. S. Navarro, R. Zhuang, C. Albertini, and S. Manfreda, “Mapping flood susceptibility using Random Forest exploiting satellite observations and geomorphic features,” Science of The Total Environment, vol. 1002, p. 180592, Nov. 2025, doi: 10.1016/j.scitotenv.2025.180592. [33] B. H. Zhang, B. Lemoine, and M. Mitchell, “Mitigating Unwanted Biases with Adversarial Learning,” in Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, New Orleans LA USA: ACM, Dec. 2018, pp. 335–340. doi: 10.1145/3278721.3278779. [34] Z. Zhou and M. Sun, “Multivariate Hawkes Processes for Incomplete Biased Data,” in 2021 IEEE International Conference on Big Data (Big Data), Dec. 2021, pp. 968–977. doi: 10.1109/BigData52589.2021.9672043. [35] E. M. Bender, T. Gebru, A. McMillan-Major, and S. Shmitchell, “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? ���������,” in Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, Virtual Event Canada: ACM, Mar. 2021, pp. 610–623. doi: 10.1145/3442188.3445922. [36] N. Carlini et al., “Poisoning Web-Scale Training Datasets is Practical,” in 2024 IEEE Symposium on Security and Privacy (SP), May 2024, pp. 407–425. doi: 10.1109/SP54263.2024.00179. [37] K. Khaing Kyaw et al., “Private sensors and crowdsourced rainfall data: Accuracy and potential for modelling pluvial flooding in urban areas of Oslo, Norway,” Journal of Hydrology X, vol. 25, p. 100191, Dec. 2024, doi: 10.1016/j.hydroa.2024.100191. [38] N. Rathnayake, U. Rathnayake, I. Chathuranika, T. L. Dang, and Y. Hoshino, “Projected Water Levels and Identified Future Floods: A Comparative Analysis for Mahaweli River, Sri Lanka,” IEEE Access, vol. 11, pp. 8920– 8937, 2023, doi: 10.1109/ACCESS.2023.3238717. [39] J. J. Smith, S. Amershi, S. Barocas, H. Wallach, and J. Wortman Vaughan, “REAL ML: Recognizing, Exploring, and Articulating Limitations of Machine Learning Research,” in 2022 ACM Conference on Fairness, Accountability, and Transparency, Seoul Republic of Korea: ACM, Jun. 2022, pp. 587–597. doi: 10.1145/3531146.3533122. [40] K. He, W. Zhao, L. Brocca, P. Quintana-Seguí, and X. Chen, “SMPD-MERG: A Hybrid Downscaling Model for High-Resolution Daily Precipitation Estimation via Merging Surface Soil Moisture and Multisource Precipitation Data,” IEEE Transactions on Geoscience and Remote Sensing, vol. 63, pp. 1–16, 2025, doi: 10.1109/TGRS.2025.3561253. [41] J. C. Dizon, I. Aryal, and I. Benitez, “Streamflow Prediction of Cañas River Watershed, Cavite, Philippines using Long Short-Term Memory,” in 2024 International Conference on IT Innovation and Knowledge Discovery (ITIKD), Apr. 2025, pp. 1–6. doi: 10.1109/ITIKD63574.2025.11005233. https://doi.org/10.1109/ACCESS.2025.3598121 https://doi.org/10.48550/arXiv.2005.14050 https://doi.org/10.1016/j.jenvman.2023.118862 https://doi.org/10.1038/s41598-025-04290-z https://doi.org/10.1016/j.scitotenv.2025.180592 https://doi.org/10.1145/3278721.3278779 https://doi.org/10.1109/BigData52589.2021.9672043 https://doi.org/10.1145/3442188.3445922 https://doi.org/10.1109/SP54263.2024.00179 https://doi.org/10.1016/j.hydroa.2024.100191 https://doi.org/10.1109/ACCESS.2023.3238717 https://doi.org/10.1145/3531146.3533122 https://doi.org/10.1109/TGRS.2025.3561253 https://doi.org/10.1109/ITIKD63574.2025.11005233 29 [42] S. A. Friedler, C. Scheidegger, and S. Venkatasubramanian, “The (Im)possibility of fairness: different value systems require different mechanisms for fair decision making,” Commun. ACM, vol. 64, no. 4, pp. 136–143, Mar. 2021, doi: 10.1145/3433949. [43] M. Raghavan, A. Slivkins, J. V. Wortman, and Z. S. Wu, “The externalities of exploration and how data diversity helps exploitation,” in Conference on Learning Theory, PMLR, 2018, pp. 1724–1738. Accessed: Apr. 17, 2025. [Online]. Available: http://proceedings.mlr.press/v75/raghavan18a.html [44] T. Tiggeloven et al., “The Role of Artificial Intelligence for Early Warning Systems: Status, Applicability, Guardrails and Ways Forward,” iScience, p. 113689, Oct. 2025, doi: 10.1016/j.isci.2025.113689. [45] L. J. Vlaming, “The Tension Between Modern Technology and the Legal Foundations of Privacy,” Dec. 2017, Accessed: Apr. 17, 2025. [Online]. Available: https://hdl.handle.net/1794/24131 [46] S. A. Rufus, N. A. Ahmad, Z. Abdul-Malek, and N. Abdullah, “Thunderstorm Prediction Model Using SMOTE Sampling and Machine Learning Approach,” in 2023 12th Asia-Pacific International Conference on Lightning (APL), Jun. 2023, pp. 1–5. doi: 10.1109/APL57308.2023.10182046. [47] G. Peery, “Vision Transformers Under Data Poisoning Attacks,” 2023, Accessed: Apr. 17, 2025. [Online]. Available: https://hdl.handle.net/1794/28707 [48] N. Mehrabi, F. Morstatter, N. Saxena, K. Lerman, and A. Galstyan, “A Survey on Bias and Fairness in Machine Learning,” Jan. 25, 2022, arXiv: arXiv:1908.09635. doi: 10.48550/arXiv.1908.09635. [49] E. Ferrara, “Fairness and Bias in Artificial Intelligence: A Brief Survey of Sources, Impacts, and Mitigation Strategies,” Sci, vol. 6, no. 1, p. 3, Mar. 2024, doi: 10.3390/sci6010003. https://doi.org/10.1145/3433949 http://proceedings.mlr.press/v75/raghavan18a.html https://doi.org/10.1016/j.isci.2025.113689 https://hdl.handle.net/1794/24131 https://doi.org/10.1109/APL57308.2023.10182046 https://hdl.handle.net/1794/28707 https://doi.org/10.48550/arXiv.1908.09635 https://doi.org/10.3390/sci6010003