MACHINE LEARNING, MACHINE BIAS:  

A SYSTEMATIC SURVEY ON ML-BASED FLOOD MODEL 

PREDICTION 

 
by 

CLIO TSAO 

 
A THESIS 

 
Presented to the Department of Math and Computer Science  

and the Robert D. Clark Honors College  
in partial fulfillment of the requirements for the degree of  

Bachelor of Science 
 

November 2025 

 
2 
 

An Abstract of the Thesis of 

Clio Tsao for the degree of Bachelor of Science 
in the Department of Math and Computer Science to be taken November 2025 

 
Title: Machine Learning, Machine Bias: A Systematic Survey on ML-Based  
Flood Model Prediction 

 
Approved: Daniel Lowd, Ph.D., Associate Professor  
Primary Thesis Advisor 

 
Every year, floods devastate communities with great social and economic costs. Flood 

modeling can allow for greater anticipation and response to these natural disasters. Artificial 

intelligence (AI) and machine learning (ML) are rapidly evolving fields with increasing 

capacities, holding potential for use in processing historical data to create more effective flood 

model predictions. However, with ML models, the risk of bias remains ever prevalent when 

considering its application. In this systematic survey, I examine recent studies surrounding the 

applications of ML-based flood prediction and bias through four central guiding questions 

regarding the most prevalent bias concerns, debiasing methods, and the efficacy of these 

methods, as well as the suggested direction for future research. Studies tend to focus on a 

technical definition and address of bias, seeking to mitigate bias in datasets and models. Future 

research suggests focusing on improving generalizability, expanding datasets, and integrating 

different models to improve performance. However, there exists a gap proposing concrete policy 

actions, as well as discussion of ethics-based handling of bias beyond a technical definition. 

Nonetheless, ML-based flood models still hold strong potential for enhancing existing flood 

prediction frameworks. 

  
3 
 

Acknowledgements 

 
I am grateful for all the support that I have from my thesis committee and the Clark 

Honors College community throughout all the twists and turns of the thesis process. Thank you 

to my Primary Thesis Advisor, Daniel Lowd, for providing guidance and direction from the early 

stages of this project. Thank you to my CHC Representative, Trond Jacobsen, for instruction on 

the many facets that go into a thesis. Thank you to Miriam Alexis Castellón Jordan, for 

coordinating all the logistics behind this thesis process. Thank you to my CHC Advisor, Kristen 

Rahilly, for addressing my scheduling questions and concerns. And thank you to my family and 

friends for lending a listening ear and providing a patient presence. I am truly lucky to have such 

a wonderful community. 

 
4 
 

Table of Contents 

Introduction 6 
Question and Methods 8 

Motivation 8 
Research Methods 8 
Research Questions 9 
Search Strategy 10 

Search Resources 10 
Search String 10 
Inclusion and Exclusion Criteria 11 

Results 14 

RQ1: Bias Concerns 14 
RQ2: Debiasing Methods 16 
RQ3: Efficacy of Methods 18 
RQ4: Future Research 19 

Discussion 21 
Limitations and Future Research 24 
Conclusions 25 
Bibliography 26 
Supporting Materials  

Slides: Thesis Prospectus Oral Presentation  


5 
 

List of Tables  

Table 1: Publication channel and number of selected papers. ...................................................... 11 
Table 2: Types of data extracted from studies .............................................................................. 12 
Table 3: Quality assessment values .............................................................................................. 13 
Table 4: Data summary of bias concerns ...................................................................................... 14 
Table 5: Data summary of debiasing methods .............................................................................. 16 
Table 6: Data summary of future research .................................................................................... 19 

 
6 
 

Introduction 

Artificial intelligence (AI) is a rapidly evolving field, driven by machine learning (ML), 

algorithms that learn from data without explicit programming. ML can process large datasets, 

with applications in summarizing data, analyzing trends, and generating reports for all kinds of 

fields. However, as researchers explore new technical opportunities, practical applications 

introduce critical ethical considerations. Fairness, bias, and accountability are issues being 

defined in the discourse over what it means to create ethical AI systems [29] [42] [43] [39] [33], 

as well as considerations for data privacy, sovereignty, and costs. This led to the focus of this 

thesis on the ethical use of ML technology in addressing bias. 

I first began by searching for existing literature reviews on the topic of interest, which 

was machine learning and bias. They focus on identifying biases commonly addressed in 

research, finding that many studies examine the impact of race and gender bias in hiring, 

criminal justice, and healthcare applications; ML algorithms can perpetuate existing inequalities, 

even leading to feedback loops [48] [49]. The preliminary search identified gaps in research 

directed towards disability, religious, and political bias, as well as the influence of machine bias 

in climate and environmental modeling. Data bias, such as collection gaps in historical data can 

influence how models are developed and used for specific communities in climate adaptation or 

disaster prediction with AI. When decision makers use these predictions to inform policy 

implementations, which groups are disenfranchised? This question guided me to narrow down 

the research focus to studies regarding ML-based predictive models for flood prediction and bias. 

Flooding devastates communities with great social and economic costs, and climate 

change has only exacerbated the unpredictability and impact of these floods [27]. Thus, 

leveraging machine learning advancements to anticipate and ameliorate flood disasters is a 


7 
 

worthwhile field of research that serves communities around the world. ML has been used in 

historical flood modeling, but recent research in ML-based methods offer advantages to 

traditional methods [8]. Traditional physics-based models face limitations such as the need for 

extensive hydro-geomorphological monitoring datasets, which may be difficult to collect and 

update [11]. The complexity of these traditional models also means that real-time flood 

prediction is infeasible, especially with limited computational resources [5]. ML-based models 

can process data with higher variation and peaks in a variety of sources [12], and enhance 

existing flood prediction by integrating real-time data, allowing for greater live forecast 

capacities and early warning systems [28]. Thus, the advantages of using AI in flood prediction 

over traditional physics-based models include improved data processing capacities, reduced need 

for highly specialized technical expertise, and the potential for real-time response and warning 

applications. With greater, more accurate computational potential, ML-based flood prediction 

can aid in mitigating flood impacts through functions like identification of vulnerabilities, early 

warning systems, advising of resource allocation, and beyond. 

Nonetheless, data bias, model bias, and other ethical concerns such as data privacy 

endure in this application of ML; it is crucial for researchers to critically engage with what that 

means for the models they develop and the most vulnerable populations it may potentially 

impact. There are barriers to using AI for informing decision making and risk management, 

including inflated expectations leading to overestimating the performance of AI and overreliance 

with a lack of critical human examination of implications before turning to action. Another risk 

is the black-box nature of ML, which makes it difficult for users to examine and understand the 

results generated [14]. Thus, additional inspection is imperative for identifying how we can use 

ML-based flood prediction techniques ethically. 


8 
 

Question and Methods 

Motivation 

Without any warning, floods can cause devastation and displacement of vulnerable 

communities. The implementation of timely policies informed by predictive models could allow 

for preemptive evacuation or reinforcement against the disaster. On the other hand, should 

predictive models fail, it would decrease trust in future predictions or limit the community’s 

ability to react in advance. By addressing bias in predictive models, researchers could produce 

more cohesive and effective forecasts, which may in turn help lessen the social and economic 

costs of floods and other natural disasters. 

The potential applications of ML in climate and disaster predictions could serve 

communities in informing preemptive action against natural disasters, but social and technical 

biases such as data gaps or poor generalizability still pose risks to utilizing this technology 

effectively. This review aims to summarize current research, identify gaps in literature, and 

collective a cohesive view on the next step for policy action and research. While regulations can 

rarely keep up with the rate at which modern technology and information develop, a clear 

internal ethical guideline for handling bias, informed by the current state of research, would 

allow researchers to proceed with an informed and conscientious baseline.  

Research Methods 

The goal of this systematic literature review (SLR) is to perform exploratory research on 

how ML-based flood prediction models, with a specific focus on how researchers address bias in 

the models. To achieve this goal, I follow the guidelines suggested for the major steps of a SLR 

as outlined in a guideline by Kitchenham and Charters (2004) and referenced the methodology of 


9 
 

other literature review studies published on machine learning topics [19] [20] [29]. The first step 

is planning the review, which contains the relevant stages of 1) identification of the need for a 

review, 2) specifying the research question(s), and 3) developing a review protocol. The next 

step is conducting the review, which has the associated stages of 1) identification of research, 2) 

selection of primary studies, 3) study quality assessment, 4) data extraction and monitoring, and 

5) data synthesis. 

Research Questions 

I established four guiding research questions: 

RQ1: What bias concerns are raised regarding the use of machine learning in flood 

prediction models? 

RQ2: What are the most frequently implemented methods for debiasing flood prediction 

models? 

RQ3: Based on the findings from RQ1 and RQ2, how closely do proposed solutions 

address key ethical concerns in the research and development of ML-based flood prediction 

models? 

RQ4: What are the recommended directions of future research for addressing bias in ML-

based flood prediction modeling? 

By addressing these main research questions, I aim to gain deeper insight into the current 

trajectory of machine learning and flood prediction modeling, as well as synthesize potential next 

steps for addressing bias during applications of ML-based flood modeling. 


10 
 

Search Strategy 

Search Resources 

I searched for articles in the following three digital libraries: IEEE Xplore, Science 

Direct, and Springer Nature Link. I chose these publication channels as well-established 

platforms for peer reviewed journals and conference papers in fields including computer science 

and climate science. For each channel, I searched for the target search terms using their 

“Advanced Search” option and filtered according to my inclusion criteria, outlined in a later 

section. I used Zotero to save and manage the library of relevant literature. 

Search String 

For the scope of this literature review, I wanted to focus specifically on studies regarding 

the topic of disaster prediction, specifically flood prediction, and machine learning, with 

discussion on addressing or mitigating machine bias when using AI predictive models. Thus, 

after implementing some pilot searches, I refined the search term as: 

(“flood* prediction” OR “flood* model*”) AND “machine learning” AND “data bias” 

I use the wildcard character (*) with the phrase “flood*” to capture any variations of 

“flood”, “floods”, “flooding”; tying it to “prediction” or “model*” returns results that more 

specifically contain the target phrase in the context of predicting floods, the natural disaster, 

thereby reducing false positives of the search term. (This was quite crucial as simply “flood” can 

result in false positives such as “information flooding a system”.) Using (AND) with “machine 

learning” as well as “data bias” filters more specifically for studies that include machine learning 

techniques and discuss data bias. This allowed me to cast a search net with more precise filtering 

without overly restrictive target search terms. 


11 
 

Inclusion and Exclusion Criteria 

I defined the criteria to include or exclude a paper in this review. I used these criteria to 

carefully screen each result by title and abstract to remove ones that were not relevant to the 

research question. 

Inclusion criteria: 

• The study is written in English. 

• The study is published in a peer-reviewed publication channel: IEEE Xplore, 

Springer Nature Link, or Science Direct. 

• The study was published between 2020 – 2025. 

• The study is related to flood prediction and bias in machine learning predictive 

models. 

Exclusion criteria: 

• The study is not relevant to any research questions. 

• The full text of the study is not available in the search database and accessible 

using institutional accounts. 

• The study is related to disaster prediction but not machine learning. 

• The study is related to machine learning but not to disaster prediction. 

• The study in a language other than English.  

• The study is not identified as peer reviewed. 

• The study is duplicated. 

After screening, I determined 26 studies to include in the primary analysis of the SLR, as 

documented in Table 1. 

Table 1: Publication channel and number of selected papers. 

PUBLICATION CHANNEL SELECTED PAPERS 

Ieee Xplore 15 

Science Direct 9 

Springer Nature Link 2 

Total 26 


12 
 

Data Collection 

I documented extracted data from the final selected papers for review in an Excel 

spreadsheet. Table 2 details the types of data recorded. 

Table 2: Types of data extracted from studies 

TYPE DATA 

Standard Information Title, authors, DOI, publication year, 
publication or conference name, paper type, 
reason for exclusion (if applicable) 
 

Research Questions RQ1: bias concerns, RQ2: debiasing methods, 
RQ3: efficacy of methods, RQ4: future 
research 
 

Machine Learning Related ML model(s), evaluation metrics, 
recommended model/application, data period, 
data source 
 

Flood Modeling Related Region, risk/prediction factor 

Study Quality Assessment Values for: data sourcing, data validation, 
model selection, model validation, bias 
assessment, generalizability 
 

 After all the studies included in the primary analysis were reviewed, the final findings 

were summarized and documented in a separate table to identify the major themes among each 

research question. 

 
13 
 

I also performed quality assessment of the reviewed studies, developing a set of quality 

assessment criteria [25]. 

Table 3: Quality assessment values 

QUALITY 

CATEGORIES 

HIGH (3) MEDIUM (2) LOW (1) 

Data Sourcing Data sources clearly 
documented and 
critically assessed, 
appropriate temporal 
and spatial resolution 

Data sources partially 
documented and 
assessed or lack of 
appropriate temporal 
and spatial resolution 
 

Data source 
description entirely 
lacking 
 

Data Validation Data gaps/outliers are 
addressed and handled 
appropriately, data sets 
are cleaned 
 

Data validation has 
been somewhat 
performed 
 

No data validation 
 

Model Selection ML model is clearly 
described and justified 
 

ML model is 
somewhat described 
and justified 

No discussion of ML 
model or rationale for 
selection 

Model Validation Model validation is 
thoroughly performed 
and justified 
 

Model validation 
somewhat performed 
 

No data validation 
 

Bias Assessment Biases and potential 
causes are defined and 
discussed 
 

Some definition and 
discussion of biases 
and potential causes 

No discussion of 
biases and potential 
causes 

Generalizability There is a proper use-
case to evaluate results, 
results are general 
enough to be expanded 
to other situations 
 

Minimal use-case for 
testing, results are 
somewhat 
generalizable 

Highly specific results 
constrained to the 
study, lacking a use-
case to evaluate the 
results 

 
 Based on the values they receive in each category, I assigned each study a final quality 

rating, which is the average of the applicable categories. 

 
14 
 

Results 

RQ1: Bias Concerns 

What bias concerns are raised regarding the use of machine learning in flood prediction 

models? 

Table 4: Data summary of bias concerns 

BIAS CONCERNS COUNT 

Insufficient data, limited monitoring 8 

Model bias 5 

Data imbalance 4 

Data privacy 2 

Computational constraints 2 

Algorithmic transparency, accountability 1 

Unequal access to climate prediction tools 1 

Interpretability barriers 1 

Overburdening of certain socio-demographic populations 1 

 
The major bias concerns raised in the reviewed studies include data imbalance, data 

privacy, bias from insufficient data or limited monitoring, and model bias. ML-based flood 

prediction models are prone to underestimating the upper ranges of flood occurrences due to the 

data imbalance where extreme flood occurrences have less recorded instances. Additionally, 

computational constraints are an area of concern limiting the effectiveness of ML-based flood 

prediction models. Issues of algorithmic transparency and accountability, unequal access to 

climate prediction tools, interpretability barriers, and overburdening of certain socio-


15 
 

demographic populations are also mentioned, although with much less frequency. This bias 

concern relates to failures in the representativity of data and its impact on vulnerable 

populations. For example, crowdsourced data has data biases related to systemic exclusions of 

certain populations from uneven coverage and limited access to internet services; these “digitally 

invisible populations” may be misgauged or overlooked [9] and high-risk regions with fewer 

data points may be classified as lower risk [34]. 

Most studies discuss bias in a technical perspective, focusing on algorithmic methods that 

can allow for the mitigation of bias in the data or the model. Definitions of bias with more 

societal impact focus are lacking; some papers only discuss it as a passing model metric (percent 

bias value), listed purely as a technical term. Reviews that I reference beyond my primary SLR 

analysis tend to provide a greater ethics-based perspective. The focus of ethics-based perspective 

papers, discuss concepts like overburdening vulnerable populations [24] in more depth compared 

to the technical explorations of ML for climate modeling. Lack of discussion in literature 

suggests that there is an insufficient framework for defining and critically addressing bias when 

it comes to ML-based flood modeling. 

  
16 
 

RQ2: Debiasing Methods 

What are the most frequently implemented methods for debiasing flood prediction models? 

Table 5: Data summary of debiasing methods 

DEBIASING METHODS COUNT 

Statistical bias correction 8 

Model development 5 

Data augmentation 4 

Crowdsourced/open-source data 3 

Federated learning 1 

Post-hoc interpretation methods 1 

 
The main debiasing methods utilized by studies in this review fall under the categories of 

data augmentation, statistical bias correction (e.g., BCSD, downscaling, Multivariate Hawkes 

process), and model development. Use of crowdsourced and open-source data, federated 

learning, and other post-hoc interpretation methods are also mentioned as a potential method to 

address bias. Specifically, data augmentation methods such as SMOTE, undersampling, and 

oversampling are frequently utilized to address bias stemming from data imbalances, where 

specific classes of data ranges are underrepresented. Statistical bias correction is used to account 

for systemic bias in the model predictions.  

Oversampling, which includes techniques like SMOTE (Synthetic Minority Over-

sampling Technique), involve selecting an instance in the minority class, identifying its k-nearest 

neighbors, and generating synthetic samples that are in a similar range to that sample of the 

minority class. This rebalances the data and minimizes bias towards the minority class. 

Undersampling, on the other hand, randomly removes instances from the majority class, 


17 
 

achieving a similar effect, although oversampling tends to yield more accurate models than 

undersampling [2]. 

Other methods under the category of “model development” include integrating different 

ML-models to augment their performance and improve their performance against key metrics 

like accuracy, precision, ROC, and percent bias. Ensemble machine learning techniques combine 

multiple models to reduce the bias of individual models, minimize overfitting, and enhance 

overall model robustness. Random Forest (RF) is a technique where the algorithm creates 

multiple decision trees based on random parts of the data, picks random features so that each tree 

is different, and combines the predictions that each tree produces. It is one of the most frequently 

used ensemble techniques, outperforming single models, and helps with identifying key features 

and avoiding overfitting [1] [2] [3] [7] [14] [23] [32] [40] [46]. Presence-only models are another 

alternative to dealing with data gaps since they only focus on locations where flood data has been 

observed, thus circumventing the issue of conflating lack of historical data with true non-flood 

zones [3] [10]. Federated learning is one method proposed to mitigate data privacy concerns [6] 

by training a model on multiple clients, keeping their data decentralized. Furthermore, using 

crowdsourced data to address areas with historical data gaps or limited monitoring may allow for 

more data coverage of otherwise data-sparse areas, though the risk of overlooking digitally 

invisible populations remains [9]. 

  
18 
 

RQ3: Efficacy of Methods 

Based on the findings from RQ1 and RQ2, how closely do proposed solutions address key ethical 

concerns in the research and development of ML-based flood prediction models? 

As summarized in the previous section, many of the studies focused on technical 

definitions of bias, and the methods they employed to mitigate these biases were also technical 

and algorithmic based. The high frequency of data bias and corresponding data augmentation 

methods align effectively. Ensemble methods and statistical bias correction aid with reducing 

bias from systemic bias in individual models. Concerns of data privacy are also addressed with 

methods like federated learning. However, focusing on mitigating data bias may not actually be 

the most valuable direction of reducing burden on vulnerable communities [24]. The greatest 

challenge remains for researchers to audit for and mitigate representational bias. This may 

require more field studies of affected populations, as well as cross-border collaboration and 

management of data collection.  

Furthermore, studies propose use of explainable AI (XAI) to address concerns of hidden 

bias due to the “black-box” nature of ML [14], aiding to establish greater trust in AI systems. 

However, XAI frameworks are not widely implemented or developed in the studies that focus on 

technical methods for debiasing. Thus, even though proposed and tested methods for addressing 

bias are successful, they are still not always applied, nor is there a cohesive method for 

integrating all methods into ML-based flood prediction models for widespread, organized 

applications. Studies still frequently mention limited knowledge about representational bias as a 

limitation and direction for future research.  


19 
 

RQ4: Future Research 

What are the recommended directions of future research for addressing bias in ML-based flood 

prediction modeling? 

Table 6: Data summary of future research 

FUTURE RESEARCH COUNT 

Generalizability 9 

Expanding datasets 9 

Compare against alternatives 8 

Integration (of other techniques, models) 5 

Validation/Follow-up monitoring 4 

Scaling model application/capacity 3 

Improve interpretability 2 

Enhancement/Further development 2 

Qualitative research of communities 1 

 
Most further improvements fall under the broad categories of integrating models or 

methods, expanding datasets, and improving generalizability. Many suggestions for future 

research focus on improving the performance and accuracy of the prediction model, such as 

expanding the generalizability of models by testing them with different data or in different 

climate conditions, collecting more data or implementing greater monitoring to expand datasets 

for model training, and integration of machine learning with other models to improve 

performance.  

Another important direction for future research is to improve interpretability and 

transparency of ML-based models. This includes continued development of explainable (XAI) in 


20 
 

future research, such as exploring how XAI tools can identify inherent data bias that causes 

model inaccuracies, comparing models with XAI tools to check for consistency, and addressing 

sources of discrepancies in XAI results compared to historical models [14]. This will allow 

future model developments to identify if different predictions from ML-based models arise from 

model bias, data bias, or discovery of new patterns.  

Furthermore, studies call for additional research to audit for representational bias, such as 

socio-economic bias. This includes auditing existing and novel data sources, such as geospatial 

data [13] or crowdsourced data channels [11] to further improve training datasets. How are 

existing models transferable to underrepresented areas? From the topics that studies suggest for 

future research, there exists a consciousness of broader issues like interpretability, scalability, 

and impact of local communities. However, significant continual research efforts are still 

required to better identify and address them. 

 
21 
 

Discussion 

From the results of the SLR, a summary of key findings is that: 

• Most studies focus on bias concerns of data imbalance, data privacy, insufficient data 

or limited monitoring, and model bias. 

• Frequently implemented techniques for debiasing are data augmentation, statistical 

bias correction, and model development. 

• Studies focus on technical sources of bias and debiasing, with less discussion of 

ethical considerations and bias definitions. While the efficacy of technical debasing 

methods is high for technical bias, it is lower for representational biases. 

• Future research directions include improving generalizability, expanding datasets for 

training, further improving model performance, and increasing model interpretability. 

• There is strong potential for effective applications of ML-based technologies for 

flood prediction and recommendation for further research, but direct policy 

suggestions are minimal. 

With flood modeling research, it is key that the studies can translate into something that 

can aid policymakers and other relevant decision makers take meaningful action towards 

mitigating negative flood impacts. However, there exists a gap among the reviewed studies in 

recommending some sort of governance policy or action plan based on their findings. Many of 

the studies conclude with a recommendation for flood policy experts to leverage their findings or 

technology to make more informed decisions for flood response but lack direct actionable 

suggestions. This may be because of the knowledge gap between subject area expertise. 

What is the recommended governance, policy action, or real-world response given the 

results of ML-based flood prediction modeling? Studies state recommendations to use their 


22 
 

predictions to guide development and infrastructure away from high-risk zones [31]. However, 

this may still be insufficiently helpful to decision makers because they still lack information on 

how to define these zones. One way to address this gap could be to include guidelines for 

response to model predictions. For example, what should be the response policy for a “high” 

flood risk but “low” flood impact? What about “low” flood risk but “high” flood impact? Or 

something in the middle? What predictions fall under each category of flood risk? Management 

of resource allocation, another frequently mentioned way to leverage flood model predictions for 

serving communities [9] [34], could also use similar collaboration with experts knowledgeable 

about natural and human resource management. 

For example, the risk of data imbalances where models underestimate upper flood values 

is that policymakers underestimate the impact of predicted floods and react insufficiently. If 

there is lower perceived risk than real risk, there would be inadequate response. Therefore, it 

would be key for model reports to highlight the chance for models to underestimate upper ranges 

and provide a metric that details the possible “high end” of flood risk. 

Addressing these data imbalances with data augmentation faces limitations as well. Data 

augmentation methods call for strata or classes of data ranges to be defined, which requires 

sufficient prior information to identify the ranges as well as proficient understanding to properly 

identify the extreme ranges. For example, when models are validated with methods such as 

stratified k-fold cross validation, which splits the dataset into k subsets, trains it on k-1 subsets, 

and tests it against the kth subset. This process is iterated and averaged over k iterations to 

estimate generalization ability of the model; stratification allows the original distribution of each 

class to be retained in each of these k subsets. This is an effective method for evaluating a model, 

particularly with imbalanced data. However, the application of this type of validation hinges on 


23 
 

knowing the ranges. With shifting climate conditions and precipitation patterns that deviate from 

historical norms, it may become more difficult to accurately stratify data ranges. 

Another key point for consideration for studies regarding AI fairness, bias, and 

accountability is for researchers to critically define what kind of fairness they aim for through 

debiasing. With certain algorithms, sufficient data diversity can mitigate undesirable side effects 

of varied data [43]. Thus, increasing data sources and collection to gain diverse dataset may help 

promote more “fair” ML outputs. At the same time, the definition of “fairness” holds certain 

assumptions about a system. The two main conflicting definitions of fairness are individual 

fairness and group fairness. Individual fairness assumes that individuals who are similar should 

be treated similarly; individuals should receive consistent and equal treatment based on their 

features, regardless of what demographic group they belong to. Group fairness, or non-

discrimination, assumes that demographic groups should be treated similarly as a whole; 

protected groups should receive equal outcomes to address historical and structural biases [42]. 

Both definitions are valid and applicable in different cases, but the adoption of a definition 

should be a conscious choice, which defines the assumption that goes into the core of a model’s 

worldview. When debiasing a model, it should also be defined what kind of standard for fairness 

is being used. Is the goal individual fairness or group fairness? This contradiction may help 

explain why papers “fail to engage critically with what constitutes ‘bias’ in the first place” [29]. 

The ML research community has worked to identify strategies to mitigate unwanted bias and 

promote fairness in the outputs of ML flood models. However, researchers are still working to 

fill a gap with regards to defining and articulating the assumptions that ML systems inherently 

encode from its creator(s) and its limitations. Nonetheless, ML methods can complement and 

enhance existing flood prediction models. 


24 
 

Limitations and Future Research 

Due to financial and time constraints, I only assessed studies that were open access or 

accessible to students with institutional accounts from three publication channels. Thus, this 

review may omit relevant studies with restricted access. Thus, while many of the papers included 

in this review focus on technical bias, this may be due to the limited scope of my search string. 

However, my findings align with the results of other systematic literature reviews in similar 

fields and corroborate that there are still studies that lack a critical discussion of bias. 

Additionally, this review focuses on recent studies to track the most recent developments in the 

application of ML-based flood modeling. However, older applications of ML exist. For 

additional in-depth future reviews, it may be valuable to compare older publications to compare 

the evolution of the field. Another future research topic may be to investigate how ethical uses of 

ML are regulated in other well-established fields to gain a better interdisciplinary understanding 

of ML use and draw from shared terminology and regulatory policies. 

Additionally, another crucial consideration when it comes to application of ML-based 

technologies is whether the computational demands of AI would put additional strain on certain 

communities despite serving another. This ethical decision for AI use impacts vulnerable 

communities. One area I would like to investigate beyond the scope of this thesis is the 

possibility of using AI models in a way conscientious of the demands that powering AI can put 

on power grids and water infrastructures of communities. Does this put unfair demands on 

certain communities just to provide benefits to others? Based purely on resources spent and 

resources saved, for example energy, would it be possible to offset the net cost? Even if there is a 

net gain, where would that allocation and distribution be and who benefits? Systems analysis 

research could provide insight into these questions.  


25 
 

Conclusions 

Bias is inherent in AI systems. Following best practices to debias data and models can 

allow for better predictions, but it may be impossible to eliminate bias from any human-based 

system. Thus, it remains imperative that researchers and policymakers work together to define 

protocols that acknowledge the existence of biases and make the most of the valuable insight that 

AI can provide while maintaining human judgment. Gaps between technical ML research and 

ethical governance policy are still apparent in the way that bias is discussed and addressed in the 

studies reviewed. These gaps signals areas for decision making and suggest a need for greater 

interdisciplinary committees to collaborate and determine ethical policies. Effective risk 

management and decision-making techniques when faced with uncertainty are at the core of 

developing a strategy that maximizes the utility of ML-based flood predictions and minimizes 

the shortcomings of inherent biases. After all, an algorithm can generate a near-perfect model, 

but it takes the understanding of human interactions to bring real change to a community. 

 
26 
 

Bibliography 

[1] S. A. Rufus, N. A. Ahmad, N. Abdullah, and Z. Abdul-Malek, “A Comparative Analysis Using Machine 
Learning Approach for Thunderstorm Prediction in Southern Region of Peninsular Malaysia,” in 2023 International 
Symposium on Lightning Protection (XVII SIPDA), Oct. 2023, pp. 1–6. doi: 10.1109/SIPDA59763.2023.10349193. 

[2] Md. A. Rahman, A. Akter, F. S. Richi, A. Shoud, and T. Ahmed, “A Comparative Study of Undersampling and 
Oversampling Methods for Flood Forecasting in Bangladesh using Machine Learning,” in 2023 14th International 
Conference on Computing Communication and Networking Technologies (ICCCNT), Jul. 2023, pp. 1–7. doi: 
10.1109/ICCCNT56998.2023.10306368. 

[3] A. Haghizadeh, R. Fathiganji, E. Sohrabi, A. Lotfi, and L. Ghasemi, “A framework for flood risk zoning and 
prioritization combining maximum entropy and game theory,” Sci Rep, vol. 15, no. 1, p. 24153, Jul. 2025, doi: 
10.1038/s41598-025-08220-x. 

[4] L. Addison, A. Hosang, T.-A. Tuitt, K. Manohar, and P. Hosein, “A LLM-Based Platform for Flood Risk 
Education and Weather Alerts in SIDS,” in 2024 IEEE International Conference on Technology Management, 
Operations and Decisions (ICTMOD), Nov. 2024, pp. 1–6. doi: 10.1109/ICTMOD63116.2024.10878134. 

[5] W. Al-Sabhan, M. Mulligan, and G. A. Blackburn, “A real-time hydrological model for flood prediction using 
GIS and the WWW,” Computers, Environment and Urban Systems, vol. 27, no. 1, pp. 9–32, Jan. 2003, doi: 
10.1016/S0198-9715(01)00010-2. 

[6] S. H. Mahir et al., “Advanced Hydro-Informatic Modeling Through Feedforward Neural Network, Federated 
Learning, and Explainable AI for Enhancing Flood Prediction,” IEEE Open Journal of the Computer Society, vol. 6, 
pp. 726–738, 2025, doi: 10.1109/OJCS.2025.3556424. 

[7] S. Karim, T. S. Nova, and S. Tasneem, “Advancing Rainfall Seasonality Detection via Bias Correction and 
Spatial Disaggregation (BCSD) with CHIRPS-v2 for Climate Modeling in Bangladesh,” in 2025 International 
Conference on Electrical, Computer and Communication Engineering (ECCE), Feb. 2025, pp. 1–6. doi: 
10.1109/ECCE64574.2025.11013178. 

[8] P. Wang, X. Wu, and Yichen, “Ai-driven approaches to flood risk management: overcoming data bias and 
enhancing decision-making,” Climate Risk Management, vol. 50, p. 100752, Jan. 2025, doi: 
10.1016/j.crm.2025.100752. 

[9] N. Coleman, A. Clarke, M. Esparza, and A. Mostafavi, “Analyzing common social and physical features of 
flash-flood vulnerability in urban areas,” International Journal of Disaster Risk Reduction, vol. 122, p. 105437, May 
2025, doi: 10.1016/j.ijdrr.2025.105437. 

[10] M. El Baida, F. Boushaba, M. Chourak, and M. Hosni, “Are Presence-Only Machine Learning Models Reliable 
Enough for Hazard Mapping? Insights from Flood Hazard in the Maghreb,” in 2025 International Conference for 
Artificial Intelligence, Applications, Innovation and Ethics (AI2E), Feb. 2025, pp. 1–4. doi: 
10.1109/AI2E64943.2025.10982848. 

[11] Z. Liu, N. Coleman, F. I. Patrascu, K. Yin, X. Li, and A. Mostafavi, “Artificial intelligence for flood risk 
management: A comprehensive state-of-the-art review and future directions,” International Journal of Disaster Risk 
Reduction, vol. 117, p. 105110, Feb. 2025, doi: 10.1016/j.ijdrr.2024.105110. 

[12] F.-J. Chang, L.-C. Chang, and J.-F. Chen, “Artificial Intelligence Techniques in Hydrology and Water 
Resources Management,” Water, vol. 15, no. 10, p. 1846, Jan. 2023, doi: 10.3390/w15101846. 

[13] C. M. Gevaert, T. Buunk, and M. J. C. van den Homberg, “Auditing Geospatial Datasets for Biases: Using 
Global Building Datasets for Disaster Risk Management,” IEEE Journal of Selected Topics in Applied Earth 
Observations and Remote Sensing, vol. 17, pp. 12579–12590, 2024, doi: 10.1109/JSTARS.2024.3422503. 

https://doi.org/10.1109/SIPDA59763.2023.10349193
https://doi.org/10.1109/ICCCNT56998.2023.10306368
https://doi.org/10.1038/s41598-025-08220-x
https://doi.org/10.1109/ICTMOD63116.2024.10878134
https://doi.org/10.1016/S0198-9715(01)00010-2
https://doi.org/10.1109/OJCS.2025.3556424
https://doi.org/10.1109/ECCE64574.2025.11013178
https://doi.org/10.1016/j.crm.2025.100752
https://doi.org/10.1016/j.ijdrr.2025.105437
https://doi.org/10.1109/AI2E64943.2025.10982848
https://doi.org/10.1016/j.ijdrr.2024.105110
https://doi.org/10.3390/w15101846
https://doi.org/10.1109/JSTARS.2024.3422503


27 
 

[14] F. Huang, W. Shangguan, Q. Li, L. Li, and Y. Zhang, “Beyond prediction: An integrated post-hoc approach to 
interpret complex model in hydrometeorology,” Environmental Modelling & Software, vol. 167, p. 105762, Sep. 
2023, doi: 10.1016/j.envsoft.2023.105762. 

[15] R. C. Hales, G. P. Williams, E. James Nelson, R. B. Sowby, D. P. Ames, and J. L. S. Lozano, “Bias correcting 
discharge simulations from the GEOGloWS global hydrologic model,” Journal of Hydrology, vol. 626, p. 130279, 
Nov. 2023, doi: 10.1016/j.jhydrol.2023.130279. 

[16] A. Ramesh, Z. Gulmira, A. H. Shnain, R. Ramya, P. Nagaveni, and A. S. K, “Big Data and Machine Learning 
for Climate Change Prediction: An Integrated Approach to Environmental Monitoring,” in 2025 International 
Conference on Automation and Computation (AUTOCOM), Mar. 2025, pp. 1384–1389. doi: 
10.1109/AUTOCOM64127.2025.10956420. 

[17] F. AlZaatiti, J. Halwani, and M. R. Soliman, “Climate change impacts on flood risks in the Abou Ali River 
Basin, Lebanon: A hydrological modeling approach,” Results in Engineering, vol. 25, p. 104186, Mar. 2025, doi: 
10.1016/j.rineng.2025.104186. 

[18] E. Wallace, T. Z. Zhao, S. Feng, and S. Singh, “Concealed Data Poisoning Attacks on NLP Models,” Apr. 12, 
2021, arXiv: arXiv:2010.12563. doi: 10.48550/arXiv.2010.12563. 

[19] M. A. Zaidi, “Conceptual Modeling Interacts with Machine Learning – A Systematic Literature Review,” in 
Computational Science and Its Applications – ICCSA 2021, O. Gervasi, B. Murgante, S. Misra, C. Garau, I. Blečić, 
D. Taniar, B. O. Apduhan, A. M. A. C. Rocha, E. Tarantino, and C. M. Torre, Eds., Cham: Springer International 
Publishing, 2021, pp. 522–532. doi: 10.1007/978-3-030-87013-3_39. 

[20] A. Khakpour and R. Colomo-Palacios, “Convergence of Gamification and Machine Learning: A Systematic 
Literature Review,” Tech Know Learn, vol. 26, no. 3, pp. 597–636, Sep. 2021, doi: 10.1007/s10758-020-09456-4. 

[21] B.-C. Jhong, F.-W. Chen, and C.-P. Tung, “Development of a real-time dynamic inundation risk assessment 
approach on paddy fields during typhoons: Exploration of adaptation strategies and quantification of risks,” Journal 
of Environmental Management, vol. 380, p. 124981, Apr. 2025, doi: 10.1016/j.jenvman.2025.124981. 

[22] T. Atmaja and K. Fukushi, “EMPOWERING GEO-BASED AI ALGORITHM TO AID COASTAL FLOOD 
RISK ANALYSIS: A REVIEW AND FRAMEWORK DEVELOPMENT,” ISPRS Annals of the Photogrammetry, 
Remote Sensing and Spatial Information Sciences, vol. V-3–2022, pp. 517–523, May 2022, doi: 10.5194/isprs-
annals-V-3-2022-517-2022. 

[23] F. Taromideh, R. Fazloula, B. Choubin, M. Masoodi, and A. Mosavi, “Ensemble Machine Learning for Urban 
Flood Hazard Assessment,” in 2024 IEEE 22nd World Symposium on Applied Machine Intelligence and Informatics 
(SAMI), Jan. 2024, pp. 000525–000530. doi: 10.1109/SAMI60510.2024.10432902. 

[24] C. M. Gevaert, M. Carman, B. Rosman, Y. Georgiadou, and R. Soden, “Fairness and accountability of AI in 
disaster risk management: Opportunities and challenges,” Patterns, vol. 2, no. 11, p. 100363, Nov. 2021, doi: 
10.1016/j.patter.2021.100363. 

[25] K. S. Khan, R. Kunz, J. Kleijnen, and G. Antes, “Five steps to conducting a systematic review,” J R Soc Med, 
vol. 96, no. 3, pp. 118–121, Mar. 2003, doi: 10.1258/jrsm.96.3.118. 

[26] B. Kitchenham and S. Charters, “Guidelines for performing systematic literature reviews in software 
engineering,” Technical report, EBSE Technical Report EBSE-2007-01, 2007. Accessed: Nov. 05, 2025. [Online]. 
Available: https://docs.edtechhub.org/lib/EDAG684W 

[27] C. Wasko et al., “Incorporating climate change in flood estimation guidance,” Philosophical Transactions of 
the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 379, no. 2195, p. 20190548, Mar. 2021, 
doi: 10.1098/rsta.2019.0548. 

https://doi.org/10.1016/j.envsoft.2023.105762
https://doi.org/10.1016/j.jhydrol.2023.130279
https://doi.org/10.1109/AUTOCOM64127.2025.10956420
https://doi.org/10.1016/j.rineng.2025.104186
https://doi.org/10.48550/arXiv.2010.12563
https://doi.org/10.1007/978-3-030-87013-3_39
https://doi.org/10.1007/s10758-020-09456-4
https://doi.org/10.1016/j.jenvman.2025.124981
https://doi.org/10.5194/isprs-annals-V-3-2022-517-2022
https://doi.org/10.5194/isprs-annals-V-3-2022-517-2022
https://doi.org/10.1109/SAMI60510.2024.10432902
https://doi.org/10.1016/j.patter.2021.100363
https://doi.org/10.1258/jrsm.96.3.118
https://docs.edtechhub.org/lib/EDAG684W
https://doi.org/10.1098/rsta.2019.0548


28 
 

[28] N. Fatima et al., “Integrating Machine Learning Models With Probability Distribution Methods for Extreme 
Flood Risk Assessment,” IEEE Access, vol. 13, pp. 160922–160938, 2025, doi: 10.1109/ACCESS.2025.3598121. 

[29] S. L. Blodgett, S. Barocas, H. D. III, and H. Wallach, “Language (Technology) is Power: A Critical Survey of 
‘Bias’ in NLP,” May 29, 2020, arXiv: arXiv:2005.14050. doi: 10.48550/arXiv.2005.14050. 

[30] J. Salas, A. Saha, and S. Ravela, “Learning inter-annual flood loss risk models from historical flood insurance 
claims,” Journal of Environmental Management, vol. 347, p. 118862, Dec. 2023, doi: 
10.1016/j.jenvman.2023.118862. 

[31] R. M. A. Ikram, M. Wang, H. Moayedi, and A. A. Dehrashid, “Management and prediction of river flood 
utilizing optimization approach of artificial intelligence evolutionary algorithms,” Sci Rep, vol. 15, no. 1, p. 22787, 
Jul. 2025, doi: 10.1038/s41598-025-04290-z. 

[32] J. S. Navarro, R. Zhuang, C. Albertini, and S. Manfreda, “Mapping flood susceptibility using Random Forest 
exploiting satellite observations and geomorphic features,” Science of The Total Environment, vol. 1002, p. 180592, 
Nov. 2025, doi: 10.1016/j.scitotenv.2025.180592. 

[33] B. H. Zhang, B. Lemoine, and M. Mitchell, “Mitigating Unwanted Biases with Adversarial Learning,” in 
Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, New Orleans LA USA: ACM, Dec. 
2018, pp. 335–340. doi: 10.1145/3278721.3278779. 

[34] Z. Zhou and M. Sun, “Multivariate Hawkes Processes for Incomplete Biased Data,” in 2021 IEEE International 
Conference on Big Data (Big Data), Dec. 2021, pp. 968–977. doi: 10.1109/BigData52589.2021.9672043. 

[35] E. M. Bender, T. Gebru, A. McMillan-Major, and S. Shmitchell, “On the Dangers of Stochastic Parrots: Can 
Language Models Be Too Big? ���������,” in Proceedings of the 2021 ACM Conference on Fairness, Accountability, and 
Transparency, Virtual Event Canada: ACM, Mar. 2021, pp. 610–623. doi: 10.1145/3442188.3445922. 

[36] N. Carlini et al., “Poisoning Web-Scale Training Datasets is Practical,” in 2024 IEEE Symposium on Security 
and Privacy (SP), May 2024, pp. 407–425. doi: 10.1109/SP54263.2024.00179. 

[37] K. Khaing Kyaw et al., “Private sensors and crowdsourced rainfall data: Accuracy and potential for modelling 
pluvial flooding in urban areas of Oslo, Norway,” Journal of Hydrology X, vol. 25, p. 100191, Dec. 2024, doi: 
10.1016/j.hydroa.2024.100191. 

[38] N. Rathnayake, U. Rathnayake, I. Chathuranika, T. L. Dang, and Y. Hoshino, “Projected Water Levels and 
Identified Future Floods: A Comparative Analysis for Mahaweli River, Sri Lanka,” IEEE Access, vol. 11, pp. 8920–
8937, 2023, doi: 10.1109/ACCESS.2023.3238717. 

[39] J. J. Smith, S. Amershi, S. Barocas, H. Wallach, and J. Wortman Vaughan, “REAL ML: Recognizing, 
Exploring, and Articulating Limitations of Machine Learning Research,” in 2022 ACM Conference on Fairness, 
Accountability, and Transparency, Seoul Republic of Korea: ACM, Jun. 2022, pp. 587–597. doi: 
10.1145/3531146.3533122. 

[40] K. He, W. Zhao, L. Brocca, P. Quintana-Seguí, and X. Chen, “SMPD-MERG: A Hybrid Downscaling Model 
for High-Resolution Daily Precipitation Estimation via Merging Surface Soil Moisture and Multisource 
Precipitation Data,” IEEE Transactions on Geoscience and Remote Sensing, vol. 63, pp. 1–16, 2025, doi: 
10.1109/TGRS.2025.3561253. 

[41] J. C. Dizon, I. Aryal, and I. Benitez, “Streamflow Prediction of Cañas River Watershed, Cavite, Philippines 
using Long Short-Term Memory,” in 2024 International Conference on IT Innovation and Knowledge Discovery 
(ITIKD), Apr. 2025, pp. 1–6. doi: 10.1109/ITIKD63574.2025.11005233. 

https://doi.org/10.1109/ACCESS.2025.3598121
https://doi.org/10.48550/arXiv.2005.14050
https://doi.org/10.1016/j.jenvman.2023.118862
https://doi.org/10.1038/s41598-025-04290-z
https://doi.org/10.1016/j.scitotenv.2025.180592
https://doi.org/10.1145/3278721.3278779
https://doi.org/10.1109/BigData52589.2021.9672043
https://doi.org/10.1145/3442188.3445922
https://doi.org/10.1109/SP54263.2024.00179
https://doi.org/10.1016/j.hydroa.2024.100191
https://doi.org/10.1109/ACCESS.2023.3238717
https://doi.org/10.1145/3531146.3533122
https://doi.org/10.1109/TGRS.2025.3561253
https://doi.org/10.1109/ITIKD63574.2025.11005233


29 
 

[42] S. A. Friedler, C. Scheidegger, and S. Venkatasubramanian, “The (Im)possibility of fairness: different value 
systems require different mechanisms for fair decision making,” Commun. ACM, vol. 64, no. 4, pp. 136–143, Mar. 
2021, doi: 10.1145/3433949. 

[43] M. Raghavan, A. Slivkins, J. V. Wortman, and Z. S. Wu, “The externalities of exploration and how data 
diversity helps exploitation,” in Conference on Learning Theory, PMLR, 2018, pp. 1724–1738. Accessed: Apr. 17, 
2025. [Online]. Available: http://proceedings.mlr.press/v75/raghavan18a.html 

[44] T. Tiggeloven et al., “The Role of Artificial Intelligence for Early Warning Systems: Status, Applicability, 
Guardrails and Ways Forward,” iScience, p. 113689, Oct. 2025, doi: 10.1016/j.isci.2025.113689. 

[45] L. J. Vlaming, “The Tension Between Modern Technology and the Legal Foundations of Privacy,” Dec. 2017, 
Accessed: Apr. 17, 2025. [Online]. Available: https://hdl.handle.net/1794/24131 

[46] S. A. Rufus, N. A. Ahmad, Z. Abdul-Malek, and N. Abdullah, “Thunderstorm Prediction Model Using SMOTE 
Sampling and Machine Learning Approach,” in 2023 12th Asia-Pacific International Conference on Lightning 
(APL), Jun. 2023, pp. 1–5. doi: 10.1109/APL57308.2023.10182046. 

[47] G. Peery, “Vision Transformers Under Data Poisoning Attacks,” 2023, Accessed: Apr. 17, 2025. [Online]. 
Available: https://hdl.handle.net/1794/28707 

[48] N. Mehrabi, F. Morstatter, N. Saxena, K. Lerman, and A. Galstyan, “A Survey on Bias and Fairness in Machine 
Learning,” Jan. 25, 2022, arXiv: arXiv:1908.09635. doi: 10.48550/arXiv.1908.09635. 

[49] E. Ferrara, “Fairness and Bias in Artificial Intelligence: A Brief Survey of Sources, Impacts, and Mitigation 
Strategies,” Sci, vol. 6, no. 1, p. 3, Mar. 2024, doi: 10.3390/sci6010003. 

 
https://doi.org/10.1145/3433949
http://proceedings.mlr.press/v75/raghavan18a.html
https://doi.org/10.1016/j.isci.2025.113689
https://hdl.handle.net/1794/24131
https://doi.org/10.1109/APL57308.2023.10182046
https://hdl.handle.net/1794/28707
https://doi.org/10.48550/arXiv.1908.09635
https://doi.org/10.3390/sci6010003