DECODING STYLE: LEVERAGING MACHINE LEARNING FOR 
FASHION TREND DETECTION 
 
 
 
 
 
 
 
by 
KORA DUMPERT 
 
 
 
 
 
 
 
 
 
 
A THESIS 
 
Presented to the Department of Data Science  
and the Robert D. Clark Honors College  
in partial fulfillment of the requirements for the degree of  
Bachelor of Science 
 
May 2024 
 
 
 
 
An Abstract of the Thesis of 
Kora Dumpert for the degree of Bachelor of Science 
in the Department of Data Science to be taken June 2024 
 
 
Title: Decoding Style: Leveraging Machine Learning for Fashion Trend Detection 
 
 
 
Approved:            Dr. Thanh Nguyen                            
               Primary Thesis Advisor 
 
Clothing, once a necessity for human survival, has evolved into a powerful means of self-
expression and social identification. Today, the fashion industry stands as a multi-trillion-dollar 
global force, shaping economies and cultures. However, its volatile nature and environmental 
footprint necessitate innovative approaches to trend identification and inventory management. 
This research explores the fusion of machine learning techniques with fashion trend analysis to 
offer an accessible solution. By leveraging image clustering algorithms, this study identifies key 
patterns and trends within fashion apparel. The research unveils significant insights into 
Spring/Summer 2024 color preferences, item types, and material/print trends. An example of 
trends identified include red colors, dresses, and stripes. Notably, the identified trends are cross-
referenced with traditional fashion publications to assess accuracy. While the study 
acknowledges certain limitations, particularly in item type differentiation, it proposes avenues 
for future research. Ultimately, this research not only offers a glimpse into the future of fashion 
trend analysis but also presents a pathway towards more sustainable and efficient inventory 
planning. By harnessing the power of machine learning, the fashion industry can align 
production with consumer preferences, minimizing waste and environmental impact while 
maximizing economic efficiency.  
2 
 
 
 
Acknowledgements 
 
I would like to express my deepest gratitude to my advisors, Dr. Thanh Nguyen, and Dr. 
Trond Jacobsen, for their unwavering support and patience throughout the thesis process. Dr. 
Nguyen, your instrumental role in guiding me from initial discussions to project implementation 
has been invaluable. Without your expertise and guidance, this thesis would not have come to 
fruition. Dr. Jacobsen, your support during my time at the Clark Honors College has been truly 
indispensable. I would also like to extend my thanks to Dr. Stephen Fickas for his valuable input 
on my methodology and experimental design, which played a crucial role in turning my ideas 
into reality. 
To my family and friends, thank you for being my rock and unwavering support system. 
To my parents, thank you for instilling in me the value of hard work and for your constant 
encouragement throughout my life. To my sister, Elise, your light-hearted perspective, and 
infectious laughter have provided much-needed relief during the challenges of this process. To 
my best friend Lauren, your support and guidance in integrating personal elements into this 
academic endeavor have been invaluable. And to all my friends, thank you for reminding me to 
strike a balance between academic achievement and savoring the moments of joy during my final 
year of college. 
  
3 
 
 
Table of Contents 
Chapter 1: Introduction 9 
Chapter 2: Literature Review 13 
2.1 Machine Learning for image analysis 13 
2.2 Types of Machine Learning 13 
2.3 Current Work 14 
Chapter 3: Methodology 17 
3.1 Dataset 17 
3.2 Image Pre-Processing 18 
3.3 Clustering 22 
3.3.1 DBSCAN 22 
3.3.2 OPTICS 23 
3.3.3 Agglomerative 24 
3.3.4 K-means 25 
3.4 Implementation 27 
3.4.1 DBSCAN 30 
3.4.2 OPTICS 33 
3.4.3 Agglomerative 35 
3.4.4 K-Means 36 
3.5 Choosing a Clustering Method 37 
3.6 Identifying Trends 38 
3.7 Assessing Accuracy 38 
Chapter 4: Experiment 39 
4.1 K-means Cluster Image Examples 40 
4.2 Cluster Contents and Image Makeup 44 
4.3 Results 71 
4.4 Accuracy & Evaluation 72 
Chapter 5: Conclusion 74 
References 76 
Supporting Materials  
Python File: make_combined_img_folder.py 
Python File: rename_img.py 
Python File: save_no_background.py 
4 
 
 
Python File: DBSCAN_clustering.py 
Python File: Kmeans_clustering.py 
Python File: agglomerative_clustering.py 
Python File: OPTICS_clustering.py  
5 
 
 
  
List of Figures  
Figure 1: John Sebastian Woodstock 1969 11 
Figure 2: Refined Tie Dye in Vogue Italia 1970 11 
Figure 3: OperaSport Runway Image 17 
Figure 4: OperaSport Runway Image with Background Removed 21 
Figure 5: Example of image discarded from dataset. 22 
Figure 6: DBSCAN Knee Curve Plot 32 
Figure 7: DBSCAN Clustering Visualization 32 
Figure 8: Reachability Plot 34 
Figure 9: Agglomerative Clustering Dendrogram 35 
Figure 10: Elbow Plot 37 
Figure 11: Example image for trend identification 38 
Figure 12: Cluster 0 Example Images 40 
Figure 13: Cluster 1 Example Images 40 
Figure 14: Cluster 2 Example Images 40 
Figure 15: Cluster 3 Example Images 41 
Figure 16: Cluster 4 Example Images 41 
Figure 17: Cluster 5 Example Images 41 
Figure 18: Cluster 6 Example Images 42 
Figure 19: Cluster 7 Example Images 42 
Figure 20: Cluster 8 Example Images 42 
Figure 21: Cluster 9 Example Images 43 
Figure 22: Cluster 10 Example Images 43 
Figure 23: Cluster 0 Color Frequency Chart 45 
Figure 24: Cluster 0 Item Type Frequency Chart 45 
Figure 25: Cluster 0 Material/Print Frequency Chart 46 
Figure 26: Cluster 1 Color Frequency Chart 47 
Figure 27: Cluster 1 Item Type Frequency Chart 48 
Figure 28: Cluster 1 Material/Print Frequency Chart 48 
Figure 29: Cluster 2 Color Frequency Chart 50 
Figure 30: Cluster 2 Item Type Frequency Chart 50 
6 
 
 
Figure 31: Cluster 2 Material/Print Frequency Chart 51 
Figure 32: Cluster 3 Color Frequency Chart 52 
Figure 33: Cluster 3 Item Type Frequency Chart 53 
Figure 34: Cluster 3 Material/Print Frequency Chart 53 
Figure 35: Cluster 4 Color Frequency Chart 55 
Figure 36: Cluster 4 Item Type Frequency Chart 55 
Figure 37: Cluster 4 Material/Print Frequency Chart 56 
Figure 38: Cluster 5 Color Frequency Chart 57 
Figure 39: Cluster 5 Item Frequency Chart 57 
Figure 40: Cluster 5 Material/Print Frequency Chart 58 
Figure 41: Cluster 6 Color Frequency Chart 59 
Figure 42: Cluster 6 Item Type Frequency Chart 60 
Figure 43: Cluster 6 Material/Print Frequency Chart 60 
Figure 44: Cluster 7 Color Frequency Chart 62 
Figure 45: Cluster 7 Item Type Frequency Chart 62 
Figure 46: Cluster 7 Material/Print Frequency Chart 63 
Figure 47: Cluster 8 Color Frequency Chart 64 
Figure 48: Cluster 8 Item Type Frequency Chart 65 
Figure 49: Cluster 8 Materials/Print Frequency Chart 65 
Figure 50: Cluster 9 Color Frequency Chart 67 
Figure 51: Cluster 9 Item Type Frequency Chart 67 
Figure 52: Cluster 9 Material/Print Frequency Chart 68 
Figure 53: Cluster 10 Color Frequency Chart 69 
Figure 54: Cluster 10 Item Type Frequency Chart 70 
Figure 55: Cluster 10 Material/Print Frequency Chart 70 
  
7 
 
 
List of Tables  
Table 1: Cluster Size & Number Breakdown 39 
Table 2: Cluster 0 Frequency Breakdown 44 
Table 3: Cluster 1 Frequency Breakdown 46 
Table 4: Cluster 2 Frequency Breakdown 49 
Table 5: Cluster 3 Frequency Breakdown 51 
Table 6: Cluster 4 Frequency Breakdown 54 
Table 7: Cluster 5 Frequency Breakdown 56 
Table 8: Cluster 6 Frequency Breakdown 59 
Table 9: Cluster 7 Frequency Breakdown 61 
Table 10: Cluster 8 Frequency Breakdown 64 
Table 11: Cluster 9 Frequency Breakdown 66 
Table 12: Cluster 10 Frequency Breakdown 68 
Table 13: Key Trend Table Summary 72 
Table 14: Trends Seen in Publications 73 
8 
 
 
 
Chapter 1: Introduction 
 
170,000 years ago, clothing emerged as an environmental necessity for humans, 
protecting them from extreme weather (Viegas 2011). However, since then, choice of dress has 
evolved into a way that people express their interests, affiliations, and often their socioeconomic 
status. This choice of dress is often described as fashion, which is “a style that is popular at a 
particular time, especially in clothes” (Cambridge Dictionary, s.v. “fashion”). Additionally, 
fashion itself is cyclical and is informed through a combination of art, culture, and social norms. 
Today, the fashion industry is a 2.5 trillion-dollar industry across the globe (Maloney 
2019). In 2019 Americans spent over 380 million dollars on fashion apparel and footwear, and 
the industry employed over 1.8 million Americans (Maloney 2019). Fashion has significant 
economic impacts for the United States and around the globe, especially for ecommerce. In 
2024, the fashion industry is projected to grow between 2 and 4 percent over the year 
(Balchandani et al. 2024). Another characteristic of the fashion industry is fluctuations in 
demand. Supply chains are often the most impacted, and experience a phenomenon called the 
“bullwhip effect” where consumer demand varies rapidly, and many supply chains cannot keep 
up (Balchandani et al. 2024) 
Demand tends to be highly volatile in the fashion industry, thus forecasting is incredibly 
important to suppliers and retailers alike. Forecasting that is either inaccurate or too slow can 
lead to serious business and environmental problems. First, if forecasted demands do not meet 
the true demands of consumers, businesses lose possible revenue from sales that they could have 
completed. Additionally, if demand is predicted to be less than reality, retailers and suppliers 
overproduce, resulting in excess inventory and further financial losses. Environmentally, 
9 
 
 
accurate demand forecasting is increasingly important as sustainability practices are a high 
priority amidst global warming and pollution. According to the Natural Resource Defense 
Council, clothing contributes to one-fifth of the annual 300 million tons of plastic that pollutes 
the earth and its waterways (Greenfield 2023). It is imperative that forecasting for fashion 
apparel is tailored to reduce any negative environmental impacts of this industry. 
Fashion trends are often defined as apparel styles that have been adopted by a large group 
of people as a form of social behavior (James Hillman Fashion Consultancy 2021). These trends 
will often emerge from a fashion designer’s seasonal fashion show and are adopted by 
consumers, or vice versa. Microtrends are apparel patterns that might shift every few months or 
so, while macrotrends tend to last multiple years (Gaddamadugu 2023). One example of a trend 
associated with a time period is the emergence of tie dye clothing in the 1960’s and early 70s. 
Tie dye, which emerged in popularity due to its use as Protest Art and Pop Fashion, became a 
fashion trend for the general public and emerged in high fashion magazines. Figure 1 below 
depicts John Sebastian at Woodstock 1969, sporting a tie dye shirt. Figure 2 shows a more 
refined version of the trend in Vogue Italia 1970. Thus, social issues such as protesting the 
Vietnam war found its way into a fashion trend that is easily associated with this time period. 
10 
 
 
  
Figure 1: John Sebastian Woodstock 1969 
(Dye Happy, 2024) 
 
Figure 2: Refined Tie Dye in Vogue Italia 1970  
(Vogue Italia, 1970) 
Traditionally, fashion trend identification falls into the hands of individuals who attended 
fashion shows, and brought their takeaways to fashion magazines like Vogue, or product 
developers at apparel brands (MasterClass 2021). However, this approach has the potential to 
shift in the era of big data and the internet. 
11 
 
 
Machine learning techniques offer a unique solution to fashion trends identification. 
Machine learning is a subset of artificial intelligence, that aims to use a machine to solve 
problems like a human might. This learning is generally a combination of computational 
algorithms that use data and often feedback to determine patterns, predict values, and classify 
items. Because fashion trends are cyclical and pattern based, machine learning algorithms are 
well suited for this kind of application. Additionally, as machine learning becomes more 
accessible, it is a way for manufacturers, brands, and consumers alike to understand trends and 
determine what will happen next. 
Thesis Outline 
This thesis will be focused on developing a process for trend analysis that would be 
accessible to anyone interested in identification fashion trends. First, a method for finding 
patterns using machine learning cluster analysis will be identified. Second, a process for 
obtaining and pre- processing the images for the trend analysis will be developed. Next, the 
process for assigning images into patterns or trend groups will be determined. Then, the results 
of the trend analysis will be summarized, and a way to determine accuracy of the process will 
follow. Finally, there will be a discussion of limitations and possible applications for this trend 
analysis process. 
Research Questions 
Q1: How can machine learning techniques such as cluster analysis be used to identify trends in 
fashion apparel? 
Q2: What are the preprocessing steps for images used in this trend identification? 
Q3: What are the trends that are found from this analysis? 
Q4: How does this process compare to traditional methods when evaluated? 
  
 
 
12 
 
 
Chapter 2: Literature Review  
2.1 Machine Learning for image analysis 
One of the promising applications of machine learning is image analysis. Image analysis 
allows for the collection of extensive data but has historically relied on significant human labor 
(Belcher et al. 2023). By utilizing machine learning techniques, human labor and human error 
can both be reduced. The role of machine learning in image analysis is to “compress high 
dimensional data into much lower- dimensional summaries relevant to a particular task or study 
objective” (Belcher et al. 2023). While humans can perform this task naturally, there are 
significant benefits to using a machine learning algorithm, such as reduction of cost and time 
necessary for processing (Norouzzadeh et al. 2018) 
2.2 Types of Machine Learning 
Machine learning is often categorized into two different sections: supervised, and 
unsupervised learning. Supervised machine learning refers to the use of training data that is 
labeled. Training data is used to teach the algorithm to recognize when something matches a 
label or does not match a label. For example, a supervised machine learning algorithm may be 
learning how to differentiate between pictures of cats, and pictures of dogs. This kind of 
approach is called classification, where the algorithm assigns already known labels to a set of 
images. The training data set would be composed of various images of cats and dogs, with the 
correct labels for each animal. Once the algorithm learns how to differentiate between these 
training images, a test data set will be used. The test data set would be composed of images of 
cats and dogs, but do not have labels associated with them. The goal of the supervised machine 
learning algorithm would then be to determine which picture belongs to what animal. 
13 
 
 
In contrast, unsupervised machine learning models are trained with data that is unlabeled. 
This approach focuses on identifying patterns in datasets. One example of this is cluster analysis. 
Clustering identifies similarities and differences, and subsets data based on these results (Han et 
al, 2021). For example, if an unsupervised machine learning algorithm is given images of dogs 
and cats, it will find what qualities of the images separate one another. The result could be any 
number of clusters made up of images. They might have two clusters, one with dogs and one 
with cats. Or three clusters, one with big dogs, and one with small dogs and cats. However, the 
algorithm wouldn’t determine that these are dogs or cats, just one cluster and another cluster.  
2.3 Current Work 
Current literature aimed at identifying fashion trends and product forecasting is available, 
however the approaches taken vary widely. One 2024 paper out of the University of Verona, 
Italy, uses a neural-network approach in combination with Google Trends to identify popularity 
of textual terms and relate this to the predicted demand of a new fashion item (Skenderi et al. 
2024). Notably, this paper forecasts the sales of a specific item, rather than identifying 
overarching trends. The research also is generalized to Nunalie, an Italian fast fashion brand and 
is therefore potentially not as representative as it could be globally.  
Another article, published in 2019, delves into the forecasting of fashion items using 
historical purchase patterns and generalizing this to abstract designs. This approach, using 
machine learning methods and python implementation, addresses the issue of predicting future 
demand for items not yet created. While the authors successfully forecasted demand for items, 
once again, general trends were not noted, and there is an absence of image analysis (Singh et al. 
2019). 
14 
 
 
Recent literature has also shown how clustering can be applied to fashion research. 
Cluster analysis aims to maximize the similarity between samples in each cluster, while at the 
same time maximizing the dissimilarity between clusters (Peng and Li 2023). It is often used in 
exploratory data analysis before further research is performed. Cluster analysis frequently 
appears in literature across fields that use multivariate data. Some applications include biological 
fields such as genomics (Oyelade et al. 2016), marketing (Benslama and Jallouli 2020), materials 
science (Vincent et al. 2023) to name a few.  
In fashion trend research, a 2019 article called “Unsupervised Deep Clustering for 
Fashion Images” develops a unique clustering model based on real world data from Amazon 
(Yan et al. 2019). Another article, published in 2015, utilizes cluster analysis to group images of 
clothing and accessory items that are related to one another. The author’s found that their method 
of clustering items together might predict what people would purchase and pair together 
(McAuley, Targett, and van den Hengel 2015).  
Several studies pull images from the internet and create a cleaned fashion dataset from 
which they categorize trends (Huang, Lu, and Hsu 2021) and (Vittayakorn et al. 2015). These 
provide an opportunity for image analysis as applied to fashion images, with many using 
clustering or other machine learning techniques. However, because of the cyclical nature of 
fashion, these datasets leave little room for determining future trends as time goes on and the 
datasets age.  
Perhaps the most promising research on fashion trend analysis is a system called “Neo-
Fashion”, created in 2021 (Zhao, Li, and Sun 2024). This system utilizes clustering, along with 
object detection methods to split runway images into multiple different pieces. This is helpful for 
fashion images, as fashion usually is made up of a variety of items. From here, the study 
15 
 
 
prepared a training dataset and used this, along with a Region Convolutional Neural Network to 
categorize each fashion image component. Finally, Neo-Fashion identifies trends in colors, 
combinations of clothing, and styles. 
This research will be working with clustering image analysis to identify trends in fashion. 
While much less technical than some machine learning techniques, cluster analysis is easily 
understood and offers an opportunity for accessibility to the public. It also opens doors for 
integration into inventory planning that could help small retailers worldwide. 
16 
 
 
Chapter 3: Methodology 
3.1 Dataset 
To effectively identify future trends, a dataset of images for a season after Winter 2024 
needed to be identified. Brands generally show their upcoming collections a season or two in 
advance, at large runway shows across the globe. Perhaps some of the most notable shows are in 
Milan, Paris, and New York. For this research, we contacted firstVIEW, a team of photographers 
that attend runway events and photograph full length pictures of each look in a show. On their 
website, they have images available for free download categorized by designer, season, and year. 
After contacting their team directly for permission to use these photos, we decided to choose 26 
different brands and 2024 Spring/Summer their collections. These were chosen with the letters of 
the alphabet. For example, our “B” designer images were from Balmain. If there was no designer 
for a letter A-Z, a designer was randomly chosen. Figure 3 below is an example of an image 
from designer OperaSport, and one of our images used in this study.  
Figure 3: OperaSport Runway Image 
(firstVIEW, 2024) 
 
17 
 
 
 
After the images were downloaded from firstVIEW, they were compiled into a single 
folder and consisted of 1053 images.  
3.2 Image Pre-Processing 
1. Renaming folders and images 
To obtain better results, we decided to first start by creating an efficient labeling system 
for folders of brand images and singular images. This provides an opportunity to understand 
what brands are in each cluster. The brand folders were renamed to just the brand, resulting in 
folders with names “Balmain” and “Chanel”. The images were named for their brand, along with 
an arbitrary number for labeling purposes. For example, an image in the Chanel folder could be 
called “Chanel8.png”. The code snapshot below depicts the function used to relabel the folders 
and images. 
18 
 
 
2. Combined Image Folder 
In this step, we combined all images from each brand folder into one folder for ease of 
access. In addition, to allow for this code to be re-ran multiple times, we also included code to 
remove existing content in the designated folder before adding new content in.  
 
 
 
 
 
 
19 
 
 
3. Removing Backgrounds 
 
Next, since each background was different, we decided to isolate just the model in each 
photo by removing the backgrounds all together. Figure 4 depicts figure 3 without a background. 
This was achieved using assistance from ChatGPT in developing a script to remove all image 
backgrounds and save the new images with new names. The python packages used were PIL, 
rembg, and os. The function ‘remove_and_save_with_white_background’ opens the image 
using PIL’s Image function, and then the rembg library’s remove function to take out the 
background. Then, the function uses Image to convert the image from RGBA to RBG format 
because further steps must use the RGB image format. A new image is then created with a white 
background, ensuring the size matches the previously (background removed) output image. Next, 
Image.alpha_composite adds the first image to the white background image, which is then 
converted back to RGB format. Finally, the function saves the new image as 
20 
 
 
“imagename_WBR.png’ to differentiate it from the original image. An example of an image that 
has its background removed is shown below. This process took about 10 seconds per photo, but 
the results were quite good. 
 
Figure 4: OperaSport Runway Image with Background Removed 
4. Final check of images 
Some images had a harder time removing backgrounds than others. Specifically, some 
images that had people in the background did fail to fully remove their backgrounds. To ensure 
these were not included in the final analysis, a manual review of photos was conducted. Any 
photos that had issues with background removal were discarded from the final image collection. 
73 images were manually discarded. This left 980 images for the final analysis. An example of 
an image that was discarded is below. 
21 
 
 
 
 
Figure 5: Example of image discarded from dataset. 
3.3 Clustering 
There are various methods of cluster analysis. Some of the most well-known algorithms 
are K-means, OPTICS, DBSCAN, and Agglomerative clustering. We wanted to explore various 
methods of clustering for this project.  
3.3.1 DBSCAN 
DBSCAN stands for Density Based Spatial Clustering of Applications with Noise. 
Density based clustering groups data points to reflect how close they are to one another. This is 
an example of clustering that does not require the pre-defined number of clusters. DBSCAN 
estimates density of a dataset by identifying points that lie within a certain neighborhood or 
radius, Epsilon, and will “consider two points connected if they lie within each other’s 
neighborhood” (Aggarwal and Reddy 2014). DBSCAN also notes some points as core points, if 
they reach or exceed the predefined MinPts number of points within their radius, Epsilon. 
DBSCAN also can identify points as noise if they are not density reachable. According to 
BOOK, “a point q is directly density-reachable from a core point p if q is within the Epsilon-
neighborhood of p” (Aggarwal and Reddy 2014). This algorithm must have predefined Epsilon 
22 
 
 
an MinPts values. DBSCAN is generally able to deal with noise and outliers well. DBSCAN is 
implemented using sklearn.cluster.DBSCAN class. 
DBSCAN’s algorithm is as follows (adapted from Aggarwal and Reddy, 2014): 
1. Beginning with arbitrary point, p, identify all points density-reachable to p, 
considering both Epsilon and MinPts. 
2. If p is a core point, a cluster has been identified. 
3. If p is not a core point, DBSCAN assigns p to noise and moves on to the next point in 
the dataset. 
4. The algorithm ends when all points are categorized as belonging to a cluster or as 
noise. 
3.3.2 OPTICS 
OPTICS stands for Ordering Points to Identify the Clustering Structure. This cluster 
method is highly related to DBSCAN, however does not require an exact Epsilon or MinPts and 
instead tries an “infinite number of distance parameters” that are smaller than a distance Epsilon 
(Aggarwal and Reddy 2014). OPTICS can help differentiate between clusters with varying levels 
of density. OPTICS does not actually produce clustering results of data, but outputs cluster 
ordering. Cluster ordering is  a linear list of all objects, which represents their density-based 
clustering structure (Han et al, 2021). The algorithm is used to create a reachability plot, which 
checks how densely points are packed. This plot allows the user to decide clusters based on their 
level of density. OPTICS is implemented using sklearn.cluster.OPTICS class. 
 
 
 
23 
 
 
The OPTICS algorithm works as follows (adapted from ArcGIS Pro): 
1. Beginning with arbitrary point, p, search all neighbor distances less than or equal to  
Epsilon. 
2. If p has a neighbor distance less than Epsilon, p is assigned that distance as it’s 
reachability distance. If all neighbor distances are greater than Epsilon, the smallest is 
assigned as the reachability distance. 
3. When no further points are within Epsilon, OPTICS moves to another point and 
repeats steps 1 and 2. 
4. After each iteration, reachability distances are calculated and ordered. 
5. Finally, the reachability plot is created from each reachability distance and is used to 
detect clusters. 
3.3.3 Agglomerative 
This clustering method is a bottom-up hierarchical method that first labels each data point 
as its own cluster. From here, a dissimilarity matrix is created. The algorithm merges the two 
closest clusters (based on chosen computed distances) and updates the dissimilarity matrix 
accordingly until the final cluster contains all data points. In this paper, the chosen distance 
computation follows Ward’s Criterion. The ward method minimizes variance in clusters and 
defines the distance between clusters as how much the sum of squares increases when merged, 
like K-means SSE method. Wards method intends to keep this growth as small as possible 
(Aggarwal and Reddy 2014). For clusters 𝐶𝐶𝑎𝑎 and 𝐶𝐶𝑏𝑏, and their cardinalities are 𝑁𝑁𝑎𝑎 and 𝑁𝑁𝑏𝑏, the 
Ward’s method to measure the SSE increase when 𝐶𝐶𝑎𝑎 ∪ 𝐶𝐶𝑏𝑏 merge is as follows: 
𝑀𝑀
𝑁𝑁 𝑁𝑁
𝑊𝑊(𝐶𝐶𝑎𝑎∪𝑏𝑏 ,𝐶𝐶𝑎𝑎∪𝑏𝑏) −𝑊𝑊(𝐶𝐶, 𝑐𝑐) =  
𝑎𝑎 𝑏𝑏  �(𝑐𝑐𝑎𝑎𝑎𝑎 − 𝑐𝑐 2𝑁𝑁 + 𝑁𝑁 𝑏𝑏𝑎𝑎
)  
𝑎𝑎 𝑏𝑏 𝑎𝑎=1
24 
 
 
 
𝑁𝑁𝑎𝑎𝑁𝑁= 𝑏𝑏  𝑑𝑑(𝑐𝑐 , 𝑐𝑐 ) 
𝑁𝑁𝑎𝑎 + 𝑁𝑁
𝑎𝑎 𝑏𝑏
𝑏𝑏
 
The agglomerative clustering algorithm is as follows (adapted from Aggarwal and Reddy, 2014): 
1. Compute the dissimilarity matrix between all points 
1. Repeat 
a. Merge clusters as 𝐶𝐶𝑎𝑎∪𝑏𝑏 =  𝐶𝐶𝑎𝑎 ∪  𝐶𝐶𝑏𝑏. Set new cluster’s cardinality as 
𝑁𝑁𝑎𝑎∪𝑏𝑏 =  𝑁𝑁𝑎𝑎 +  𝑁𝑁𝑏𝑏. 
b. Insert a row and column containing the distances between the new cluster 
𝐶𝐶𝑎𝑎∪𝑏𝑏 and the remaining clusters. 
2. Until only one maximal cluster remains. 
Further, Agglomerative clustering commonly employs a dendrogram that is a visual of 
the hierarchy of clusters. Each level of the dendrogram corresponds to certain clusters. To 
implement this method, we used sklearn.cluster.AgglomerativeClustering class. 
 
3.3.4 K-means 
K-means is one of the most popular and easily understood methods for clustering. It is not 
a hierarchical method, but rather a partitioning method because the number of clusters K must be 
pre- defined prior to use (Han et al 2021). The algorithm begins by choosing K data points 
randomly, which become centroids. Each data point is then assigned to the nearest centroid using 
a computed distance. In this study, and in many, the Euclidean distance is used.  
𝐸𝐸𝐸𝐸𝑐𝑐𝐸𝐸𝐸𝐸𝑑𝑑𝐸𝐸𝐸𝐸𝐸𝐸 𝐷𝐷𝐸𝐸𝐷𝐷𝐷𝐷𝐸𝐸𝐸𝐸𝑐𝑐𝐸𝐸 =  �(𝑥𝑥2 − 𝑥𝑥1)2 + (𝑦𝑦 22 − 𝑦𝑦1)   
25 
 
 
The Euclidian distance measures the distance from one point to anther in Euclidean Space. When 
all points are assigned to their respective clusters, the centroids, or centers of each cluster, must 
be updated. This is done by finding the mean of all points that belong to each cluster. From here, 
the process repeats until either the predetermined number of iterations is reached, or the centroid 
values begin to stop changing.  
The K-means algorithm works as follows (adapted from Aggarwal and Reddy, 2014): 
1. Select K points as initial centroids. 
2. Repeat 
a. Form K clusters by assigning each point to its closest centroid based on 
chosen method. 
b. Update the centroid of each cluster. 
3. Until convergence criterion is met. 
 
Primarily, the K-means algorithm employs the Sum of Squared Errors (SSE) function and 
works to minimize the function by finding the ideal clustering. Below is the mathematical 
formulation for the SSE function. Given dataset 𝐷𝐷 = {𝑥𝑥1, 𝑥𝑥2, … 𝑥𝑥𝑁𝑁}, and the clustering after 
using K-means represented as 𝐶𝐶 = {𝐶𝐶1,𝐶𝐶2, … 𝐶𝐶𝐾𝐾}, the SSE formula is below. Additionally, note 
that 𝑐𝑐𝑘𝑘 is the centroid of cluster 𝐶𝐶𝑘𝑘. 
𝐾𝐾
𝑆𝑆𝑆𝑆𝐸𝐸(𝐶𝐶) =  � � ‖𝑥𝑥𝑖𝑖 − 𝑐𝑐 2𝑘𝑘 ‖  
𝑘𝑘=1 𝑥𝑥𝑖𝑖∈𝐶𝐶𝑘𝑘
 
∑ 𝑥𝑥
𝑐𝑐𝑘𝑘 =  
𝑥𝑥𝑖𝑖∈𝐶𝐶𝑘𝑘 𝑖𝑖
|  𝐶𝐶𝑘𝑘|
26 
 
 
 
While K-means must have predefined clusters, there are ways to find the optimal number 
of clusters. One method is called the Elbow Method. This method produces a graph of within-
cluster sum of squares (WCSS) also known as inertia. At a certain point, the graph of WCSS 
begins to slow its rate of change. This point is called the “elbow” and indicates the optimal 
number of clusters for the given data. 
To see which clustering method we wanted to choose, we decided to implement all four 
above and compare the clustering results. 
 
3.4 Implementation 
Four different scripts were created for each clustering mechanism as follows. All 
methods required functions load_images, preprocess_images, extract_features, 
create_clusters_dict, and move_to_clusters_folder. 
 
27 
 
 
 
1. The load_images method loads in images from a folder path, creating a list of all 
file names (image_files), and stores all images in a list (images) using OpenCV’s 
cv2.imread(). It returns both the list of names, and the list of loaded images. 
2. The preprocess_images method resizes all loaded images to a 100 x 100 pixel 
size with cv2.resize() and returns the list of resized _images. 
3. The extract_features method extracts features from each image, flattening each 
image by converting the 2D array into a 1D array. Then, Principal Component 
Analysis can be used with size of 2 Principal Components to ensure lower 
dimensional data. This allows the pixel values to be used as features, which are 
the data points that will be clustered. The pixel values contain information about 
shapes, patterns, textures, and colors that can be captured from each image.  
28 
 
 
 
4. The create_clusters_dict function takes in the cluster labels and the list of image 
names in image_files. It creates a dictionary where keys are the cluster labels and 
values are the names of image files. 
5. The function move_to_clusters_folder aids in moving images into folders based 
on their cluster. Its parameters include the cluster_dict created before, along with 
the source and destination folders. This allows users to see what images belong in 
each cluster automatically. 
 
 
29 
 
 
3.4.1 DBSCAN 
 
1. DBSCAN’s cluster_images method uses sklearn.cluster.DBSCAN class to 
initialize a DBSCAN object, with the necessary parameters, epsilon, 
min_samples, and identifies the Euclidean distance for the distance metric. It then 
fits the DBSCAN algorithm to the features extracted previously and returns the 
cluster labels.  
30 
 
 
2. To find the Epsilon value, we use a method called the Knee Curve Plot (Yadav 
2020). This method requires the functions above, 
compute_distances_to_nearest_neighboors and plot_knee_curve. The function 
compute_distances_to_nearest_neighboors uses 
sklearn.neighboors.NearestNeighboors class to computer the distances between 
features and their nearest neighbor point. Then, plot_knee_curve plots these 
sorted distances to visualize the knee curve as seen in Figure 6 below. The 
Epsilon value based on this method, is where the slope of the curve is at its 
“knee” point – or, where the second derivative is the highest. The next function, 
called find_epsilon, finds this second derivative and returns the value associated 
with epsilon. In this example, epsilon was about 95.25. 
31 
 
 
Figure 6: DBSCAN Knee Curve Plot 
3. Additionally, the min_points is determined by using 2 * Dimension of data 
(Mullin, 2020). Since PCA made our data 2D, we chose min_points = 4. 
4. The visualize_clusters function produces a plot (Figure 7) of feature data points 
with various colors corresponding to their assigned clusters. 
Figure 7: DBSCAN Clustering Visualization 
 
32 
 
 
3.4.2 OPTICS 
1. OPTICS’s cluster_images method begins by first scaling features to normalize 
them with sklearn.preprocessing.StandardScaler. Then, the function employs 
sklearn.cluster.OPTICS class to initialize an OPTICS object, with the necessary 
parameters, xi, min_samples, and min_cluster_size. The xi parameter defines the 
minimum decrease in reachability distance for a cluster. The min_samples 
parameter identifies the minimum number of images per cluster. It then fits the 
OPTICS algorithm to the normalized features and returns the cluster labels. The 
function also produces a reachability plot seen below by first accessing 
reachability distances, then plotting these distances against the sorted data points. 
33 
 
 
 
 
Figure 8: Reachability Plot 
Points with similar reachability distances are more likely to be contained in one cluster. Spikes in 
reachability distances are characterized as separations between clusters, while valleys often 
represent clusters. The above plot does not appear to differentiate very well between significant 
clusters, as there are not clearly seen spikes and valleys. 
 
 
34 
 
 
3.4.3 Agglomerative 
 
1. The hierarchical_clustering method within the agglomerative clustering script 
computes the hierarchical clustering using the Ward approach using 
scipy.cluster.hierarchy’s linkage method. 
2. This function then plots the linkage matrix in the form of a dendrogram. The 
dendrogram depicts how clusters are arranged and allows the user to choose 
cluster amounts based on this graph. Clusters numbers are chosen based on the 
visual number of splits in the dendrogram. Based on Figure 9, we chose 17. 
Figure 9: Agglomerative Clustering Dendrogram 
Each level of the dendrogram represents a cluster. 17 clusters was chosen as the cutoff point 
because this is where the clusters began to merge closer to one another. 
3. The cluster_images function creates an AgglomerativeClustering function from 
sklearn.cluster, with the given parameters of num_clusters, ward linkage, and 
Euclidean as the distance metric. It then fits the model and returns cluster labels. 
35 
 
 
3.4.4 K-Means 
1. As discussed previously, K-means clustering must have a pre-defined number of 
clusters. However, to find the optimal number of clusters, the Elbow Method was 
performed within the find_opimal_clusters method. This method creates an elbow 
plot, where we must decide the ideal number of clusters from this. Figure 10 
depicts the plot produced by our data. We identified the optimal number of 
clusters as 11, due to the slowing of the rate of inertia. 
2. The cluster_images function creates a K-means object from 
sklearn.cluster.KMeans and fits the model to the extracted features, returning 
cluster labels. 
 
 
 
 
36 
 
 
 
Figure 10: Elbow Plot 
 
 
3.5 Choosing a Clustering Method 
To choose a clustering method, we will examine all clusters created by each clustering 
algorithm and identify which best represents the dataset. 
 
 
37 
 
 
3.6 Identifying Trends 
To identify trends in each cluster, the following procedure is used: 
1. Look at each image in a cluster. Identify types of clothing, major colors, material 
and pattern. Tally up the totals of each color, type, and pattern/material and 
identify the top 3 of each category for each cluster.  
2. For example, the image below is tallied as shown: 
 
Figure 11: Example image for trend identification 
Colors:  Red, Black 
Material/print: Polka dot, Leather, Floral 
Type: Skirt, Blouse 
3.7 Assessing Accuracy 
To access the accuracy of this trend identification mechanism, we will consult popular 
fashion magazines such as Vogue, Glamour and The Cut to see if our identifications match or 
stray away from their Spring/Summer 2024 trend identifications.  
 
38 
 
 
Chapter 4: Experiment 
Each clustering method was performed, and subsequent clusters were analyzed. Below is 
a breakdown of cluster size and amount for each method.  
 
Clustering Method Total Number of Clusters Cluster Sizes 
DBSCAN 1 980 
OPTICS 3 967, 8, 5 
Agglomerative 17 90, 77, 85, 8, 70, 97, 21, 32, 
56, 35, 88, 52, 162, 22, 43, 
22, 20 
K-means 11 124, 132, 94, 54, 110, 122, 
18, 48, 101, 100, 77 
 
Table 1: Cluster Size & Number Breakdown 
Based on that table above, we see that OPTICS and DBSCAN are not the best options for 
this application of clustering methods, due to their low cluster sizes. One, or even three, clusters 
would not be enough to break down trends from the overall dataset. That leaves Agglomerative 
and K-means. Given the distribution of cluster sizes, and total number of clusters, we chose to 
stick with K-means as our primary method of clustering and use these clusters for further 
analysis. 
 
39 
 
 
4.1 K-means Cluster Image Examples 
Cluster 0 
 
Figure 12: Cluster 0 Example Images 
Cluster 1 
 
Figure 13: Cluster 1 Example Images 
Cluster 2 
 
Figure 14: Cluster 2 Example Images 
 
40 
 
 
Cluster 3 
 
Figure 15: Cluster 3 Example Images 
Cluster 4 
 
Figure 16: Cluster 4 Example Images 
Cluster 5 
 
Figure 17: Cluster 5 Example Images 
 
41 
 
 
Cluster 6 
 
 
Figure 18: Cluster 6 Example Images 
Cluster 7 
 
Figure 19: Cluster 7 Example Images 
Cluster 8 
 
Figure 20: Cluster 8 Example Images 
 
42 
 
 
Cluster 9 
 
Figure 21: Cluster 9 Example Images 
Cluster 10 
 
Figure 22: Cluster 10 Example Images 
 
 
 
 
 
 
 
 
 
43 
 
 
4.2 Cluster Contents and Image Makeup 
Cluster 0 
Total images: 124 
Top Colors Frequency Top Item Frequency Top Frequency 
types Material/Print 
White 28 Dress 31 Mesh/Sheer 20 
Proportion 22.58% Proportion 25% Proportion 16.12% 
Yellow 19 Jacket 30 Floral 16 
Proportion 15.32% Proportion 24.19% Proportion 12.9% 
Grey 16 Matching 23 Abstract 7 
Set* Designs 
Proportion 12.9% Proportion 18.55% Proportion 5.64% 
Table 2: Cluster 0 Frequency Breakdown 
*A matching set is a top and bottom of an outfit that match in color and material; made to be 
worn together  
Brands Represented (21/26): 7 Days Active, Balmain, Chanel, Erdem, Feben, Fendi, Ganni, 
Hermes, Isabel Marant, Jill Sander, KNWL, MiuMiu, Numero21, OperaSport, Rotate, Salvatore 
Ferragamo, Tory Burch, Versace, Wood Wood, Zimmerman 
 
44 
 
 
Colors
23%
White
49% Yellow
Grey
15%
Other
13%
 
Figure 23: Cluster 0 Color Frequency Chart 
This pie chart depicts the top three colors in terms of their frequency among the entire cluster. 
Colors not in the top 3 are collected under “Other”.  
Item Types
25%
32% Dress
Jacket
Matching Set*
Other
24%
19%
 
Figure 24: Cluster 0 Item Type Frequency Chart  
This figure depicts the top three item styles in terms of proportion to the entire cluster. Items not in 
the top 3 are classified under “Other”. 
 
45 
 
 
Material/Print
16%
Mesh/Sheer
13% Floral
Abstract Designs
65% 6% Other
Figure 25: Cluster 0 Material/Print Frequency Chart 
This figure is a pie chart that displays the proportion of images that include the top three 
material/prints within the cluster. Those materials/prints that did not make up the top three 
materials/prints are included under “Other”. 
Cluster 1 
Total images: 132 
Top Colors Frequency Top Item Frequency Top Frequency 
types Material/Print 
Grey 22 Dress 16 Leather 27 
Proportion 16.67% Proportion 12.12% Proportion 20.45% 
Black 28 Jacket 21 Stripes 13 
Proportion 21.21% Proportion 15.9% Proportion 9.85% 
Red 18 Matching 13 Mesh/ Sheer 13 
Set* 
Proportion 13.63% Proportion 9.85% Proportion 9.85% 
Table 3: Cluster 1 Frequency Breakdown 
 
46 
 
 
*A matching set is a top and bottom of an outfit that match in color and material; made to be 
worn together 
 
Brands Represented (22/26): 7 Days Active, Acne, Chanel, Dolce & Gabbana, Erdem, Feben, 
Ganni, Hermes, Isabel Marant, Jill Sander, KNWL, MiuMiu, Numero21, OperaSport, Prada, 
Rotate, Salvatore Ferragamo, Tory Burch, Undercover, Versace, Wood Wood, Yves Saint 
Laurent  
Colors
17%
Grey
48% Black
21% Red
Other
14%
 
Figure 26: Cluster 1 Color Frequency Chart 
This pie chart depicts the top three colors in terms of their frequency among the entire cluster. 
Colors not in the top 3 are collected under “Other”. 
 
47 
 
 
Item Types
12%
Dress
16%
Jacket
Matching Set*
62% 10% Other
 
Figure 27: Cluster 1 Item Type Frequency Chart 
This figure depicts the top three item styles un terms of proportion to the entire cluster. Items not 
in the top 3 are classified under “Other”. 
Material/Print
20%
Leather
Stripes
10%
Mesh/ Sheer
60% Other
10%
 
Figure 28: Cluster 1 Material/Print Frequency Chart 
This figure is a pie chart that displays the proportion of images that include the top three 
material/prints within the cluster. Those materials/prints that did not make up the top three 
materials/prints are included under “Other”. 
 
48 
 
 
Cluster 2 
Total images: 94 
Top Colors Frequency Top Item Frequency Top Frequency 
types Material/Print 
Black 51 Dress 24 Mesh/Sheer 15 
Proportion 54.26% Proportion 25.53% Proportion 15.96% 
Purple 12 Pants 14 Floral 6 
Proportion 12.77% Proportion 14.89% Proportion 6.38% 
Blue 11 Jacket 14 Leather 4 
Proportion 11.7% Proportion 14.89% Proportion 4.26% 
Table 4: Cluster 2 Frequency Breakdown 
Brands Represented (20/26): 7 Days Active, Acne, Balmain, Chanel, Dolce & Gabbana, Feben, 
Ganni, Hermes, Isabel Marant, Loewe, MiuMiu, Numero21, OperaSport, Prada, Rotate, 
Salvatore Ferragamo, Tory Burch, Undercover, Versace, Yves Saint Laurent  
49 
 
 
 
 
Colors
21%
Black
Purple
12% Blue
54% Other
13%
 
Figure 29: Cluster 2 Color Frequency Chart 
This pie chart depicts the top three colors in terms of their frequency among the entire cluster. 
Colors not in the top 3 are collected under “Other”. 
Item Types
25%
Dress
45% Pants
Jacket
15% Other
15%
Figure 30: Cluster 2 Item Type Frequency Chart 
This figure depicts the top three item styles in terms of proportion to the entire cluster. Items not in 
the top 3 are classified under “Other”. 
50 
 
 
Material/Print
16%
6% Mesh/Sheer
Floral
4%
Leather
Other
74%
 
Figure 31: Cluster 2 Material/Print Frequency Chart 
This figure is a pie chart that displays the proportion of images that include the top three 
material/prints within the cluster. Those materials/prints that did not make up the top three 
materials/prints are included under “Other”. 
Cluster 3 
Total images: 54 
Top Colors Frequency Top Item Frequency Top Frequency 
types Material/Print 
Black 17 Dress 16 Leather 5 
Proportion 31.48% Proportion 29.63% Proportion 9.25% 
Red 9 Pants 9 Animal Print 4 
Proportion 16.67% Proportion 16.67% Proportion 7.4% 
Blue 6 Jacket 13 Mesh/Sheer 4 
Proportion 11.11% Proportion 24.07% Proportion 7.4% 
Table 5: Cluster 3 Frequency Breakdown 
 
51 
 
 
Brands Represented (15/26): 7 Days Active, Balmain, Erdem, Feben, Ganni, Hermes, Isabel 
Marant, Jill Sander, KNWL, Loewe, MiuMiu, Numero21, Tory Burch, Undercover, Yves Saint 
Laurent 
Colors
31%
Black
41%
Red
Blue
Other
17%
11%
 
Figure 32: Cluster 3 Color Frequency Chart 
This pie chart depicts the top three colors in terms of their frequency among the entire cluster. 
Colors not in the top 3 are collected under “Other”. 
 
52 
 
 
Item Types
29%
30%
Dress
Pants
Jacket
Other
17%
24%
 
Figure 33: Cluster 3 Item Type Frequency Chart 
This figure depicts the top three item styles in terms of proportion to the entire cluster. Items not in 
the top 3 are classified under “Other”. 
Material/Print
9%
8%
Leather
7%
Animal Print
Mesh/Sheer
Other
76%
Figure 34: Cluster 3 Material/Print Frequency Chart 
This figure is a pie chart that displays the proportion of images that include the top three 
material/prints within the cluster. Those materials/prints that did not make up the top three 
materials/prints are included under “Other”. 
 
 
53 
 
 
Cluster 4 
Total images: 100 
Top Colors Frequency Top Item Frequency Top Frequency 
types Material/Print 
Black 42 Dress 30 Leather 7 
Proportion 42% Proportion 30% Proportion 7% 
Grey 12 Pants 12 Sparkles 6 
Proportion 12% Proportion 12% Proportion 6% 
Blue 10 Jacket 11 Mesh/Sheer 15 
Proportion 10% Proportion 11% Proportion 15% 
Table 6: Cluster 4 Frequency Breakdown 
Brands Represented (22/26): 7 Days Active, Acne, Chanel, Dolce & Gabbana, Erdem, Ganni, 
Hermes, Isabel Marant, Jill Sander, KNWL, Loewe, MiuMiu, Numero21, OperaSport, Prada, 
Rotate, Salvatore Ferragamo, Tory Burch, Undercover, Versace, Wood Wood, Yves Saint 
Laurent  
 
 
54 
 
 
 
Colors
42%
36% Black
Grey
Blue
Other
10%
12%
 
Figure 35: Cluster 4 Color Frequency Chart 
This pie chart depicts the top three colors in terms of their frequency among the entire cluster. 
Colors not in the top 3 are collected under “Other”. 
Item Types
30%
Dress
47% Pants
Jacket
Other
12%
11%
Figure 36: Cluster 4 Item Type Frequency Chart 
This figure depicts the top three item styles in terms of proportion to the entire cluster. Items not in 
the top 3 are classified under “Other”. 
55 
 
 
Material/Print
7%
6%
Leather
15% Sparkles
Mesh/Sheer
Other
72%
 
Figure 37: Cluster 4 Material/Print Frequency Chart 
This figure is a pie chart that displays the proportion of images that include the top three 
material/prints within the cluster. Those materials/prints that did not make up the top three 
materials/prints are included under “Other”. 
Cluster 5 
Total images: 110 
Top Colors Frequency Top Item Frequency Top Frequency 
types Material/Print 
Black 37 Dress 29 Leather 10 
Proportion 33.64% Proportion 26.36% Proportion 9.1% 
Grey 19 Shorts 12 Floral 10 
Proportion 17.27% Proportion 10.91% Proportion 9.1% 
Green 14 Jacket 12 Mesh/Sheer 10 
Proportion 12.72% Proportion 10.91% Proportion 9.1% 
Table 7: Cluster 5 Frequency Breakdown 
 
56 
 
 
Brands Represented (21/26): 7 Days Active, Acne, Balmain, Chanel, Dolce & Gabbana, Feben, 
Hermes, Isabel Marant, KNWL, Loewe, MiuMiu, OperaSport, Prada, Rotate, Salvatore 
Ferragamo, Tory Burch, Undercover, Versace, Wood Wood, Yves Saint Laurent  
Colors
34%
36% Black
Grey
Green
Other
13% 17%
Figure 38: Cluster 5 Color Frequency Chart  
This pie chart depicts the top three colors in terms of their frequency among the entire cluster. 
Colors not in the top 3 are collected under “Other”. 
Item Types
26%
Dress
Shorts
52% Jacket
11%
Other
11%
Figure 39: Cluster 5 Item Frequency Chart 
This figure depicts the top three item styles in terms of proportions to the entire cluster. Items not 
in the top 3 are classified under “Other” 
 
57 
 
 
Material/Print
9%
9%
Leather
9% Floral
Mesh/Sheer
Other
73%
Figure 40: Cluster 5 Material/Print Frequency Chart 
This figure is a pie chart that displays the proportion of images that include the top three 
material/prints within the cluster. Those materials/prints that did not make up the top three 
materials/prints are included under “Other”. 
Cluster 6 
Total images: 122 
Top Colors Frequency Top Item Frequency Top Frequency 
types Material/Print 
White 60 Dress 60 Floral 25 
Proportion 49.18% Proportion 49.18% Proportion 20.49% 
Yellow 28 Pants 12 Stripes 4 
Proportion 22.95% Proportion 9.84% Proportion 3.28% 
Orange 17 Matching 11 Mesh/Sheer 18 
Set* 
Proportion 13.93% Proportion 9.02% Proportion 14.75% 
58 
 
 
Table 8: Cluster 6 Frequency Breakdown 
 
*A matching set is a top and bottom of an outfit that match in color and material; made to be 
worn together 
Brands Represented (21/26): 7 Days Active, Acne, Balmain, Dolce & Gabbana, Feben, Fendi, 
Ganni, Hermes, Loewe, Miu Miu, Numero21, OperaSport, Prada, Rotate, Salvatore Ferragamo, 
Tory Burch, Undercover, Versace, Wood Wood, Yves Saint Laurent , Zimmerman 
 
 
Colors
14%
White
14%
49% Yellow
Orange
Other
23%
Figure 41: Cluster 6 Color Frequency Chart 
This pie chart depicts the top three colors in terms of their frequency among the entire cluster. 
Colors not in the top 3 are collected under “Other”. 
 
 
59 
 
 
 
Item Types
32% Dress
49% Pants
Matching Set*
Other
9%
10%
 
Figure 42: Cluster 6 Item Type Frequency Chart 
This figure depicts the top three item styles in terms of proportion to the entire cluster. Items not in 
the top 3 are classified under “Other”. 
Material/Print
21%
Floral
3% Stripes
Mesh/Sheer
61% 15% Other
 
Figure 43: Cluster 6 Material/Print Frequency Chart 
This figure is a pie chart that displays the proportion of images that include the top three 
material/prints within the cluster. Those materials/prints that did not make up the top three 
materials/prints are included under “Other”. 
60 
 
 
Cluster 7 
Total images: 18 
Top Colors Frequency Top Item Frequency Top Frequency 
types Material/Print 
Red 7 Dress 7 Floral 2 
Proportion 38.89% Proportion 38.89% Proportion 11.11% 
Black 3 Pants 9 Leather 6 
Proportion 16.67% Proportion 50% Proportion 33.33% 
Brown 3 Shorts 2 Mesh/Sheer 2 
Proportion 16.67% Proportion 11.11% Proportion 11.11% 
Table 9: Cluster 7 Frequency Breakdown 
*A matching set is a top and bottom of an outfit that match in color and material; made to be 
worn together 
 
Brands Represented (10/26): 7 Days Active, Hermes, Jill Sander, KNWL, Miu Miu, Rotate, Tory 
Burch, Wood Wood, Yves Saint Laurent , Zimmerman 
61 
 
 
Colors
28%
39% Red
Black
Brown
Other
17%
16%
 
Figure 44: Cluster 7 Color Frequency Chart 
This pie chart depicts the top three colors in terms of their frequency among the entire cluster. 
Colors not in the top 3 are collected under “Other”. 
 
Item Types
0%
11%
39% Dress
Pants
Shorts
Other
50%
 
Figure 45: Cluster 7 Item Type Frequency Chart 
This figure depicts the top three item styles in terms of proportion to the entire cluster. Items not in 
the top 3 are classified under “Other”. 
62 
 
 
Material/Print
11%
Floral
45% Leather
33% Mesh/Sheer
Other
11%
 
Figure 46: Cluster 7 Material/Print Frequency Chart 
This figure is a pie chart that displays the proportion of images that include the top three 
material/prints within the cluster. Those materials/prints that did not make up the top three 
materials/prints are included under “Other”. 
Cluster 8 
Total images: 101 
Top Colors Frequency Top Item Frequency Top Frequency 
types Material/Print 
White 22 Dress 20 Floral 10 
Proportion 21.78% Proportion 19.8% Proportion 9.9% 
Brown 18 Jacket 17 Stripes 16 
Proportion 17.82% Proportion 16.83% Proportion 15.84% 
Blue 18 Matching 22 Leather 14 
Set* 
Proportion 17.82% Proportion 21.78% Proportion 13.86% 
 
63 
 
 
Table 10: Cluster 8 Frequency Breakdown 
*A matching set is a top and bottom of an outfit that match in color and material; made to be 
worn together 
 
Brands Represented (24/26): 7 Days Active, Acne, Balmain, Chanel, Dolce & Gabbana, Erdem, 
Feben, Ganni, Hermes, Isabel Marant, Jill Sander, KNWL, Loewe, Miu Miu, Numero21, 
OperaSport, Rotate, Salvatore Ferragamo, Tory Burch, Undercover, Wood Wood, Yves Saint 
Laurent , Zimmerman 
 
Colors
22%
White
42%
Brown
Blue
18% Other
18%
Figure 47: Cluster 8 Color Frequency Chart 
This pie chart depicts the top three colors in terms of their frequency among the entire cluster. 
Colors not in the top 3 are collected under “Other” 
 
64 
 
 
Item Types
20%
Dress
41%
Jacket
17% Matching Set*
Other
22%
. 
Figure 48: Cluster 8 Item Type Frequency Chart 
This figure depicts the top three item styles in terms of proportion to the entire cluster. Items not in 
the top 3 are classified under “Other”. 
Material/Print
10%
16% Floral
Stripes
Leather
60% 14% Other
Figure 49: Cluster 8 Materials/Print Frequency Chart 
This figure is a pie chart that displays the proportion of images that include the top three 
material/prints within the cluster. Those materials/prints that did not make up the top three 
materials/prints are included under “Other”. 
 
65 
 
 
Cluster 9 
Total images: 77 
Top Colors Frequency Top Item Frequency Top Frequency 
types Material/Print 
White 25 Dress 32 Floral 14 
Proportion 32.47% Proportion 41.56% Proportion 18.18% 
Green 19 Blouse 9 Stripes 9 
Proportion 24.68% Proportion 11.69% Proportion 11.69% 
Grey 15 Matching 8 Mesh/Sheer 11 
Set* 
Proportion 19.48% Proportion 10.39% Proportion 14.29% 
 
Table 11: Cluster 9 Frequency Breakdown 
*A matching set is a top and bottom of an outfit that match in color and material; made to be 
worn together 
 
Brands Represented (15/26): Acne, Balmain, Chanel, Dolce & Gabbana, Fendi, Hermes, Isabel 
Marant, Loewe, Prada, Rotate, Salvatore Ferragamo, Tory Burch, Yves Saint Laurent, 
Zimmerman 
66 
 
 
Colors
23%
32% White
Green
Grey
20% Other
25%
Figure 50: Cluster 9 Color Frequency Chart 
This pie chart depicts the top three colors in terms of their frequency among the entire cluster. 
Colors not in the top 3 are collected under “Other”. 
Item Types
42%
36% Dress
Blouse
Matching Set*
Other
10%
12%
Figure 51: Cluster 9 Item Type Frequency Chart 
This figure depicts the top three item styles in terms of proportion to the entire cluster. Items not in 
the top 3 are classified under “Other”. 
 
 
67 
 
 
Material/Print
18%
Floral
12% Stripes
Mesh/Sheer
56%
Other
14%
 
Figure 52: Cluster 9 Material/Print Frequency Chart 
This figure is a pie chart that displays the proportion of images that include the top three 
material/prints within the cluster. Those materials/prints that did not make up the top three 
materials/prints are included under “Other”. 
Cluster 10 
Total images: 48 
Top Colors Frequency Top Item Frequency Top Frequency 
types Material/Print 
Grey 15 Dress 25 Floral 8 
Proportion 31.25% Proportion 52.08% Proportion 16.67% 
Black 9 Jacket 10 Stripes 4 
Proportion 18.75% Proportion 20.83% Proportion 8.33% 
Pink 8 Blouse 7 Animal print 4 
Proportion 16.67% Proportion 14.58% Proportion 8.33% 
Table 12: Cluster 10 Frequency Breakdown 
 
68 
 
 
*A matching set is a top and bottom of an outfit that match in color and material; made to be 
worn together 
 
Brands Represented (13/26): Chanel, Erdem, Feben, Fendi, Ganni, Hermes, Jil Sander, 
Numero21, OperaSport, Salvatore Ferragamo, Tory Burch, Undercover, Zimmerman 
 
 
Colors
33% 31% Grey
Black
Pink
Other
17%
19%
Figure 53: Cluster 10 Color Frequency Chart 
This pie chart depicts the top three colors in terms of their frequency among the entire cluster. 
Colors not in the top 3 are collected under “Other”. 
 
 
 
 
 
69 
 
 
Item Types
12%
Dress
15%
Jacket
52% Blouse
Other
21%
 
Figure 54: Cluster 10 Item Type Frequency Chart 
This figure depicts the top three item styles in terms of proportion to the entire cluster. Items not in 
the top 3 are classified under “Other”. 
Material/Print
17%
Floral
8% Stripes
8% Animal print
67% Other
 
Figure 55: Cluster 10 Material/Print Frequency Chart 
This figure is a pie chart that displays the proportion of images that include the top three 
material/prints within the cluster. Those materials/prints that did not make up the top three 
materials/prints are included under “Other”. 
 
70 
 
 
4.3 Results 
Cluster 0 is represented mostly with the color white, dresses, and mesh or sheer materials. 
Cluster 1 consists of heavily the color black, jackets, and leather materials. Cluster 2 was made 
up of items with the color black, dresses, and mesh or sheer materials. Cluster 3 is represented by 
the color black, dresses, and leather materials. Cluster 4 consisted of the color black, dresses, and 
sheer/mesh items. Cluster 5 contained mostly black colors, dresses, and tied for materials that 
were leather, sheer/mesh, or floral patterns. Cluster 6 was made up of mostly white colors, 
dresses, and floral patterns. Cluster 7 is represented by the color red, pants, and leather materials. 
Cluster 8 is identified with having white colors, matching sets, and stripes. Cluster 9 consisted of 
white colors, dresses, and floral patterns. Cluster 10 contained grey colors, jackets, and floral 
patterns. 
The most common top three colors identified (outside of black, white, or grey) were blue, 
which appeared in the top three colors of 4 clusters, and red, which appeared in the top three 
colors of 3 clusters. Green, brown, and yellow appeared in the top colors of two clusters each. 
Pink, Orange, and Purple appeared in one cluster’s top three colors each. 
Dresses appeared in every cluster’s top three item types. Jackets appeared in nine of the 
clusters top three item types. Pants and Matching Sets were identified in five cluster’s top three 
item types. Blouse and Shorts were identified in two of the clusters top three item types. 
The most common materials/prints identified were floral and mesh/sheer, which both 
appeared in the top three materials/prints of seven clusters. Leather followed, appearing in the 
top three of six clusters. Stripes appeared in 5 clusters’ top three materials/prints. Animal Print 
was present in 2 clusters top three materials/prints. Abstract designs appeared in one cluster’s top 
three materials/prints. 
71 
 
 
Key Color Trends Key Item Type Key Material/Print 
Identified Trends Identified Trends Identified 
Black Dress Mesh/Sheer 
White Jacket Floral 
Grey Matching Set Leather 
Red Pants Stripes 
Blue Blouse Animal Print 
Table 13: Key Trend Table Summary 
This table identifies the top five trends from all the clusters combined. 
4.4 Accuracy & Evaluation 
A key aspect of any technical application to a traditional method is assessing the accuracy 
of the new approach. Evaluating accuracy in this scenario is best done by comparing a traditional 
method of a magazine article identifying spring/summer trends to our results. We first selected 
Glamour Magazine’s February 2024 article, “Meet the Spring 2024 Fashion Trends You Should 
Know (and Shop) Now”. Author Jake Henry Smith highlights 14 trends in the article, notably 
with some of the same identifications as we found. Glamour’s article mentioned burgundy, 
animal print, sky blue, stripes, and sheer layering as six of the 14 trends(Smith 2024).  
We then evaluated British Vogue’s March 2024 article “The Key Spring/Summer 2024 
Trends To Know Now” by Ellie Pithers. Pithers notes that this season’s “palette was muted, with 
black and white blotting out” the typically brighter colors associated with summer fashion but 
noted red as one of the “few tones” to make it in the collections (Pithers 2024). The article 
highlights shorts, white dresses, roses/florals and sheer skirts among the ten mentioned trends.  
72 
 
 
Finally, The Cut published an article in February 2024 called “Five Spring 2024 Trends 
that Depict the Season” by fashion market editor Cortne Bonilla. Bonilla identifies white dresses, 
baby blue, the color black, and transparent (or sheer) materials in her writing (Bonilla 2024). 
 
Trends also identified in above publications:  
Trends NOT identified in above publications:   
 
Key Color Trends Key Item Type Key Material/Print 
Identified Trends Identified Trends Identified 
Black Dress Mesh/Sheer 
White Jacket Floral 
Grey Matching Set Leather 
Red Pants Stripes 
Blue Blouse Animal Print 
Table 14: Trends Seen in Publications  
This table identifies which of our identified trends also appeared in publications. 
 
 
 
 
73 
 
 
Chapter 5: Conclusion 
 
Of the fifteen trends that our method identified, nine of them were also highlighted in one 
of the three fashion magazine publications above. This left six trends that we found that were not 
corroborated by traditional methods. Of those trends not identified, four of them were under the 
Item Type trend category. One color trend and one material/print trend were also not identified. 
This leads one to conclude that our cluster analysis struggled mainly to cluster and identify Item 
Types. Given K-means clustering does not identify specific items, and instead relies upon 
features such as colors and patterns, this result is understandable. However, it can be noted that 
the K-Means clustering method was highly successful in clustering images that helped to identify 
trends in colors and materials/prints. 
Limitations of this method lie in differentiation between articles of clothing. In further 
research, we would suggest adopting a method that segments images into different clothing 
pieces prior to extracting features and clustering. This would ensure that each individual item of 
clothing is compared rather than each image. An example of methodology to consider would be 
“Neo-Fashion”, created in 2021 by Li Zhao, Muzhen Li, and Peng Sun as referred to in the 
previously mentioned literature review (Zhao, Li, and Sun 2024). Additionally, we might 
consider more detailed clothing Item Type labels, such as high waisted pants, or capris, instead 
of only pants to allow for more variety of identification. 
Another suggestion for further research would be to include images from social media 
sites. This is because as social media usage grows, and platforms such as Instagram and TikTok 
are implementing shopping options for users, clothing trends are sure to emerge or at least be 
influenced by these sites. To approach this, we would recommend creating a data set of images 
74 
 
 
from runway fashion, and social media images to acquire a more comprehensive trend 
identification method. 
We hope that this research can be applied to inventory management and purchasing for 
clothing retailers of any size. By understanding what trends are incoming for the future seasons, 
inventory can be purchased ahead of time to reflect what consumers more accurately will be 
interested in. Additionally, by ordering inventory that aligns with trends, ideally this will 
minimize clothing that will either be discounted or end up in landfills as textile waste. This 
approach has the potential to save retailers money and eliminate pollution around the world. 
Further, this trend identification method can be adapted to other consumer purchasing trends. For 
example, if similar trend identification methods are used for home products, makeup, or 
technology, production and inventory can be planned accordingly.  
Overall, clustering methods such as K-means are a highly accessible and simplistic way 
to introduce machine learning tactics into industry and inventory management. They are easily 
understood and implemented, even for those with limited experience in machine learning. Our 
hope is that this research will promote better planning to reduce consumer product waste and 
encourage others to apply traditionally academic methods to real life issues or interests such as 
fashion trends. 
 
 
 
 
 
 
75 
 
 
References 
Aggarwal, Chaur C., and Chandan K. Reddy. 2014. Data Clustering: Algorithms and 
Applications. 
ArcGIS Pro. n.d. ‘How Density-Based Clustering Works’. ArcGIS Pro. Accessed 4 May 
2024. https://pro.arcgis.com/en/pro-app/latest/tool-reference/spatial-statistics/how-
density-based-clustering-works.htm. 
Balchandani, Anita, Ewa Starzynska, David Barrelet, Gemma D’Auria, Felix Rolkens, and 
Imran Amed. 2024. ‘The State of Fashion 2024’. McKinsey & Company. 2024. 
https://www.mckinsey.com/industries/retail/our-insights/state-of-fashion. 
Belcher, Byron T., Eliana H. Bower, Benjamin Burford, Maria Rosa Celis, Ashkaan K. 
Fahimipour, Isabela L. Guevara, Kakani Katija, et al. 2023. ‘Demystifying Image-
Based Machine Learning: A Practical Guide to Automated Analysis of Field 
Imagery Using Modern Machine Learning Tools’. Frontiers in Marine Science 10 
(June): 1157370. https://doi.org/10.3389/FMARS.2023.1157370/BIBTEX. 
Benslama, Teissir, and Rim Jallouli. 2020. ‘Clustering of Social Media Data and Marketing 
Decisions’. Lecture Notes in Business Information Processing 395: 53–65. 
https://doi.org/10.1007/978-3-030-64642-4_5/TABLES/5. 
Bonilla, Cortne. 2024. ‘Five Spring 2024 Trends You Can Shop Now’. 2024. 
https://www.thecut.com/article/spring-summer-2024-trends-buy-now.html. 
Gaddamadugu, Indu. 2023. ‘Applications of Technology in Fashion Trend Forecasting’. 
Illumin Magazine, 2023. https://illumin.usc.edu/applications-of-technology-in-
fashion-trend-forecasting/. 
Greenfield, Nicole. 2023. ‘New York Is Exposing the Fashion Industry for What It Is: A 
Climate Nightmare’. National Resource Defense Council. 2023. 
Huang, Fu Hsien, Hsin Min Lu, and Yao Wen Hsu. 2021. ‘From Street Photos to Fashion 
Trends: Leveraging User-Provided Noisy Labels for Fashion Understanding’. IEEE 
Access 9: 49189–205. https://doi.org/10.1109/ACCESS.2021.3069245. 
James Hillman Fashion Consultancy. 2021. ‘What Is a Fashion Trend?’ James Hillman 
Fashion Consultancy. 2021. https://www.jameshillman.co.uk/blog/2021/3/4/what-is-
a-fashion-trend. 
Maloney, Carolyn B. 2019. ‘The Economic Impact of the Fashion Industry’. United States 
Joint Economic Committee. 2019. 
https://www.jec.senate.gov/public/index.cfm/democrats/2019/2/the-economic-
impact-of-the-fashion-industry. 
MasterClass. 2021. ‘Fashion Trend Forecasting: How Brands Predict New Styles’. 2021. 
https://www.masterclass.com/articles/fashion-trend-forecasting-guide. 
76 
 
 
McAuley, Julian, Christopher Targett, and Anton van den Hengel. 2015. ‘Image-Based 
Recommendations on Styles and Substitutes’. 
https://doi.org/10.1145/2766462.2767755. 
Norouzzadeh, Mohammad Sadegh, Anh Nguyen, Margaret Kosmala, Alexandra Swanson, 
Meredith S. Palmer, Craig Packer, and Jeff Clune. 2018. ‘Automatically Identifying, 
Counting, and Describing Wild Animals in Camera-Trap Images with Deep 
Learning’. Proceedings of the National Academy of Sciences of the United States of 
America 115 (25): E5716–25. 
https://doi.org/10.1073/PNAS.1719367115/SUPPL_FILE/PNAS.1719367115.SAPP
.PDF. 
Oyelade, Jelili, Iitunuoluwa Isewon, Funke Oladipupo, Olufemi Aromolaran, Efosa 
Uwoghiren, Faridah Aameh, Mmoses Aachas, and Ezekiel Aadebiyi. 2016. 
‘Clustering Algorithms: Their Application to Gene Expression Data’. Bioinformatics 
and Biology Insights 10 (November): 237–53. 
https://doi.org/10.4137/BBI.S38316/ASSET/IMAGES/LARGE/10.4137_BBI.S3831
6-FIG1.JPEG. 
Peng, Feng, and Kai Li. 2023. ‘Deep Image Clustering Based on Label Similarity and 
Maximizing Mutual Information across Views’. Applied Sciences 2023, Vol. 13, 
Page 674 13 (1): 674. https://doi.org/10.3390/APP13010674. 
Pithers, Ellie. 2024. ‘The 10 Key Spring/Summer 2024 Fashion Trends To Know Now | 
British Vogue’. March 2024. https://www.vogue.co.uk/article/spring-summer-2024-
fashion-trends. 
Singh, Pawan Kumar, Yadunath Gupta, Nilpa Jha, and Aruna Rajan. 2019. ‘Fashion Retail: 
Forecasting Demand for New Items’. https://doi.org/10.1145/1122445. 
Skenderi, Geri, Christian Joppi, Matteo Denitto, and Marco Cristani. 2024. ‘Well Googled 
Is Half Done: Multimodal Forecasting of New Fashion Product Sales with Image-
Based Google Trends’. https://github.com/HumaticsLAB/GTM-Transformer. 
Smith, Jake Henry. 2024. ‘12 Spring 2024 Fashion Trends to Start Shopping Now | 
Glamour’. April 2024. https://www.glamour.com/story/2024-fashion-trends. 
Viegas, Jennifer. 2011. ‘Humans First Wore Clothing 170,000 Years Ago’. NBC News. 
2011. https://www.nbcnews.com/id/wbna40965564. 
Vincent, Tom, Kenji Kawahara, Vladimir Antonov, Hiroki Ago, and Olga Kazakova. 2023. 
‘Data Cluster Analysis and Machine Learning for Classification of Twisted Bilayer 
Graphene’. Carbon 201 (January): 141–49. 
https://doi.org/10.1016/J.CARBON.2022.09.021. 
Vittayakorn, Sirion, Kota Yamaguchi, Alexander C. Berg, and Tamara L. Berg. 2015. 
‘Runway to Realway: Visual Analysis of Fashion’. Proceedings - 2015 IEEE Winter 
77 
 
 
Conference on Applications of Computer Vision, WACV 2015, February, 951–58. 
https://doi.org/10.1109/WACV.2015.131. 
Yadav, Mrinal. 2020. ‘DBSCAN ALGORITHM’. Medium. 2020. 
https://mrinalyadav7.medium.com/dbscan-algorithm-c894701306d5. 
Yan, Cairong, Umar Subhan Malhi, Yongfeng Huang, and Ran Tao. 2019. ‘Unsupervised 
Deep Clustering for Fashion Images’. Communications in Computer and 
Information Science 1027: 85–96. https://doi.org/10.1007/978-3-030-21451-
7_8/TABLES/2. 
Zhao, Li, Muzhen Li, and Peng Sun. 2024. ‘Neo-Fashion: A Data-Driven Fashion Trend 
Forecasting System Using Catwalk Analysis’. Clothing and Textiles Research 
Journal. https://doi.org/10.1177/0887302X211004299. 
  
 
 
 
 
78