AN ALGORITHM FOR CLIPPING POLYGONS OF 
 LARGE GEOGRAPHICAL DATA 
 
 
 
 
 
 
 
 
 
by 
 
AREEJ SALEH ALGHAMDI 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
A THESIS 
 
Presented to the Department of Computer and Information Science 
and the Graduate School of the University of Oregon 
in partial fulfillment of the requirements 
for the degree of 
Master of Science  
March 2017 
 
  
  
ii 
THESIS APPROVAL PAGE 
 
Student: Areej Saleh Alghamdi 
Title: An Algorithm for Clipping Polygons of Large Geographical Data  
This thesis has been accepted and approved in partial fulfillment of the requirements for 
the Master of Science degree in the Department of Computer and Information Science 
by: 
 
Hank Childs Chairperson 
 
and 
 
Scott L. Pratt Dean of the Graduate School  
 
Original approval signatures are on file with the University of Oregon Graduate School. 
 
Degree awarded March 2017 
  
  
iii 
 
 
 
 
 
 
 
 
 
  
 
 
 
 
 
© 2017 Areej Saleh Alghamdi 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
  
  
iv 
THESIS ABSTRACT 
 
Areej Saleh Alghamdi  
 
Master of Science 
 
Department of Computer and Information Science 
 
March 2017  
 
Title: An Algorithm for Clipping Polygons of Large Geographical Data 
 
We present an algorithm for overlaying polygonal data with regular grids and 
calculating the percentage overlap for each cell in the regular grid.  Our algorithm is able 
to support self-intersecting polygons, meaning that some spatial regions may be covered 
by two or more polygons.  Our algorithm is able to identify these cases and eliminate 
redundant polygons, preventing erroneous results.  We also present an optimized version 
of our algorithm that uses spatial sorting through interval trees, and provide a performance 
comparison between the optimized and unoptimized versions. Finally, we apply our 
algorithm to geography data, specifically of bark beetle infestation. 
 
 
 
 
 
 
 
 
 
  
v 
CURRICULUM VITAE 
 
NAME OF AUTHOR: Areej Saleh Alghamdi 
 
 
GRADUATE AND UNDERGRADUATE SCHOOLS ATTENDED: 
 
 University of Oregon, Eugene, OR 
 
DEGREES AWARDED: 
 
Master of Science, Computer and Information Science, 2017, University of 
Oregon 
Bachelor of Science, Computer and Information Science, 2014, University of 
Oregon 
 
AREAS OF SPECIAL INTEREST: 
 
 Scientific Visualization 
 High Performance Computing 
 
GRANTS, AWARDS, AND HONORS: 
 
King Abdullah Scholarship for a Bachelor Degree, Ministry of Higher Education,  
Saudi Arabia, March 2009 – March 2014 
 
King Abdullah Scholarship for a Master Degree, Ministry of Higher Education, 
Saudi Arabia, April 2014 – March 2017 
  
vi 
ACKNOWLEDGMENTS 
 
First and foremost, I would like to express my gratitude to Allah (God) for giving 
me the ability and strength to complete this work successfully.  
I cannot begin to express my gratitude to my family for their genuine love and 
encouragement all through this journey. To my Father and Mother, for their unconditional 
and endless love and care throughout my life, who raised me to be the person I am today that 
always aspire to the best, and whom I consider my greatest role models in life. To my 
husband, my soul mate, for his constant love, care, motivation, and understanding. Also, to 
my siblings for always cheering me up and making me feel loved, appreciated and proud for 
whatever that I do small or big. Also, for my two little angels, the yet to understand anything, 
Maya, and the yet to be born my baby boy…I love you all.  
My greatest appreciation and sincere thanks go to my advisor and mentor Dr. Hank 
Childs, for his willingness to offer his support, valuable feedback, and especially for his 
confidence in me. Dr. Childs, a great and valuable portion of my knowledge and love for 
the field of computer science came from you and I am so honored to have been one of your 
students.  
 
 
 
 
 
 
 
  
vii 
 
This thesis is dedicated to my parents, my husband, and my siblings.  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
  
viii 
TABLE OF CONTENTS 
Chapter Page 
 
 
I. INTRODUCTION    .................................................................................................    1 
II. RELATED WORK .................................................................................................    3 
III. DATA DESCRIPTION  ........................................................................................    8 
IV. ALGORITHM  ......................................................................................................    12 
      Algorithm Initialization  ........................................................................................    12 
      Algorithm 1 (Unoptimized) ...................................................................................    12 
               Pseudocode  ..................................................................................................    13 
               Full Description of Code  ..............................................................................    14 
                        Finding Cells Covered by Every Triangle in the Data Set  ..................    14   
                        Clipping Routine Algorithm  ...............................................................    16  
                        Detailed Description of the Clipping Routine Functions  ....................    19 
                                 Processing Triangles for Clipping against the Maximum X  
                                 Limit of a Cell  ............................................................................    19 
                                 Clipping a Triangle against the Maximum X limit  
                                 of a Cell  ......................................................................................    19 
                        Calculating Area Covered and Removing Overlap between  
                        Polygons in the Data  ...........................................................................    21 
      Algorithm 2 (Optimized)  ......................................................................................    23 
               Pseudocode ...................................................................................................    23 
 
 
 
  
ix 
Chapter Page 
 
 
               Full Description of Code …………………………………………………...   25 
                        Interval Tree Construction and Usage  ................................................    25 
                        Detailed Description of Finding the Area of Overlap between  
                        Two Triangles Functions  ....................................................................    26 
                                 Finding the Intersection Points Between the Edges of Two  
                                 Triangles ………………………………………………………     26 
                                 Testing if any of the Two Triangle Vertices are inside of  
                                 the Other Triangle  ......................................................................    27 
                                 Calculating the Area of overlap between the Two Triangles  
                                 Using Found Points of Intersections  ..........................................    28 
V. TESTING AND RESULTS .................................................................................... 29 
      Experiments Overview ........................................................................................... 29 
      Factors .............. .....................................................................................................    29 
               Optimization Settings  ...................................................................................    29 
               Grid Sizes .............. .......................................................................................    29 
               Data Sets .............. ........................................................................................    29 
      Results and Performance Analysis  .......................................................................  30 
VI. CONCLUSION .....................................................................................................    34 
REFERENCES CITED ................................................................................................ 35 
 
 
 
 
 
  
x 
 
LIST OF FIGURES 
 
 
Figure Page 
 
 
1. Data Overlap Illustration  ......................................................................................      9 
 
2. 4 Million Triangles of the Intermountain (IM) Data with Polygon Overlap  ........    10 
 
3. 4 Million Triangles of IM Data without Polygon Overlap  ...................................    10 
 
4. Australia with Overlap among Polygons  ..............................................................    11 
 
5. Australia without Overlap among Polygons  .........................................................    11 
 
6.   The Unoptimized Algorithm Pseudocode  .............................................................    14 
 
7. The Process of Finding the Cells that Overlap with a Triangle  ............................ 16 
8. The Process of Clipping a Triangle against a Cell’s Boundaries  .......................... 18 
9. Example to Show the Process of Clipping a Triangle against the Maximum X  
Value of a Cell  ............................................................................................................    20 
10. The Output after Clipping a Triangle against the Maximum X Value  
of a Cell  ....................................................................................................................... 21 
11. Example to Illustrate the Process of Calculating the Area Covered of a Cell  
and Removing the Overlap  ......................................................................................... 23 
12. The Optimized Algorithm Pseudocode  ................................................................. 24 
13. Performance Trend Comparison between the Optimized and Unoptimized  
Algorithm  ....................................................................................................................    30 
  
xi 
LIST OF TABLES 
 
Table Page 
 
 
1. Execution Times Comparison between the Optimized and Unoptimized Code ....    31 
 
2.   Execution Times for Building the Interval Trees.……………………………….     31 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
  
1 
 CHAPTER I 
 
INTRODUCTION 
 
 
The use of interactive and scientific visualization and data analysis techniques has 
become a necessity, and scientific researchers in various fields are becoming more inclined 
to use these techniques. It helps them understand their collected data, simulate experiments 
or propose methodologies to gain a better perspective into them. In this thesis, we explore 
visualization techniques to assist with problems from the geography domain, specifically 
those coming from professor Chris Bone from the Geography department. 
The data that inspired this project is a large scientific data set collected over a period 
of 18 years. It has been collected aerially from tree-covered areas in the United States. It 
reflects the loss and “mortality over millions of hectares of forests” caused by the mountain 
pine beetle, which is “a native insect that occurs in pine forests over much of western North 
America, extending from northern Mexico to northwestern British Columbia (BC) and 
from the Pacific Ocean east to the Black Hills of South Dakota (Wood 1982)” [1].  
This collected data is important because it captures the harmful effect of the 
mountain bark beetle insect on pine forests. Professor Bone is studying and analyzing this 
data to further understand the history and effect of the bark beetle on pine trees, one of the 
greatest national treasures. The research in this thesis was inspired by the idea of helping 
him to visualize his data.  
The data is divided into five geographical regions of the United States map: 
Northern, Intermountain, Pacific Northwest, Pacific Southwest, Rocky Mountain, and 
Southwest. For every region, there is a shapefile that contains the data collected aerially of 
the areas of trees killed by the bark beetle. Each such region is stored as a polygon.  Even 
  
2 
though each geographical region’s data is independent of the others, within a single region 
it might overlap with that of another region. This overlap causes miscalculations and 
produces erroneous results. 
This thesis describes an approach to calculate the percentage of trees attacked in a 
certain area while dealing with the problem of the overlap within the data. The goal is to 
create 2D histograms with a “percent filled” for each bin. The calculation of the percentage 
of a bin is done by calculating the area covered by the bin with the data that overlaps with 
it and then dividing it by the whole area of the bin. However, a naïve approach would lead 
to over-counting of the area covered, due to the overlap. We solve this problem by 
efficiently identifying overlapping data and clipping away the redundant regions to get the 
correct answer.   
The contribution of this thesis is twofold. First, it describes an algorithm for 
efficiently and correctly producing 2D histograms that enable geographers to understand 
the impact of bark beetle on pine trees. This approach can be applied to other types of data 
as well. The second contribution is a study of the efficiency of the approach, varying over 
several configurations and choices for search structures. 
In Chapter II, we survey previous work in polygon clipping and finding 
intersections between polygons. Chapter III describes the data that inspired this thesis, how 
it is configured and the specific problems with it, and provides samples of the data 
processed and visualized before and after the problems are addressed. In Chapter IV, we 
present our approach and describe the algorithms we developed for mitigating the problems 
within the data. Finally, in Chapter V we present our experiments and results. 
 
  
3 
CHAPTER II 
RELATED WORK 
 
In this chapter, we survey previous work in polygon clipping and finding 
intersections between polygons.  
Weiler and Atherton [2] present an algorithm for removing hidden lines and 
surfaces. The algorithm can clip two general concave polygons with holes. It outputs a 
polygon of the same form as the input polygons. That means the outputted polygons can 
be entered back to the system to be processed further. The complexity of the final produced 
image is related to the processing time. 
 Huang and Liu [3] describe a new line clipping algorithm that clips general 
polygons. The algorithms transform the line segment from clipping to a horizontal line 
using shearing transformations. Also, using the same parameters as the ones used to 
transform the line, every edge of the polygon is transformed with shearing transformation. 
These transformations simplify the clipping resulting from faster processing because of the 
reduced number of calculations compared to previous clipping algorithms.  
Sproull and Sutherland [4] developed a clipping divider. It is a technology that 
allows a display system to present a magnified portion of any large image. It frees the 
programmer from the hustle of dealing with the difficulties of bit-packing and resolution 
in displays.  
Cyrus and Beck [5] describe a clipping algorithm and how it was implemented in 
2D and 3D. The algorithm checks for intersection between a line and a convex planar 
polyhedron. Their implementation of the algorithm can run on parallel which results on 
reduction in execution time.  
  
4 
 Sutherland and Hodgman [6] introduce reentrant polygon clipping. It uses 2D 
polygon clipping to render images in 3D by removal of hidden surfaces. This algorithm 
applies to convex, planar or non-planar polygons. The polygons in this algorithm are 
defined regarding their vertices only, not edges. This approach proved to simplify the 
process of clipping polygons. The extension to the algorithm can deal with concave and 
self-intersecting polygons by producing degenerate edges.  
Barsky and Liang [7] present a new method of line clipping in 2D, 3D, and 4D 
against a rectangular clipping window. This algorithm maps the line segment to be clipped 
into a parametric representation. The main advantage claimed for this algorithm is that it 
rejects invisible segments, i.e., it rejects segments out of the boundary of the clipping 
window, and also that the algorithm can be generalized for clipping against any convex 
polygon. The authors claim that this algorithm is faster than the algorithm developed by 
Sutherland and Hodgman [6] but requires floating point computations. The main limitation 
is that the clip polygon must always be a rectangle with the sides parallel to the axes. 
Nicholl et al. [8] developed a clipping algorithm of a rectangular window for 
clipping lines in 2D. The algorithm performs better than the Barsky-Liang algorithm and 
Cohen-Sutherland algorithm [7], [9:113], and the performance comparisons between their 
algorithm and the two old ones are machine independent. The algorithm finds the points of 
intersections between the line segment and the boundaries of the window defined by x-left 
and –right, and y-top and –bottom. If an intersection exists, the coordinates of endpoints of 
the line segment intersecting with the window are calculated. 
 
Maillot [10] describes a 2D clipping algorithm for clipping polygons that adopts 
the line clipping algorithm of Cohen-Sutherland. The author chose this algorithm rather 
  
5 
than the Weiler and Artherton [2] or Barsky and Liang [7] because it has an efficient 
process for acceptance and rejection cases and also because it covers more applications, as 
it can handle both integer and floating point computations depending on how it is 
implemented.  
Day [11] presents a new algorithm for line clipping that can be used for both 
floating point arithmetic and integer arithmetic, i.e., in object and image spaces 
respectively. The clipping is done against rectangular windows. In comparison to other 
algorithms, this algorithm uses a fewer number of operations for the cases where the line 
segment being clipped resides completely within the clipping window. The algorithm also 
performed better with smaller window sizes.  
Moller T. [12] introduces intersections tests for testing if two triangles intersect. 
The tests that can be used in developing algorithms for collision detection. The algorithm 
computes plane equations for the triangles and uses rejection of trivial points as well as 
computation of intersection intervals to test if the two subject triangles are intersecting or 
are co-planar.  
Greiner and Hormann [13] present an algorithm for clipping 2D arbitrary closed 
polygons. The algorithm handles the general case of arbitrary polygons as did Weiler and 
Atherton and Vatti algorithms [2], [14]. The general case of arbitrary polygons is the case 
when the intersecting polygons are not convex and might self-intersect. This algorithm as 
well as the similar previous and compared ones [2], [14] all find the points of intersection 
between the two subject intersecting polygons. In this algorithm, they use doubly linked 
lists to represent all the polygons, i.e., the intersecting polygons as well as the resulted 
  
6 
polygons of the intersection between the two original ones are represented as doubly linked 
lists. Their results proved that their algorithm is faster than Vatti’s algorithm [14]. 
Liu et al. [15] proposed a polygon clipping algorithm for concave and convex 
polygons with holes. The algorithm uses entry and exit point classification for intersection 
points, and they used a single linked list data structure. The algorithm is based on three 
main phases: intersection points are first found between the clip and the subject polygons 
and their entry/exit status and added to the linked list. Secondly, the polygon’s edges are 
clipped using line clipping, and lastly, the list of the point of intersections is traversed and 
the resulting polygon is constructed. The algorithm used for clipping the line against any 
polygon is Liu’s Algorithm, the first author of this paper.  
Held [16] presents ERIT, a set of intersection tests for finding the intersection 
between pairs of a polygon and a polygon or a polygon and a line segment. Its 3D 
intersection works with lines segments, triangles, cylinders, cones and a concave or convex 
rotated circular arc around an axis. In 2D, an algorithm for finding points of intersections 
between a triangle and a line segment is provided as well as a small set of other intersection 
tests. They provide a triangle/triangle test in 3D, but their approach is to minimize it to a 
problem of a 2D triangle/line segment intersection test.  
Źalik [17] introduces two new algorithms for finding intersection points between 
polygons. Both algorithms are sweep-line algorithms and can handle polygons with holes 
in them. One is a mid-point between time complexity and difficulty of implementation, and 
the second is faster but is harder to implement. The first is called a set-based intersection 
algorithm and the second is called a binary search tree intersection algorithm.  
  
7 
[18], [19], [2], [20] are convex hull algorithms for finding convex hull intersection 
points and sorting points sets. [2] is a variation of the Graham’s algorithm [18] that has 
improved efficiency.  
We have not fully adopted or based our algorithm or techniques used in any 
previous methods or algorithms because of how different our problem is.  We are dealing 
with data with polygons with different types, but we use a filter to transform all polygons 
into a single type: triangles. The clipping of triangles is done on a square clipping window. 
Moreover, the intersection is found between two triangles at a time. The intersection area 
can be of any type and is calculated after sorting the points of intersections on clock-wise 
order. 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
  
8 
CHAPTER III 
 
DATA DESCRIPTION 
 
 
The data is of the geographical region of the United States, divided into the five 
geographical regions: Northern, Intermountain, Pacific Northwest, Pacific Southwest, 
Rocky Mountain, and Southwest. Each region has its own data file. Areas of lost trees are 
represented by an enclosed region of a list of points (2D coordinates). From a computer 
scientist’s point of view, the data is stored in Shapefiles for every region. “A shapefile 
stores nontopological geometry and attribute information for the spatial features in a data 
set. The geometry of a feature is stored as a shape comprising a set of vector coordinates” 
[21]. The area of lost trees in a specific region is represented by enclosed regions of 
polygons. Polygons are of different types: triangles, quadrilaterals, etc. When the polygons 
overlap, an overcounting occurs when performing certain calculations. What we mean by 
data overlapping is that multiple polygons intersect. In this thesis specifically, we are 
concerned about the overlap of polygons because the overlap happens within bins. The 
polygons cover a certain percentage of that bin, and the goal is to calculate that percentage. 
Polygon overlaps cause overcalculation when an area of intersection between any number 
of polygons is overcounted. Figure 1 shows two triangles overlapping. Within the 
highlighted bin, T1 and T2 intersect. The intersection area in red is considered twice when 
naïvely calculating the area that the two small triangles are covering in the blue bin. Figure 
2 shows a 2D histogram of 4 million triangles of the Intermountain (IM) region of the bark 
beetle data in a 1000 by 1000 grid. This histogram is of data processed without dealing 
with the overlap within the data. The percentages covered in the bins ranges from 0 to 
  
9 
277.7%. This shows that overlap among the polygons within the bins cause 
overcalculations of the area covered and subsequently the percentage covered of the cell. 
 
FIGURE 1. Data Overlap Illustration. 
 
 
 
 
Figure 3 shows a 2D histogram of 4 million triangles of the bark beetle data in the 
IM region in a 1000 by 1000 grid. In this case, the overlap among the polygons has been 
dealt with by our algorithm. Figure 4 shows a different data set that is smaller in 
comparison than the size of the data set of the bark beetle, but it produces 2D histograms 
that make it easier to detect the difference in the results produced when the overlap between 
the polygons is dealt with and when it is not. Figure 4 is the 2D histogram of the Australia 
map where overlap among polygons is ignored, and Figure 5 is of the same data set but 
with overlaps among polygons removed. 
   
 
 
  
10 
 
 
 
FIGURE 2. 4 Million Triangles of the Intermountain (IM) Data with Polygon Overlap. 
 
 
 
 
 
 
FIGURE 3. 4 Million Triangles of IM Data without Polygon Overlap. 
 
 
  
11 
 
 
FIGURE 4. Australia with Overlap among Polygons. 
 
 
 
FIGURE 5. Australia without Overlap among Polygons. 
 
 
 
  
12 
CHAPTER IV 
 
THE ALGORITHM 
 
In this chapter, we present our algorithm’s initialization step, its pseudocode, and 
the detailed descriptions of its routines.  
 
4.1. Algorithm Initialization 
 
The input to the algorithm is a user-specified number of bins in the 2D histogram.  
For example, if the user specified 100x200, then a histogram with 100 bins in width and 
200 bins in height would be created.  There is also an option for the user to specify the 
spatial range to place the histogram.  If no spatial range is specified, then the algorithm 
calculates the bounding box for all input data, and uses that bounding box as the spatial 
range. The program first reads the data from disk.  The data is then converted from 
polygonal data to triangle data.  This conversion makes subsequent overlap calculations 
easier to perform.  The conversion itself is done by a module in the VTK library 
(vtkTriangleFilter).  Next, an array is created for storing the cumulative percentage overlap 
between triangles and the 2D histogram.  This array is referred to as “areaCoveredOfCells.” 
The size of the array is the same as the number of bins in the 2D histogram.  Each element 
of the array is initialized to zero. There are two algorithms.  One is unoptimized without 
search structures, and another is optimized with search structures.   
 
4.2. Algorithm 1 (Unoptimized) 
 
The main code runs on all triangles. For every triangle, the algorithm finds the cells 
that are covered, performs clipping taking into consideration removing redundancy caused 
by overlap among triangles, and updates the bin with their overlap.  The problem of overlap 
  
13 
within the data and its effect on the accuracy of its calculations are explained in Chapter 
III, section 3.  
 
4.2.1. Pseudocode 
 
In the pseudocode in Figure 6,  we use the following short variable names: C is the 
total number of cells, T is the number of triangles in the data set, AC is the area of any 
single cell in the grid of size numX * numY, TAC is the number of triangles produced by 
clipping the original triangle t overlapping with some cell, NTC is the number of triangles 
in the current cell, and DataStats is the array that holds the dimension of the grid (numX, 
and numY), and the boundaries of a cell are denoted by minX, maxX, minY, and maxY for 
minimum X, maximum X, minimum Y, and maximum Y, respectively. 
 
 
  
14 
 
FIGURE 6. The Unoptimized Algorithm Pseudocode 
 
4.2.2 Full Description of Code 
4.2.2.1 Finding Cells Covered by Every Triangle in the Data Set 
In the algorithm for finding cells covered by every triangle section lines (6-20), the 
algorithm begins by retrieving the triangles from the Shapefile. For every triangle, we first 
find the bounding box of the triangle, and then calculate the normalized bounding box, that 
  
15 
is, the bounding box of the triangle mapped to the output mesh that starts from 0 to the 
number of x on the x-axis, and starts from 0 to the number of y on the y-axis. 
We then loop from the minX of the normalized bounding box for the triangle to the 
maxX, and from the minY to the maxY of the normalized bounding box, to locate all the 
cells overlapping with this triangle. For each of these cells the triangle clipping routine is 
performed.  
To illustrate this process, look at the example in Figure 3 alongside the pseudocode. 
We have a triangle with the vertices (1,3), (1.5, 2), and (2.5, 4). At line 6 of the pseudocode, 
we first get the normalized bounding box of the triangle, that is the bounding box of the 
original triangle mapped to the mesh. For this example, for simplicity the normalized 
bounding box is the same as the original bounding box of the triangle. So, minXNormalized 
= 1, maxXNormalized = 3, minYNormalized = 2, and maxYNormalized = 4. To find the 
cell numbers that overlap with the triangle, we loop from minXNormalized to 
maxXNormalized, and the inner loop from minYNormalized to maxYNormalized. The cell 
number is calculated by adding x and y*numX in the grid. For this example, numX = 3, 
numY = 4, minX = minY = 0, and maxX = maxY = 3.  So, the first cell that possibly 
overlaps with the triangle is cell number = 1 + (2*3) = 7, the remaining cells are 8, 10, and 
11.  
 
 
  
16 
 
 
 
 
 
 
FIGURE 7. The Process for Finding Cells that Overlap with a Triangle. 
 
  
4.2.2.2. Clipping Routine Algorithm  
 
In “clipping the triangle” section, lines (14-15) in the clipping routine, we start with 
an empty vector. We then push the triangle to be clipped into this vector. The routine 
consists of four functions that take both a vector and a limit that we are clipping against. 
Those are maxXClip, minXClip, maxYClip, and minYClip. We clip the triangle first with 
  
17 
respect to the maxX of the cell. The maxXClip routine takes the vector with the original 
triangle, and returns the clipped triangle or triangles in a new vector. 
The triangles produced from clipping with respect to maxX are clipped with respect 
to the minX of the cell, and the resulting vector of triangles from the minXClip is processed, 
and the triangles are clipped with respect to maxY. The same process is repeated with 
minYClip.  After the result of clipping the original triangle against the boundaries of the 
cell, we add the triangles in the resulting vector to the cell’s vector of triangles for use in 
the later stage of calculating the area covered of this cell by the triangles in it. 
To illustrate the clipping routine, let’s look at the triangle with the vertices (1,3), 
(1.5, 2), and (2.5, 4), in Figure 4. The clip routine creates a new empty vector, and pushes 
the triangle to be clipped to it. The vector of the single triangle is passed to the maxXClip 
function along with the maxX limit of the cell that the triangle is overlapping with. Inside 
the maxXClip function, the points of the triangle are sorted clockwise, and for every vertex 
of the triangle we check if it is greater than the maxX limit of the cell. For this triangle, the 
vertex (2.5, 4) is beyond the maxX limit of cell C1 which is equal to 2. Two points are 
interpolated on the x = 2 axis, and the two triangles T1 and T2 are produced as a result of 
the clip. The resulting vector of the function maxXClip is passed to the function minXClip 
function. The minXClip function takes the first triangle (T1) in the vector, and clips it 
against the minX limit of the cell C1 which is equal to 1. This clip produces the same 
triangle (T1). The minXClip function takes the second triangle from the vector (T2), and 
clips it against the minX limit of the cell. The clipping of the second triangle (T2) against 
x=1 produces the same triangle (T2). The produced vector from the minXClip function is 
of size two with the same triangles, T1 and T2, produced from the maxXClip routine. The 
  
18 
resulting vector is then passed to be clipped against the maxY limit of the cell C1 and minY 
limit of the cell C1. The resulting vector after the clipping is a vector with a single triangle 
with the same vertices as T2.  
 
 
 
 
 
FIGURE 8. The Process of Clipping a Triangle against a Cell’s Boundaries. 
 
 
 
 
  
19 
4.2.2.3. Detailed Description of the Clipping Routine Functions 
 
To illustrate the process of the clipping routine functions, consider how the first 
two work. The remaining functions follow their main concept.  
 
4.2.2.3.1. Processing Triangles for Clipping against the Maximum X 
Limit of a Cell 
This is done with a function named maxXClip. It takes a vector of triangles 
represented by double arrays, and a double maxX value that represents the maximum X 
limit of the cell the triangles in the vector are being clipped against. For every triangle in 
the vector, we call the function maxXClipTri.  
 
4.2.2.3.2 Clipping a Triangle against the Maximum X Limit of a Cell  
In this function, named maxXClipTri , the triangle’s vertices are checked first 
clock-wise by the X values. After that, every triangle vertex is checked against the maxX 
value. If the x value of the vertex is greater than the maxX value limit, the x value of the 
vertex is updated to the maxX limit value, and the y value of the vertex is changed to the 
new interpolated value along the original vertex and the other vertex of the triangle that is 
less than the maxX value. If the vertex is less than the maxX limit value, then the vertex’s 
x and y values stay the same.  
 
  
20 
 
 
FIGURE 9. Example to Show the Process of Clipping a Triangle against the Maximum X 
Value of a Cell. 
 
To illustrate the exact process of the clipping, look at Figure 9 as an example. The 
first vertex is the point P0, but since the x value of P0 is bigger than the maxX limit of the 
cell C1, P0’s x value is updated to the maxX limit value. If P1 is less than the maxX limit 
of the cell C1, we interpolate the new y value for the point P0. The new x and y values of 
P0 produce a new point P0A, and if the other connected vertex to P0, i.e. P2 is less than 
the maxX limit of the cell, we get a second new point P0B with an x value equal to the 
maxX limit of the cell, and the y value interpolated for the maxX value along the edge P0-
P2.  
The clipping of the original triangle P0 P1 P2, produces two triangles, which are: 
P1àP2 àP0A, and P2 à P0A à P0B shown in Figure 10. The two triangles are pushed 
to the new vector, and returned to Clip function to be next processed by the next clipping 
routine, minXClip, and so forth.  
 
  
21 
 
 
FIGURE 10. The Output after Clipping a Triangle against the Maximum X Value of a 
Cell. 
 
4.2.2.4. Calculating Area Covered and Removing Overlap Between Polygons in 
the Data 
 
In the “calculate area covered of every cell” section of the algorithm, we calculate 
the area covered from each cell by triangles residing within its boundaries. During this 
process, we consider the overlap between triangles within the cell, and remove it to prevent 
erroneous area calculation caused by it. 
To calculate the area covered of the cell, let’s look at the cell C1 in Figure 11 to 
illustrate the process. Refer to line 24 of the algorithm: we first retrieve the vector 
containing the triangles in the cell. The vector for C1 will be of size 4. In line 25, we 
initialize the variable that accumulates the area covered of the cell 
(AreaOfCurrentCellCovered) to zero.  After that for every triangle in the cell, we add 
the area of the triangle to the cell and subtract the area of intersection between this triangle 
and every other one in the cell. For C1, we first add the area of T1 to 
AreaOfCurrentCellCovered and subtract the area of overlap between T1 and the 
remaining triangles in the cell, which are T2, T3, and T4. For the second iteration of the 
  
22 
loop, we add the area of T2 to AreaOfCurrentCellCovered and subtract the area of 
overlap between T2 and T3, and T2 and T4. For every triangle, we find overlap between 
the triangle in hand and only triangles following it in the vector, because possible 
intersections with triangles before the current one have already been considered with the 
previous triangles. In this specific example, since the overlap between T1 and T2 has been 
handled, when it is T2’s turn to be processed, we don’t bother subtracting the overlap with 
T1 because it has already been subtracted when T1 was processed. After T2, T3’s area is 
added to the AreaOfCurrentCellCovered and the overlap between T3 and T4 is 
subtracted. Lastly, the area of the triangle T4 is added to the cell’s area, and there are no 
remaining triangles in the cell to subtract the overlap with.  
In the section “populate the arrays with the result” section of the algorithm, the 
processing of every triangle in the cell’s vector is done, so we deposit the result of the 
current cell that is, the calculated area covered of the current cell in the areaCoveredOfCells 
array, and calculate the percentageCoveredOfCells array after dividing the 
AreaOfCurrentCellCovered by the area of a cell in the current grid and multiplying the 
fraction by 100.  
 
  
23 
 
 
FIGURE 11. Example to Illustrate the Process of Calculating Area Covered of a Cell and 
Removing the Overlap.   
 
4.3. Algorithm 2 (Optimized)  
 
Optimizing our clipping and area calculation algorithm was a necessity in this 
project. The number of calculations and steps performed on the large data set resulted in 
having huge execution times and running out of memory. Optimizing the algorithm using 
interval trees cut the time of searching for overlaps, minimized the execution time, and 
allowed for processing of bigger portions of the data.  
 
4.3.1. Pseudocode  
 
The optimized version of our algorithm is shown in Figure 12 and described in 
detail in the following section. 
  
24 
 
FIGURE 12. The Optimized Algorithm Pseudocode. 
 
 
  
25 
4.3.2. Full Description of Code 
 
The optimized algorithm’s “initialization and finding cells covered by every 
triangle” section (lines 1-21) are the exact same as the one in the unoptimized version. 
 
    4.3.2.1. Interval Tree Construction and Usage 
 
 In the section for calculating area covered by every cell (lines 22 – 55), we loop 
over every cell in the bin retrieving the triangles that lie inside each of them. For every cell, 
an interval tree is constructed of size equal to the number of triangles in the cell. For every 
triangle in the cell, we calculate the bounding box of the triangle and add the extents as 
intervals, such as [maxX, maxY], and [minX, minY] to the constructed interval tree of the 
cell. When all the intervals of the triangles are added to the interval tree, the interval tree 
is built. Now the interval tree is ready to find which of the triangles of the cell overlap with 
the triangle under investigation, which is the following step.  
In the section for finding overlapping triangles (lines 37 – 43), for every triangle in 
the cell, we first add the area of the current triangle to the AreaOfCurrentCellCovered, 
and remove the redundancy, i.e., area of intersection with any other triangle in the cell. The 
removal of this redundancy is done in the find overlapping triangles section as follows. We 
first calculate the bounding box of the current triangle. After that, we use the interval tree 
search structure to find all the triangles that overlap with the current triangle’s intervals. A 
pool of triangle identifiers is returned, and for each of these triangles, we calculate 
areaOfOverlapToSubtract between them and the current triangle, and subtract it from 
the AreaOfCellCovered. This is done in the “remove intersection” section of the 
pseudocode.  
  
26 
In the “populate the arrays with the result” section of the code, the processing of 
every triangle in the cell’s vector is done. We deposit the result of the current cell that is 
the calculated area covered of the current cell in the areaCoveredOfCells array. Then the 
percentage of the current cell is deposited in the percentageCoveredOfCells array after 
dividing the AreaOfCurrentCellCovered by the area of a cell in the current grid, and 
multiplying the fraction by 100. 
 
4.3.2.2. Detailed Description of Finding the Area of Overlap between Two 
Triangles Functions 
 
The areaOfOverlapToSubtract function takes two triangles as its parameters, and a 
pointer to a vector which holds the points of intersections between the two triangles. The 
first step is to find the points of intersection between the two triangles using the two 
functions: findIntersectionsBetweenEdges and isInside, and the second step is to 
calculate the area of intersection between the two triangles using the points of intersection 
found by the previous step.  
 
4.3.2.2.1.  Finding the Intersection Points between the Edges of 
Two Triangles 
The FindIntersectionsBetweenEdges function takes an edge from the first triangle, 
and an edge from the second triangle. This function is called on every edge from the two 
triangles. That is, nine times. Specifically, we find points of intersection between the first 
edge of the first triangle with the three edges from the second triangle, and the points of 
intersection between the second edge from the first triangle with the three edges from the 
  
27 
second triangle, and lastly, the points of intersection of the third edge from the first triangle 
with the three edges from the second triangle.  
 
4.3.2.2.2.  Testing if any of the Two Triangle Vertices are Inside 
of the Other Triangle   
The isInside function takes a triangle vertex, and the triangle represented by a 
double array.  The function checks if any of the points from the first triangle is inside of 
the second triangle. This function handles the case where the edge from the first triangle is 
not intersecting with one of the edges from the second triangle but one of that edge’s end 
points is inside of the triangle which means intersection point with the triangle exists, and 
it is the end point of that edge that lies within the inner area of the triangle.  
The function is called on every vertex of the two triangles. So, it is called six times. 
It checks if any of the first triangle’s vertices is within the second triangle. The isInside 
function works as follows: it checks if the areas of triangles formed by the vertex we are 
testing with the triangle’s vertices sum up to the original triangles’ area; if not, we can 
conclude that the tested vertex is not inside of the triangle. If it is within the triangle, we 
check if the point of intersection has been already detected by the 
findIntersectionsBetweenEdges, that is, the vertex is already in the pointsOfIntersection 
vector; if not, we push the vertex to the pointsOfIntersections vector; if it is in the vector, 
we ignore this point.  
 
 
 
  
28 
4.3.2.2.3. Calculating the Area of Overlap between the Two 
Triangles Using Found Points of Intersections 
The CalculateAreaOfIntersection function takes the pointsOfIntersection vector 
as a parameter, after all possible points of intersections have been found. The number of 
vertices of the area of intersections must be at least three; if the points of intersection 
between the two triangles are less than three, then there is no overlap between the areas of 
the two triangles. If the points of intersections are more than two, the points are ordered in 
a counter-clockwise direction to ensure the correct calculation of the area of the resulted 
polygon.  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
  
29 
CHAPTER V 
 
TESTING AND RESULTS 
 
5.1. Experiments Overview  
Our study was designed to test the correctness and performance of both the 
optimized and unoptimized versions of our algorithm. The algorithm is running as a filter 
through VisIt which is an open source visualization tool for 2D and 3D meshes. Through 
its GUI, we can easily choose different data sets and vary the grid sizes for testing and 
experimenting. The tests varied three factors: optimization setting, grid sizes, and data sets.    
 
5.2. Factors 
5.2.1. Optimization Settings 
 
 The experiments we did were run on two different optimization settings, either 
optimized or unoptimized.  
 
5.2.2. Grid Sizes 
The grid sizes are the number of elements in width by the number of elements in 
height. In this study, our tests used grids with equal dimensions. That is, the number of X 
elements equaled the number of Y elements. The grid sizes reported in the results table 
below starts from grid size 50 by 50 to 2000 by 2000.  
 
5.2.3. Data Sets 
The performance results presented here are for the intermountain data’s first million 
triangles processed using the optimized and unoptimized versions of the filter. The 
  
30 
intermountain data X values range is -2045100 to -1040320, and the Y values’ range is 
1560084 to 266110, with almost 10 million and 750 thousand triangles.  
 
 
 
 
 
FIGURE 13.  Performance Trend Comparison between the Optimized and Unoptimized 
Algorithm. 
 
5.3. Results and Performance Analysis 
The performance of the optimized version of the filter is better when the number of 
triangles in one cell is big. That is, the interval tree built for the triangles ranges is used 
enough times to reap the benefits of this efficient searching structure. The numbers in Table 
1 are execution times in seconds for the different grid sizes. Figure 13 shows the 
performance trend comparison between the optimized and unoptimized algorithms.  
 
 
 
50X50 100X100 300X300 500	X	500 800	X	800 1000X1000 2000X2000
Optimized 22.58 22.42 23.08 24.53 29.53 31.8 50.8
Unoptimized 776.66 271.26 73.99 48.68 43.38 43.19 51.77
0
100
200
300
400
500
600
700
800
900
Performance	Trend
  
31 
TABLE 1. Execution Times Comparison between the Optimized and Unoptimized Code. 
 
Grid Size Optimized Unoptimized % Optimized faster 
50 X 50 22.579536 776.66 97.09 % 
100 X 100 22.418807 271.26 91.73 % 
200 X 200 22.141781 106.50 79.2 % 
300 X 300 23.080292 73.99 68.8 % 
400 X 400 24.052942 57.14 57.9 % 
500 X 500 24.526272 48.68 48.9 % 
800 X 800 29.526272 43.28 31.78 % 
1000 X 1000 31.799353 43.19 26.37 % 
2000 X 2000 50.79175 51.77 1.88 % 
      
   
 
TABLE 2. Execution Times for Building the Interval Trees. 
 
Grid Size Total Execution 
Time 
Interval trees 
construction time 
Interval trees usage time = Total 
execution time – Interval trees 
construction time 
50 X 50  22.579536 9.791923 12.787613 
100 X 100 22.418807 10.053468 12.365339 
200 X 200 22.141781 10.333979 11.807802 
300 X 300 23.080292 10.409227 12.671065 
400 X 400 24.052942 10.804534 13.248408 
500 X 500 24.526272 11.072370 13.80017 
800 X 800 29.526272 12.436397 17.089875 
1000 X 1000 31.799353 13.268166 18.531187 
2000 X 2000 50.79175 19.904474 30.819699 
 
By observing the numbers in Table 1, we can see that the total execution time of 
the optimized code increases as the number of bins in the grid increases, while the 
  
32 
execution time of the unoptimized code decreases until we reach the 2000 by 2000 grid 
size. The optimized code runs faster than the unoptimized code until we reach the 2000 by 
2000 grid size, even though we kept the same number of triangles across the different codes 
and different grid sizes. The average density of triangles, that is, the number of triangles 
per bin, gets smaller as the number of bins increases. Therefore, we can see that the 
unoptimized code runs faster as the number of bins increases. But the same is not true about 
the optimized code, as the performance of the optimized code drops as the number of bins 
increases. We provide the analysis for this trend below.  
 The optimized code uses the interval tree searching structure. The cost of 
constructing new interval trees is O (n log n), and the query cost is O (log n + k) time, 
where k is the number of segments reported to be overlapping [22:220]. In this problem, 
every bin in the grid constructs its own tree with the triangles residing within its boundaries. 
Let N be the total number of triangles in the data. So, for every triangle, we have its 
bounding box inserted in the interval tree as an interval, and B the number of bins in the 
grid. But for each step of processing, i.e., building the interval tree, and using it, there is a 
fixed overhead cost. Constructing an interval tree takes O (n log n + C1) where C1 is a 
fixed overhead cost. As the number of bins increases in the grid, the fixed cost is multiplied, 
which increases the overall execution time. For example, you have 9 million triangles, and 
900 bins. Then the average number of triangles per bin (n) = 9000000/ 900 = 10000 
triangles. The cost of constructing interval tress = B * O (n log n + C1). In this example, it 
is 900 * O(10000 log 10000 + C1) = 3.6 million + 900 * C1. As the number of bins increase, 
the cost of building the trees, and the overhead cost drastically increase. When the number 
of bins is increased to B = 9000, then the new average number of triangles per bin (n) = 
  
33 
9000000 / 9000 = 1000. The cost of building the interval trees = 9000 * O(1000 log 1000 
+ C1) = 27 million + 9000 * C1. 
The same analysis applies for the increasing cost of using the built interval trees. 
Thus, even though the general cost of constructing an interval tree, i.e., O(n log n) might 
seem bigger than the recurring cost of using it, i.e., O (log n), in our code, as the fixed cost 
associated with using the interval trees is way bigger than the fixed cost associated with 
constructing the interval trees. 
From our previous example, B = 9000, and n = 1000. The query and usage cost O (log n + 
C2) where C2 is the fixed cost associated with using the built interval trees.  
9000 * O (log 1000 + C2) = 27000 + 9000 * C2 
Referring back to Table 2, we observe that the time for using the interval trees is 
always bigger than that for constructing the trees. Therefore, we can form the following 
inequality:  
The cost of constructing the interval trees < The cost of using the interval trees 
27 million + 9000 * C1 < 27000 + 9000 * C2 
26973000 + 9000 * C1 < 9000 * C2 
2997 + C1 < C2  
We can therefore conclude that the optimized version of the algorithm is efficient 
and is helping the performance, but there is an overhead cost that might be from the 
initialization steps needed or the memory allocation.  
 
 
 
 
 
 
  
34 
CHAPTER VI  
 
CONCLUSION  
 
This thesis was inspired by the need of Prof. Chris Bone from the Geography 
department for a way to visualize a big data set. The data was collected over a span of many 
years of the loss of trees in the United States caused by the bark beetle insect. We developed 
an algorithm to process and visualize the data accurately and efficiently.  
There were two main goals of this thesis. The first was to develop an algorithm that 
reads and processes the data and can be applied to different types of data that need to be 
processed and visualized in a similar way. The second goal was to implement the 
algorithm, test it, and also optimize it. The algorithm successfully calculates the percentage 
covered of a cell by the triangles residing within its boundaries and removes redundant 
intersection areas, i.e., the overlap among triangles, that causes erroneous results. The 
algorithm is running as a filter through VisIt, and reads Shapefiles or VTK files to process 
the data according to our developed algorithms, and produce a 2D histogram of the data. 
We performed extensive testing and experiments for accuracy and performance, discovered 
our implementation’s limitations, and arrived at insights with regard to the performance of 
our code’s optimization. 
Our future work includes developing the algorithm further to make it run faster and 
able to handle larger data sets. Furthermore, we would like to produce a parallelized version 
of our algorithm, and experiment with the different techniques. Also, we would like to 
make use of the power of supercomputers and run our algorithms on them with the bigger 
data sets, with no limit on the number of polygons that can be processed before running 
out of memory.  
  
35 
REFERENCES CITED 
 
[1] L. Carroll and L. Safranyik, “The bionomics of the mountain pine beetle in lodgepole 
pine forests: establishing a context.,” in Mountain pine beetle symposium: Challenges 
and solutions , 2003, pp. 21–32.  
 
[2] K. Weiler and P. Atherton, Hidden surface removal using polygon area sorting, ACM 
SIGGRAPH Computer Graphics, vol. 11, no. 2, pp. 214–222, Jan. 1977. 
 
[3] Y. Q. Huang and Y. K. Liu, An Algorithm for Line Clipping against a Polygon Based 
on Shearing Transformation, Computer Graphics Forum, vol. 21, no. 4, pp. 683–688, 
DEC. 2002.  
 
[4] R. F. Sproull and I. E. Sutherland, “A Clipping Divider,” Proceedings of the 
December 9-11, 1968, fall joint computer conference, part I on - AFIPS '68 (Fall, part I), 
Dec. 1968. 
 
[5] M. Cyrus and J. Beck, Generalized Two- and Three-Dimensional Clipping, 
Computers & Graphics, vol. 3, no. 1, pp. 23–28, 1978.  
 
[6] I. E. Sutherland and G. W. Hodgman, Reentrant Polygon Clipping, Communications 
of the ACM, vol. 17, no. 1, pp. 32–42, Jan. 1974. 
 
 [7] B. A. Barsky and Y. D. Liang, A New Concept and Method for Line Clipping, ACM 
Transactions on Graphics, vol. 3, no. 1, pp. 1–22, Jan. 1984. 
 
[8] T. M. Nicholl, D. T. Lee, and R. A. Nicholl, An Efficient New Algorithm for 2-D Line 
Clipping: Its Development and Analysis, ACM SIGGRAPH Computer Graphics, vol. 21, 
no. 4, pp. 253–262, Aug. 1987. 
 
[9] Van D. Foley, S. K. Feiner, and J. F. Huges, Computer Grpahics Principles and 
Practice, 2nd ed. Reading, Mass: Addison- Wesley, 1990. 
 
[10] P. G. Maillot, A New, Fast Method for 2D Polygon Clipping: Analysis and Software 
Implementation, ACM Transactions on Graphics, vol. 11, no. 3, pp. 276–290, 1992. 
 
[11] J. D. Day, A New Two-Dimensional Line Clipping Algorithm for Small Windows, 
Computer Graphics Forum, vol. 11, no. 4, pp. 241–245, Aug. 1992. 
 
[12] T. Möller, A Fast Triangle-Triangle Intersection Test, Journal of Graphics Tools, 
vol. 2, no. 2, pp. 25–30, 1997. 
 
[13] G. Greiner and K. Hormann, Efficient Clipping of Arbitrary Polygons, ACM 
Transactions on Graphics, vol. 17, no. 2, pp. 71–83, 1998. 
 
 
  
36 
[14] B. R. Vatti, A Generic Solution to Polygon Clipping, Communications of the ACM, 
vol. 35, no. 7, pp. 56–63, 1992. 
 
[15] Y. K. Liu, X. Q. Wang, S. Z. Bao, M. Gomboši, and B. Žalik, An Algorithm for 
Polygon Clipping, and for Determining Polygon Intersections and Unions, Computers & 
Geosciences, vol. 33, no. 5, pp. 589–598, 2007. 
 
[16] M. Held, ERIT—A Collection of Efficient and Reliable Intersection Tests, Journal of 
Graphics Tools, vol. 2, no. 4, pp. 25–44, 1997. 
 
[17] B. Žalik, Two Efficient Algorithms for Determining Intersection Points between 
Simple Polygons, Computers & Geosciences, vol. 26, no. 2, pp. 137–151, Mar. 2000. 
 
[18] R. L. Graham, An Efficient Algorithm for Determining the Convex Hull of a Finite 
Planar Set, Information Processing Letters, vol. 1, no. 4, pp. 132–133, 1972. 
 
[19] T. M. Chan, Optimal Output-Sensitive Convex Hull Algorithms in Two and Three 
Dimensions, Discrete & Computational Geometry, vol. 16, no. 4, pp. 361–368, 1996. 
 
[20] C. B. Barber, D. P. Dobkin, and H. Huhdanpaa, The Quickhull Algorithm for Convex 
Hulls, ACM Transactions on Mathematical Software, vol. 22, no. 4, pp. 469–483, Dec. 
1996. 
 
[21] ESRI Shapefile: a Technical Description. 1998. 
 
[22] M. de Berg, O. Cheong, M. van Kreveld, and M. Overmars, Computational 
Geometry - Algorithms and Applications, 3rd ed. Springer, 2008. 
 
[23] Aerial Survey Geographic Information System Handbook: Sketchmaps to Digital 
Geographic Information. USDA Forest Service, Nov. 2005.