NETWORK OPTIMIZATION USING LINEAR
PROGRAMMING AND REGRESSION
by
JASON LEE
A THESIS
Presented to the Department of Mathematics
and the Robert D. Clark Honors College
in partial fufillment of the requirements for the degree of
Bachelor of Science
June 2016
An Abstract of the Thesis of 
J a.son Lee for the degree of Bachelor of Arts 
in the department of Mathematics to be taken June 2016 
Title: Network Optimization Using Linear Programming and Regression 
~ t:L 
Approved: ____ P_r_o_fe_s-so_r_H_ao-T-~g..c:--------
The purpose of this research is to explore the synergistic application of linear pro-
gramming, regression, and computer science to solve practical economic problems. 
In particular, this research focuses on network optimization problems in which the 
aim is to maximize revenue and minimize production, transportation, and other 
costs by using historical information to predict future market behavior. The first 
half of the thesis provides background information on linear programming and re-
gression for readers who may not be familiar with the subjects. The remaining 
sections of this thesis cover the modeling process and computer programming de-
sign, where the variables of interest are identified, arranged into an appropriate 
form, and MATLAB programming is utilized to carry out linear programming and 
regression operations to provide an optimized solution for the network using given 
historical data and market conditions. 
ii 
Acknowledgments
I would like to thank Professor Wang for giving me inspiration to start this
project, Professor Sinclair and Professor Prazniak for serving on my defense
committee, and all my advisors and peers for giving me feedback along the way.
Without all their support none of this would have been possible.
iii
Table of Contents
1 Introduction 1
2 Linear Programming 1
2.1 Brief History of Linear Programming . . . . . . . . . . . . . . . . . . 2
2.2 The General Linear Programming Problem . . . . . . . . . . . . . . . 3
2.3 Properties of Linear Programming Problems . . . . . . . . . . . . . . 5
2.4 The Simplex Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.5 Degeneracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3 Regression 10
3.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4 MATLAB Implementation 16
4.1 Basic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2 Annotated Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.3 Regression Application . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5 Appendix 23
iv
List of Figures
1 Graph showing the feasible region for the example problem . . . . . . 6
2 Graph showing data with a fitted line and residuals . . . . . . . . . . 11
3 Plots of the residuals for two different fitted lines . . . . . . . . . . . 12
4 Scatter Plot of Sample Data . . . . . . . . . . . . . . . . . . . . . . . 13
5 Sample Data with Fitted Linear Model . . . . . . . . . . . . . . . . . 13
6 Plot of the Residuals for the Linear Model . . . . . . . . . . . . . . . 14
7 Sample data with Fitted Polynomial Model . . . . . . . . . . . . . . . 15
8 An example network with n = 3,m = 2 . . . . . . . . . . . . . . . . . 16
v
1 Introduction
The goal of this work is to create a program which will utilize mathematical methods
to determine an efficient production and distribution plan for a network of produc-
tion, storage, and distribution centers given available data. This is valuable because
it eliminates the need for manual calculation, something that is necessary when net-
works include hundreds of different nodes. The first half of this paper provides a
basic introduction of linear programming and regression to provide background for
those who may not be familiar with the topics, and covers a complete example of
each technique. The second section lays out the problem of interest and includes a
walk-through for the MATLAB code used to provide the solutions.
2 Linear Programming
Linear programming, also known as linear optimization, is a field of mathematics
that deals with finding efficient solutions to systems defined by multiple linear equal-
ities and inequalities. An efficient solution is one where a specific value is minimized
or maximized, such as minimum cost or maximum profit. Linear programming is
commonly used to solve management problems since the solutions it provides focus
on maximizing efficiency. The focus of my work is transportation networks and
production schemes, though the same techniques can be applied to any situation
which requires allocating a finite amount of resources in an optimal way including
investing, route planning, and work shift assignment.
2.1 Brief History of Linear Programming
Linear programming does not have as much history to it relative to other fields of
mathematics largely due to the difficulty in solving most problems without the aid
of computer calculation. One of the first attempts to provide a viable solution to
linear programming problems was made by Joseph Fourier, who published a method
in 1827. However, his algorithm was inefficient for larger systems since it required at
least exponential time to carry out, meaning that problems with a hundred or more
variables would still require an impossible amount of time to solve. A few other
attempts were made to provide a solution, but they had limited success. Major
developments in the field would not be made until the mid-20th century.
The most well known and significant contribution to the field of linear programming
was made by George Dantzig in 1947, when he developed the simplex method for
solving linear programming problems while working as an military advisor at the
Pentagon. The simplex method was the first practical algorithm which had a time
efficient solution to linear programming problems. However, at the time of its cre-
ation there were still limits on the ability to solve large problems. In fact, one of the
first implementations of the algorithm on a large system required 8 hours of feeding
cards into a Card Programmable Calculator (CPC) [3]. Despite these limitations,
the simplex algorithm allowed people to solve linear programming problems in a
much more efficient manner than any previous method. Interestingly enough, the
name linear programming has little to do with actual computer programs, and actu-
ally has its origins in the military use of the word program to refer to their logistical
and deployment plans [5].
Though the simplex method was initially intended to improve various aspects of
military logistics, it was quickly adapted and used in commercial applications such as
2
oil refining and blending [5]. The decade that followed also saw improvements to the
simplex method and the development of related fields like nonlinear programming
and integer programming.
In the present, linear programming is more useful than ever before for management
and logistical challenges due to easy access of large amounts of data. Combined with
improved algorithms and improvements in computing power, linear programming is
able to solve larger and more complicated problems. The primary challenge now is
accurately modeling applicable situations and using available data efficiently.
2.2 The General Linear Programming Problem
In linear programming problems, the primary goal is to maximize or minimize a
linear function, which we will call z, that is subject a finite set of linear constraints.
The function z is known as the objective function and is a linear combination of
the variables (x1, x2, . . . , xn) with the general form z = c1x1+c2x2+ · · ·+cnxn where
each c is a constant. The linear constraints can be either equality constraints or
inequality constraints, and take the general form
a1x1 + a2x2 + · · ·+ anxn

≤
=
≥
 b
where a and b are constants. A feasible solution is any combination of the variables
(x1, x2, . . . , xn) that satisfies the constraints, and the set of these n-tuples is known
as the feasible region. A problem which has no solution which satisfies all of the
constraints is infeasible. The full form of the general linear programming problem
3
is as follows
Maximize c1x1 + c2x2 + . . .+ cnxn = z
Subject to a11x1 + a12x2 + . . .+ a1nxn ≤ b1
a21x1 + a22x2 + . . .+ a2nxn ≤ b2
...
am1x1 + am2x2 + . . .+ amnxn ≤ bm
xi ≥ 0 i ∈ 0, 1, . . . , n
bi ≥ 0 i ∈ 0, 1, . . . ,m
Notice that all the linear constraints are written using ≤ rather than a mixture of
equality and inequality constraints, which is possible since we can easily to convert
constraints from one form to another. An equality constraint can be converted to
an inequality constraint by creating two seperate inequalities which converge on the
same point. This allows us to have the restriction that all xi and bi must be positive
and makes solving the problem easier.
Additionally, there can be situations where we would want to minimize the objec-
tive function rather than maximize it, such as a setup where the goal is to minimize
costs. Though these objectives may seem like polar opposites, the distinction be-
tween minimizing and maximizing the objective function is actually trivial since
minimizing z is the same thing as maximizing −z. Therefore, we can change the
objective function to a −z when we want to minimize it, so the difference can be
eliminated by a simple sign change. The general linear programming problem is also
commonly written in matrix form
z = cTx, Ax ≤ b, xi ≥ 0, bi ≥ 0
4
c =

c1
c2
...
cn
 , x =

x1
x2
...
xn
 , A =

a11 a12 . . . a1n
a21 a22 . . . a2n
...
am1 am2 . . . amn
 , b =

b1
b2
...
bm

2.3 Properties of Linear Programming Problems
Linear programming problems have several important properties which allow us to
know a solution exists as long as the feasible region is bounded. The feasible region
is said to be bounded if there exists a number M such that |xi| ≤ M holds for
every feasible point (x1, x2, . . . , xn) and each i = 1, 2, . . . , n. [8]
1. If the set of linear constraints defines bounded feasible region, then there exists
a point in that region that maximizes the objective function and a point that
minimizes the objective function.
2. If a maximum occurs in the feasible region, it must be a vertex point of the
feasible region.
3. If a minimum occurs in the feasible region, it must be a vertex point of the
feasible region.
2.4 The Simplex Method
The simplex method was the first algorithm which successfully solved large linear
programming problems. It functions by first finding a basic feasible point, if one is
available, and checks to see if it gives a maximum value for the objective function.
If not, it searches for a new point in the feasible region that yields a higher value for
5
the objective function and repeats this process until a maximum is obtained. The
following example shows the basic steps of the simplex method.
2.4.1 Example Problem
Maximize x1 + x2 = z
Subject to 2x1 + x2 ≤ 4
x1 + 2x2 ≤ 3
x1, x2 ≥ 0
Figure 1: Graph showing the feasible region for the example problem
The first step in the simplex method is to introduce new variables for each constraint,
known as slack variables, which are used to turn the inequalities into equalities.
In this case, we define the slack variables, yi, as the positive difference between the
two sides of the inequalities. We also rewrite the objective function with all the
variables on one side to allow for easier calculation. So we now have the following
6
equations, with the rewritten objective function placed on the bottom for the sake
of convention.
2x1 + x2 + y1 = 4
x1 + 2x2 + y2 = 3
−x1 − x2 + z = 0
x1, x2, y1, y2 ≥ 0
We can write these equations in an augmented matrix form which allows us to
perform row operations to obtain a solution in the form of (x1, x2, y1, y2). This form
is know as the simplex tableau.
x1 x2 y1 y2 z
2 1 1 0 0 41 2 0 1 0 3
-1 -1 0 0 1 0
Now that the equations are in matrix form, we can look for our initial solution. The
first step is to look for all the variables whose columns which consist of 0’s and a
single 1. These variables are known as basic variables. The initial solution consists
of setting the value of all the non-basic variables to zero and solving for the basic
variables. For this problem, this gives us the solution (0, 0, 4, 3) and a value of 0 for
z. From here, we can check to see if this is the optimized solution by looking at the
bottom row of the matrix. The negative entries indicate that we do not have an
optimized solution since increasing either of their values would lead to an increase
in z. Therefore, we must continue with the algorithm.
The next step in the process is to find the variable in the last row which is the
most negative and choose one of the entries in that column as a pivot. If there are
equivalent numbers, the we pick the leftmost column. To determine which entry to
7
use as a pivot, we pick the row whose value has the smallest value when dividing
the corresponding value in the right-most column. This selection process for the
pivot is known as Bland’s Rule. In this case, we choose the x2 entry in the 2nd
row since 3
2
< 4
1
.
x1 x2 y1 y2 z
1
1
2
1
2
0 0 2
0 3
2
−1
2
1 0 1
0 −1
2
1
2
0 1 2
Now we repeat the same process for the new basic and non-basic variables, and
arrive at a solution of (2, 0, 0, 1) with a value of 2 for z. While this new solution
has a larger value for the objective function, it still is not the most efficient solution
since we can still increase its value by increasing x2. Therefore, we repeat the same
process of finding the most negative value in the last row and using it as a pivot
column over and over till all the entries in the last row are positive, since that would
indicate that we have an efficient solution. In this case, we only need to do that one
more time before we get
x1 x2 y1 y2 z
1 0
1
3
−1
3
0 5
3
0 1 −1
3
2
3
0 2
3
0 0 1
3
1
3
1 7
3
This gives us a solution of (5
3
, 2
3
, 0, 0) and z = 7
3
which we know is the optimal
solution since our objective function would be
z =
7
3
− 1
3
y1 − 1
3
y2
and since the values of y1, y2 can only be positive, that indicates that the value of z
8
cannot exceed 7
3
.
General Steps of the Simplex Algorithm
1. Add slack variables to each inequality to change them into equalities.
2. Format the equations into an augmented matrix form, also know as the simplex
tableau.
3. For the initial solution, set the value of all non-basic variables to 0 and solve
for the value of the basic variables.
4. Check if there are any negative values for any of the non-basic variables in
the last row. If there are any, then the solution is not the most efficient.
Otherwise, the solution is optimal and you are done.
5. Find the most negative value in the last row, picking the leftmost value in the
case of a tie.
6. Find out which element in that column has the smallest value when dividing
the rightmost element of its row. That element becomes a new pivot. Make
appropriate changes to the matrix to reflect this.
7. Return to step 4.
2.5 Degeneracy
There is a special case, known as degeneracy, that can sometimes occur when car-
rying out the simplex algorithm. Degeneracy occurs when the value of one of the
coefficients in the rightmost column is 0. This causes the value of the objective func-
tion to not change during an iteration of the simplex algorithm despite the variables
being different. Degeneracy is not a major issue in linear programming problems
9
since its primary drawback is that is causes the simplex method to carry out extra
iterations [11]. There is a rare case where degeneracy can cause cycling, in which
the simplex method gets caught cycling through the same feasible solutions. How-
ever, cycling can avoided by applying Bland’s rule when carrying out the algorithm,
which determines which value is chosen as the next pivot.
3 Regression
Regression analysis is a set of techniques used to make predictions about the re-
lationship between variables. Like linear programming, regression makes extensive
use of matrices and linear algebra. Regression has applications in almost every field,
whether it is education or weather, physics or biology. Though there are many dif-
ferent regression techniques available, we will only make use of a few of them related
to linear models and checking their validity. We can make use of regression analy-
sis to find equations that fit data on demand, shipping prices, and other variables.
Using those equations, we can make predictions for future values of those variables
and use that information in our model. The regression techniques that we will make
use of are linear regression, residual analysis, and forward/backward selection.
One of the most basic linear regression techniques is ordinary least squares.
This method seeks to fit a line to a set of data by choosing the line which minimizes
the sum of the squared errors (SSE). The errors, also know as residuals, are the
difference between the predicted values and the actual values. The predicted values
are the y values given by the fitted line for any particular x. The fitted line will
have predicted value for the coefficients whose value is indicated by the associated p-
value. A small p-value, α < 0.05, means that the term has a statistically significant
contribution to the model. If the p-value is large, we can remove that coefficient
10
from the model to improve it. In this work, we use linear regression to create and
test how various models fit the data and use them to make predictions for future
data points.
Figure 2: Graph showing data with a fitted line and residuals
Regression also includes many different ways to test if the models are reliable. Ex-
amining the plots of the residuals is one way to ensure the model fits the data well.
If the model is appropriate for the data, then we would expect the residuals to be
normally distributed as shown in the first plot in figure 3. However, some times the
model chosen is not appropriate to the data and we get residual plots like the second
one where the residuals form a parabolic shape. That suggests that the data would
be better described by a polynomial model. In order to compare different models
to determine which one fits the data the best, we will use Akiake’s Information
Criterion (AIC) which measures model quality based of the number of parameters
and statistical goodness of fit. Similarly the Bayesian Information Criterion
(BIC) also measures model quality based on the same criteria, but tends to favor
models with fewer parameters when compared with the AIC. Ideally, the best model
11
Figure 3: Plots of the residuals for two different fitted lines
will have both the smallest AIC and BIC value. The formulas for AIC and BIC are
AIC = −2 log(L) + 2K BIC = −2 log(L) +K log(n)
where L is the value of the likelihood function, K is the number of parameters, and
n is the number of observations.
3.1 Example
In this example we will generate a dataset and demonstrate how to evaluate the
degree to which a particular model fits the data. The procedure used to generate
the data and all relevant MATLAB code are included in the Appendix. The first
step when attempting to create a model for the data is to plot the data and try to
determine a potential model based on the shape of the graph.
Looking at the graph, there is a clear positive correlation between X and Y, which
appears to be roughly linear, so our initial model will be Y = β0 + β1X. Using the
fitlm function using the basic linear model yields the result
12
. 
0 
N 
0 
N 
q 
... 
.. 
...--------1'-:~~~~~-o:~ 
... 
.. 
.. 
i 
I 
10 
0 
. 
:i! 
!al 
~ 
0 
~ . . 
0 50 100 150 
F.ued-
Figure 4: Scatter Plot of Sample Data
Esitmate SE tStat pValue
(Intercept) 22.235 4.2922 5.1804 1.1849e− 06
X 1.4401 0.078442 18.359 1.7449e− 33
Figure 5: Sample Data with Fitted Linear Model
Both p-values for the coefficients are extremely small, which means that both terms
have a statistically significant effect on the model. This model also has a root mean
13
squared error (RMSE) of 21.5 and and R2 value of 0.775. Though this may appear
to be a good fit for the data, however, it is always a good idea to test other models
to see if a better fit can be achieved. For this example, we can look at a plot of
the residuals to see if there are any trends which would suggest that a better model
exists. If the current model is a good fit, then the residuals will display a normal
distribution.
Figure 6: Plot of the Residuals for the Linear Model
Looking at the plot of the residuals versus the fitted values, there is a small but
definite curved shape to them which indicates that a polynomial model might be a
better fit. The first polynomial model we will try will be Y = β0 + β1X + β2X
2.
Esitmate SE tStat pValue
(Intercept) 42.878 5.7545 7.4512 3.8595e− 11
X 0.13419 0.27851 0.48182 0.63102
X2 0.013769 0.0028403 4.8478 4.7436e− 06
Though this model has a smaller RMSE of 19.4 and a larger R2 of 0.819, the X
14
coefficient is not significant with a p-value of 0.63102, so we remove that term and
try a reduced model with just the X2 and intercept terms.
Esitmate SE tStat pValue
(Intercept) 45.273 2.8896 15.667 1.9427e− 28
X 0.015093 0.00071858 21.004 4.6156e− 38
Figure 7: Sample data with Fitted Polynomial Model
In addition to having both coefficients have significant p-values, this model has the
best RMSE of 19.3 and highest R2 value of 0.818, which indicates that it fits the data
better than the other two. The graph of the residuals has a normal distribution of
values, and if compare the calculated values to the original function used to generate
the data shows that the model is rather accurate.
15
4 MATLAB Implementation
In this section I will walk through the steps of creating and implementing both the
linear optimization and regression program. The basic situation I want to model
is the production and distribution of one or more products. The image below de-
picts a very basic model where three distribution nodes are each connected to two
destination nodes.
Figure 8: An example network with n = 3,m = 2
4.1 Basic Model
The first step in the process is to identify the variables we want to consider and
represent the problem in mathematical terms. In this basic model, for the sake of
simplicity we will only consider a system of factories and stores, leaving the inclusion
of distribution/storage centers to a later model. The variables of interest are:
 F - number of factories
16
 S - number of stores
 P - number of products
 Dps - demand for a product at a store
 PLfp - production limit at a factory for a particular product
 TCp - the transportation cost of a product
 PCfp - production cost of a product at a particular factory
 DIfs - distance between a factory and store
The goal is to minimize the total costs, while having the demand for every product
and store satisfied. Therefore the objective function for this problem would be the
total production costs for all of the products and the total transportation costs. In
the objective function x(f, p, s) represents the amount of a product that a particular
factory sends to a particular store.
z =
F∑
i=1
P∑
j=1
S∑
k=1
x(f, p, s) ∗ (PCfp + TCp ∗DIfs)
For the constraints, we already have the condition that all the demands need to
be satisfied, and the only other conditions are that the production limits are not
exceeded.
S∑
k=1
x(f, p, s) ≤ PLfp f = 1, 2, ..., F and p = 1, 2, ..., P
F∑
i=1
x(f, p, s) = Dps p = 1, 2, ..., P and s = 1, 2, ..., S
Now that we have the objective function and all the constraints defined, we now
need to set up the problem so that it can be solved in MATLAB. For this problem,
17
we will be using the linprog function from MATLAB’s optimization toolbox and will
use the dual-simplex algorithm.
4.2 Annotated Code
In order to solve a linear programming problem in MATLAB using linprog, we
need to create and fill several matricies containing the coefficients for the objective
function, inequalities, equalities, and any lower or upper bounds that exist.
%=========================================================================
%This program provides an efficent solution to the system of factories and
%stores with multiple products
%=========================================================================
% Set the rng to default to ensure code works, change later for tests
rng('default')
% Number of factories, stores, and products
Size = 8;
Size2 = Size*Size;
F = floor(.1 *Size2);
S = floor(.12 *Size2);
P = 3;
% Randomly determine constraints for the problem for testing data
d = round(40*rand(P,S) + 10); %Demand at stores (10-50)
pl = round(35*rand(F,P) + 40); %Production limit at factories (40-75)
tc = round(2*rand(1,P) + 1); %Transportation cost per unit distance (1-3)
pc = round(10*rand(F,P) + 5); %Production cost (5-15)
% Place the factories and stores onto a grid
xy = randperm(Size2,F+S);
[x,y] = ind2sub([Size Size],xy);
% Plot the data on a graph for a visual representation
h = figure;
plot(x(1:F),y(1:F),'ˆ',x(F+1:F+S),y(F+1:F+S),'rs')
legend('Factory','Store','Location','WestOutside')
xlim([0 Size+1]);ylim([0 Size+1])
% Determine the distance between each pair of F and S, no diagonals
dist = zeros(F,S);
for a = 1:F
for b = 1:S
18
dist(a,b) = abs(x(a)-x(F+b))+abs(y(a)-y(F+b));
end
end
% Create the objective function
Z = zeros(F,P,S);
for a = 1:F
for b = 1:P
for c = 1:S
Z(a,b,c) = pc(a,b) + dist(a,c)*tc(b);
end
end
end
Z1 = Z(:);
w = length(Z1); %Size of constraints determined by length of Z
% Create the empty inequality matricies
A = sparse(F*P,w,F*P*S);
b = zeros(F*P,1);
% Create the empty equality matricies
Aeq = sparse(P*S,w,F*P*S);
beq = zeros(P*S,1);
% Create empty rows to fill in with data
empty = zeros(size(Z));
empty1 = zeros(1,F*S);
% Fill in the matrix A and matrix b
count = 1;
for m = 1:F
for n = 1:P
entry = empty;
entry(m,n,:) = 1;
entry = sparse(entry(:));
A(count,:) = entry';
b(count) = pl(m,n);
count = count+1;
end
end
% Fill matricies Aeq and beq, assume demand is always met
count = 1;
for q = 1:P
for r = 1:S
entry = empty;
entry(:,q,r) = 1;
entry = sparse(entry(:));
Aeq(count,:) = entry';
beq(count) = d(q,r);
19
count = count+1;
end
end
lb = zeros(w,1); %sets the lower bound as 0
ub = []; %no upper bound for values
opts = optimoptions('linprog','Display','final','Algorithm','dual-simplex');
[x,fval,exitflag,output] = linprog(Z,A,b,Aeq,beq,lb,ub,[],opts);
%Shows allocation reccomendations, rows are different factories,
%columns are different products
FIN = reshape(x,[F,P,S])
fval,exitflag,output;
4.3 Regression Application
For most realistic production networks, the amount of demand for a particular
product is never a constant value. Therefore, in this system I plan to use regression
to predict future values for the demand. For the purposes of testing, and since I do
not have access of specific demand data for specific products, I will model demand
with the assumption that it follows one of the following forms with a single variable:
constant Y = β0, linear Y = β0 + β1X, or polynomial Y = β0 + β1X + β2X
2. It
is definitely possible and not very difficult to include more variables. However, the
main goal of this section is to lay out a basic system for comparing models and
determining the best fit.
Given the data set of quarterly demand (generated through a process which can be
found in the Appendix), we first need a method to read the data and convert in into
variables that can be used in the program. The dlmread command allows for easy
converting of data in text files to a usable matrix regardless of whether the data is
seperated by commas, spaces, or some other delimiter.
20
The next step is to test the two models and compare the AIC and BIC values to
determine which one is superior. Though we could use the F test to compare the
two different models in this example because the polynomial model contains all the
terms of the linear model, AIC and BIC will be used because they are also applicable
in the cases where the models are not nested which gives the possibility of including
more models in the comparison if desired. Ideally, this program would select the best
model automatically and then give an estimate for the value of the demand for the
coming quarter. Therefore, we need to create a function which selects the lowest
AIC and BIC values, chooses that model, and then uses that model to estimate
demand. The fitlm command already calculates the values of the AIC and BIC for
the models, so we can simply call on that value for comparison. In the case that the
AIC and BIC do not agree on the best model we will default to the choice indicated
by the AIC, however in the majority of cases that should not be an issue.
Finally, we will have the program take the model which has the best prediction, and
output the prediction to a text file autmatically which can be read by the linear
optimization program.
%=========================================================================
%This program reads data from a text file and tests several different
%polynomial and linear models for fit, chooses the best one, and prints the
%results to a text file.
%=========================================================================
A = dlmread('file.txt','\t',1)
%modify the quarter data so q1 2013 is 2013, q2 2013 is 2013.25, q3 2013 is
%2013.5, and q4 2013 is 2013.75
D = [A(:,1) + .25*(A(:,2)-1) A(:,3)]
tbl=table(D(:,1),D(:,2),'VariableNames',{'X','Y'});
%test each of the models using fitlm
mdl1 = fitlm(tbl,'Y~1')
MC1 = [mdl1.ModelCriterion.AIC,mdl1.ModelCriterion.BIC]
21
mdl2 = fitlm(tbl,'Y~X')
MC2 = [mdl2.ModelCriterion.AIC,mdl2.ModelCriterion.BIC]
mdl3 = fitlm(tbl,'Y~Xˆ2-X')
MC3 = [mdl3.ModelCriterion.AIC,mdl3.ModelCriterion.BIC]
mdl4 = fitlm(tbl,'Y~Xˆ2')
MC4 = [mdl4.ModelCriterion.AIC,mdl4.ModelCriterion.BIC]
comp = [ MC1 ; MC2 ; MC3 ; MC4]
[M,I] = min(comp)
%select the best model based on lowest AIC
select = min(I(1),I(2))
%give a warning if the two measurements do not agree
if I(1)==I(2);
else
disp('Warning: AIC and BIC do not agree on solution')
end
%Set the value for the prediction to be one quarter later than the most
%recent demand value provided
Xnew = D(end/2) + .25
%display value for the estimate of the selected model
p=[]
p(1) = predict(mdl1,Xnew)
p(2) = predict(mdl2,Xnew)
p(3) = predict(mdl3,Xnew)
p(4) = predict(mdl4,Xnew)
p(select)
%Change the prediction time period into the old form with the prediction
%attached
Z1 = [mod((Xnew/.25),4)+1,floor(Xnew),p(select)]
%Print the results to a text file.
fileID = fopen('Results.txt','w');
formatSpec = ...
'The predicted demand in Quarter %1.0f of %4.0f is %8.4f units.\n';
fprintf(fileID,formatSpec,Z1);
fclose(fileID);
22
5 Appendix
Regression Example Code
%=========================================================================
%This code was used for solving the example problem in the regression
%section
%=========================================================================
%Generate 100 nubmers from 0-100 from a uniform distribution and order them
X = 0 + (100)*rand(100,1)
X = sort(X)
%Generate Y values based on the function Y = 45 +.015Xˆ2
Y2= []
for a = 1:100;
Add = 45 + (20)*randn + .015*X(a)ˆ2;
Y2 = [Y2;Add];
end
Y = Y2
%Create scatter plot for data
scatter(X,Y,'x')
%Test the model Y = aX + b and create appropriate plots
tbl = table(X,Y)
mdl = fitlm(tbl,'Y~X')
plot(mdl)
plotResiduals(mdl,'fitted')
mdl.ModelCriterion.AIC
%Test the model Y = aXˆ2 + bX + c
mdl = fitlm(tbl,'Y~Xˆ2')
%Test the model Y = aXˆ2 + b and create appropriate plots
mdl = fitlm(tbl,'Y~Xˆ2-X')
plot(mdl)
plotResiduals(mdl,'fitted')
%Compare result to original function
hold on
plot(X,45+.015*X.ˆ2)
hold off
23
Data Generation Code
%=========================================================================
%This program generates data that will be used to test the effectiveness of
%the regression program in choosing the best model for the data.
%=========================================================================
%sets the random seed for consistency when testing and for replicability
rng('default')
%Creating Demand data for quarters from 2007-2016
A = 2003:1:2015
B = 1:1:4
[B,A] = meshgrid(B,A)
C = cat(2,A',B')
D = reshape(C,[],2)
Z = [D(:,1) + .25*(D(:,2)-1)]
%function is same as 2.4Zˆ2 -9.55e3 Z + 9.5992e6
for num = 1:52
E(num) = ( 200 + .5*(Z(num)-2000)+ 2.4*(Z(num)-2000)ˆ2 )+ 50*randn;
end
F = [D E']
%Writes the data into a text file called file.txt
name = {'Year';'Quarter';'Demand'}
T = table(F(:,1),F(:,2),F(:,3),'VariableNames',name)
writetable(T,'file.txt','Delimiter','\t','WriteRowNames',true)
24
References
[1] Arsham, H. (2015). Deterministic Modeling: Linear Optimization with Applica-
tions. Retrieved May 6, 2015.
[2] Attaway, S. (2013). MATLAB: A Practical Introduction to Programming and
Problem Solving. Waltham, MA: Elsevier.
[3] Bixby, R. E. (2012). A brief history of linear and mixed-integer programming
computation. Documenta Mathematica, pp. 107-121.
[4] Dantzig, G. B. (1951). Application of the Simplex Method to a Transporta-
tion Problem. In T. C. Koopmans (Ed.), Activity Analysis of Pbroduction and
Allocation. pp. 359-373. New York: John Wiley & Sons, Inc.
[5] Dantzig, G. B. (2002). Linear Programming. Operations Research 50(1): pp.
42-47.
[6] Sierksma, G. (2001). Linear and Integer Programming: Theory and Practice.
New York, NY: CRC Press.
[7] Klee, Victor, Minty, George J. (1972). How good is the simplex algorithm?. In
Shisha, Oved. Inequalities III. New York-London: Academic Press. pp. 159-175
[8] Nicholson, K.W. (1990). Elementary Linear Algebra. Boston, MA: PWS-Kent.
[9] Reeb, J.E., & Leavengood, S.A. (2002). Transportation Problem: A Special Case
for Linear Programming Problems. Corvallis, OR: Extension Service, Oregon
State University.
[10] Vanderbei, Robert J. (2008). Linear Programming Foundations and Extensions.
New York, NY: Springer.
[11] Xie, F. (2010). An Example of Degeneracy in Linear Programming. Retrieved
May 6, 2015.
25