Abstract:
In making use of the theory of linear regression to obtain an estimation of a dependent variate from the information contained in an independent variate, one frequently is faced with the problem of having the independent variable given in a non-quantitative manner. In these cases the independent variable usually is classified into ordered groups. In order to use the theory of regression one must assign a numerical weight to each of theses groups. It is the purpose of this paper to consider the problem of determining these weights. The data considered in this paper will be assumed to be bivariate with the dependent variable quantitatively measured and the independent variable classified into ordered groups. Numerical weights are to be determined such that the regression equation thereupon obtained will give the best estimate of the dependent variable. At present the usual practice is to obtain these weights in a more or less subjective manner. In considering this problem Frank A. Pearson state that since there is no numerical value given to the classes, "... a unite rate of change cannot be calculated for a relationship in which the independent variable is non-numerical", while Ezekial, in the recent edition of his book, makes the following statement. "In case a non-quantitative factor is a very important one, so that ignoring it in determining the net linear regressions may seriously impair their accuracy, it may be roughly included by designating successive groups by a numerical code which approximate the expectated influence of the bariable." The literature contains very little in the way of a direct reference to this problem as it arises in connection with regression, however, one finds in the literature many references to the problem of estimating the correlation coefficient from qualitative data. Among the references available, an assignment of weights may be made incidental to the estimation of the correlation coefficient. For this reason, and since the problem of correlation is so closely related to that of regression, the principle methods of determining the correlation coefficient for nonquantiative data will be given before the actual problem of this paper is considered. Following the discussion of these methods, a method of assigning weights to the ordered classes of an independent variable, which is based on minimizing the standard error of estimate, is developed. This will be followed by a numerical example to illustrate this method for determination of weights and the result obtained compared with those obtained for several other choices of weights. A discussion of other definitions of what might be considered as the best estimate of the dependent variable together with a few summarizing remarks will conclude the paper.