The strong and generally similar-looking trends suggest that we will get a very high value of R-squared if we regress sales on income, and indeed we do. Here is the summary table for that regression:. However, a result like this is to be expected when regressing a strongly trended series on any other strongly trended series , regardless of whether they are logically related.
Here are the line fit plot and residuals-vs-time plot for the model:. The residual-vs-time plot indicates that the model has some terrible problems. First, there is very strong positive autocorrelation in the errors, i. In fact, the lag-1 autocorrelation is 0. It is clear why this happens: the two curves do not have exactly the same shape. The trend in the auto sales series tends to vary over time while the trend in income is much more consistent, so the two variales get out-of-synch with each other.
This is typical of nonstationary time series data. And finally, the local variance of the errors increases steadily over time.
The reason for this is that random variations in auto sales like most other measures of macroeconomic activity tend to be consistent over time in percentage terms rather than absolute terms, and the absolute level of the series has risen dramatically due to a combination of inflationary growth and real growth. As the level as grown, the variance of the random fluctuations has grown with it.
Confidence intervals for forecasts in the near future will therefore be way too narrow, being based on average error sizes over the whole history of the series. So, despite the high value of R-squared, this is a very bad model.
One way to try to improve the model would be to deflate both series first. This would at least eliminate the inflationary component of growth, which hopefully will make the variance of the errors more consistent over time. Here is a time series plot showing auto sales and personal income after they have been deflated by dividing them by the U.
This does indeed flatten out the trend somewhat, and it also brings out some fine detail in the month-to-month variations that was not so apparent on the original plot.
In particular, we begin to see some small bumps and wiggles in the income data that roughly line up with larger bumps and wiggles in the auto sales data. If we fit a simple regression model to these two variables, the following results are obtained:. Adjusted R-squared is only 0. Well, no.
Because the dependent variables are not the same, it is not appropriate to do a head-to-head comparison of R-squared. Arguably this is a better model, because it separates out the real growth in sales from the inflationary growth, and also because the errors have a more consistent variance over time. The latter issue is not the bottom line, but it is a step in the direction of fixing the model assumptions.
Most interestingly, the deflated income data shows some fine detail that matches up with similar patterns in the sales data. However, the error variance is still a long way from being constant over the full two-and-a-half decades, and the problems of badly autocorrelated errors and a particularly bad fit to the most recent data have not been solved.
Another statistic that we might be tempted to compare between these two models is the standard error of the regression, which normally is the best bottom-line statistic to focus on. But wait… these two numbers cannot be directly compared, either, because they are not measured in the same units.
The standard error of the first model is measured in units of current dollar s, while the standard error of the second model is measured in units of dollar s. Those were decades of high inflation, and dollars were not worth nearly as much as dollars were worth in the earlier years. In fact, a dollar was only worth about one-quarter of a dollar. The slope coefficients in the two models are also of interest.
Because the units of the dependent and independent variables are the same in each model current dollars in the first model, dollars in the second model , the slope coefficient can be interpreted as the predicted increase in dollars spent on autos per dollar of increase in income.
The slope coefficients in the two models are nearly identical: 0. Notice that we are now 3 levels deep in data transformations: seasonal adjustment, deflation, and differencing! This sort of situation is very common in time series analysis. This model merely predicts that each monthly difference will be the same, i. Adjusted R-squared has dropped to zero!
We should look instead at the standard error of the regression. The units and sample of the dependent variable are the same for this model as for the previous one, so their regression standard errors can be legitimately compared. The sample size for the second model is actually 1 less than that of the first model due to the lack of period-zero value for computing a period-1 difference, but this is insignificant in such a large data set.
The regression standard error of this model is only 2. The residual-vs-time plot for this model and the previous one have the same vertical scaling: look at them both and compare the size of the errors, particularly those that have occurred recently. It is often the case that the best information about where a time series is going to go next is where it has been lately. There is no line fit plot for this model, because there is no independent variable, but here is the residual-versus-time plot:.
These residuals look quite random to the naked eye, but they actually exhibit negative autocorrelation , i. The lag-1 autocorrelation here is This often happens when differenced data is used, but overall the errors of this model are much closer to being independently and identically distributed than those of the previous two, so we can have a good deal more confidence in any confidence intervals for forecasts that may be computed from it.
Of course, this model does not shed light on the relationship between personal income and auto sales. So, what is the relationship between auto sales and personal income? That is a complex question and it will not be further pursued here except to note that there some other simple things we could do besides fitting a regression model. For example, we could compute the percentage of income spent on automobiles over time , i.
In some fields, it is entirely expected that your R-squared values will be low. Humans are simply harder to predict than, say, physical processes.
Furthermore, if your R-squared value is low but you have statistically significant predictors, you can still draw important conclusions about how changes in the predictor values are associated with changes in the response value. Regardless of the R-squared, the significant coefficients still represent the mean change in the response for one unit of change in the predictor while holding other predictors in the model constant. Obviously, this type of information can be extremely valuable.
See a graphical illustration of why a low R-squared doesn't affect the interpretation of significant variables. A low R-squared is most problematic when you want to produce predictions that are reasonably precise have a small enough prediction interval.
How high should the R-squared be for prediction? Well, that depends on your requirements for the width of a prediction interval and how much variability is present in your data. A high R-squared does not necessarily indicate that the model has a good fit. That might be a surprise, but look at the fitted line plot and residual plot below. The fitted line plot displays the relationship between semiconductor electron mobility and the natural log of the density for real experimental data.
The fitted line plot shows that these data follow a nice tight function and the R-squared is However, look closer to see how the regression line systematically over and under-predicts the data bias at different points along the curve. You can also see patterns in the Residuals versus Fits plot, rather than the randomness that you want to see.
This indicates a bad fit, and serves as a reminder as to why you should always check the residual plots. This example comes from my post about choosing between linear and nonlinear regression. In this case, the answer is to use nonlinear regression because linear models are unable to fit the specific curve that these data follow. Portfolio Management. Financial Analysis. Risk Management. Advanced Technical Analysis Concepts. Fundamental Analysis. Your Privacy Rights. To change or withdraw your consent choices for Investopedia.
At any time, you can update your settings through the "EU Privacy" link at the bottom of any page. These choices will be signaled globally to our partners and will not affect browsing data.
We and our partners process data to: Actively scan device characteristics for identification. I Accept Show Purposes. Your Money. Personal Finance. Your Practice. Popular Courses. Financial Ratios Guide to Financial Ratios. What Is R-Squared? Compare Accounts. The offers that appear in this table are from partnerships from which Investopedia receives compensation. This compensation may impact how and where listings appear. Investopedia does not include all offers available in the marketplace.
0コメント