Introductory econometrics wooldridge solution manual pdf




















The signs of the estimated slopes imply that more spending increases the pass rate holding lnchprg fixed and a higher poverty rate proxied well by lnchprg decreases the pass rate holding spending fixed. These are what we expect. Presumably this is well outside any sensible range. This makes sense, especially in in Michigan, where school funding was essentially determined by local property tax collections.

Intuitively, 1 1 failing to account for the poverty rate leads to an overestimate of the effect of spending. Therefore, the variables giftlast and propresp help to explain significantly more variation in gifts in the sample although still just over eight percent.

The simple regression estimate is 2. Remember, the simple regression estimate holds no other factors fixed. Such an increase can happen only if propresp goes from zero to one. Instead, consider a. Then, gift is estimated to be A negative relationship makes some sense, as people might follow a large donation with a smaller one.

Homoskedasticity is one of the CLM assumptions. An important omitted variable violates Assumption MLR. The CLM assumptions contain no mention of the sample correlations among independent variables, except to rule out the case where the correlation is one.

For such a large percentage increase in sales, this seems like a practically small effect. The t statistic is. Its t statistic is only 1. The t statistic on hrsemp has gone from about —1. From Table G. We are interested in the coefficient on log employ , which has a t statistic of. Therefore, we conclude that the size of the firm, as measured by employees, does not matter, once we control for training and sales per employee in a logarithmic functional form.

The t statistic is [—. In fact, the p-value is about. These variables are jointly significant, but including them only changes the coefficient on totwrk from —. If there is heteroskedasticity in the equation, the tests are no longer valid. It appears that, once firm sales and market value have been controlled for, profit margin has no effect on CEO salary.

The t statistic on log mktval is about 2. We can use the standard normal critical value, 1. So log mktval is statistically significant. This is not a huge effect, but it is not negligible, either. Other factors fixed, another year as CEO with the company increases salary by about 1.

On the other hand, another year with the company, but not as CEO, lowers salary by about. More non-CEO years with a company makes it less likely the person was hired as an outside superstar. These effects certainly cannot be ignored. The t statistic for the hypothesis in part ii is —. The estimate implies that one more run per year, other factors fixed, increases predicted salary by about 1. Most major league baseball players are pretty good fielders; in fact, the smallest fldperc is which means.

With relatively little variation in fldperc, it is perhaps not surprising that its effect is hard to estimate. The F statistic for their joint significance with 3 and df is about. Therefore, these variables are jointly very insignificant. If we increase phsrank by 10, log wage is predicted to increase by. This implies a. However, the sample standard deviation of phsrank is about Therefore, the base point remains unchanged: the return to a junior college is estimated to be somewhat smaller, but the difference is not significant and standard significant levels.

Therefore, id should not be correlated with any variable in the regression equation. It should be insignificant when added to 4. In fact, its t statistic is about. The two-sided p- value is zero to three decimal places. All of the control variables — log income , prppov, and log hseval — are highly correlated, so it is not surprising that some are individually insignificant.

It holds fixed three measure of income and affluence. A normally distributed random variable takes on no particular value with positive probability. Further, the distribution of cigs is skewed, whereas a normal random variable must be symmetric about its mean. The histogram uses 27 bins, which is suggested by the formula in the Stata manual for observations. For comparison, the normal distribution that provides the best fit to the histogram is also plotted. The histogram for the residuals from this equation, with the best-fitting normal distribution overlaid, is given below: Certainly the histogram in part ii fits under its comparable normal density better than in part i , and the histogram for the wage residuals is notably skewed to the left.

Residuals far from zero does not appear to be nearly as much of a problem in the log wage regression. The R-squared from this regression, Ru2 , is about. With 1, observations, the chi-square statistic is 1, Plus, having the squared term has only a minor effect on the slope even for large values of roe. The approximate slope is. Its t statistic is about —1. The standard error gets multiplied by the same factor. Of course the interpretation of the two equations is identical once the different scales are accounted for.

Performances on math and science exams are measures of outputs of the educational process, and we would like to know how various educational inputs and school characteristics affect math and science scores. For example, if the staff-to-pupil ratio has an effect on both exam scores, why would we want to hold performance on the science test fixed while studying the effects of staff on the math pass rate?

This would be an example of controlling for too many factors in a regression equation. The variable scill could be a dependent variable in an identical regression equation.

The second equation is also easier to interpret than the third. The only issue is whether the point prediction is below the upper bound. Most of the time, the estimated SER is well below that. The effect is much smaller now, and statistically insignificant.

This is because we have explicitly controlled for several other factors that determine the quality of a home such as its size and number of baths and its location distance to the interstate. This is consistent with the hypothesis that the incinerator was located near less desirable homes to begin with. The coefficient on log dist is now very statistically significant, with a t statistic of about three. Just adding [log inst ]2 has had a very big effect on the coefficient important for policy purposes.

This means that distance from the incinerator and distance from the interstate are correlated in some nonlinear way that also affects housing price. We can find the value of log inst where the effect on log price actually becomes negative: 2.

When we exponentiate this we obtain about 5, feet from the interstate. Therefore, it is best to have your home away from the interstate for distances less than just over a mile.

After that, moving farther away from the interstate lowers predicted house price. Therefore, it is not necessary to add this complication. The t statistic on the interaction term is about 2. We want the coefficient on educ. The square of this, or roughly. Therefore, for predicting price, the log model is notably better.

In equation 6. So we can write equation 6. So, the increase from 15 to 16 years of experience would actually reduce salary. This is a very high level of experience, and we can essentially ignore this prediction: only two players in the sample of have more than 15 years of experience.

These top players command the highest salaries. It is not more college that hurts salary, but less college is indicative of super-star potential. Its t statistic is barely above one, so we are justified in dropping it. The coefficient on age in the same regression is —3. Together, these estimates imply a negative, increasing, return to age. The turning point is roughly at 74 years old. In any case, the linear function of age seems sufficient. With 2 and df, this gives a p- value of roughly.

Therefore, once scoring and years played are controlled for, there is no evidence for wage differentials depending on age or years played in college. In particular, an increase in ecoprc of. A ceteris paribus increase of 10 cents per lb. These effects, which are essentially the same magnitude but of opposite sign, are fairly large.

The p-values are zero to at least three decimal places. This is much less variation than ecoblbs itself, which ranges from 0 to 42 although 42 is a bit of an outlier. This is a very small explained variation in ecolbs. So the two price variables do not do a good job of explaining why ecolbsi varies across families.

The p-value for the joint F test with 4 and df is about. Evidently, in addition to the two price variables, the factors that explain variation in ecolbs which is, remember, a counterfactual quantity , are not captured by the demographic and economic variables collected in the survey.

Its one-sided p-value is about. This residual is the difference between the actual pass rate and our best prediction of the pass rate, given the values of spending, enrollment, and the free lunch variable. That is, for school , its pass rate is over 51 points higher than we would expect, based on its spending, size, and student poverty.

Therefore, the quadratics are jointly very insignificant, and we would drop them from the model. Therefore, in standard deviation units, lunch has by far the largest effect.

The spending variable has the smallest effect. Thus, the evidence for a gender differential is fairly strong. The coefficient implies that one more hour of work 60 minutes is associated with.

When age and age2 are both in the model, age has no effect only if the parameters on both terms are zero. Because hsize is measured in hundreds, the optimal size of graduating class is about The t statistic is about — The very large sample size certainly contributes to the statistical significance. The t statistic is over 13 in absolute value, so we easily reject the hypothesis that there is no ceteris paribus difference.

The difference is therefore — Because the estimate depends on two coefficients, we cannot construct a t statistic from the information given. We can then obtain the t statistic we want as the coefficient on the black female dummy variable. For the specific estimates in equation 7. The coefficient on noPC is —. We have only two groups based on PC ownership so, in addition to the overall intercept, we need only to include one dummy variable. If we try to include both along with an intercept we have perfect multicollinearity the dummy variable trap.

From this we see that if we regress outlf on all of the independent variables in 7. In the case of the slopes, changing the signs of the estimators does not change their variances, and therefore the standard errors are unchanged but the t statistics change sign.

But here we are changing the dependent variable. Nevertheless, the R-squareds from the regressions are still the same. To see this, part i suggests that the squared residuals will be identical in the two regressions. For each i the error in the equation for outlfi is just the negative of the error in the other equation for inlfi, and the same is true of the residuals.

Therefore, the SSRs are the same. Further, in this case, the total sum of squares are the same. The estimated effect of PC is hardly changed from equation 7. It is not surprising the estimates on the other coefficients do not change much when mothcoll and fathcoll are added to the regression. The coefficient on hsGPA is about —1. This is a borderline case. The coefficient of main interest, on PC, falls to about.

Adding hsGPA2 is a simple robustness check of the main finding. Using the data in MLB1. The t statistic is about 1. The F statistic, with 5 and df, is about 1. The evidence against the joint null in part ii is weaker because we are testing, along with the marginally significant catcher, several other insignificant variables especially thrdbase and shrtstop, which has absolute t statistics well below one.

The t statistic is about —2. So the differential at When we run this regression we obtain about —. Its standard error is about. In equation 7. Each of the four terms involving inc and age have very significant t statistics. On the other hand, once income and age are controlled for, there seems to be no difference in eligibility by gender. The coefficient on male is very small — at given income and age, males are estimated to have a.

The smallest fitted value is about. This means one theoretical problem with the LPM — the possibility of generating silly probability estimates — does not materialize in this application. Of the 3, families actually eligible, only As we saw there, the model does a good job of predicting when a family is ineligible. However, the t statistic is only about 1. Remember, these are in thousands of dollars.

Therefore, we strongly reject the null hypothesis that there is no difference in the averages. This is just more than half of what is obtained by simply comparing averages. Its coefficient is. It shows that the effect of k eligibility on financial wealth increases with age.

Another way to think about it is that age has a stronger positive effect on nettfa for those with k eligibility. For the regression in part iv , the coefficient on ek from part iv is about 9.

If we evaluate the effect in part iv at a wide range of ages, we would see more dramatic differences. So the family size dummies are jointly significant. With 20 and 9, df, the p-value is essentially zero.

In this case, there is strong evidence that the slopes change across family size. Allowing for intercept changes alone is not sufficient. If you look at the individual regressions, you will see that the signs on the income variables actually change across family size.

If regprc increases by 10 cents, the probability of buying eco-labeled apples increases by about. Of course, we are assuming that the probabilities are not close to the boundaries of zero and one, respectively.

Thus, based on the usual F test, the four non-price variables are jointly very significant. Of the four variables, educ appears to have the most important effect. For example, a difference of four years of education implies an increase of.

This suggests that more highly educated people are more open to buying produce that is environmentally friendly, which is perhaps expected.

Household size hhsize also has an effect. Comparing a couple with two children to one that has no children — other factors equal — the couple with two children has a.

We would not expect a large increase in R-squared from a simple change in the functional form. The coefficient on log faminc is about. If log faminc increases by. There are two fitted probabilities above 1, which is not a source of concern with observations. With the usual prediction rule, the model does a much better job predicting the decision to buy eco-labeled apples.

The homoskedasticity assumption played no role in Chapter 5 in showing that OLS is consistent. But we know that heteroskedasticity causes statistical inference based on the usual t and F statistics to be invalid, even in large samples. When MLR. Without specific information on how the omitted variable is correlated with the included explanatory variables, it is not possible to determine which estimator has a small bias.

For each coefficient, the usual standard errors and the heteroskedasticity-robust ones are practically very similar. This is similar to the effect of having four more years of education.

Thus, the estimated probability of smoking for this person is close to zero. In fact, this person is not a smoker, so the equation predicts well for this particular observation. Estimation is possible, but we do not discuss that here. In any event, the usual weight is incorrect. Thus, attaching large weights to large firms may be quite inappropriate. Because the coefficient on male is negative, the estimated variance is higher for women.

The t statistic on male is only about —1. The p-value is about. The variable neutral has by far the largest effect — if the game is played on a neutral court, the probability that the spread is covered is estimated to be about.

There is essentially no evidence against H0. The explanatory power is very low, and the explanatory variables are jointly very insignificant. The coefficient on neutral may indicate something is going on with games played on a neutral court, but we would not want to bet money on it unless it could be confirmed with a separate, larger sample.

The robust CI still excludes the value zero by some margin. They will not be the same, of course, but they should not be wildly different. Therefore, the usual standard errors, t statistics, and F statistics reported with weighted least squares are not valid, even asymptotically.

All variables that were statistically significant with the nonrobust standard errors remain significant, but the confidence intervals are much wider in several cases.

Therefore, using OLS, we must conclude the interaction term is only marginally significant. But the coefficient is nontrivial: it implies a much more sensitive relationship between financial wealth and income for those eligible for a k plan. Thus, the p-value is slightly above. Of course, whether this has a practical impact on the estimated partial effects for various levels of the explanatory variables is a different matter.

Therefore, the percentage of students eligible for the lunch program is very similar to the percentage of students living in poverty. The variables log expend and lnchprg are negatively correlated: school districts with poorer children spend, on average, less on schools. From Table 3. So when we control for the poverty rate, the effect of spending falls. Therefore, a ten percentage point increase in lnchprg leads to about a 3.

Clearly most of the variation in math10 is explained by variation in lnchprg. This is a common finding in studies of school performance: family income or related factors, such as living in poverty are much more important in explaining student performance than are spending per student or other school characteristics.

Because prospective students may look at campus crime as one factor in deciding where to attend college, colleges with high crime rates have an incentive not to report crime statistics. If this is the case, then the chance of appearing in the sample is negatively related to u in the crime equation.

For a given school size, higher u means more crime, and therefore a smaller probability that the school reports its crime figures. As m grows, the bias disappears completely.

Intuitively, this makes sense. The average of several mismeasured variables has less measurement error than a single mismeasured variable. As we average more and more such variables, the attenuation bias can become very small. With 2 and df, i the F statistic is about 1. But it is probably not strong enough to worry about. In the simple regression model, these are contained in u.

The coefficient on grant is actually positive, but not statistically different from zero. When the largest company is left in the sample, the quadratic term is statistically significant, even though the coefficient on the quadratic is less in absolute value than when we drop the largest firm.

What is happening is that by leaving in the large sales figure, we greatly increase the variation in both sales and sales2; as we know, this reduces the variances of the OLS estimators see Section 3. If we look at Figure 9. Without the largest firm, a linear relationship between rdintens and sales seems to suffice. Data are missing for some variables, so not all of the 1, observations are used in the regressions.

The coefficient on white is about. To three decimal places, these are the same estimates we got when using the entire sample see Computer Exercise C7. Perhaps this is not very surprising since we only lost out of 1, observations. The associated p-value, with 6 and 9, df, is essentially zero.

We are finding that k eligibility has a larger effect on mean wealth than on median wealth. Finding different mean and median effects for a variable such as nettfa, which has a highly skewed distribution, is not surprising.

Apparently, k eligibility has some large effects at the upper end of the wealth distribution, and these are reflected in the mean. The median is much less sensitive to effects at the upper end of the distribution. The positive coefficient means that there is no deterrent effect, and the coefficient is not statistically different from zero. We would not characterize Texas as an outlier. The coefficient changes sign and becomes nontrivial: each execution is estimated to reduce the murder rate by.

So, it is not an outlier here, either. Texas accounts for much of the sample variation in exec, and dropping it gives a very imprecise estimate of the deterrent effect. Most time series processes are correlated over time, and many of them strongly correlated. This means they cannot be independent across observations, which simply represent different time periods. Even series that do appear to be roughly uncorrelated — such as stock returns — do not appear to be independently distributed, as you will see in Chapter 12 under dynamic forms of heteroskedasticity.

This follows immediately from Theorem In particular, we do not need the homoskedasticity and no serial correlation assumptions. Trending variables are used all the time as dependent variables in a regression model. We do need to be careful in interpreting the results because we may simply find a spurious association between yt and trending explanatory variables.

Including a trend in the regression is a good idea with trending dependent or independent variables. As discussed in Section With annual data, each time period represents a year and is not associated with any season. This inclusion of the linear time trend allows the dependent variable and log pcinct to trend over time intt probably does not contain a trend , and the quarterly dummies allow all variables to display seasonality. But a permanent increase means the level of pe increases and stays at the new level, and this is achieved by increasing pet-2, pet-1, and pet by the same amount.

Adding post79 to equation The coefficient on def falls once post79 is included in the regression. Because the dependent and independent variable are in logs, the estimated elasticity of prepop with respect to prgnp is.

The trend is very significant. There is also very strong seasonality in unemployment claims, with 6 of the 11 monthly dummy variables having absolute t statistics above 2. Because this estimate is so large in magnitude, we use equation 7.

We have controlled for a time trend and seasonality, but this may not be enough. This equation implies that if income growth increases by one percentage point, consumption growth increases by. The t statistic on gyt-1 is only about 1. In addition, the coefficient is not especially large. At best there is weak evidence of adjustment lags in consumption. The t statistic on r3t is very small. The estimated coefficient is also practically small: a one- point increase in r3t reduces consumption growth by about.

Higher interest rates imply that T-bill and bond investments are more attractive, and also signal a future slowdown in economic activity. While economic growth can be a good thing for the stock market, it can also signal inflation, which tends to depress stock prices. On the other hand, a one percentage point increase in interest rates decreases the stock market return by an estimated 1.

In other words, we do not know i3t before we know rspt. What the regression in part i says is that a change in i3 is associated with a contemporaneous change in rsp In other words, once seasonality is eliminated, totacc grew by about. There is pretty clear evidence of seasonality. Only February has a lower number of total accidents than the base month, January. The peak is in December: roughly, there are 9. With 11 and 95 df, this give a p-value essentially equal to zero.

As economic activity increases — unem decreases — we expect more driving, and therefore more accidents. The estimate that a one percentage point increase in the unemployment rate reduces total accidents by about 2. A better economy does have costs in terms of traffic accidents. The coefficient on spdlaw implies that accidents dropped by about 5. There are at least a couple of possible explanations. One is that people because safer drivers after the increased speed limiting, recognizing that the must be more cautious.

It could also be that some other change — other than the increased speed limit or the relatively new seat belt law — caused lower total number of accidents, and we have not properly accounted for this change. The coefficient on beltlaw also seems counterintuitive at first. But, perhaps people became less cautious once they were forced to wear seatbelts. The highest value of prcfat is 1. This is a statistically significant effect. The new seat belt law is estimated to decrease the percent of fatal accidents by about.

Interestingly, increased economic activity also increases the percent of fatal accidents. This may be because more commercial trucks are on the roads, and these probably increase the chance that an accident results in a fatality. In fact, the R-squared is practically zero, which means neither gmwage nor gcpi has any effect on employment growth in sector Therefore, there is little evidence that minimum wage growth affects employment growth in sector , either in the short run or the long run.

Neither of these depends on t. Of course, the persistent correlation across time is due to the presence of the time-constant variable, z. The smallest effect is at the twelfth lag, which hopefully indicates but does not guarantee that we have accounted for enough lags of gwage in the FLD model.

While this is greater than one, it is not much greater, and the difference from unity could be due to sampling error. Therefore, we regress yt on zt, zt-1 — zt , zt-2 — zt , … , zt — zt and obtain the coefficient and standard error on zt as the estimated LRP and its standard error. Let Rur2 be the R-squared from this regression. To obtain the restricted R-squared, Rr2 , we need to reestimate the model reported in the problem but with the same observations used to estimate the unrestricted model.

We would find the critical value from the F6, distribution. Therefore, the errors are serially uncorrelated. Especially after detrending there is little evidence of a unit root in log invpc. For log price , the first order autocorrelation is about. After detrending, the first order autocorrelation drops to. We cannot confidently rule out a unit root in log price. Because differencing eliminates linear time trends, it is not surprising that the estimate on the trend is very small and very statistically insignificant.

Only if both parameters are zero does E returnt returnt-1 not depend on returnt The F statistic is about 2. The R- squared is about. The time trend coefficient is very insignificant, so it is not needed in the equation. So the estimated LRP is now negative and significant, which is very different from the equation in levels, This is a good example of how differencing variables before including them in a regression can lead to very different conclusions than a regression in levels.

The coefficient on gct-1 is also practically large, showing significant autocorrelation in consumption growth. This regression basically shows that the change in prcfat cannot be explained by the change in unem or any of the policy variables. It does have some seasonality, which is why the R-squared is. Of course, this is not to say the levels regression is valid. But, as it turns out, we can reject a unit root in prcfat, and so we can at least justify using it in level form; see Computer Exercise Generally, the issue of whether to take first differences is very difficult, even for professional time series econometricians.

This is very little evidence against H0; the null is not rejected at any reasonable significance level. With multiple explanatory variables the formulas are more complicated but have similar features. There is no reason to worry about serial correlation in this example. But any kind of adjustment, either to obtain valid standard errors for OLS as in Section The t statistic is about 2.

This means we should view the standard errors reported in equation The t statistic is well below one in absolute value, so there is no evidence of serial correlation in the accelerator model. If we view the test of serial correlation as a test of dynamic misspecification, it reveals no dynamic misspecification in the accelerator model.

The largest t statistic is on incum, which is estimated to have a large effect on the probability of winning. But we must be careful here. So, for an incumbent Democrat running, we must add the coefficients on partyWH and incum together, and this nets out to about zero.

The economic variables are less statistically significant than in equation The gnews interaction has a t statistic of about 1. Since the dependent variable is binary, this is a case where we must appeal to asymptotics. Unfortunately, we have only 20 observations. The inflation variable has the expected sign but is not statistically significant. So 15 out of 20 elections through are correctly predicted. But, remember, we used data from these years to obtain the estimated equation.

Because this is above. Therefore, there is little evidence of serial correlation in the errors. And, if anything, it is negative. In fact, all heteroskedasticity-robust standard errors are less than the usual OLS standard errors, making each variable more significant.

But we must remember that the standard errors in the LPM have only asymptotic justification. With only 20 observations it is not clear we should prefer the heteroskedasticity-robust standard errors to the usual ones. So there is no evidence that the average price of fish varies systematically within a week. Rough seas as measured by high waves would reduce the supply of fish shift the supply curve back , and this would result in a price increase.

One might argue that bad weather reduces the demand for fish at a market, too, but that would reduce price. If there are demand effects captured by the wave variables, they are being swamped by the supply effects.

We can use the omitted variable bias table from Chapter 3, Table 3. Without wave2 and wave3, the coefficient on t seems to have a downward bias. Since we know the coefficients on wave2 and wave3 are positive, this means the wave variables are negatively correlated with t. In other words, the seas were rougher, on average, at the beginning of the sample period. You can confirm this by regressing wave2 on t and wave3 on t.

Further, the height of the waves is not influenced by past unexpected changes in log avgprc. Therefore, there is strong evidence of positive serial correlation. The coefficient on wave3 drops by a relatively smaller amount, but its t statistic 1. The graph in part iii makes this clear, as does finding that the smallest variance estimate is 2. We should really compare adjusted R-squareds, because the ARCH 1 model contains only two total parameters. Therefore, after adjusting for the different df, the quadratic in return-1 fits better than the ARCH 1 model.

Therefore, an ARCH 2 model does not seem warranted. The adjusted R-squared is about. Therefore, there is very little evidence of first-order serial correlation. The variance of the error appears to be larger when the change in unemployment is larger. To account for the increase in average education levels, we obtain an additional effect: —. So the drop in average fertility if the average education level increased by 1.

For example, in Example In Example Each person in the panel data set is exactly two years older on January 31, than on January 31, As we know, when we have an intercept in the model we cannot include an explanatory variable that is constant across i; this violates Assumption MLR.

Intuitively, since age changes by the same amount for everyone, we cannot distinguish the effect of age from the aggregate time effect. The increase from. But the very large. Prior to the policy change, the high earning group spent about By dropping highearn from the regression, we attribute to the policy change the difference between the two groups that would be observed without any intervention.

So we just use the usual F statistic for joint significance of the year dummies. The R-squared is about. This suggests that, at a minimum, we should compute heteroskedasticity-robust standard errors, t statistics, and F statistics. We could also use weighted least squares although the form of heteroskedasticity used here may not be sufficient; it does not depend on educ, age, and so on. Students studying business and finance tend to find the term structure of interest rates example more relevant, although the issue there is testing the implication of a simple theory, as opposed to inferring causality.

I have found that spending time talking about these examples, in place of a formal review of probability and statistics, is more successful in teaching the students how econometrics can be used. And, it is more enjoyable for the students and me. The return to education, perhaps focusing on the return to getting a college degree, is a good example of how counterfactual reasoning is easily incorporated into the discussion of causality.

That is, each student is assigned a different class size without regard to any student characteristics such as ability and family background. For reasons we will see in Chapter 2, we would like substantial variation in class sizes subject, of course, to ethical considerations and resource constraints. We might find a negative correlation because a larger class size actually hurts performance.

However, with observational data, there are other reasons we might find a negative relationship. For example, children from more affluent families might be more likely to attend schools with smaller class sizes, and affluent children generally might score better on standardized tests. Another possibility is that, within a school, a principal might assign the better students to smaller classes.

Some way of controlling for the confounding factors is needed, and this is the subject of multiple regression analysis. Some observed characteristics are years of schooling, years in the workforce, and experience in a particular job. Firms might even discriminate based on age, gender, or race. Moreover, different kinds of workers might be attracted to firms that offer more job training on average, and this might not be evident to employers.

So, two firms with exactly the same kinds of employees would generally have different outputs if they use different amounts of capital or technology. The quality of managers would also have an effect. The many factors listed in parts ii and iii can contribute to finding a positive correlation between output and training even if job training does not improve worker productivity.

We can then use statistical methods to measure the association between studying and working, including regression analysis, which we cover starting in Chapter 2. They are both choice variables of the student.

The factors such as consumption, investment, net exports, and so on, would be required for a controlled experiments. There are two people reporting zero years of education and 19 people reporting 18 years of education. Therefore, the average hourly wage in dollars is roughly 4. Reporting just the average masks the fact that almost 85 percent of the women did not smoke.

Of course, this is much higher than the average over the entire sample because we are excluding 1, zeros. So, at least in , the reading test was harder to pass. Not surprisingly, schools that have high pass rates on one test have a strong tendency to have high pass rates on the other test. For men not receiving job training, the average of re78 is about 4. This, too, is a big difference. Our conclusions about economic significance would be stronger if we could also establish statistical significance which is done in Computer Exercise C9.



0コメント

  • 1000 / 1000