Linear Regression Research Paper on Determinants of Import Across Countries

Determinants of the Level of Imports Across Countries Presented to: Prof. Angela D. Nalica School of Statistics Faculty University of the Philippines, Diliman In Partial Fulfillment of the Requirements of Statistics 136: Regression Analysis Presented by: Mary Ann A. Boter Michael Daniel C. Lucagbo Krystalyn Candy C. Mago April 9, 2009 Abstract The level of a country’s imports measures its participation and competitiveness in the international market. As such, it is important to identify economic indicators that affect the level of imports.

Economic theory rarely presents imports as a response variable. It appears very frequently, however, as a predictor in formulas that model a multitude of variables, supporting the fact that it is highly related with other economic variables. The paper aims to construct a model with imports as the response variable and choose those independent variables that affect it significantly. The results of the study confirm the significance of GDP and labor force size in predicting imports. These were the two main variables that had been potentially significant even as theory describes them to be.

Another economic measure, budget revenue, turned out to be a good predictor for imports. The final model expresses imports as a function of GDP, labor force size and budget revenue. . Introduction All countries are linked to the rest of the world through trade. Presence of trade means that some of a country’s domestically-produced goods are exported to other countries. It also means that countries purchase needed goods not available locally. The amount of goods a country purchases from other countries is, by definition, the level of its imports.

Imports, being indicative of a country’s participation and competitiveness in international trade, is an important economic variable. However, imports is not commonly expressed as an endogenous variable (that is, a dependent variable in an economic model). It appears in formulas as a value that determines the value of other variables. In this paper the researchers wish to make imports the dependent variable and pick variables that affect it much and build a model based on these variables.

It is well-accepted in economic literature that the level of a country’s imports is affected by its income. Richer countries tend to import more as they have the capacity to do so. In purchasing those goods that are not produced locally (foreign shoes, foreign foods, computers etc. ), they spend on greater volumes. America, still the richest country in terms of GDP, has exceedingly high import levels. On the other hand, countries that have low incomes are a bit low on imports. Their financial resources limit the purchasing of goods and services to within geographic bounds.

This study aims to determine the extent to which income is linearly related to imports, where, this time, imports is the variable to be predicted. One common and well-understood measure of a country’s income is the Gross Domestic Product (GDP), which measures the amount of goods and services produced in a country for a given year. The researchers intend on verifying the relationship between GDP and imports by building a regression model relating these two. At the outset, a linear relationship between the two is hypothesized.

Another variable that is closely related to the level of imports is the size of the labor force. A larger labor force means greater national employment and, hence, greater capacity of the country to buy foreign goods and services and even foreign labor. China and the USA are obvious examples. Both have a large labor force and both are giant importers in the world market. Another objective of this study is to relate the level of employment with the level of imports. This relationship is not well-discussed in open-economy macroeconomics, hence the importance of a model that links the two.

Other variables such as public debt, foreign exchange, investment income and the Gini coefficient are considered for inclusion in the construction of the model. Although there is a lack of economic models that concretely relate these to imports, many of these economic indicators are, theoretically, related to it. The relationships are explained in great detail in macroeconomic texts. Review of Related Literature The relationship between level of imports and income is well known in economic literature and is often discussed as a theoretical positive relationship in economics classes.

Dornbusch et. al (2008), in his popular book Macroeconomics, presents imports, which he denotes by Q, as a function of Y (income) and R (the real exchange rate), thus affirming the relationship. Q = f(Y, R) A rise in home income raises import spending. The real exchange rate, which is the purchasing power of local currency, also affects imports. However, no quantitative measure of R is available. Since R varies from one good to another and one currency to another, it is hard to measure the relative values of R across countries. Thus the real exchange rate R will ot be considered here. Moreover, Dornbusch presents budget surplus (one of the independent variables to be included in the model) as being a function of taxes, government expenditures, and transfers. BS = f(TA, G, TR,) Since part of the government’s expenditures will be spent on foreign products, the study presumes a linear relationship between budget surplus and level of imports. However, computationally, budget surplus is obtained by taking the difference between budget revenues and budget expenditures. Thus, BS could be negative.

In order to make possible transformations that require positive values, the researchers decided to choose a variable highly correlated with BS, budget revenues, which is always positive. According to Okun’s Law, the level of a country’s production and employment rate are linked. A special graph, called the Philipps curve, fits a curve showing a functional relationship between the two. Lindbeck and Snower (2002), The Insider Outsider Theory, and Gilles Saint-Paul, in an issue of Economic Policy, have noted that a great deal of modern work has been devoted to modelling using Philipps curves.

This is one reason why the authors have decided to choose employment as a predictor for imports. Since a country’s import level depends on the level of its production, imports might also depend on employment. In order to escape from the possibility of multicollinearity, however, the researchers have decided to look at the size of the labor force rather than the rate of unemployment. Alwyn Young in Quarterly Journal of Economics justifies the use of labor force rather than unemployment rate in presenting the growth in the Asian Tigers.

Robert Mundell and Marcus Fleming (1967), in International Economics, proposed what is now called the Mundell-Fleming model which relates a variety of variables with imports. In particular, it examines an economy with flexible exchange rates and perfect capital mobility (the freedom with which goods and assets flow inside and outside a country). Many variables such as money supply, interest rates, foreign exchange and investment demand are associated with imports in the Mundell-Fleming model.

For example, part of investment demand falls under imports since many people in the business sector purchase foreign products in setting up their businesses. However, due to inadequate data, the researchers have included only those variables that appear more directly related with imports and for which data are available. A simpler model than the Mundell-Fleming is the Simple Keynesian Model, proposed by John Maynard Keynes, which is known for its popular “Keynesian Cross”, that is formed by the aggregate supply and aggregate demand curves. A simple introduction is given in econmodel. com.

One implication of the Keynesian model which the researchers incorporate in this study is the assertion that a decrease in aggregate demand (which clearly includes imports) affects the level of employment. Moreover, the IS-LM model (discussed by Mankiw in Macroeconomics), , which is much-mentioned in macroeconomic literature, links, however indirectly, imports with interest rates, investment rates, type of economy, and other variables. Unfortunately, the IS-LM model’s conclusions are dependent on the time before prices could adjust (in economic parlance, the short run and the long run period).

Since the short and long run periods cannot be measured objectively (and definitely there is no measure that applies to all countries) the researchers chose not to add a dummy variable based on the two periods. Another difficulty with using the IS-LM model was the need of supplementary information as to whether a country was governed by fiscal policy or monetary policy. The researchers had no way of identifying the type of policy adopted by each country’s government. This was necessary because the type of policy was assumed to be very influential to the other variables in the IS-LM model.

Nonetheless, what the researchers did was to take note of the relationships inherent in the IS-LM model and check for any contradiction with the final model (there was none). Definition of Terms[1] 1. Imports (import) – total amount of goods a country purchases from other countries on a cost, insurance, and freight, or free on board basis in US dollar. 2. Budget Expenditure (budgetexpe) – government’s expenditures on goods and services plus transfers 3. Budget Revenues (budgetrev) – summation of all the government’s sources of revenue (including taxes, foreign aid, etc. ) 4.

Current account balance (accntbal) – defined as the country’s net trade in goods and services, plus net earnings from rents, interest, profits, and dividends, and net transfer payments (such as pension funds and worker remittances) to and from the rest of the world during 2008. The current account balance used in this study is calculated on an exchange rate basis. 5. Debt – external (debtext) – the total amount of public foreign financial obligations. 6. GDP per capita (gdpcapita) – GDP (Gross Domestic Product) per capita is computed on a purchasing power parity basis divided by the population as of the 1st day of July 2008. 7.

Gini index (income) – this index measures the degree of inequality in the distribution of family income in a country. The index is calculated from the Lorenz curve, in which cumulative family income is plotted against the number of families arranged from the poorest to the richest. 8. Inflation rate (consumer prices) (inflation) – the annual percent change in consumer prices compared with the previous year’s consumer prices. 9. Investment (gross fixed) (investment) – the total business spending on fixed assets, such as factories, machinery, equipment, dwellings, and inventories of raw materials, which provide the basis for future production. 0. Labor force (labor) – consists of people who are working and people who are actively looking for work. 11. Public debt (debptpublic) – the cumulative total of all government borrowings less repayments that are denominated in a country’s home currency. Public debt should not be confused with external debt, which reflects the foreign currency liabilities of both the private and public sector and must be financed out of foreign exchange earnings. 12. Unemployment rate (unemploy) – percentage of the labor force that is without jobs. Methodological Sketch

The data used in this research is a compilation of world economic indicators for the year 2008, taken from the 2008 edition of the World Fact Book published by the Central Intelligence Agency (CIA) of the United States, with almanac-style information about the countries of the world. It was originally an annual book, but the 2008 edition was the last to be printed on paper by the Government. The Fact book is available in the form of a website, which is partially updated every two weeks. It provides a two- to three-page summary of the demographics, geography, communications, government, economy, and military of 266 U.

S. -recognized countries, dependencies, and other areas in the world. Moreover, it is frequently used as a resource for academic research papers. First the data set was put under data cleaning to check for any possible measurement or encoding error, missing values, etc. Apparent measurement errors were detected. Among these was a recorded value of 3460 for inflation rate (which, when expressed as a percentage, should be between 0 and 100). Countries with missing values for many of the variables were removed from the data set.

From 259 countries included in the compilation, only 93 countries were used in the study. In predicting imports, eleven independent variables, namely: budget expenditure, budget revenue, current account balance, debt external, Gini index, GDP per capita, inflation rate, investment, labor force, debt public, and unemployment rate, were chosen so that all are (directly or indirectly) related to the dependent variable, imports. The dependent variable was regressed on the 11 economic indicators. To check if at least one of the regressors was significantly linearly related to imports, the overall F-test was used.

The eleven variables were then evaluated based on the p-values corresponding to the individual t-tests. This was the criterion in deciding whether the variables were significant or not. The variable with the highest p-value was first to be deleted. A new regression was run on the remaining variables. P-values produced by the F and t tests were noted. Variables with high p-values were put under consideration to be taken out of the model. The deleted variables were those that showed no clear signs of being good determinants of imports. A new estimated regression line was run.

The researchers repeated the process, monitoring the F and t tests at each repetition, until all the remaining variables were found significant. The obtained model was checked for normality and homoskedasticity of error terms, multicollinearity of the independent variables, serial correlation and influential outliers. Normality of the error terms was checked using the tests for normality (Wilk-Shapiro, Kolmogorov-Smirnov, Cramer von Mises, and Anderson-Darling). The residuals had to pass all four tests before normality could be attributed to the error terms.

Homoskedasticity of error terns was checked using F test for the equality of two variances and the residual plot, where the predicted value of imports is plotted against the residuals and the individual residual plots for the independent variables. There is no indication of heteroskedasticity when the points do not fall in a funnel-shaped figure. Specifically, when the points fall around a horizontal band, they indicate constancy of error variances. The observations were transformed using several combinations of logarithmic transformations until normality was achieved.

Multicollinearity of independent variables was checked using the Variance Inflation Factor (VIF), Condition Index and Proportion of Variation. Many diagnostic measures are available for multicollinearity. If the VIF exceeds 10, there is an indication of its presence. If the condition index exceeds 30, the variables are highly correlated with each other. Proportion of variation is checked for those condition indices that exceed 30. A value that exceeds 0. 5 indicates multicollinearity. If these tests are not satisfied, Ridge Regression is one remedial option.

Addressing the problem of multicollinearity sometimes leads to non-normality and heteroskedasticity. Serial correlation was checked using the Durbin Watson Test. A value of the Durbin Watson D statistic that is close to 2 indicates no serial correlation problem. Detection of outliers along the y-axis uses the Studentized Residual. If the Studentized residual exceeds the tabulated value of t-table, then the observation corresponding to that studentized residual is an outlier. Detection of outliers along the x-axis is checked by the diagonals of the Hat matrix or the leverage values.

If a leverage value exceeds the usual cut-off (2p/n), then the observation corresponding to that leverage is an outlier. Outliers detected may be influential to the data set. Checking for the influence of an outlier is done through Cook’s D, DFFITS and DFBETAS. If the Cook’s D value exceeds the tabulated value in the F table with the appropriate degrees of freedom, the outlier is declared influential. If the outlier exceeds the cut-off for DFFITS ([pic]) and DFBETAS ([pic]), then the outlier is potentially influential.

To address the problem of influential outliers, one option is to run a regression excluding the possibly influential outlier and check the changes among the parameter estimates and coefficient of multiple determination. If the obtained model is free from the problems of normality, heteroskedasticity, multicollinearity, serial correlation and influential outliers, the model is plausible in predicting imports. Results and Discussions The starting model for the study is: Regressing imports on all of the eleven variables obtains an overall F statistic whose p-value does not exceed 0. 1. This means that there is at least one of the eleven regressors that can explain imports. The individual t-tests, which test the significance of each coefficient, reveal that not all variables are significant. See Table 1. Those variables with high p-values were deleted one at a time. The authors have performed this deletion judiciously, making sure that the deleted variables are those for which literature does not explicate a clear relationship with imports. (See Appendix 1 for the complete output. ) Table 1. Individual t-tests for the Parameters in the Unrestricted Model T Statistics | |Variable |t Value |Pr ; |t| | |Intercept |-0. 63 |0. 53 | |Budgetexpe |-0. 22 |0. 8247 | |Budgetrev |1. 18 |0. 2431 | |Accntbal |-1. 75 |0. 0835 | |Debtext |0. 79 |0. 4334 | |Income |1. 44 |0. 1544 | |Gdpcapita |2. 41 |0. 0181 | |Inflation |-0. 52 |0. 6031 | |Investment |0. 1 |0. 9196 | |Labor |5. 9 | |t| | |Intercept |-0. 85 |0. 3974 | |Budgetrev |9. 91 | A-Sq |0. 2244 | The data set passes all four tests for normality. Shapiro-Wilk’s W statistic is close to 1 and the p-value associated with it exceeds both 0. 05 and 0. 10. The p-value for Kolmogorov-Smirnov’s D statistic (the maximum deviation between the theoretical and observed CDFs) also exceeds the natural levels of ?. Multicollinearity was the next stumbling block. The multicollinearity diagnostic values outputted by SAS were beyond rule-of-thumb cutoffs. See Appendix 8 for the complete output. ). In particular, as can be seen in Table 7, Table 7. Collinearity Check for Model 1 |Variable |t Value |Pr ; |t| |Variance Inflation | |Intercept |1. 99 |0. 0498 |0 | |lnbudgetrev |2. 28 |0. 025 |13. 59105 | |lngdpcap |6. 65 |0. 2500 Anderson-Darling A-Sq 0. 489191 Pr ; A-Sq 0. 2244 Quantiles (Definition 5) Quantile Estimate 100% Max 1. 6645761 99% 1. 36645761 95% 0. 66211784 90% 0. 42917071 75% Q3 0. 28700116 50% Median 0. 00667705 Appendix 10: Normality checking for the regression model 1. Extreme Observations ——Lowest—— ——Highest—– Value Obs Value Obs -1. 161591 6 0. 662118 59 -1. 75466 9 0. 771251 45 -0. 850099 11 0. 796414 8 -0. 738774 3 1. 267573 75 -0. 676541 16 1. 366458 34 The UNIVARIATE Procedure Variable: resid1 (Residual) Stem Leaf # Boxplot 13 7 1 0 12 7 1 0 11 10 9 0 1 | 7 7 1 | 6 346 3 | 5 5 1 | 4 0022335 7 | 3 00011237 8 | 2 2235589 7 +—–+ 1 01347778 8 | | 0 0115778889 10 *–+–* -0 8876533300 10 | | -1 88652221 8 | | -2 9887610 7 +—–+ -3 621 3 | -4 63 2 | -5 94311 5 | -6 873322 6 | -7 4 1 | -8 5 1 | -9 | -10 8 1 | -11 6 1 0 —-+—-+—-+—-+ Multiply Stem. Leaf by 10**-1 Appendix 11: Normality checking for the regression model 1. The UNIVARIATE Procedure Variable: resid1 (Residual) Normal Probability Plot 1. 35+ * | * | + | ++ | ++ 0. 5+ ++ | ++** | +** | ++* | +*** 0. 35+ +*** | **** | ** | *** | *** -0. 15+ **** | ***+ | **+ | +** | +*** -0. 5+ **** | *++ | +* | ++ | ++ * -1. 15+* +—-+—-+—-+—-+—-+—-+—-+—-+—-+—-+ -2 -1 0 +1 +2 Appendix 12: Ridge regression output at with stabilized estimates at. 8 to 1 l n I b _ _ n l u l _ D _ P t n l d n M _ E R C _ e g n g i

O T P I O R r d l e m D Y V D M M c p a t p O E P A G I S e c b r o b L E R E T E p a o e r s _ _ _ _ _ _ t p r v t 1 MODEL1 PARMS lnimport . . 0. 45003 1. 4771 0. 90489 0. 60326 0. 20711 -1 2 MODEL1 RIDGE lnimport 0. 1 . 0. 46866 3. 3759 0. 60255 0. 40998 0. 37053 -1 3 MODEL1 RIDGE lnimport 0. 2 . 0. 49149 4. 5231 0. 53799 0. 36647 0. 37622 -1 4 MODEL1 RIDGE lnimport 0. 3 . 0. 51963 5. 5062 0. 49969 0. 33989 0. 36758 -1 5 MODEL1 RIDGE lnimport 0. 4 . 0. 5108 6. 3836 0. 47073 0. 31956 0. 35566 -1 6 MODEL1 RIDGE lnimport 0. 5 . 0. 58407 7. 1778 0. 44671 0. 30262 0. 34307 -1 7 MODEL1 RIDGE lnimport 0. 6 . 0. 61736 7. 9024 0. 42589 0. 28793 0. 33067 -1 8 MODEL1 RIDGE lnimport 0. 7 . 0. 65021 8. 5670 0. 40740 0. 27490 0. 31876 -1 9 MODEL1 RIDGE lnimport 0. 8 . 0. 68214 9. 1793 0. 39073 0. 26319 0. 30748 -1 10 MODEL1 RIDGE lnimport 0. 9 . 0. 71292 9. 7453 0. 37555 0. 25255 0. 29683 -1 11 MODEL1 RIDGE lnimport 1. 0 . 0. 74241 10. 2704 0. 36162 0. 24281 0. 28680 -1 n I b _ _ n l u l _ D _ P t n l d n M _ E R C _ e g n g i O T P I O R r d l e m D Y V D M M c p a t p O E P A G I S e c b r o b L E R E T E p a e r s _ _ _ _ _ _ t p r v t 1 MODEL1 PARMS lnimport . . 0. 45003 1. 47708 0. 90489 0. 60326 0. 20711 -1 2 MODEL1 RIDGE lnimport 0. 80 . 0. 68214 9. 17925 0. 39073 0. 26319 0. 30748 -1 3 MODEL1 RIDGE lnimport 0. 81 . 0. 68528 9. 23784 0. 38915 0. 26208 0. 30638 -1 4 MODEL1 RIDGE lnimport 0. 82 . 0. 68840 9. 29598 0. 38758 0. 26098 0. 30530 -1 5 MODEL1 RIDGE lnimport 0. 83 . 0. 69150 9. 35367 0. 38603 0. 25989 0. 30421 -1 6 MODEL1 RIDGE lnimport 0. 84 . 0. 69460 9. 41092 0. 38449 0. 25881 0. 30314 -1 7 MODEL1 RIDGE lnimport 0. 5 . 0. 69769 9. 46772 0. 38297 0. 25774 0. 30207 -1 8 MODEL1 RIDGE lnimport 0. 86 . 0. 70076 9. 52409 0. 38146 0. 25669 0. 30101 -1 9 MODEL1 RIDGE lnimport 0. 87 . 0. 70382 9. 58004 0. 37996 0. 25564 0. 29996 -1 10 MODEL1 RIDGE lnimport 0. 88 . 0. 70686 9. 63556 0. 37848 0. 25460 0. 29891 -1 11 MODEL1 RIDGE lnimport 0. 89 . 0. 70990 9. 69066 0. 37701 0. 25357 0. 29786 -1 12 MODEL1 RIDGE lnimport 0. 90 . 0. 71292 9. 74534 0. 37555 0. 25255 0. 29683 -1 Appendix 13: Ridge regression output with stabilized estimates at . 8256 to . 8258 n I b _ _ n l u l _ D _ P t n l d n M _ E R C _ e g n g i O T P I O R r d l e m D Y V D M M c p a t p O E P A G I S e c b r o b L E R E T E p a o e r s _ _ _ _ _ _ t p r v t 1 MODEL1 PARMS lnimport . . 0. 45003 1. 47708 0. 90489 0. 60326 0. 20711 -1 2 MODEL1 RIDGE lnimport 0. 821 . 0. 68871 9. 30177 0. 38743 0. 26087 0. 30519 -1 3 MODEL1 RIDGE lnimport 0. 822 . 0. 68902 9. 30756 0. 38727 0. 26076 0. 30508 -1 4 MODEL1 RIDGE lnimport 0. 823 . 0. 68933 9. 1334 0. 38712 0. 26065 0. 30497 -1 5 MODEL1 RIDGE lnimport 0. 824 . 0. 68964 9. 31911 0. 38696 0. 26054 0. 30486 -1 6 MODEL1 RIDGE lnimport 0. 825 . 0. 68995 9. 32488 0. 38681 0. 26043 0. 30475 -1 7 MODEL1 RIDGE lnimport 0. 826 . 0. 69026 9. 33065 0. 38665 0. 26033 0. 30465 -1 8 MODEL1 RIDGE lnimport 0. 827 . 0. 69057 9. 33641 0. 38650 0. 26022 0. 30454 -1 9 MODEL1 RIDGE lnimport 0. 828 . 0. 69088 9. 34217 0. 38634 0. 26011 0. 30443 -1 10 MODEL1 RIDGE lnimport 0. 829 . 0. 69119 9. 34792 0. 38619 0. 26000 0. 30432 -1 n I b _ _ n l u l _ D _ P t n l d n M _ E R C _ e g n g i O T P I O R r d l e m D Y V D M M c p a t p O E P A G I S e c b r o b L E R E T E p a o e r s _ _ _ _ _ _ t p r v t 1 MODEL1 PARMS lnimport . . 0. 45003 1. 47708 0. 90489 0. 60326 0. 20711 -1 2 MODEL1 RIDGE lnimport 0. 8251 . 0. 68998 9. 32546 0. 38679 0. 26042 0. 30474 -1 3 MODEL1 RIDGE lnimport 0. 8252 . 0. 69001 9. 32604 0. 38678 0. 26041 0. 30473 -1 4 MODEL1 RIDGE lnimport 0. 8253 . 0. 69005 9. 32661 0. 38676 0. 26040 0. 30472 -1 5 MODEL1 RIDGE lnimport 0. 8254 . 0. 69008 9. 32719 0. 38674 0. 26039 0. 30471 -1 6 MODEL1 RIDGE lnimport 0. 255 . 0. 69011 9. 32777 0. 38673 0. 26038 0. 30470 -1 7 MODEL1 RIDGE lnimport 0. 8256 . 0. 69014 9. 32834 0. 38671 0. 26037 0. 30469 -1 8 MODEL1 RIDGE lnimport 0. 8257 . 0. 69017 9. 32892 0. 38670 0. 26036 0. 30468 -1 9 MODEL1 RIDGE lnimport 0. 8258 . 0. 69020 9. 32950 0. 38668 0. 26035 0. 30467 -1 10 MODEL1 RIDGE lnimport 0. 8259 . 0. 69023 9. 33007 0. 38667 0. 26034 0. 30466 -1 Appendix 14: Residual Plots for model 1 Appendix 15: Residual plots Appendix 16: F Tests for the homoskedasticity of variances of the error terms output Homoskedasticity Check of ln(gdpcap) using F test F-Test Two-Sample for Variances | |  |Variable 1 |Variable 2 | |Mean |-0. 305948719 |0. 299291808 | |Variance |0. 304196179 |0. 438880865 | |Observations |46 |47 | |Df |45 |46 | |F |0. 693117891 |  | |P(F HYPERLINK “https://www. cia. gov/library/publications/the-world-factbook/definitions” [pic]https://www. cia. gov/library/publications/the-world-factbook/definitions. ———————– [pic] [pic] [pic] [pic] [pic]

Leave a Reply

Your email address will not be published. Required fields are marked *