You are here

Basic Econometrics of Tax Revenues and Inequality

The following is the first of several short articles on the relationship between public finances, economic growth, and economic welfare. In this case two variables are considered, the Gini Coefficient of Income and the Total Tax Revenue as a percentage of GDP for OECD countries. In both cases the data is from c2018, the "circa" indicating that in some cases the data is a year or two older. In both caes the data is directly from the OECD itself; (, Values have been rounded to three decimal places, as per OECD data. If a country is missing either Gini or Tax Revenue data it has been excluded; which mean a total of 36 (N=36) countries are used in this short example.

For both the Gini Income Coffeciient and the Total Tax Revenue Percentage the Sum, Min and Max (Range) values, the Median and Mean, Variance, and Simple Standard Deviation have been derived. To compare the two datasets a covariance and correlation is derived. The results are as follows:

Gini Coefficient
Sum = 11.374 Min = 0.22 Max = 0.46 Median = 0.310 Mean = 0.316 Variance = 0.003 Simple Standard Deviation = 0.056

Total Tax revenue
Sum = 1233.192 Min = 16.132 Max = 46.095 Median = 34.715 Mean = 34.255 Variance = 51.014 Simple Standard Deviation = 7.142

Covariance = -0.271 Correlation = -0.701

Conclusion: The negative correlation shows that as the ratio of tax to GDP increases income inequality decreases.

The calculations were conducted using GNU GNU Datamash (v1.7) and simple Linux utilities. It can be replicated as follows on the files "tax" (the tax revenue file), and "gini (the Gini income file):

$ datamash sum 1 min 1 max 1 median 1 mean 1 svar 1 sstdev 1 < tax
$ datamash sum 1 min 1 max 1 median 1 mean 1 svar 1 sstdev 1 < gini
$ paste gini tax > combined
$ datamash pcov 1:2 < combined
$ datamash ppearson 1:2 < combined

There are several caveats here which must be considered on this initial survey.

1) The correlation is only tested for certain OECD countries. This should be obvious, but needs to be stated. In defense, the OECD countries that were excluded (RUS, CRI, ZAF, BGR, ROU) show high Gini Coefficients and are not known for high tax rates, so it would actually make the conclusion stronger.

2) The correlation is only tested for the given datasets of Tax Revenue. Again, this should be obvious, but also needs to be stated. The value of Total Tax Revenue as a Percentage of GDP does not consider GDP per capita, although as OECD countries one can imagine they are all on the upper-end of the scale on a global scale. It is also evaluated against tax revenue, and does not differentiate against productive taxes (e.g., natural resources rents, such as employed by Norway, or taxes on labour and capital returns or consumption). Nor does the observation review expenditures or the type of expenditure (e.g., corporate subsidies vs welfare, "guns" vs "butter" etc.).

3) The correlation is only tested for the given dataset of Gini Income Coefficient. Again.. you know what I'm going to say. Income is a typical measure of inequality, but the Gini Wealth Coefficient typically shows a more prominent disparity. Something about land ownership, I believe. Also, the Gini Income Coefficient doesn't account for the net effects of welfare transfers which flattens the income disparity, albeit in a state-managed fashion.

4) The correlation is not against a longitudinal dataset. The conclusions would be stronger if the effects were oberved over a greater number of years, although there are issues of data equivalence throughout.

5) The correlation is not a regression analysis. This is simply a correlation against two variables. To strengthen the probability that the variables have a causal relationship consideration of other variables needs to be taken.

Commenting on this Blog entry will be automatically closed on January 18, 2021.