In July this year, at a resort near Bangalore (yes, we at Takshashila do sometimes play resort politics) I got the fifth batch of the GCPP to work on the problem of building an index which measures the development of various Indian states in the last 10 years. I used this case as a reference while doing my module on Analytical Methods in Public Policy. This was as part of one of the weekend workshops which are part of the GCPP. As part of this exercise I taught them how to pick variables, how to measure them, procure data, look for interactions between variables and then combine them to form an index.
It is interesting that a couple of months after that session, the report of the Raghuram Rajan Committee on Composite Development Index of States has been published. I will use this blog post to give my comments on that report as I go through it. Since I’m going to be effectively “live-blogging” my reading of the report, the rest of this post is in bullet points.
Also, in keeping up with my title of “resident quant” I will try as much as possible to restrict my comment to the data and methodology, and not comment on economic issues. However, it is likely that I might go on economic rants here or there.
- The first paragraph of the executive summary states that the reason we adopted a command and control model after independence was so that we don’t increase the inequalities across regions and states. This is the first time I’m hearing this story
- The index is based entirely on publicly available data. I think this is a good thing.
- Each state gets 0.3% of the total available pool, irrespective of its size. Of the remaining 91.6% (28 states => 8.4% fixed payment), 3/4th will be distributed based on “need” and 1/4 on “performance”. Nearly seventy years since independence, I’m of the opinion that this ratio should be less skewed towards “need”
- Arbitrary cutoffs have been drawn at scores of 0.6 and 0.4 to classify states as “least developed” and “less developed”. While these are round numbers, I’m not yet sure they make sense.
- The report alludes to the “resource curse”, which is a good thing.
- Quote: “The Normal Central Assistance (NCA) grant, which is distributed to states as per Gadgil-Mukherjee formula based on categorization of “Special Category” and “General Category” states, constituted only about 3.8 per cent of total resources transferred to States and 8.2 per cent of plan transfers.” (emphasis mine)
- The underdevelopment index has ten components. I won’t comment on the wisdom of the number of quality of the components chosen.
- It is a good thing that Mean Per Capita Consumption Expenditure is used as a measure of richness rather than per capita Net State Domestic Product. As the report argues, the latter can include economic activity that doesn’t really reach the people, and is hence not as good a measure as consumption expenditure.
- Table 1 (on page 17 of the report) gives the correlations between the metrics chosen. I think it is a fantastic thing that they have chosen to present the correlations in the first place (something ripe to be pushed under the carpet). As expected, a number of chosen variables are highly correlated.
- Correlation between Consumption Expenditure and Urbanisation is 75%!! Similarly, correlation between expenditure and female literacy is 58%.
- Then comes the damp squib – the excitement induced by presenting the correlation table is doused by the statement that each of the ten parameters are going to be accorded equal weight. This is disappointing on several counts: firstly, the sheer arbitrariness (remember that ‘equal allocation’ is as arbitrary as any other distribution). Next, that the correlations are thrown out of the window and certain factors are likely to get more weight. Then, the fact that this makes it easily manipulable by adding or deleting factors of choice. I’m so disappointed by this one decision that I’m putting this entire point in boldface. Apologies.
- The report acknowledges that broadly categorizing states into “developed” and “under developed” creates issues of moral hazard. However, rather than fully doing away with the division, the committee (again, disappointingly) takes a “middle path” by splitting two categories into three. I suspect some mathematical brain is involved here, in that the next committee will increase number of categories to four, and the one after to five, until a time when each state (finally!!) becomes its own category
- To convert per capita allocation to state-wide allocation, the formula uses a combination of population and area. I agree that it is tougher to provide infrastructure to thinly populated areas, so this combination is fair. It reminds me of my days in airline cargo pricing when we would similarly adjust between the weight and volume of a piece of cargo.
- Performance index is computed based on changes in the development index over time. This is a good thing. Shows the committee is “eating its own dog-food”
- This is the first time “performance” is being used as a criterion for fund distribution. So the 25% weight is a good start. I retract my earlier abuse of this ratio.
- The committee recommends that this analysis be carried out every five years, since a good amount of data used in calculating indices are published at that frequency. Also considering that’s the frequency of finance commissions, it is a good thing.
- The report tries to bolster its credibility by showing that the index is highly correlated with the UN Human Development Index. I like it that a scatter plot and regression line have been presented
- The allocation based on performance is again skewed in favour of less developed states. So you are likely to get more if 1. You are underdeveloped and 2. You have shown an improvement. I think this is fair.
- One good thing is that the formula is plug and play. It is “timeless” in a sense. At any given future point in time, you can simply look up the data points that are required and just construct the index. There is no human intelligence required for that effort
- There is heavy reliance on NSSO data, and I’m not sure that’s a good thing since it is “survey data”. I think it might have been better to have used data from census.
- The committee actually examined the option of weighting factors based on squared factor loadings from a Principal Component Analysis (*applause*) and found that the index thus constituted was 99% correlated with the one using simple arithmetic averages, and thus decided to go with the simpler formula. I’ll still continue to keep the earlier point in bold, though
- Each “sub-component” was normalized between 0 and 1 using a simple linear formula (higher number indicating greater under-development). I like it that they used this rather than a rank ordering metric.
- The report includes a sensitivity analysis to show that the ranking and index values are robust. Again, applause
- A dissent note from Committee member Shaibal Gupta indicates that there are problems in using a simple weighted average rather than data from the PCA
Finally, despite all the talk of transparency and ease of calculation, the report itself does not contain either the index number or the component values for various states. I hope the data has been released (and if it has, please help me by giving me the link). If not, we should campaign for the data to be given out to the public in a CSV (or equivalent) format through the government data portal http://data.gov.in