What is Takshashila blogging about?

As you might be aware, we run a large number of blogs on the Takshashila platform. I wouldn’t blame you if you might be confused about which blog talks about what. In order to ease your decision-making, we will look at “wordclouds” of each of our bloggers here. As you might be aware, a wordcloud is basically a pictorial representation of the frequency of various words. The more frequently a word appears, the bigger its representation in the wordcloud.

So without much ado, let us go ahead and look at the wordclouds of each of the Takshashila bloggers:

1. Nitin Pai

acorn

 

2. Pavan Srinath

catalyst3. V Anantha Nageswaran

tgs

 

4. Rohit Pradhan

retributions

 

5. Rohan Joshi

filtercoffee

 

6. Krupakar Manukonda

pratyaya

 

7. Priya Ravichandransumpolites

8. Sarah Farooqui

terranullius

 

9. Bibhu Routray

conflictology

 

10. The Broad Mind (our community blog)

broadmind

 

11. Logos (the Takshashila Student Blog)

logos

 

12. Karthik Shashidhar

rq

 

 

 

Measuring Income

Earlier today I was reading an interview in the Business Standard with Shaibal Gupta, Secretary of the Asian Development Research Institute and member of the Raghuram Rajan committee on composite development index of states.  Gupta wrote a dissenting note to the report, with his main contention being the use of the Median Per Capita Expenditure (MPCE) as a measure of income to compare states rather than using the Per Capita Gross State Domestic Product (GSDP). I must state up front that I agree with the report here, and will use this post to defend my stance. Meanwhile, I must mention that one of the reasons he gave for using the GSDP (“Per capita income is taken as an indicator for this purpose by a number of institutions, including the Planning Commission and Finance Commissions.”, he said) almost made me fall off my chair.

Suppose you run a manufacturing company. Your production facility is located in Hosur, Tamil Nadu. However, for administrative convenience, and for the convenience of your top management, you have decided to headquarter your company in Bangalore, Karnataka (for the record, Hosur is just about 35 kms from Bangalore). Most of your workers live in Tamil Nadu, and draw their salaries there. Your top management gets compensated in Karnataka, and they live there. The question is how your company contributes to the economies of the two states.

From an accounting perspective, all your sales are attributed to Karnataka, for you are headquartered there. Of course, what your workers in Tamil Nadu spend out of their salaries will be accounted for in that State’s GDP but the overall sales of the company itself will be attributed to Karnataka, even though the company does next to no economic activity there. With the simple act of locating your company headquarters in Karnataka, you push up Karnataka’s GSDP while reducing Tamil Nadu’s. Some states (eg. Maharashtra and Delhi) are much more popular than others for the location of company headquarters, and they can lead to a fairly distorted figure of how much is produced in each state.

That is not all. The problem with Per Capita GSDP is that it is a mean figure, and is thus liable to be grossly affected by extreme values. Let us say we are comparing the income levels in two neighbourhoods. Neighbourhood A has 1000 people.999 of them earn Rs. 100 per month while the 1000th earns Rs. 1 crore per month. Neighbourhood B also has 1000 people but each of them earns Rs. 10000. Which neighbourhood is richer?

If you go by the mean income, the mean income of A is Rs. (999 * 1000 + 1 * 1,00,00,000)/1000 = Rs. 10999. The mean income of B is Rs. 10000. So you would say that A is richer than B. While on an average that might be true, you might notice that the number for A is skewed by the one rich guy. What this hides is the fact that 99.9% of A earn only a tenth of B’s mean income. Can we do better?

Instead of looking at what the resident of a neighbourhood makes on an average, what if we instead measure what the average person in the neighbourhood makes? In other words, what if we measure the median income in each neighbourhood? The advantage with the median is that it doesn’t get skewed by extreme values, as is likely in case of a variable such as income which usually follows a power law distribution. In our example above, the median income of A is Rs. 1000 while that of B is Rs. 10,000 which is probably a better reflection of the richness of the average resident of these two localities.

Similarly, the per capita GSDP, being a mean measure, is not a great measure for determining the richness or poorness of the people of a state. Suppose, for example, that neighbourhoods A and B are two states. Notice that A will have a much larger GSDP compared to B, and that this tells us nothing about the richness of the average resident of these two states.

Putting both above reasons together, you realize that the per capita GSDP is not  a great estimator of the richness of a particular state.

So what do we use? We discussed above that median income is a much better metric than the mean income. So can we use that for measuring richness instead? While it sounds good in theory, we have a practical and accounting problem – given that a large part of the country is essentially a cash economy, it is hard to keep track of people’s incomes. Moreover, there are enough reasons to both under-report and over-report one’s income if you were to ask someone as part of a survey. For this reason, the general consensus among development economists is that total consumption expenditure is a good estimate of income among the poor, whose net savings rate is negligible.

What about the non-poor, you may ask. Notice that we are only trying to capture the expenditure of the median resident of a state, and assuming that more than 50% of a state is within an income level at which income equals consumption expenditure is fair. So the median per capita expenditure will give a good picture.

So how do we estimate this? Unfortunately, we don’t have any accounting statistics that capture this, and we need to rely on surveys. The National Sample Survey Organization (NSSO) conducts surveys on people’s consumption expenditure every five years, and this is what the Rajan committee has used. Now, you may question the wisdom of relying on sample data (rather than “population data”) to determine the richness or poorness.

The answer to that is that the median is a rather robust statistic, and as long as samples have been chosen at random, it is unlikely that the median of a sample will be too far away from the median of the population (and this is independent of the distribution of the population). We will examine the issue of sampling median in a subsequent post.

In conclusion, I endorse the decision of the Rajan committee to use the median per capita consumption expenditure as a metric for determining the richness or poorness of a state.

India: Disinvestment Receipts

Common wisdom is that disinvestment in India was on a high back in the days when Atal Behari Vajpayee was Prime Minister, when there was a dedicated Ministry of Disinvestment under Arun Shourie. The UPA, upon coming to power in 2004, disbanded this ministry and common wisdom is that disinvestment stopped as a result of that.

Here, we take a look at disinvestment in India over the years. Here is the total disinvestment amount by year:

Source: Data.gov.in
Source: Data.gov.in

 

You can see that there was a spike in disinvestment in 2003-04, which was Vajpayee’s last year as Prime Minister. You can also see that disinvestment ground to a halt in the first term of the UPA government – possibly as a result of the presence of the Left Front as part of the government. However, you may not have realized that in its second avatar the UPA government has taken up disinvestment with a vengeance, with the receipts in the last four years far exceeding the receipts during Vajpayee’s tenure as Prime Minister.

However, the picture becomes clear if we look at the method of disinvestment. Most disinvestment receipts in the 1990s and in the last five years have come through a sale of minority stakes in PSUs. The disinvestment receipts in the Vajpayee years, however, came mostly through majority stake sales and strategic sales. In other words, there has been no big bang disinvestment in the last ten years – the money the government has made is through quiet sales of minority stakes in PSUs. So one can say that big bang disinvestment has ground to a halt after Vajpayee’s tenure.

Source: Data.gov.in
Source: Data.gov.in

 

Largest crop by state

We will continue to stick with the state-wise data on agriculture. In this edition, we will look at the largest crop by state, by year. We define this as the crop with the biggest acreage in the state.

No fancy visualizations here. Just data presented in a table. Two tables, actually, one for kharif and one for rabi. For each year these two tables show the biggest crop per state.

Offered without comment.

Major Crops in India, by State: Kharif Season
Major Crops in India, by State: Kharif Season

 

Major Crops in India: Rabi Season
Major Crops in India: Rabi Season

(click on images for larger size)

 

Growing wheat and rice in India

One of the most massive data sets on data.gov.in is district-wise data on the total area under cultivation and production of various crops for each season for each year from 1998 to 2010. In this post we will look at which states utilize the most amount of land growing each crop.

First, a note on the data. The data is district-wise and season-wise. The irritating thing is that the seasons are not mutually exclusive. The seasons in the data set are “Summer”, “Kharif”,  “Autumn”, “Winter”, “Rabi” and “Whole Year”. First of all, I don’t know what “Autumn” means in India – as far as I know India doesn’t have one such season. Granting some liberties, it is irritating that seasons overlap.

Here is how I’ve consolidated the data. For either crop, for each year, I took the total area under cultivation for each state for each season. Next, I looked at the maximum area under cultivation in a particular state at a particular point in time (any time in the 12 years of data I have). So the data I present in this post is the maximum area in a particular state that was under a particular crop at some point of time in the 1998-2010 time period.

So, who grows wheat in India? The graph here shows the states with the maximum area under wheat:

Source: data.gov.in
Source: data.gov.in

 

Notice that Uttar Pradesh and Madhya Pradesh have had much more area under wheat than Punjab or Haryana. Also notice that only eight states in India have ever had more than 5000 square kilometers of land growing wheat. To put this in perspective let us look at rice:

Source: data.gov.in
Source: data.gov.in

 

Here I have put the cutoff (for entry to the graph) at 10000 square kilometer, and yet fourteen states make the cut. In terms of area under cultivation at least, we can say that we are a predominantly rice growing country. Again, in rice, notice that Uttar Pradesh and Madhya Pradesh have more area under cultivation of rice than more “traditional” rice growing areas like Orissa or West Bengal.

Which state has the biggest proportion of its land area under wheat? And rice? The next two graphs show the proportion of land under wheat and under rice in each state (note again that these are maximum values over a decade).

wheatarea2ricearea2

There is much more information in this particular data set. We will revisit it in subsequent posts.

 

 

The Raghuram Rajan Committee report on Composite Development Index of States

In July this year, at a resort near Bangalore (yes, we at Takshashila do sometimes play resort politics) I got the fifth batch of the GCPP to work on the problem of building an index which measures the development of various Indian states in the last 10 years. I used this case as a reference while doing my module on Analytical Methods in Public Policy. This was as part of one of the weekend workshops which are part of the GCPP. As part of this exercise I taught them how to pick variables, how to measure them, procure data, look for interactions between variables and then combine them to form an index.

It is interesting that a couple of months after that session, the report of the Raghuram Rajan Committee on Composite Development Index of States has been published. I will use this blog post to give my comments on that report as I go through it. Since I’m going to be effectively “live-blogging” my reading of the report, the rest of this post is in bullet points.

Also, in keeping up with my title of “resident quant” I will try as much as possible to restrict my comment to the data and methodology, and not comment on economic issues. However, it is likely that I might go on economic rants here or there.

  • The first paragraph of the executive summary states that the reason we adopted a command and control model after independence was so that we don’t increase the inequalities across regions and states. This is the first time I’m hearing this story
  • The index is based entirely on publicly available data. I think this is a good thing.
  • Each state gets 0.3% of the total available pool, irrespective of its size. Of the remaining 91.6% (28 states => 8.4% fixed payment), 3/4th will be distributed based on “need” and 1/4 on “performance”. Nearly seventy years since independence, I’m of the opinion that this ratio should be less skewed towards “need”
  • Arbitrary cutoffs have been drawn at scores of 0.6 and 0.4 to classify states as “least developed” and “less developed”. While these are round numbers, I’m not yet sure they make sense.
  • The report alludes to the “resource curse”, which is a good thing.
  • Quote: “The Normal Central Assistance (NCA) grant, which is distributed to states as per Gadgil-Mukherjee formula based on categorization of “Special Category” and “General Category” states, constituted only about 3.8 per cent of total resources transferred to States and 8.2 per cent of plan transfers.” (emphasis mine)
  • The underdevelopment index has ten components. I won’t comment on the wisdom of the number of quality of the components chosen.
  • It is a good thing that Mean Per Capita Consumption Expenditure is used as a measure of richness rather than per capita Net State Domestic Product. As the report argues, the latter can include economic activity that doesn’t really reach the people, and is hence not as good a measure as consumption expenditure.
  • Table 1 (on page 17 of the report) gives the correlations between the metrics chosen. I think it is a fantastic thing that they have chosen to present the correlations in the first place (something ripe to be pushed under the carpet). As expected, a number of chosen variables are highly correlated.
  • Correlation between Consumption Expenditure and Urbanisation is 75%!! Similarly, correlation between expenditure and female literacy is 58%.
  • Then comes the damp squib – the excitement induced by presenting the correlation table is doused by the statement that each of the ten parameters are going to be accorded equal weight. This is disappointing on several counts: firstly, the sheer arbitrariness (remember that ‘equal allocation’ is as arbitrary as any other distribution). Next, that the correlations are thrown out of the window and certain factors are likely to get more weight. Then, the fact that this makes it easily manipulable by adding or deleting factors of choice. I’m so disappointed by this one decision that I’m putting this entire point in boldface. Apologies. 
  • The report acknowledges that broadly categorizing states into “developed” and “under developed” creates issues of moral hazard. However, rather than fully doing away with the division, the committee (again, disappointingly) takes a “middle path” by splitting two categories into three. I suspect some mathematical brain is involved here, in that the next committee will increase number of categories to four, and the one after to five, until a time when each state (finally!!) becomes its own category
  • To convert per capita allocation to state-wide allocation, the formula uses a combination of population and area. I agree that it is tougher to provide infrastructure to thinly populated  areas, so this combination is fair. It reminds me of my days in airline cargo pricing when we would similarly adjust between the weight and volume of a piece of cargo.
  • Performance index is computed based on changes in the development index over time. This is a good thing. Shows the committee is “eating its own dog-food”
  • This is the first time “performance” is being used as a criterion for fund distribution. So the 25% weight is a good start. I retract my earlier abuse of this ratio.
  • The committee recommends that this analysis be carried out every five years, since a good amount of data used in calculating indices are published at that frequency. Also considering that’s the frequency of finance commissions, it is a good thing.
  • The report tries to bolster its credibility by showing that the index is highly correlated with the UN Human Development Index. I like it that a scatter plot and regression line have been presented
  • The allocation based on performance is again skewed in favour of less developed states. So you are likely to get more if 1. You are underdeveloped and 2. You have shown an improvement. I think this is fair.
  • One good thing is that the formula is plug and play. It is “timeless” in a sense. At any given future point in time, you can simply look up the data points that are required and just construct the index. There is no human intelligence required for that effort
  • There is heavy reliance on NSSO data, and I’m not sure that’s a good thing since it is “survey data”. I think it might have been better to have used data from census.
  • The committee actually examined the option of weighting factors based on squared factor loadings from a Principal Component Analysis (*applause*) and found that the index thus constituted was 99% correlated with the one using simple arithmetic averages, and thus decided to go with the simpler formula. I’ll still continue to keep the earlier point in bold, though
  • Each “sub-component” was normalized between 0 and 1 using a simple linear formula (higher number indicating greater under-development). I like it that they used this rather than a rank ordering metric.
  • The report includes a sensitivity analysis to show that the ranking and index values are robust. Again, applause
  • A dissent note from Committee member Shaibal Gupta indicates that there are problems in using a simple weighted average rather than data from the PCA

Finally, despite all the talk of transparency and ease of calculation, the report itself does not contain either the index number or the component values for various states. I hope the data has been released (and if it has, please help me by giving me the link). If not, we should campaign for the data to be given out to the public in a CSV (or equivalent) format through the government data portal http://data.gov.in

 

Stock market volatility spikes

The Indian stock markets have become especially volatile. Figure 1 shows the volatility of the Nifty in the last three years. As usual, we use a trailing 30-day quadratic variation as a measure of volatility. Don’t bother about the units of the y-axis, just look at the relative movement.

Source: Yahoo
Source: Yahoo

Notice that the volatility levels we have seen in the last month or so are unprecedented in the last three years. Let us take a closer look:

Source: Yahoo
Source: Yahoo

This gives us a better picture. Volatility was well under control till mid-August, when it started rising (since we use a 30-day trailing QV, this means that markets started getting choppy in mid-July). The volatility is now at an all-time high.

However, the official volatility index (India VIX) disagrees. According to this, volatility has actually dropped from its all-time high. The VIX also looks significantly choppy.

Source: NSE
Source: NSE

 

Perhaps this indicates some trading opportunity in options?

Exponential increase

“Increasing exponentially” is a common phrase used by analysts and commentators to describe a certain kind of growth. However, more often than not, this phrase is misused and any fast growth is termed as “exponential”.

If f(x) is a function of x, f(x) is said to be exponential if and only if it can be written in the form:

f(x) = K alpha ^x

So if your growth is quick but linear, then it is not exponential. While on the topic, I want to point you to two posts I’ve written on my public policy blog for Takshashila: one on exponential growth in bank transfers that I wrote earlier today and another on exponential growth in toilet ownership. The former shows you how to detect exponential growth and in the latter I use what I think is an innovative model to model toilet growth as exponential.

Exponential increase in uptake of IMPS

We had dealt with exponential increases on this blog once before. We revisit the topic, and this time this is in the context of the inter bank mobile payment system that came into place sometime last year. I’ve never used it so I’m not sure how it works, but going by the data put out by the National Payments Corporation of India, the volume of transactions is increasing at an exponential rate.

How do we determine this is an exponential rate? First, let us look at the time series of total volumes of transactions:

Source: http://www.npci.org.in/impsVolumes.aspx
Source: http://www.npci.org.in/impsVolumes.aspx

Notice that after remaining flat for a couple of months (maybe even decreasing) the number of transactions has really taken off (March is probably an aberration – but given that it’s the month of financial closure the higher volumes can be expected). Increased exponentially, you say? How can we test that?

We can test that by using a logarithmic scale for the y-axis. Here is the same plot again, except that this time the Y-axis is logarithmic.

Source: http://www.npci.org.in/impsVolumes.aspx
Source: http://www.npci.org.in/impsVolumes.aspx

Notice that apart from the part with the aberration and the initial two months, the graph is now linear. In other words, we can describe this graph by a line of the form

log y = a + b x

or y = exp (a + bx)

Thus, exponential!

Coming back from the geekery, it is really good to note that IMPS has taken off. However, this should not be taken as proof of the fact that mobile payments are easy, for IMPS is anything but easy. New RBI Governor Raghuram Rajan has said in his inaugural speech that he hopes to make it simpler to make payments via mobile. Hopefully this will take off soon. Till then all we can do is to contribute to the exponential growth in the update of the IMPS!

Wheat Prices in India

The ministry that has taken the greatest enthusiasm in disseminating data via the data.gov.in data portal launched by the government of India is the Ministry of Agriculture, which has so far released over 1700 different data sets. Once you download the data you will find that the data is extremely extensive.

I happened to download data on wheat prices in the last four years and the level of detail is amazing. For each agricultural market in the country, for each kind of wheat, it gives the minimum, maximum and modal traded price of wheat for every day. Over the four years, the data set has over 6 lakh data points.

I wanted to look at how the wholesale price of wheat has varied in the last four years. Rather than get into the nittygritties of different varieties of wheat and different markets, I simply took the median traded price of wheat for each day and plotted them. While there might be different varieties whose prices vary from each other, the median is enough to give us a level.

wheatprice

Notice the seasonality in the price of wheat. Given that wheat is primarily a Rabi crop, you would expect the new harvest to hit the markets sometime around March-April (Baisakhi is the primary Rabi harvest festival). However, if you look at the price trends, you notice that the price peaks each year around December, and the price drops starting in January. It continues to drop until March-April after which it starts rising again.

The data shows that there was a steep increase in the price of wheat towards the end of 2009. 2011, however, didn’t behave similarly, with a sharp drop in the price of wheat towards the end of the year. The latter, however, has been more than compensated by the sharp increase in the price of wheat through the course of 2012.

There is a lot more you can play around with the data. You can expect some more agricultural analysis on this blog in the coming weeks or months.