Rapes in the states of India

Recently the National Crime Records Bureau put out statistics of violent crime in India in 2012. In this post we will look at the incidence of rape in various states.

The following chart plots the number of rapes per 10000 people against the number of murders per 10000 people. The reason we have included the murder numbers here is to control for states which have an overall high rate of violent crime. This chart also has the regression line and the shaded region is the 95% confidence interval around this line.

Source: Crimes in India 2012 report by the National Crime Records Bureau
Source: Crimes in India 2012 report by the National Crime Records Bureau

It is interesting to note that relative to murder rate, Mizoram has the highest case of reported rape in India. In fact, Mizoram has the absolute highest rape rate in India. However, we need to take into account that these are the numbers of reported rape cases. According to reports in the media, a large number of rape cases in India go unreported. So these numbers need to be taken with a handful of salt.

RG@ICSE,ISC

I did my higher education at two “institutes of national importance”. Both institutions followed what is called “relative grading”. It didn’t matter on an absolute scale how well or how badly you did. Your grade for the course would depend on how everyone else who took the course did. So for example, there was this one course at IIT Madras where I got 80/100, and got an S grade (the highest grade possible). The general performance of the class had not been great, so in that course 80 merited an S. In another course, however, 80 fetched me only a B (the third highest grade) – the general performance of the class had been much better.

While IITs and IIMs and some other autonomous institutions practice relative grading, it is not the “done thing” in most of the rest of India. Most of our board and university exams follow what is known as “absolute grading” – your grade for the course depends solely on your performance, without taking into account the performance of others. So it is theoretically possible to have a case where practically everyone in the class scores “90%”. Given that this is the prevailing system of grading in most of India, we assume that the board exams follow this principle, too.

Two or three days back, Debarghya Das, a student at Cornell set a cat among the pigeons by scraping the marks of every single student who took the ICSE or ISC exams (10th and 12th board respectively administered by the CICSE). What he noticed was that certain marks had gone missing – for example nobody scored 81, 82, 84, 85, 87, 89, 91 or 93 in any of the courses.  This is just a sample of marks that have gone missing. There are several other numbers which are effectively “unattainable” in any of the courses. Das, on his account, has alleged some kind of “fraud”.

What is the first thing that comes to your mind when you see this rather jagged distribution? I wouldn’t blame you if you saw a hedgehog. But can you think of a graph that looks like that?

Three years back I bought myself a DSLR camera, after which I pretend to be an expert photographer. I even use Photoshop/Gimp to manipulate some of the images I click. And a decidedly much better photographer friend has told me that the first thing you do while editing a photo is to adjust “levels”. See this to know what you can do with levels. Basically, the concept is that some parts of the colour spectrum are unrepresented in an image, and by adjusting levels you make sure the full spectrum is used, thus improving the contrast of the image.

There is something known as the image histogram. I took a picture that I had shot and adjusted the “levels”. On the left you see what the histogram looks like after the levels. On the right, you see the histogram as it was before you adjusted the levels.

Image histogram after (left) and before (right) adjusting the levels of an image. From a random photograph I had shot
Image histogram after (left) and before (right) adjusting the levels of an image. From a random photograph I had shot

Doesn’t the histogram on the left remind you of the distribution of ICSE/ISC marks? And how did we get that histogram? By taking the histogram on the right (which is smoothed but all bunched up in one part of the distribution) and stretching it so that it falls across the entire distribution. And what happened when we did that? We got gaps, as you can see in the histogram on the left or the distribution of ICSE/ISC marks.

There is an article in The Hindu today that again explores this issue of missing marks in ICSE/ISC. In that the ICSE council, which administers these exams is quoted saying:

 “In keeping with the practice followed by examination conducting bodies, a process of standardisation is applied to the results, so as to take into account the variations in difficulty level of questions over the years (which may occur despite applying various norms and yardsticks), as well as the marginal variations in evaluation of answer scripts by hundreds of examiners (inter-examiner variability), for each subject.”

Another money quote from the same article:

“The word tampering is wrong. There is moderation that happens across education boards,” explained a teacher, who has worked with ICSE schools in Hyderabad and Chennai. “After the first round of corrections, raw data is given to officials and head examiners who analyse how students have performed. They try to ensure the bell curve of the results does not look awkward. If it does, the implication is that the checking has been either too liberal or very strict.”

So there you go. The ICSE Council effectively follows relative grading. There is a certain distribution of marks that they desire, and they adjust the “levels” of the overall distribution of marks so that the desired distribution is achieved. The desired distribution of marks is something like “X% students get between 95 and 100, Y% get between 90 and 95”, and so on. Now, two students who had got the same number of marks as per the initial marking have to get the same number of marks after recalibration. So what the missing marks indicates is that there was clustering – a large number of students had ended up scoring in the same narrow range, and so after normalization, this range got expanded because of which you have gaps. Now, when certain sections of the range in the middle are expanded, some at the end have to get contracted (for example, if someone who originally got 70 is given 90, a person who originally got 90 deserves so much more). Which is why you see that at one end – 94-100 all possible marks are represented.

This still doesn’t explain one thing though – why is it that the same marks have gone missing in all subjects? It is impossible that the initial distribution of marks was identical across subjects. I have only one explanation for this – there was one overall mapping algorithm that was used across subjects, that converted marks obtained to the relative marks. This is also seen in the fact that the shape of the distribution across subjects varies widely (again refer to Das’s post).

So that explains the weird distribution of marks in the ICSE / ISC exams. But what explains the title of the post? In IITian English, “RG” is a term derived from “relative grading”. It is a rather derogatory term used to describe people who prefer to pull down others in their quest to get ahead (note that this is a consequence of relative grading). Taking some more liberties and using IITian English, you can say that the ICSE/ISC board has “RGed” students!

Rail track utilization, per Railway Minister

Now I guess you know how I work. I come across a data set and then torture it to extract as much information as I can before I let go of it. So continuing with the railway data put out by the EPW, in this post we will look at the track utilization. The metric is simple – how many passenger trains go over a piece of railway track each day?

We have numbers for the total route length and the total number of passenger train kilometers. Dividing the latter by the former gives us the number of trains that pass over the average piece of track in a year. Divide that by365 and you get the number of trains that go over the track per day. In 1992, this number was 16. An average piece of track was run over by 16 trains each day. By 2009, this number had gone to 25!

Data source: Economic and Political Weekly May 18, 2013 vol xlviII no 20
Data source: Economic and Political Weekly May 18, 2013 vol xlviII no 20

Note that these are average numbers. They hide the fact that there might be tracks on which no trains run, and other tracks on which maybe 100 trains run each day (even higher if you think of something like the Mumbai local train tracks). Yet, they give us a good indication of how the railways have utilized the infrastructure that is most scarce (tracks are the hardest thing to add, given the complexities involved in laying additional track – taking into account land acquisition, etc.).

Notice that though this is a largely linear growth, there have been times when growth has been faster than in other times. Next, let us look at how much utilization has been added each year. And let us look at it in terms of who the railway minister was in that financial year!

Data source: Economic and Political Weekly May 18, 2013 vol xlviII no 20
Data source: Economic and Political Weekly May 18, 2013 vol xlviII no 20

Notice that the outlier years are the first two years of Nitish Kumar’s occupation of the Ministry. During his unbroken 5 year tenure, Lalu Prasad Yadav also consistently added significantly in terms of track utilization. Unfortunately the data for passenger train kilometers ends with 2009, so here we are not able to see how Mamata Banerjee performed in her second stint in the ministry.

 

Strain on Indian Railways

In my last post I looked at some railways data that was put out by the Economic and Political Weekly to show that the total addition in route length over the last 20 years is not much to talk about. The same data set also gives data on “passenger kilometers” and “passenger train kilometers” for each year. The latter gives the  total distance all passenger trains in India have run, while the former gives the total distance traveled by (ticketed) train passengers in India each year.

Now, the ratio of these two numbers gives us the number of passengers per train. It is interesting to note how this has moved in the last 20 years.

Data source: Economic and Political Weekly May 18, 2013 vol xlviII no 20
Data source: Economic and Political Weekly May 18, 2013 vol xlviII no 20

In 1990 the average train used to carry about 800 passengers. That number has almost doubled to 1400 in 2009 (data on passenger train kilometers not available after that).

While some people might see this as a measure of higher efficiency by the railways, I see it more as an inability by the railway infrastructure to keep up with passenger demand. With little track length having been added, there is no surprise in that.

Rail length growth in India, or why you should not trust visualizations at face value

My colleague Nitin Pai extracted some data from the latest issue of EPW that shows the growth in total route length of Indian railways in the last 20 years. To get a better understanding of how the rail length has grown, I draw a simple graph. This is what I found:

Data source: Economic and Political Weekly May 18, 2013 vol xlviII no 20
Data source: Economic and Political Weekly May 18, 2013 vol xlviII no 20

From this graph, it looks like the growth in Indian Railways route length has been pretty impressive. You will also notice that the graph is not monotonically increasing – there are years where the route length is lower than that of the previous years. I would suspect that is due to conversion of metre gauge to broad gauge tracks.

But then if you take a closer look at the graph, you might notice that the y axis doesn’t start at zero. So you might want to see what the growth looks like if you were to start the y axis at zero. Here is what you get:

Data source: Economic and Political Weekly May 18, 2013 vol xlviII no 20
Data source: Economic and Political Weekly May 18, 2013 vol xlviII no 20

Now that the axis has been plotted starting from zero, you notice that the growth in rail length by the Indian railways is not all that impressive.

Moral of the story: If you are a user of a visualization, make sure you check things like axes, scales, etc. before jumping to conclusions. You never know what tricks the person who made the visualization might have been up to. If you are making a visualization, however, keep in mind that a lot of your consumers are not going to look at the visualization too carefully, so make use of axes, scales, etc. in a way that embellishes your story.

Surveying Priorities

Earlier today the Lowy Institute put out the results of a survey it conducted on “India’s views of the world ahead”. While the report contains some excellent insights (including Indians’ perception of various countries), the problem is that it doesn’t establish what people’s priorities are.

For example, there is a question that asks people how important it is that “India has the largest navy in the Indian Ocean”. Some 94% of respondents think it is important, but neither the question nor the answer acknowledges the cost of being the largest navy in the Indian Ocean. Of course, having the largest navy in the Indian Ocean is a great thing to have, but what about the cost?

This is the problem with “uni-directional surveys” – where questions are independent of each other and no relation between factors is established. For example, everyone wants low taxes, high level of government-sponsored welfare, full employment, good wages and a strong military. The reason differences between political parties occur is because it is impossible to have all of it at the same time, and different parties have different positions on the trade-offs.

Table 24 of the Lowy survey illustrates this. The question is about domestic policy goals, and respondents are asked about the importance of each. Is it of any surprise that over 90% of respondents think each and every one of these goals is important?

Extracted from the Lowy Institute report on Indian Views of the World Ahead (http://www.lowyinstitute.org/files/india_poll_2013_0.pdf)
Extracted from the Lowy Institute report on Indian Views of the World Ahead (http://www.lowyinstitute.org/files/india_poll_2013_0.pdf)

In order to capture trade-offs, I propose a different kind of survey. One where the respondent is told “The government suddenly gets an extra Rs. 100 which it has to spend on either strengthening our military or providing food security. What do you choose?”. The survey I propose will have a series of such “binary” questions, where respondents have to allocate the government budget between various programs. That way, the true preferences of the respondents can be captured.

One last point on the presentation of the above table. The survey uses a “4 point Likert scale” (“not at all important”, “not very important”, “fairly important”,”very important”) to record responses. First off, marketing research theory recommends that such scales have an odd number of choices (3 and 5 are the recommended numbers). Secondly, the report has chosen to group the first two choices under “total not important” and the latter two under “Total important”. As you can see from the table, these “total” columns are presented in boldface, thus drawing attention. Consequently, given the amount of information in each table, no one really looks at the columns not in bold face. In other words, the Likert scale could have had only two points (important – not important)!

Water Subsidy in Bangalore

Pavan Srinath yesterday wrote about the water subsidy in Bangalore, arguing in favour of “crisis pricing” of water in order to tide over the current water shortage. To support that he has produced the chart produced below which shows the total subsidy a household gets as a function of consumption.

The interesting thing to note is that there is “indefinite subsidy”. Ideally you would expect to get subsidy only up to a certain level of consumption. However, the data here shows that irrespective of how much you consume, you still get a significant subsidy for the marginal liter of water that you consume.

Subsidised Water in Bangalore

Pavan’s own comments on this chart can be found on his post at The Transition State

Revising the Food Security Bill Numbers

Mohit Satyanand replied to my earlier post on Food Security Bill with a couple of comments. He mentioned that only about 40% of the beneficiaries are going to get rice while the other 60% are going to get wheat. He also pointed me to the site of the Food Corporation of India where they give the official “all in” costs of rice and wheat (Rs. 27 and Rs. 19 respectively). I still believe that the wholesale market price is a better measure of the all-in price, but it would be useful to see what the subsidy number works out to given the official government numbers on prices.

foodsecurity2

Notice that the total subsidy has now come to about 6% of the budget, which is still massive. There are of course other problems with the bill – such as distortion of markets, but those are outside the scope of this blog so I’ll stop here.

Food Security: Making sense of the numbers

I’m not convinced by the official figures for the subsidy that is required for the food security bill. According to my calculations (shown below), it is likely to be a whopping 11% of the budget, and this is excluding administrative cost.

I bought rice this morning from the kirana store close to my house. I paid Rs. 48 per kg, and it was not the most premium quality. Assuming that the food security act will provide rice that is of slightly inferior quality, and taking out the retailer’s margin, I think assuming Rs. 40 per kg is a fair estimate for market price of rice.

I’m pasting a screenshot of my spreadsheet here:

foodsecurity

As you can see, the proposed bill intends a subsidy of 11% of India’s budget this year (to put that in context, the Fiscal Deficit is 5%). Also note that the calculation above doesn’t take into account the administrative cost of implementing this scheme.

Errata: The sixth line in the spreadsheet should read “subsidy per person per year'”. The numbers, though, remain the same

Banking activity and economic activity

Out on Capitalmind, Deepak Shenoy has an excellent post on the penetration of banking services in India, where he points out that 30% of all bank deposits in India are in Mumbai and Delhi. I encourage you to read that post in full.

Having read that, I was interested to see the per capita figures and compare them across states. On a whim, I decided to compare that to per capita state GDP and this is what I got:

Data source: RBI website Note: Maharashtra, Delhi and Goa have been left out because they are outliers. Some other states (Chandigarh, Gujarat and Mizoram) have been left out since their latest GSDP figures are not available
Data source: RBI website
Note: Maharashtra, Delhi and Goa have been left out because they are outliers. Some other states (Chandigarh, Gujarat and Mizoram) have been left out since their latest GSDP figures are not available

 

 

While the direction of causality cannot be clearly established, this clearly shows that banking penetration is highly correlated with economic activity.