Pertinent Observations Grows Up

Over the weekend, I read Ben Blatt’s Nabokov’s Favourite Word Is Mauve, a simple natural language processing based analysis of hundreds of popular authors and their books. In this, Blatt uses several measures of goodness or badness of writing, and then measures different authors by it.

So he finds, for example, that Danielle Steel opens a lot of her books by talking about the weather, or that Charles Dickens uses a lot of “anaphora” (anyone who remembers the opening of A Tale of Two Cities shouldn’t be surprised by that). He also talks about the use of simple word counts to detect authorship of unknown documents (a separate post will come on that soon).

As someone who has already written a book (albeit nonfiction), I found a lot of this rather interesting, and constantly found myself trying to evaluate myself on the metrics with which Blatt subjected the famous authors to. And one metric that I found especially interesting was the “Flesch-Kincaid grade level“, which is a measure of complexity of language in a work.

It is a fairly simple formula, based on a linear combination of the average number of words per sentence and the average number of syllables per word. The formula goes like this:

Flesch-Kincaid Grade Score

And the result of the formula tells the approximate school grade of a reader who will be able to understand your writing. As you see, it is not a complex formula, and the shorter your sentences and shorter your words (measured in syllables), the simpler your prose is supposed to be.

The simplest works by this metric as mentioned in Blatt’s book are the works of Dr. Seuss, such as The Cat in the Hat or Green Eggs and Spam, on account of the exclusive usage of a small set of words in both books (Dr Seuss wrote the latter as a challenge, not unlike the challenges we would pose each other during “class participation” in business school). These books have a negative grade score, technically indicating that even a nursery kid should be able to read them, but actually meaning they’re simply easy to read.

Since the Flesch Kincaid Grade Score is based on a simple set of parameters (word count, sentence count and syllable count), it was rather simple for me to implement that on the posts from this blog.

I downloaded an XML export of all posts (I took this dump some two or three weeks ago), and then used R, with the Tidytext package to analyse the posts. Word count was most straightforward, using the str_count function in the stringr package (part of the tidyverse). Sentence count was a bit more complicated – there were no ready algorithms. I instead just searched for “sentence enders” (., ?, !, etc. I know the use of . in abbreviations creates problems but I can live with that).

Syllable count was the hardest. Again, there are some packages but it’s incredibly hard to use. Finally after much searching, I came across some code that again approximates this and used it.

Now that the technical stuff is done with, let’s get to the content. This word count, sentence count and syllable count all flow into calculating the Flesch-Kincaid (FK) score, which is the approximate class that one needs to be in to understand the text. Let’s just plot the FK score for all my blog posts (a total of 2341 of them) against time. I’ve added a regression line for good effect.

The trend is pretty clear. Over time, this blog has become more complicated and harder to read. In fact, drawing this graph slightly differently gives another message. This time, instead of a regression line, I’ve drawn a curve showing the trend.

When I started writing in 2004, I was at a 5th standard level. This increased steadily for the first two years (I gained a lot of my steady readership in this time) to get to about 8th standard, and plateaued there for a bit. And then again around 2009-10 there was n increase, as my blog got up to the 10th standard level. It’s pretty much stayed there ever since, apart from a tiny bump up in the end of 2014.

I don’t know if this increase in “complexity” of my blog is a good or a bad thing. On the one hand, it shows growing up. On the other, it’s becoming tougher to read, which has probably coincided with a plateauing (or even a drop) in the readership as well.

Let me know what you think – if you prefer this “grown up style”, or if you want to go back to the more simple writing I started off with.

Newsletter!

So after much deliberation and procrastination, I’ve finally started a newsletter. I call it “the art of data science” and the title should be self-explanatory. It’s pure unbridled opinion (the kind of which usually goes on this blog), except that I only write about one topic there.

I intend to have three sections and then a “chart of the edition” (note how cleverly I’ve named this section to avoid giving much away on the frequency of the newsletter!). This edition, though, I ended up putting too much harikathe, so I restricted to two sections before the chart.

I intend to talk a bit each edition about some philosophical part of dealing with data (this section got a miss this time), a bit on data analysis methods (I went a bit meta on this this time) and a little bit on programming languages (which I used for bitching a bit).

And that I plan to put a “chart of the edition” means I need to read newspapers a lot more, since you are much more likely to find gems (in either direction) there than elsewhere. For the first edition, I picked off a good graph I’d seen on Twitter, and it’s about Hull City!

Anyway, enough of this meta-harikathe. You can read the first edition of the newsletter here. In case you want to get it in your inbox each week/fortnight/whenever I decide to write it, then subscribe here!

And put feedback (by email, not comments here) on what you think of the newsletter!

Medium stats

So Medium sends me this email:

Congratulations! You are among the top 10% of readers and writers on Medium this year. As a small thank you, we’ve put together some highlights from your 2016.

Now, I hardly use Medium. I’ve maybe written one post there (a long time ago) and read only a little bit (blogs I really like I’ve put on RSS and read on Feedly). So when Medium tells me that I, who considers myself a light user, is “in the top 10%”, they’re really giving away the fact that the quality of usage on their site is pretty bad.

Sometimes it’s bloody easy to see through flattery! People need to be more careful on what the stats they’re putting out really convey!

 

Bloggers and anti-bloggers

I know this post “dates” me as someone who started blogging back in the peak era of blogging in the mid 2000s. But what the hell! 

I think you can consider yourself to have “made it” as a blogger when a post that you write attracts abuse. Sometimes this abuse could be in public, in the comments section of the blog. At other times, the abuse is in private, when someone meets you or calls you, and abuses you for writing what you wrote.

As long as you’ve been reasonable in your blogging (which the early years of this blog’s predecessor cannot exactly claim), abuse on your comments section is more of an indicator of the thin-skinnedness of the abuser, rather than you crossing lines on what you should write about.

At this point in time, it is pertinent to introduce the class of people who I call as “anti-bloggers”. Sometimes they might themselves have a blog, but that is not necessary, what is necessary is that they have a “holier than thou” attitude.

Anti-bloggers are people with especially thin skins who are always on the lookout for something to outrage about, and blogs, which allow people to express themselves freely on a public forum without editorial oversight, are a common whipping boy.

This outrage could come in several forms. The thicker-skinned version of this outrage happens from people who abuse you only if they think you’ve abused them on the blog (good bloggers take care to never mention names in a negative manner, so this is usually a case of “kumbLkai kaLLa heglmuTT nODkonDa” (the pumpkin thief looked at his shoulder; it’s a Kannada proverb meaning something like “every thief has a straw in his beard) ).

The thinner skinned version of anti-bloggers find it even easier to find things to outrage about. Look at the Bangalore post I’d written ten years back. There was no hint that I’d written about anyone at all, but the post received heaps of abuse, from people who manufactured some kind of entity that the post purportedly offended!

The most annoying anti-bloggers are those that abuse you when you simply pen down an observation that is there for all to see. I won’t take specific examples now, but sometimes the simple act of reporting a fact that is evident to everyone can offend people, for its existence on paper (a website, rather) gives it new-found legitimacy!

This last bit can also help explain the annoyance of some sections of the “mainstream media” with “social media” such as blogs/twitter. The worthies in the mainstream media had established certain unwritten rules by which certain facts/events wouldn’t be put down on paper.

The mention of these events in social media (which is unedited) suddenly gave these events/happenings sudden legitimacy, which steered the overall narrative away from where it existed during the mainstream media monopoly, annoying the mainstream media!

One penultimate point – anti-bloggers are the same people who talk about the glories of the days prior to social media (this piece in The Guardian is an especially strong specimen), when people could only read news that was filtered and possibly censored by newspaper editors.

And finally, ever since my credentials as a blogger were established about a decade back, some people have started explicitly mentioning to me when they are saying something “off the record”. And I’ve always respected these conditions!

Commenting on social media

While I’m more off than on in terms of my consumption of social media nowadays, I find myself commenting less and less nowadays.

I’ve stopped commenting on blogs because I primarily consume them using an RSS reader (Feedly) on my iPad, and need to click through and use my iPad keyboard to leave comments, a hard exercise. And comments on this blog make me believe that it’s okay to not comment on blogs any more.

On Facebook, I leave the odd comment but find that most comments add zero value. “Oh, looking so nice” and “nice couple” and things like that which might flatter some people, but which make absolutely no sense once you start seeing through the flattery.

So the problem on Facebook is “congestion”, where a large number of non-value-adding comments may crowd out the odd comment that actually adds value, so you as a value-adding-commentor decide to not comment at all.

The problem on LinkedIn is that people use it mostly as a medium to show off (that might be true of all social media, but LinkedIn is even more so), and when you leave a comment there, you’re likely to attract a large number of show-offers who you are least interested in talking to. Again, there’s the Facebook problem here in terms of congestion. There is also the problem that if you leave a comment on LinkedIn, people might think you’re showing off.

Twitter, in that sense, is good in that you can comment and selectively engage with people who reply to your comment (on Facebook, when all replies are in one place, such selective engagement is hard, and you can offend people by ignoring them). You can occasionally attract trolls, but with a judicious combination of ignoring, muting and blocking, those can be handled.

However, in my effort to avoid outrage (I like to consume news but don’t care about random people’s comments on it), I’ve significantly pruned my following list. Very few “friends”. A few “twitter celebrities”. Topic-specific studs. The problem there is that you can leave comments, but when you see that nobody is replying to them, you lose interest!

So it’s Jai all over the place.

No comments.

Our documented lives

I think I’ve confessed here several times that I like reading my old blogposts. In fact, I like reading my old blogposts from 2006 onwards – there was an inflexion point towards the end of 2005, and I hate my posts written before that. It was almost I was a completely different person.

Anyway, of late, these nostalgia trips have taken a different direction. Firstly, in 2006-10, I used GTalk fairly extensively, and most conversations are still archived (except for some people who explicitly turned off the saving). So once in a while I pick a random person (most often it’s the person who’s now my wife, and most of my GTalking with her was before we had even met) and check out my conversations with him/her.

Sometimes it just sends me on a bout of nostalgia. Sometimes it reminds me of what I (and these people I used to talk to) was like back then, and wonder how I’ve changed and so forth. At other times these posts remind me of what was “hot gossip” back then (yes, I was a major gossipmonger in my younger days), which, thanks to the fundamental fleetingness of gossip,  I normally don’t remember. When I remember such gossip, it’s a fun exercise to reconcile the subjects of gossip with their present selves.

Another activity I take up randomly from time to time is reading people’s blogs. Some of these have been mostly taken private as these people in question have embarked on successful corporate careers. I still have my LiveJournal account, so that helps me access some of these blogs (and others have kindly shared passwords to their now-private blogs with me).

The kind of trips these take me on is similar to what the old chats inspire – some nostalgia, some recollection of what different people were like back then and how they’ve turned out (I also make sure I read the comments), catching up on gossip of that day and all such.

In a way, I’m quite glad that so many of us live such documented lives! In that sense I quite hate Twitter and Facebook, for it’s bloody hard to search for stuff there (except for Facebook’s this day that year feature), and with a lot of documentation having moved there from blogs and GTalk, it’s quite sad!

PS: Sometimes I indulge in these nostalgic activities jointly with my wife, and occasionally it’s not fun, since she ends up discovering a part of my history which she didn’t know existed. Documentation has its downsides as well!

PPS: It makes me wonder what “oral histories” (I’ve always regarded them as a fraud concept, but I’ll save my description of those for another day) will look like one or two generations down the line, when so much of our documented histories will be available, if we choose to make them available.

On writing a book

While I look for publishers for the manuscript that I’ve just finished (it’s in “alpha testing” now), I think it’s a good time to write about what it was like to write the book. Now, I should ideally be writing this after it has been published and declared a grand success.

But there are two problems with that. Firstly, the book may not be a success of any kind. Secondly, it will be way too long after having finished it to remember what it was like to write it. In fact, a week after the first draft, I’ve almost already forgotten what it was like. So I’m writing this now.

  1. Writing is a full-time job. I got this idea for the book in October 2014 when I was visiting Barcelona for the first time. I wrote the outline in November 2014. Despite several attempts to write, nothing came out of it.

    During a break from work in October 2015 I managed to get started, but I’ve re-written all that I wrote then. Part-time effort doesn’t just cut it. It wasn’t until I came to Barcelona in February that I could focus completely on the book and write it.

  2. You need discipline. This probably doesn’t need to be explicitly stated, but writing a book, unlike writing a blog post, is a fighter process, and you need a whole load of discipline and focus. After a week or two of preparing the outline, I prepared fairly strict deadline regarding when I would finish the book. I had to reset the deadline a couple of times, but finally managed it.
  3. There is no feedback. I think I wrote about this a few days back. The big problem with writing a book is that you spend a significant amount of effort before even a small fraction of your customers have seen the product. So you soldier on without any feedback, and it can occasionally be damn frustrating.
  4. You feel useless. Writing a book can introduce tremendous amounts of self-doubt. One day you think you’ve completely cracked it, and your book will change the world. The next day you start wondering if there’s any substance at all to what you’re writing, and there’s any point in going ahead with it. On several occasions, I’ve had thoughts on abandoning it.
  5. Getting away helps. The only reason I didn’t abandon the book when I had my bouts of self-doubt was that I was away in Barcelona with nothing else to do. It wasn’t as if I could ditch the book and find some work to do the next day. Being away meant that the TINA factor pushed me on. There was no alternative but to write the book.
  6. Getting in a draft is important. You are likely to have bad days when you’re writing. On those days you feel like giving up. On putting things off for another day. Reams have been written about great writers stalling their books for several days because they couldn’t find the “right word”. I don’t buy that.

    Found that when I’m in a rut, it’s better I simply push through and finish the chapter. Editing it later on is far easier than writing it again from scratch.

  7. There is a limit to how much you can write. When I said it’s a full time job you might think I spent 8 hours a day on the book. I took around 70 days to write it (including a 10-day vacation), and the draft weighs in at 75,000 words (I intend to cut it before publication). So it’s less than 1200 words per day on an average.

    That doesn’t sound like a lot, but trust me, writing on a continuous basis is quite hard. A lot of time goes in fact checks and in getting links (I don’t think I still have all the footnotes and endnotes I need for the book). Writing a book is far more complex than writing a blog post.

  8. Writing is tiring. This isn’t something I figured out while writing the 2000 odd posts I’ve put on this blog. When you’re writing a book, and for an audience, you realise that you get tired pretty quickly. I don’t think I was able to work more than four hours a day on any of my “writing days”. And four hour-days would leave me a zombie.
  9. You need a schedule, and a workplace. I did the pseud romantic thing. The entire book was written at this WiFi enabled cafe near my place in Barcelona. Pseud value apart, the point of having the workplace was that it brought a schedule and some discipline to my days. I would go there every morning on writing days (exact time varied), get a coffee and sit down to write. And not rise until I had finished my target for the session.

    Two days back I went there to work on something else. I figured I couldn’t – that cafe is now forever tied to my writing the book. The kind of focus required there was of a different kind.

I’ll stop for now. I hope to republish this blog post once the book has hit the stands!

How social media affects your life

My first attempt at writing of any kind was in 2004, when I edited the daily newsletter at Saarang, IIT Madras’s cultural festival. It was a fun experience (I remember digging out my newsletters sometime back, but cant seem to find them now), and I think RAP and I did a pretty good job.

Given that events would go on late into every night and we’d to bring out an edition every morning, some “preprocessing” was key, and I decided to solve the problem through some “online writing” (at the same time I was doing my B.Tech. project in online algorithms, but I digress). As and when I would make a pertinent observation (I borrowed the name for that newsletter, too), I would try and think about how I would describe it in the next day’s newsletter, and immediately jot it down in a notepad I carried.

This way, by the time RAP and I met every evening to compile the newsletter, most of the material would be in place and all we would have to do was to compile, edit and typeset it, and the newsletter would be ready. One time, when we knew that a quiz would go on till dawn (as per tradition), we wrote up the article even before it had happened based on how previous editions had gone. The winner’s name was inserted in the morning just before printing.

The reason I’m telling this story (which I might have told before) is that it inculcated in me the habit of trying to instantly describe in written word anything I saw. Going forward, it became a habit, though it didn’t have much outlet. Later in 2004 I started this blog, and when I would remember the thoughts I’d thought to describe things I saw, I would put it down on this blog.

Twitter changed all that. Now, as soon as I could describe something I saw in a meaningful (and short) fashion, there was an outlet for instant output. Facebook made it even better, allowing me to tell stories with photos and without a word limit (Facebook did photos long before Twitter did). Instagram did the same.

So seven or eight years on social media (I joined Facebook in late 2007 and Twitter in mid 2008) meant that my skill of quick written pertinent observations about just about anything I saw got a lot of encouragement (though, most times no one would react, and at times I would get trolled).

A month after going off social media, I realise that this habit has gotten completely ingrained into me, and irrespective of what I’m doing I’m thinking more about how I’d describe it in a few words (and maybe a picture), rather than enjoying the sight or sound or conversation or whatever! And knowing that I’ve denied myself this mode of output (social media) temporarily, it feels a bit odd when I mentally make one such observation, knowing there’s no way to put it out!

The thing is while I used to already do this before I got access to instant social media, the extent to which I’ve started reacting this way has changed significantly over the years! And I don’t know if that is a good thing.

Anyway, here’s an old style pertinent observation, being made much delayed, and put on this blog (rather than on any other media). I found this place called “ze fork on the water” on the Lake Geneva shoreline yesterday!

zefork

Twitter and negativity

One of the reasons that sparked my departure from social media platforms such as Facebook and Twitter two weeks back was an argument with my wife where she claimed that Twitter had made me too negative, and highly prone to trolling (even in “real life”). Accepting a challenge from her, I offered to go through my tweets over the last few months, and identify those that were negative. I also offered to perform a similar exercise with my blog.

I started off with the intention to go through tweets in the last one year and delete anything that was negative or “troll-y”. I allocated myself an hour to accomplish this, along with a similar exercise for my blog.

I must have spent fifty minutes going through my twitter feed, and didn’t manage to go back more than two months. I was surprised by my own sheer volume of tweeting. What was more surprising was the amazing lack of insight in most of those tweets – there were horrible PJs that I’d cracked just because I could, there were random replies to other people which didn’t add any kind of value, there was outrage about the lack of outrage and some plain banal life stuff (apart from some downright trolly stuff which I deleted).

It made for extremely painful reading, and I could hardly recognise myself from my own tweets. Apart from some personal markers, I would find it hard to recognise most of these tweets as my own if they were to be presented to me a few months later. It was a clear indication that it was time to exit twitter (though since I have a rather kickass username there I’m not deleting my account).

The ten minutes I spent that day going through this blog, however, was a sheer delight. I did end up deleting a couple of outragey posts (both of which were essentially collections of tweets which I’d collated for posterity), but most of my posts were mostly sheer delight! There was some kind of insight in each of my posts, and I’d lie if I were to say that I’m not proud of what I’ve written.

It’s not that I’ve not written shit on this blog (or its predecessor), having written posts as late as 2008 which I’m definitely not proud of. What I’ve noticed, however, is that I’ve evolved over time, and my writing style has been refined, and I think I continue to add significant value to my readers.

Twitter’s constant engagement feature, however, meant that it was hard to evolve there and hard to escape from the cycle of banal and negative tweets. My tweets from this February are unlikely to be qualitatively very different from those 5 years back, and that’s not a positive thing to say.

The thing with Twitter is that its short format encourages a “shoot first ask questions later” kind of thinking. You end up posting shit without thinking through it, and without having to construct a reasonable argument. This encourages outrage, and posting banal stuff. Spending one minute typing out a banal tweet is far lower cost than spending 20 minutes typing out a banal blog post – the latter is unlikely to be written unless there’s some kind of insight in it.

Outrage is one thing, but what’s really got to me with respect to twitter is its sheer ordinariness, and temporality (most tweets lose value a short period of time after they’re posted). It’s insane that it’s taken me so long (and three longish sabbaticals from twitter) to find out!

Bring on the Blook

I’m normally not one to notice such stuff, but I was randomly browsing my site stats the other day and found that I had published 1997 posts till then (not including the three that I’d published and subsequently withdrew for various reasons). I’ve written two more posts after that which makes this one the 2000th post on this blog (including its predecessor). It’s taken a bit more than 11 years (I started blogging in August 2004) to reach this milestone.

A couple of years back, I’d considered writing a “blook”. “Blook“, for the uninitiated, is a book that is based on a blog. So you don’t really write a blook. You simply compile posts from your own blog, fix them in a logical order, write a foreword, and there it is! Back when I had considered the blook, I thought I didn’t have enough good posts on this blog. And then set myself a target of “another 200 blog posts”. I forget when I set this target. It doesn’t matter.

If I’ve written 2000 blog posts so far, I’m sure at least a 100 (5%) of them are pretty good, and good enough to share with a wider world than my readers? So this time, I’m seriously considering publishing a blook.

I’m looking for an editor to assist me in this exercise. The job of the editor is to go through my 2000 blog posts, and identify a 100 or so “good posts” (which are in a sense “timeless”) and figure out a way to compile and curate and put them together under  themes, perhaps, in order to compile a blook. I could possibly do it myself, but I might be biased, and attached in unhealthy ways to certain posts, so I’d prefer a trusted third party to take this up.

So if you think you can edit my blog into a blook, or know someone who can do that, please do let me know. I’m really serious about it this time. We can figure out a “structure” to compensate your efforts. And you will get editing credits for the blook.

A little celebratory speech before that: when I started writing in 2004, little did I know that I would hit 2000 blog posts one day. I thank all my readers, loyal and disloyal. I thank people who have cared to comment on this blog over the years (excluding the spambots), for it’s they who’ve kept me going. I thank people who’ve  brought up subjects from this blog for discussion in social gatherings. And last but not the least, I thank my wife, who I met through this blog (it’s predecessor to be precise), and who constantly berates me for not writing enough about her!

Oh, and don’t forget the blook!