Tuesday, 23 March 2010

Profiling using twitter

Swami Nikhilaananda thought:
"People say so much on social networking sites, that if we design a system to collect and analyse that data, we can build a model of that person and be able to judge how that person works"

This is just a random thought of mine, that is not complete. More will follow as I get a little clearer on this. Do post some ideas of yours... Oh, and let me know if any you guys are interested in pursuing this :P

Social networking sites like Orkut, Facebook and micro blogging site Twitter have taken the world by storm. They are immensely popular, with several thousands of people being members of these sites and growing networks.

Just imagine if we can collect information systematically and analyse them, process them and design a model of them. You don't need to do this manually. It is fairly simple to write programs to do data mining for you, and you can write programs that use machine learning to do the processing and analysis once you design it perfectly. This is what a lot of people fear about companies such as Google who have vast amounts of information about people. I found this as an exciting project. Of course, companies such as Google have huge amounts of information using which an even more comprehensive result can be achieved. We shall see them at the end.

First of all, what are the various things that we can derive from... say take tweets from twitter. It should be relatively easy to fetch tweets of any one person or groups, depending on search arguments.

People generally write tweets about what they are doing at the moment. They also write tweets about where they are, or where they went. They write their opinions, and they write several other bits and pieces of information.

Now, what are the things that we can see? We can observe the time at which the tweets are getting posted. If the tweets are posted uniformly throughout the day, we can assume that the person has a job working on a computer which has internet access throughout, and social networking sites are not blocked. If social networking sites are blocked, then the tweets would probably come via mobile, or would be clumped at time frames between end of day and before sleeping... You can make intelligent guesses about types of jobs based on more information.

If the person is tweeting about current affairs, then we can take key-words out of it, and run searches over the internet using search engines such as Google and Bing and check their first ten results which will include highly relevant results, and also other bits of information such as news, images, videos, etc. integrated with them. We can analyse these results, and understand how this person reacts to actual situations... For example, when reservation bills are passed in India, there will be a rise in Anti-Reservation tweets or pro, depending on the person's inclination. Or you will see a phenomenal increase in tweets about cricketers and teams during IPL seasons (but no so many when there is no IPL). Similarly for sports, games, political situations... anything that is in current affairs.

The person's tweets about current location or locations visited will give information to reasonably understand about types of places visited, and places visited. You can map this against holidays, weekends, etc. to see if leaves were taken or perform other analysis. What I am trying to say is the myriad things that can be done with sentences.

There are some people who tweet in proper English, following grammatical rules. And den der r sum who r uncnvntional... Some use the ellipses (...) a lot, while some use hyphens, commas, etc. The choice of words, their language, punctuations, etc. tells a lot about people. Lengths of tweets are also interesting to observe.

Tweets could also be totally random.

The people that you follow also tells a bit about you, your friends, your circle... Probably if this is integrated with other sites, it can be used to guess where you are from, college etc. For example, you are only on twitter, and a majority of people you follow are your classmates, who in turn are on facebook or orkut and are members of your college community there. It can be a reasonable guess that you belong to the same college. Things like that...

Sites such as Google have the bulk of information that you are feeding it, like your photos, personal information, things that you search for, your geographical location, the institutes that you are members of... Just imagine what they can do :-)

More to follow. This is just an initial thinking... Do post some ideas of yours...

Comments from Facebook:
Hemanth Pai
its already done buddy , how do u think u get those ads on the right of u r screen on FB :P

Nikhil Baliga
No I know :P Google does exactly the same thing for most appropriate ads. I am saying - Just imagine if you extend the same thing to beyond ads to understand how people are...

Ashvin Srinivasan
yeah most of them, but there are some ppl who say only what u choose to hear maga :P

Ashvin Srinivasan oh or rather what u wanna hear :P

Mithun R Shroff
Nice..Even I had thought about this a few months back..:)

Deepti Rao
gud thought!


Niranjan said...

good idea :) but modelling is difficult na.. it varies day to day..one may tweet regularly for 1 day..but the next day he may not..so 1 model for a person each day ah???

Nik said...

@Niranjan - Modelling is difficult, but not impossible. And models are not changed daily. They grow and evolve with increase of data. I think if you read a little bit about Artificial Neural Networks, you will understand what I am saying a lot better