Wednesday, September 4, 2013

Big data analysis in people search, Really challenging.

Nowadays, more and more people, from both industry and academy, are talking about big data. its popularity is like the fanaticism brought by MJ's new album if he's still alive. It is no exaggerating to say that we are now living in an era of "Big Data": science, engineering and technology are producing increasingly large data streams, with petabyte and exabyte scales becoming increasingly common. Big data presents opportunities and also perils. On the optimistic side, it gives us a pretty good opportunity to scale existing theoretical principles and also learning algorithms from modest-size data sets to massive data sets, thus, big data give us a fantastic platform and resources to verify the success of those principles and algorithms; while on the other side, there are some big challenges also brought by big data. It amplify the errors and their effects in existing technologies, and also, efficiency, space, energy are major issues we need to tackle in big data analysis. 
As the social network website are booming since later 90s and early 00s, millions of people join these kind of website like facebook and twitter each year, as the official statistic data of facebook says, it has a billion active user at the end of 2012. People are looking for and sharing all kinds of information on these website, and an obvious trend leaded by social network is that people are much more willing to post and share their personal information online. As a result, anyone wants to find his friends and try to get their recent updates, he can get all of these information by just typing his friends' names online, for example, on google or on some social network websites. Moreover, as this kind of requirement becomes more popular and general, an advanced people search engine is very useful. 
Recently, I'm investigating people search. I felt that neither google or social network website like facebook cannot exactly meet my requirement, actually, there is a wonderful people search engine which can aggregate a person's profile around the internet, not only from social network website, but also from all kinds of personal pages, even news pages online. Its name is called "whova",  provides advanced people search ability which is built upon proprietary big data analytics and mining technology. To precisely aggregate one's personal information online is very challenging, because:
1) Firstly, as we have mentioned above, a single person's information can be scattered around many websites, for example, I may ever submitted my personal information, such as, name, education, affiliation etc on facebook, and also on other website, eg linkedin. To precisely determine that different profiles information parsed from different website belong to a same person is very difficult.
2) Secondly, it may be some kind easy to get profile information from social network website, since the information show on these websites are structural. However, for general pages, an intelligent parser must be carefully designed to inference personal profile from those complex text paragraphs on a page, as we know, semantic analysis is a pretty challenging open problem.