Abstract

With the explosion of Internet bandwidth, there are more and more social media sites (e.g., Flickr, YouTube, Facebook, and Google News) for people to capture and share social media data online. As a result, a popular event that is happening around us and around the world can spread very fast, and there are substantial amounts of events with multi-modality (e.g., images, videos, and texts) in Internet. Most of these social events from different news medias are related with some specific topics, and it is time-consuming to manually identify or cluster them. Cross-collection social event analysis can discover collective and subjective information from the vast amounts of multiple cross-collection sources in social news medias, and the mining results can be helpful for many applications such as social event detection, social event tracking and social event prediction.

Dataset

To facilitate more research on social event analysis, here we introduce a cross-collection social event dataset, created by Institute of Automation, Chinese Academy of Sciences. The evaluation dataset is constructed from online social news media sources. These websites are all in English and cover a long period including rich relations text metadata and image metadata about the hot social event “Arab spring”. We have collected eleven countries information including Algeria, Bahrain, Egypt, Iraq, Jordan, Lebanon, Libya, Saudi, Syria, Tunisia, Yemen . Note that the data are given in the form of data table of mysql.

Data Collection

In order to capture the hot topic information from the newspaper documents. We crawled news published in the websites of New York Times, Sputnik, and Hurriyet Daily News, which are important news agencies in U.S., Russia, and Turkey, respectively. Totally, we collect 40,532 new documents from March 2011 to December 2015. Then the rich textual metadata and image metadata are captured via their APIs.

The basic statistics of our dataset is presented in table below:

 Country/Num   Algeria   Bahrain   Egypt   Iraq   Jordan   Lebanon   Libya   Saudi   Syria   Tunisia   Yemen 
 Nytimes   127   324   2080   1696   381   265   1515   592   3557   342   766 
 Sputnik   82   181   1506   2167   253   272   1380   1505   9330   224   1195 
 Daily News   78   143   1629   1925   157   277   744   485   5738   278   415 
 Total   287   648   5215   4831   791   694   3639   2582   18625   844   2376 

Downloads

News media documents of the hot social event “Arab spring” in eleven countries

Citation

Multi-modal Multi-view Topic-opinion Mining for Social Event Analysis


Shengsheng Qian, Tianzhu Zhang, Changsheng Xu.
ACM Multimedia 2016.
[project] [pdf] [slides] [poster]