Abstract

With the emergence and popularity of social media, people now usually engage in disparate Online Social Networks (OSNs) simultaneously for different purposes. The cross-network activities together record people's integral online footprints and reflect their demographics as well as interests from different perspectives. Cross-network user modeling gradually lays the foundation for various personalized services, such as recommendation and retrieval. However, this topic is pretty new and there was no much research done on cross-network user modeling.

Dataset

To facilitate more research on this topic, here we introduce a cross-network dataset with user account linkage between YouTube and Twitter, created by Institute of Automation, Chinese Academy of Sciences. The dataset contains rich user metadata and historical behaviors in YouTube and Twitter, including basic user profiles in YouTube and Twitter, their social relations and tweeting data in Twitter, their three kinds of video behaviors in YouTube as well as rich video metadata for all the collected videos. Since users' tweeting data are not permitted to share to the wide public by Twitter company, we just give out users' topcial distributions extracted from their tweeting data via standard topic model (LDA). Note that the other data are given in the form of data table of mysql.

Data Collection

In order to capture the user account linkage between different social networks, we started from the aggregation site Google+, which encourages users to share their user accounts on other OSNs in the Google profile, and obtained the different network accounts (YouTube, Twitter, Facebook and Flickr) for the same persons. Then the Twitter and YouTube user data of these persons are obtained via their respective APIs (i.e., Twitter Rest APIs and YouTube public APIs).

The basic statistics of our dataset is presented in table below:

#YouTube users
#Twitter users
#Overlapped users
#Videos
#Average videos for each YouTube user #Average friends for each Twitter user
 38,377  39,659  11,687  2,280,129  93.60  891.1

Application

Other than the topic in our paper, our dataset can be also used in many other scenarios. Here we list some potential research topics that can be conducted on our released dataset:

  1. Comprehensive user modeling. User modeling by aggregating user information from multiple social networks.
  2. Cross-network recommendation. Recommendation by transferring available user information in another network, two directions: from Twitter to YouTube and from YouTube to Twitter.
  3. Cross-network correlation pattern extraction. By bridging with the overlapped users, it is possible to discover potential correlation patterns between entities from different networks. For example, the gossip tweets in Twitter may be correlated with luxury products in Amazon.
  4. Cross-network user identification. The alignment of user accounts across multiple social networks.

Downloads

Google+ Data

Twitter Data

YouTube Data

Citation

1. Mining Cross-network Association for YouTube Video Promotion


Ming Yan, Jitao Sang and Changsheng Xu
ACM Multimedia 2014, Orlando, Florida, USA
[project] [pdf] [slides] [poster]

2. YouTube Video Promotion by Cross-network Association: @Britney to Advertise gangnam style


Ming Yan, Jitao Sang and Changsheng Xu
IEEE Transaction on Multimedia (TMM), 2015
[pdf]

3. Unified YouTube Video Recommendation via Cross-network Collaboration


Ming Yan, Jitao Sang and Changsheng Xu
ACM ICMR 2015, Shanghai, China
Best student paper award
[project] [pdf] [slides]