Characterizing Topics in Social Media

  • Extracted key response features that depict the dynamics of the conversation under different subreddits by analyzing a Reddit dataset containing 5K subreddits and 887M comments
  • Utilized machine learning to evaluate the effectiveness of the extracted response features, which show a 90% accuracy in predicting the genre of Reddit submissions
  • Clustered posts within a subreddit with response features, K-means, and PCA to identify the dominant topics within each subreddit
  • Applied the derived response features to accurately detect outlier posts and efficiently predict the viral posts

Download the paper for this project: Reddit

Bowen Gong
Bowen Gong
Quant Software Developer in Citibank

My research interests include Network Science, Machine Learning, Natural Language Processing, Parallel Computing.

Related