Characterizing Topics in Social Media
- Extracted key response features that depict the dynamics of the conversation under different subreddits by analyzing a Reddit dataset containing 5K subreddits and 887M comments
- Utilized machine learning to evaluate the effectiveness of the extracted response features, which show a 90% accuracy in predicting the genre of Reddit submissions
- Clustered posts within a subreddit with response features, K-means, and PCA to identify the dominant topics within each subreddit
- Applied the derived response features to accurately detect outlier posts and efficiently predict the viral posts
Download the paper for this project: Reddit