Data ScienceCompleted

User Clustering for New Sales Push Notifications

Applied K-Means and DBSCAN clustering on user click history from the home page to segment customers for the New Sales Push Notifications project.

2025
1 months
6 people
Data Scientist
User Clustering for New Sales Push Notifications

Context & Background

At Choose, the New Sales Push Notification project aimed to increase engagement by moving away from sending the same "most-clicked" upcoming sale to all users. Instead, I sought to segment users based on their category click preferences on the home page. The initial objective was to build data-driven segments so each group could receive a push notification promoting the most-clicked sale in their preferred category, rather than a single generic sale sent to everyone.

Challenge

I initially attempted clustering using Machine Learning techniques such as K-Means and DBSCAN. However, the dataset contained 30 category features, leading to a high-dimensional space where clusters were neither compact nor well-separated. As a result, the algorithmic clustering did not produce actionable segments. The challenge was to create meaningful and stable clusters that could be directly applied for targeting, while avoiding excessive complexity.

Solution

I adopted a manual segmentation strategy guided by business-driven thresholds on user engagement metrics. We defined 22 distinct user segments based on dominant category preferences. For example, a "Kids" segment included users who frequently clicked on Kids-related sales. Each segment then received a push notification promoting the most-clicked sale within its preferred category from the Upcoming Sales section. This ensured relevance while maintaining operational simplicity.

Results & Impact

+2% increase in open rates
Achieved over 70% diversity in sales sent across users

Appendix

K-Means Clustering Visualization

The figure below illustrates the results of applying the K-Means algorithm to user click history data on the home page. • Number of clusters : 100 • Silhouette coefficient : Generally below 0.2, indicating low intra-cluster cohesion. • Davies–Bouldin index : Above 1.5, suggesting a poor trade-off between intra-cluster compactness and inter-cluster separation.
K-Means Clustering Visualization

Coverage Comparison: Baseline vs. Cluster-Based Segmentation

The chart below compares the coverage rate between two approaches for sending push notifications: Baseline (v1.0) – Sending the single most-clicked sale to all users. Cluster-based segmentation (v1.1) – Sending the most-clicked sale within each user cluster, based on category click preferences. Coverage definition: The ratio between the number of distinct sales sent in notifications and the total number of new sales available each day.
Coverage Comparison: Baseline vs. Cluster-Based Segmentation

Technologies Used

PythonSQLPolarsDBSCANK-MeansMatplotlibNumpySeabornJupyterDBT

Project Details

Company

Choose

Duration

1 months

Team Size

6 people

My Role

Data Scientist

Interested in Similar Work?

Let's discuss how I can help your team with data science and ML projects.

Get In Touch