Analyzing Stripe data with Python and Pandas
05 May 2019
As the authoritative source on how your business is performing, your Stripe account contains a wealth of data just waiting for analysis. In this post, I’ll show how to generate common subscription revenue metrics like MRR, churn, and subscription revenue by plan with Pandas, the Python Data Analysis Library.
Stripe offers its own tool for querying data with SQL, Stripe Sigma. It’s a great tool, but I’ve found that analytics queries can be awfully complex to write in SQL. SQL’s sweet spot is CRUD applications, not heavy analytics queries. These analytic queries are where Pandas excels in both cleanliness and speed.
Installing Jupyter Notebooks + Pandas
If you are new to Python, I suggest installing Jupyter Notebooks via Anaconda. This will install Pandas as well. Jupyter Notebooks gives you an interactive way to explore your data and share your analysis.
Export Stripe Invoices
First, we need to export our invoices from Stripe into a Pandas Dataframe. A Dataframe is like an Excel sheet in code. We can do this with just a couple of lines using the PetalData Python package (
pip install petaldata).
Create a new Jupter notebook. Copy and paste the following into the first cell:
See PetalData’s Stripe Docs for the full list of dataset operations. Note that downloading thousands of invoices from Stripe can take 30 minutes or more.
Calculating Stripe MRR
With our invoices dataframe loaded at
invoices.df, we can now calculate MRR in dollars over the entire invoice timerange. Copy and paste the following into a Jupyter notebook cell:
Which should display something like:
We can plot this as well on a line chart inside Jupyter notebooks:
Calculating Stripe Churn
We can view a list of customers that have churned in the current month:
To get the total value of the churn, just call
Calculating Revenue by Subscription Plan
You likely have several subscription plans configured in Stripe. To view revenue by plan over time:
I’ve just scratched the surface of your potential Stripe data science super powers. Once your data is in a Pandas Dataframe, there’s much more analysis you can perform:
- Use scikit-learn and K-Means Clustering to group similar customers together.
- See if you can predict customers likely to churn by their invoice history (for example, customers with downward-trending invoice amounts may be likely to churn) using a random forest classifier.
- Use prophet to forecast MRR over the next six months.
Python and Pandas opens up a world far beyond plain SQL and spreadsheet-based analysis.
Tutorials on using data science with your cloud app data, in your inbox. No marketing campaigns. No nonsense. Unsubscribe anytime.