Analyzing Hubspot data with Python and Pandas

Derek Haynes
07 May 2019

The data within your Hubspot contacts contains a wealth of information that indicates how your business is performing. It’s begging for analysis.

In this post, I’ll show how to generate common business KPIs from your Hubspot contacts with Pandas, the Python Data Analysis Library.

Why Python+Pandas?

You could manually export Hubspot contacts via their API to an SQL database, then perform SQL queries on the data. However, I’ve found that analytics queries can be awfully complex to write in SQL. SQL’s sweet spot is CRUD applications, not heavy analytics queries. These analytic queries are where Pandas excels in both cleanliness and speed.

Installing Jupyter Notebooks + Pandas

If you are new to Python, I suggest installing Jupyter Notebooks via Anaconda. This will install Pandas as well. Jupyter Notebooks gives you an interactive way to explore your data and share your analysis.

Export Hubspot Contacts

First, we need to export our Hubspot contacts into a Pandas Dataframe. A Dataframe is like an Excel sheet in code. We can do this with just a couple of lines using the PetalData Python package (pip install petaldata).

Create a new Jupyter notebook. Copy and paste the following into the first cell:

See PetalData’s Hubspot Docs for the full list of dataset operations. Note that downloading thousands of contacts from Hubspot can take 30 minutes or more.

New contacts by month

With our contacts dataframe Access via df if you used the export contacts code above. we can now calculate the number of signups by month. Copy and paste the following into a Jupyter notebook cell:

Which should display something like:

Total contacts over time

We can plot the cumulative number of contacts over time inside Jupyter notebooks:

Contacts by State

We can see the geographic distribution of contacts by state (collected from their IP address):

Going deeper

I’ve just scratched the surface of your potential Hubspot data science super powers. Once your data is in a Pandas Dataframe, there’s much more analysis you can perform:

  • Use scikit-learn and K-Means Clustering to create clusters of similar contacts.
  • See if you can predict potential high-value contacts using a random forest classifier.
  • Use prophet to forecast contact over the next six months and identify seasonal patterns For example, which day of the week is the highest volume signup day? in signup trends.

Python and Pandas opens up a world far beyond plain SQL and spreadsheet-based analysis.

Email Newsletter

Tutorials on using data science with your cloud app data, in your inbox. No marketing campaigns. No nonsense. Unsubscribe anytime.