Exporting Intercom data for some quick data science
01 Apr 2019
As the beating heart of many SaaS businesses, Intercom contains a treasure of user behavior. From user geographic locations to the number of logins to monthly spend, a data scientist can perform trajectory-changing gymnastics using the data within Intercom. For example, you can calculate:
- Signups over time
- Lead-to-user conversion rate
- Churn by signup cohort
- Monthly spend by segment
…and that’s just the start! In this post, I’ll provide a gentle introduction for exporting Intercom data to a Pandas Dataframe for analysis.
If you’re a data hobbyist, you know Google sheets. Google Sheets is great for basic analysis on smaller datasets, but it falls down when analyzing thousands of rows and columns. Instead, we’re going to use a lean data science stack that will handle your Intercom data without breaking a sweat:
- Google Colab - a free, shared, remote notebook to run our data analysis. This means there’s no local environments to setup. Jupyter Notebook is a good, free local option, but you’ll need to install and setup a Python environment.
- Python 3 and a standard set of data science libraries. We’ll install these libraries in our notebook.
There are two approaches Intercom’s UI also provides a way to manually export a list of users or companies as a CSV. This cannot be triggered without navigating through the UI. for programmatic access to your Intercom data set:
- A custom Intercom API script - write your own script to fetch data via Intercom’s API.
- PetalData - export your Intercom data in a data science-ready format with a one-line command.
Generate an Intercom access token
Intercom does not have an official Python SDK. Instead, we’ll use python-intercom to fetch data.
Before we start coding, create a private app on Intercom, copying down the access token:
Intercom API “Hello World”
Let’s verify we can access the Intercom API. Create a Google Colab Notebook, add the code below to the first cell, and verify it runs without error:
This installs the
python-intercom package, initializes an Intercom client, and prints the total number of users in your account. The end of your output should look something like this:
Number of users to export: 10000
Add users to a list
Next we’ll iterate over the users in your account and append them to the
users_list. Create a new code cell with the following content:
Create a Pandas Dataframe
It’s hard to extract information from a list of dictionaries. We’ll create a Pandas Dataframe from this list. Once in a Dataframe, we can do all sorts of data science magic. Creating the dataframe is simple:
You should see output like:
The above output indicates the Dataframe has 1,000 rows and 33 columns.
Set the Dataframe index
By default, the Dataframe creates a new index that simply increments with the row number. Instead, let’s use the
user_id as the index. Create a new cell with the following code:
Convert timestamps to datetime
There are a number of timestamp columns (ex:
created_at) that are provided as seconds since epoch. These should be converted to proper
datetime objects. Create a new cell with the following code:
Left to the reader
Before inspecting the Dataframe, let’s cover what is not in our exporting logic. Our code above is enough to get started, but it is not production-ready. I’ve left the following to you:
- Scrolling - A
BadRequestErroris thrown if your app has over 10,000 users. Intercom’s scroll API needs to be used instead.
python-intercomhas support for this, but usage is not documented.
- Error Handling - It’s very common to run into Network timeouts and other errors when iterating over data sets. There is no error handling for this.
- Limiting Memory Usage - This script builds a large list of users in memory. This could consume a lot of memory if your app has many thousands of users.
- Rate Limits - There’s no special handling to stay under Intercom’s rate limits.
- The following attributes are objects or lists and are not parsed:
- Custom Attributes
- Location Data
- Social Profiles
Yes, this is the time for the PetalData pitch. PetalData handles all of the important (yet tedious) work above when you call
petaldata.Intercom().load_companies(). If you want to bypass this tedious and brittle work, signup for our BETA.
Inspect the schema
We can take a peek at the Dataframe’s schema by calling
df_users.info(). You should see output like:
<class 'pandas.core.frame.DataFrame'> Int64Index: 1000 entries, 0 to 999 Data columns (total 33 columns): anonymous 1000 non-null bool app_id 1000 non-null object avatar 1000 non-null object companies 1000 non-null object created_at 1000 non-null datetime64[ns] custom_attributes 1000 non-null object do_not_track 0 non-null object ... utm_term 0 non-null object dtypes: bool(4), datetime64[ns](5), int64(1), object(23) memory usage: 238.3+ KB
Inspect the data
You can view signups over time with the following code:
Which generates a chart like:
How active are your users? A histogram of of user session counts would help! Just type
How about getting a list of your power users? Query for users in the 80th percentile of number of sessions:
df_users[df_users.session_count >= df_users.session_count.quantile(0.80)].email
This is just scratching the surface of the types of insights you can obtain by loading your Intercom data into a Pandas Dataframe.
A 2016 Data Science Report showed that 60% of a data scientist’s time is spent on cleaning and organizing data. Just 9% is spent on mining data for patterns. This is sad - we want to extract insights from data sets like Intercom, not spend our time in data preprocessing.
Spend more of your time generating insights by trying PetalData.
Tutorials on using data science with your cloud app data, in your inbox. No marketing campaigns. No nonsense. Unsubscribe anytime.