Exporting Intercom data for some quick data science

Derek Haynes
01 Apr 2019

intercom data science

As the beating heart of many SaaS businesses, Intercom contains a treasure of user behavior. From user geographic locations to the number of logins to monthly spend, a data scientist can perform trajectory-changing gymnastics using the data within Intercom. For example, you can calculate:

  • Signups over time
  • Lead-to-user conversion rate
  • Churn by signup cohort
  • Monthly spend by segment

…and that’s just the start! In this post, I’ll provide a gentle introduction for exporting Intercom data to a Pandas Dataframe for analysis.

Tools

If you’re a data hobbyist, you know Google sheets. Google Sheets is great for basic analysis on smaller datasets, but it falls down when analyzing thousands of rows and columns. Instead, we’re going to use a lean data science stack that will handle your Intercom data without breaking a sweat:

  • Google Colab - a free, shared, remote notebook to run our data analysis. This means there’s no local environments to setup. Jupyter Notebook is a good, free local option, but you’ll need to install and setup a Python environment.
  • Python 3 and a standard set of data science libraries. We’ll install these libraries in our notebook.

Exporting data

There are two approaches Intercom’s UI also provides a way to manually export a list of users or companies as a CSV. This cannot be triggered without navigating through the UI. for programmatic access to your Intercom data set:

  1. A custom Intercom API script - write your own script to fetch data via Intercom’s API.
  2. PetalData - export your Intercom data in a data science-ready format with a one-line command.

In this post, I’ll get you started on a custom script to export Intercom data into a Pandas Dataframe. When we’re done, you may decide to offload this data wrangling to PetalData 😉.

Generate an Intercom access token

Intercom does not have an official Python SDK. Instead, we’ll use python-intercom to fetch data.

Before we start coding, create a private app on Intercom, copying down the access token:

access token

Intercom API “Hello World”

Let’s verify we can access the Intercom API. Create a Google Colab Notebook, add the code below to the first cell, and verify it runs without error:

This installs the python-intercom package, initializes an Intercom client, and prints the total number of users in your account. The end of your output should look something like this:

Number of users to export: 10000

Add users to a list

Next we’ll iterate over the users in your account and append them to the users_list. Create a new code cell with the following content:

Create a Pandas Dataframe

It’s hard to extract information from a list of dictionaries. We’ll create a Pandas Dataframe from this list. Once in a Dataframe, we can do all sorts of data science magic. Creating the dataframe is simple:

You should see output like:

(1000,33)

The above output indicates the Dataframe has 1,000 rows and 33 columns.

Set the Dataframe index

By default, the Dataframe creates a new index that simply increments with the row number. Instead, let’s use the user_id as the index. Create a new cell with the following code:

Convert timestamps to datetime

There are a number of timestamp columns (ex: created_at) that are provided as seconds since epoch. These should be converted to proper datetime objects. Create a new cell with the following code:

Left to the reader

Before inspecting the Dataframe, let’s cover what is not in our exporting logic. Our code above is enough to get started, but it is not production-ready. I’ve left the following to you:

  • Scrolling - A BadRequestError is thrown if your app has over 10,000 users. Intercom’s scroll API needs to be used instead. python-intercom has support for this, but usage is not documented.
  • Error Handling - It’s very common to run into Network timeouts and other errors when iterating over data sets. There is no error handling for this.
  • Limiting Memory Usage - This script builds a large list of users in memory. This could consume a lot of memory if your app has many thousands of users.
  • Rate Limits - There’s no special handling to stay under Intercom’s rate limits.
  • The following attributes are objects or lists and are not parsed:
    • Custom Attributes
    • Location Data
    • Avatar
    • Companies
    • Social Profiles
    • Segments
    • Tags

Yes, this is the time for the PetalData pitch. PetalData handles all of the important (yet tedious) work above when you call petaldata.Intercom().load_companies(). If you want to bypass this tedious and brittle work, signup for our BETA.

Inspect the schema

We can take a peek at the Dataframe’s schema by calling df_users.info(). You should see output like:

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1000 entries, 0 to 999
Data columns (total 33 columns):
anonymous                   1000 non-null bool
app_id                      1000 non-null object
avatar                      1000 non-null object
companies                   1000 non-null object
created_at                  1000 non-null datetime64[ns]
custom_attributes           1000 non-null object
do_not_track                0 non-null object
...
utm_term                    0 non-null object
dtypes: bool(4), datetime64[ns](5), int64(1), object(23)
memory usage: 238.3+ KB

Inspect the data

You can view signups over time with the following code:

Which generates a chart like:

signups chart

How active are your users? A histogram of of user session counts would help! Just type df_users.session_count.hist():

hist

How about getting a list of your power users? Query for users in the 80th percentile of number of sessions:

df_users[df_users.session_count >= df_users.session_count.quantile(0.80)].email

This is just scratching the surface of the types of insights you can obtain by loading your Intercom data into a Pandas Dataframe.

Wrapping up

A 2016 Data Science Report showed that 60% of a data scientist’s time is spent on cleaning and organizing data. Just 9% is spent on mining data for patterns. This is sad - we want to extract insights from data sets like Intercom, not spend our time in data preprocessing.

Spend more of your time generating insights by trying PetalData.

Email Newsletter

Tutorials on using data science with your cloud app data, in your inbox. No marketing campaigns. No nonsense. Unsubscribe anytime.