Introduction to “Profile Age”

The time a user has been with a particular product or ‘age on books’ is a measure of the users affinity. If a user has used a product for a long time, we can expect the user to use the product again in future (we apply both the concept of using the product and age on books of the user). However, can the ‘age of a user’ (where ‘age’ is time the user has been with the product) be used as a measure of growth? One measure of growth at Firefox is the number of downloads (e.g. on a day). When a user downloads Firefox, if a profile doesn’t exist, a new one is created. The day this is created is called “profile creation date (pcd)”. Then for any given day $d( > pcd)$, the age of the profile is $d - pcd$ (measured in days). Once a profile has been created, there is no tangible signal that profile has stopped using Firefox. A user could stop using Firefox for months on end (probably signifying they have left Firefox) but we would not know that till several months go by. This is quite unlike say a user of Netflix who would call to end their subscription.

The “Average Age of Profiles” Intution

My intuition told me that if we did not have any new downloads going forward the average age of the profiles at any time in the future would keep increasing. Since a profile never leaves Firefox, for any given day $d$ we can compute the average age across all $N$ profiles $\sum_i (d - pcd_i) / N$ which clearly increases with time. If we have zero growth, our average age increases; if we have a huge growth, then the average age is a combination of large values and small ones (the latter corresponding to new profiles) and the average age will decrease. Thus a decrease in average age indicates a growth in new profiles and an increase indicates a drop in growth.

With this average age concept we can observe 3 things:

  1. the growth of a group across time
  2. from the measure we get an idea of how long a typical profile has been with Firefox
  3. when we compare two curves of average age (across time) for two different groups (e.g. Poland and US), a lower average age one group indicates that that population have a higher proportion of newer users.

I was enamored with this idea till I did the math. And let this post be a learning: before using or pushing for the use of a measure, reason about it. Analyze how it will behave in different contexts. The problem with the above is that though 1,2, and 3 hold, the increase or decrease (1) is very much connected to the rate of growth in downloads. And without some simulations, we would have invested in this measure and after much time come to the conclusion that this measure, as defined, is not useful.

The Analysis

I plotted a smooth curve to new downloads from May 5, 2016 till May 5, 2017. This curve is effectively the trend of the time series of downloads and we will work with this instead of actual daily downloads.

The following curve plots the average age of profiles (with pcd after than May 5, 2016) versus time. It is always increasing with some changes in slope. I didn’t expect it to be immune to the change in downloads…

If we assume, for the sake of modeling the data, that the number of downloads at time $x$ is function $g(x)$ ( a positive function). Then the total number of new profiles in time $t$ is $\int_{0}^{t} g(x)dx$ .The total age since created is given by $\int_{0}^t (t - x)g(x) dx$. Hence the average age of profiles at time $t$ is given by

(I assume we start our population size i.e. total users at time $t=0$ to be 0. It doesn’t really change the asymptotic results.)

The above can be rewritten as (using integration by parts)

For $g(x) = x^k, k>0$, we have

for a decreasing function $g(x) = 1/(1+x)^k$ ($k$>2) we have

for the rapidly increasing $g(x) = exp(x)$ we have

In the all the above cases when $g$

  • is polynomial and increasing, $A(t)$ is an increasing linear line
  • is decreasing (of the above form), $A(t)$ is an increasing linear-ish line
  • and $A(t)$ is non linear when the rate of downloads is exponential (not typical after some time and definitely not for us)

Even in case where $g$ which increases and then decreases

$A(t)$ is still an always increasing line!

Ultimately for a lot of typical curves for download trends, $A(t)$ is an increasing line making it very boring for analyst and product manager alike.

Moral of the Story

I’ve been playing with this idea for months now. I didnt get beyond the “this is a good idea and plays out well intuitively”. Before telling my manager we should use it across the org, Figs(0) and Figs(1) puzzled me which led to the analysis and the subsequent death of the measure.

So as a data scientist, before trying to convince everyone of a good idea for a new measure, sit down, take a breath and at the very least reason about this measure: how does it behave under different conditions? how does it help a product manager?