The time a user has been with a particular product or ‘age on books’ is a
measure of the users affinity. If a user has used a product for a long
time, we can expect the user to use the product again in future (we apply
both the concept of *using* the product and age on books of the
user). However, can the ‘age of a user’ (where ‘age’ is time the user has
been with the product) be used as a measure of growth? One measure of
growth at Firefox is the number of downloads (e.g. on a day). When a user
downloads Firefox, if a profile doesn’t exist, a new one is created. The
day this is created is called “profile creation date (pcd)”. Then for any
given day $d( > pcd)$, the age of the profile is $d - pcd$ (measured in
days). Once a profile has been created, there is no tangible signal that
profile has stopped using Firefox. A user could stop using Firefox for
months on end (probably signifying they have left Firefox) but we would not
know that till several months go by. This is quite unlike say a user of
Netflix who would call to end their subscription.

My intuition told me that if we did not have any new downloads going forward the average age of the profiles at any time in the future would keep increasing. Since a profile never leaves Firefox, for any given day $d$ we can compute the average age across all $N$ profiles $\sum_i (d - pcd_i) / N$ which clearly increases with time. If we have zero growth, our average age increases; if we have a huge growth, then the average age is a combination of large values and small ones (the latter corresponding to new profiles) and the average age will decrease. Thus a decrease in average age indicates a growth in new profiles and an increase indicates a drop in growth.

With this average age concept we can observe 3 things:

- the growth of a group across time
- from the measure we get an idea of how long a typical profile has been with Firefox
- when we compare two curves of average age (across time) for two different groups (e.g. Poland and US), a lower average age one group indicates that that population have a higher proportion of newer users.

I was enamored with this idea till I did the math. And let this post be a
learning: before using or pushing for the use of a measure, *reason* about
it. Analyze how it will behave in different contexts. The problem with the
above is that though 1,2, and 3 hold, the increase or decrease (1) is very
much connected to the *rate* of growth in downloads. And without some
simulations, we would have invested in this measure and after much time
come to the conclusion that this measure, as defined, is not useful.

I plotted a smooth curve to new downloads from May 5, 2016 till May 5, 2017. This curve is effectively the trend of the time series of downloads and we will work with this instead of actual daily downloads.

The following curve plots the average age of profiles (with pcd after than May 5, 2016) versus time. It is always increasing with some changes in slope. I didn’t expect it to be immune to the change in downloads…

If we assume, for the sake of modeling the data, that the number of downloads at time $x$ is function $g(x)$ ( a positive function). Then the total number of new profiles in time $t$ is $\int_{0}^{t} g(x)dx$ .The total age since created is given by $\int_{0}^t (t - x)g(x) dx$. Hence the average age of profiles at time $t$ is given by

(I assume we start our population size i.e. total users at time $t=0$ to be 0. It doesn’t really change the asymptotic results.)

The above can be rewritten as (using integration by parts)

For $g(x) = x^k, k>0$, we have

for a decreasing function $g(x) = 1/(1+x)^k$ ($k$>2) we have

for the rapidly increasing $g(x) = exp(x)$ we have

In the all the above cases when $g$

- is polynomial and increasing, $A(t)$ is an increasing linear line
- is decreasing (of the above form), $A(t)$ is an increasing linear-ish line
- and $A(t)$ is non linear when the rate of downloads is exponential (not typical after some time and definitely not for us)

Even in case where $g$ which increases and then decreases

$A(t)$ is still an always increasing line!

Ultimately for a lot of typical curves for download trends, $A(t)$ is an increasing line making it very boring for analyst and product manager alike.

I’ve been playing with this idea for months now. I didnt get beyond the “this is a good idea and plays out well intuitively”. Before telling my manager we should use it across the org, Figs(0) and Figs(1) puzzled me which led to the analysis and the subsequent death of the measure.

So as a data scientist, before trying to convince everyone of a good
idea for a new measure, sit down, take a breath and at the very least
*reason* about this measure: how does it behave under different conditions?
how does it help a product manager?