How can we generate synthetic phone call durations?

Here you find a brief tutorial for generating synthetic data of mobile phone call durations using the TLAC model. The code for python can be downloaded here.

The first step is to  generate a set of users according to the MetaDist distribution. Each user i is represented by two parameters: rho_i and beta_i.

Real data Synthetic data (MetaDist)


We have extracted the parameters of the MetaDist distribution for 4 months of  the real dataset. The parameters are described in the following table:




To generate a set of 10,000 users using the parameters of the second month, you have to execute the following code:

users = gen_TLAC_data(10000, 2)

The variable 'users' contains a set of 10000 pairs (rho_i,beta_i). Note that when you execute this code, a file 'synth-metadist.dat' will be saved with the content of 'users'.

To generate a set of 4,000  phone call durations for a random user from the set 'users', execute the following code:

calls_random_user = generate_TLAC_from_users(4000, users)

If user '1234' is chosen, a file 'synth-tlac-user1234.dat' will be generated containing all of his phone call durations. Verify that the calls durations follow the TLAC distribution by plotting the odds ratio. The odds ratio should be a straight line in log-log scales:

plot_odds_ratio(calls_random_user)



To generate 4,000 thousand call durations for all the users from the set 'users', execute the following code:

for idx in range(8990, 9000):
        generate_TLAC_from_users(1000, users, idx)

Again, a file 'synth-tlac-user[idx].dat' will be generated containing all  phone call durations of user 'idx'.

Reference: VAZ DE MELO, P. O. S. ; AKOGLU, Leman ; FALOUTSOS, Christos ; LOUREIRO, Antonio Alfredo Ferreira . Surprising Patterns for the Call Duration Distribution of Mobile Phone Users, 2010, Barcelona. Machine Learning and Knowledge Discovery in Databases, 2010. v. 6323/2.