9 분 소요

[Notice] [Increasing_Customer_Retention]

Increase customer retention with A/B testing


Data

Data is (https://www.kaggle.com/yufengsui/mobile-games-ab-testing)

  • userid - An identification number that identifies individual users.

  • version - You can see which user belongs to the experimental group or control group. (gate_30, gate_40)

  • sum_gamerounds - Number of rounds played by users in 14 days after first install.

  • retention_1 - Whether the user returned within 1 day of installation.

  • retention_7 - Whether the user returned within 7 days of installation.


problem definition

  • In the Cookie Cats game, when a specific stage is reached, the stage is locked.

  • In case of Area Locked, you can get 3 keys by playing a special edition game to get Keys, ask a Facebook friend, or purchase a paid item and open it immediately.

  • When locking in the stage at which stage, it is necessary to decide which is best for user retention.


Data exploration

import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")
df = pd.read_csv('Data/cookie_cats.csv')
print(df.shape)
df.tail() 
(90189, 5)
userid version sum_gamerounds retention_1 retention_7
90184 9999441 gate_40 97 True False
90185 9999479 gate_40 30 False False
90186 9999710 gate_30 28 True False
90187 9999768 gate_40 51 True False
90188 9999861 gate_40 16 False False
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 90189 entries, 0 to 90188
Data columns (total 5 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   userid          90189 non-null  int64 
 1   version         90189 non-null  object
 2   sum_gamerounds  90189 non-null  int64 
 3   retention_1     90189 non-null  bool  
 4   retention_7     90189 non-null  bool  
dtypes: bool(2), int64(2), object(1)
memory usage: 2.2+ MB
df.groupby("version").count()
userid sum_gamerounds retention_1 retention_7
version
gate_30 44700 44700 44700 44700
gate_40 45489 45489 45489 45489
sns.boxenplot(data=df, y="sum_gamerounds")
<AxesSubplot:ylabel='sum_gamerounds'>

df.loc[df["sum_gamerounds"] > 45000]
userid version sum_gamerounds retention_1 retention_7
57702 6390605 gate_30 49854 False True
# Remove users who have played more than 45000. 
df = df[df["sum_gamerounds"] < 45000 ]
print(df.shape)
df.tail()
(90188, 5)
userid version sum_gamerounds retention_1 retention_7
90184 9999441 gate_40 97 True False
90185 9999479 gate_40 30 False False
90186 9999710 gate_30 28 True False
90187 9999768 gate_40 51 True False
90188 9999861 gate_40 16 False False
# Look at the percentile.
df["sum_gamerounds"].describe()
count    90188.000000
mean        51.320253
std        102.682719
min          0.000000
25%          5.000000
50%         16.000000
75%         51.000000
max       2961.000000
Name: sum_gamerounds, dtype: float64
sns.boxenplot(data=df, y="sum_gamerounds")
<AxesSubplot:ylabel='sum_gamerounds'>


Data analysis

# Count the number of users for each game run.
plot_df = df.groupby("sum_gamerounds")["userid"].count()
plot_df
sum_gamerounds
0       3994
1       5538
2       4606
3       3958
4       3629
        ... 
2251       1
2294       1
2438       1
2640       1
2961       1
Name: userid, Length: 941, dtype: int64
ax = plot_df[:100].plot(figsize=(10,6))
ax.set_title("The number of players that played 0-100 game rounds during the first week")
ax.set_ylabel("Number of Players")
ax.set_xlabel('# Game rounds')
Text(0.5, 0, '# Game rounds')

sns.distplot(df["sum_gamerounds"])
<AxesSubplot:xlabel='sum_gamerounds', ylabel='Density'>

  • You can see that the number of users who have installed the game and never run it is significant.

  • Some users can see how addicted (?) to the game after running it enough in the first week of installation.

  • In the video game industry, 1-day retention is a key metric for how fun and addicting a game is.

  • With a high 1-day retention, you can easily grow your subscriber base.

# Look at the average of 1-day retention. 
df["retention_1"].mean()
0.4452144409455803
  • You can see that less than half of the users played the game again the day after installation.
# Look at the average of 1-day retention by group.
df.groupby('version')['retention_1'].mean()
version
gate_30    0.448198
gate_40    0.442283
Name: retention_1, dtype: float64
  • Simply comparing the averages between groups, the number of plays is higher when the gate is 30 (44.8%) than when it is 40 (44.2%).

  • It’s a small difference, but it will affect retention and, ultimately, long-term returns.

  • By the way, can this alone convince you that putting the gate at 30 is better than putting it at 40?

# Look at the average of 7-day retention.
df["retention_7"].mean()
0.1860557945624695
# Look at the average of 7-day retention by group.
df.groupby("version")["retention_7"].mean()
version
gate_30    0.190183
gate_40    0.182000
Name: retention_7, dtype: float64
  • Simply comparing the means between groups, the survival rate is higher with the gate 30 (19.0%) than with the gate 40 (18.2%).

  • It’s a small difference, but it will affect retention and, ultimately, long-term returns.

  • The difference is larger when the 7th is compared to the 1st. But does this alone convince me that putting the gate at 30 is better than putting it at 40?


Bootstrapping

# Create a list of bootstrapped means values for each AB group.
boot_1d = []
for i in range(1000):
    boot_mean = df.sample(frac = 1,replace = True).groupby('version')['retention_1'].mean()
    boot_1d.append(boot_mean)
    
# Convert list to DataFrame.
boot_1d = pd.DataFrame(boot_1d)
    
# A Kernel Density Estimate plot of the bootstrap distributions
boot_1d.plot(kind='density')
<AxesSubplot:ylabel='Density'>

  • The above two distributions express the bootstrap uncertainty that 1 day retention can have for both groups AB.

  • Although small, there seems to be evidence of a difference.

  • Let’s plot the % difference to take a closer look.

boot_1d['diff'] = (boot_1d.gate_30 - boot_1d.gate_40)/boot_1d.gate_40*100

ax = boot_1d['diff'].plot(kind='density')
ax.set_title('% difference in 1-day retention between the two AB-groups')

print('High probability of 1-day retention when the gate is at level 30:',(boot_1d['diff'] > 0).mean())
High probability of 1-day retention when the gate is at level 30: 0.958

  • In the diagram above, the most likely % difference is around 1%-2%, with 95% of the distribution above 0%, favoring gates at level 30.

  • Bootstrap analysis shows that the daily retention rate is more likely to be higher when the gate is at level 30.

  • However, most players haven’t reached level 30 yet, since players only played for one day.

  • That is, most users would not have an effect on retention depending on whether the gate was at 30 or not.

  • After playing for a week, you should also check the 7-day retention as more players reach levels 30 and 40.

df.groupby('version')['retention_7'].sum() / df.groupby('version')['retention_7'].count()
version
gate_30    0.190183
gate_40    0.182000
Name: retention_7, dtype: float64
  • As with 1-day retention, 7-day retention is lower at gate level 40 (18.2%) than at gate level 30 (19.0%).

  • This difference is larger than the 1-day retention, probably because more players had time to open the first gate.

  • Full 7-day retention is lower than Full 1-day retention. This is because fewer people play the game a week after installation than a day after installation.

  • As before, let’s use bootstrap analysis to see if there are any differences between the AB groups.

boot_7d = []
for i in range(500):
    boot_mean = df.sample(frac=1,replace=True).groupby('version')['retention_7'].mean()
    boot_7d.append(boot_mean)
    
boot_7d = pd.DataFrame(boot_7d)

boot_7d['diff'] = (boot_7d.gate_30 - boot_7d.gate_40)/boot_7d.gate_40*100

ax = boot_7d['diff'].plot(kind='density')
ax.set_title('% difference in 7-day retention between the two AB-groups')

print('High probability of 7-day retention when the gate is at level 30:',(boot_7d['diff'] > 0).mean())
High probability of 7-day retention when the gate is at level 30: 1.0

  • Bootstrap results indicate that there is strong evidence for a higher 7-day retention when the gate is at level 30 than when it is at level 40.

  • Bottom line, gates should not be moved from level 30 to level 40 to increase retention


T-test

df_30 = df[df["version"] == "gate_30"] 
print(df_30.shape)
df_30.tail()
(44699, 5)
userid version sum_gamerounds retention_1 retention_7
90179 9998576 gate_30 14 True False
90180 9998623 gate_30 7 False False
90182 9999178 gate_30 21 True False
90183 9999349 gate_30 10 False False
90186 9999710 gate_30 28 True False
df_40 = df[df["version"] == "gate_40"] 
print(df_40.shape)
df_40.tail()
(45489, 5)
userid version sum_gamerounds retention_1 retention_7
90181 9998733 gate_40 10 True False
90184 9999441 gate_40 97 True False
90185 9999479 gate_40 30 False False
90187 9999768 gate_40 51 True False
90188 9999861 gate_40 16 False False
from scipy import stats
# Independent Sample T-Test (2 Sample T-Test)

tTestResult = stats.ttest_ind(df_30['retention_1'], df_40['retention_1'])

tTestResultDiffVar = stats.ttest_ind(df_30['retention_1'], df_40['retention_1'], equal_var=False)

tTestResult
Ttest_indResult(statistic=1.7871153372992439, pvalue=0.07392220630182522)
tTestResult = stats.ttest_ind(df_30['retention_7'], df_40['retention_1'])
tTestResultDiffVar = stats.ttest_ind(df_30['retention_7'], df_40['retention_1'], equal_var=False)

tTestResult
Ttest_indResult(statistic=-84.48321935747556, pvalue=0.0)

T Score
  • A large t-score means that the two groups are different.

  • A small t-score means that the two groups are similar.

P-values
  • The p-value is 0.05 at the 5% level.

  • Small p-values are recommended. This means that the data did not happen by chance.

  • For example, a p-value of 0.01 means that there is only a 1% chance that the result is by chance.

  • In most cases, a p-value of the 0.05 (5%) level is taken as a reference. In this case, it is said to be statistically significant.

T-test Reference

  • Looking at the above analysis results, it can be seen that there is no significant difference in retention_1 between the two groups and there is a significant difference in retention_7.

  • Again, it is not accidental that gate30 has a higher retention_7 than gate40.

  • In other words, the gate at 30 is a better choice for retention 7 dimensions than at 40.


chi-square

  • In fact, the t-test was analyzed with retention set to 0,1.

  • However, retention is actually a categorical variable.

A chi-square test is a better method than this method.

  • The chi-square test is also used to test whether a categorical random variable 𝑋 is independent or correlated with another categorical random variable 𝑌.

  • When the chi-square test is used to check for independence, it is called the chi-square test of independence.

  • If two random variables are independent, then the 𝑌 distribution for 𝑋=0 and the 𝑌 distribution for 𝑋=1 must be the same.

  • In other words, the distribution of Y is the same for both versions 30 and 40.

  • Therefore, if the chi-square test is adopted with the null hypothesis that the sample sets come from the same probability distribution, the two random variables are independent.

  • If rejected, then the two random variables are correlated.

  • In other words, if the chi-square test result is rejected, the value of retention will change depending on whether the gate is 30 or 40.

  • If each 𝑌 distribution according to the value of 𝑋 is given in the form of a two-dimensional table (contingency table), the difference between the distribution in the case of independence and the actual y sample size is calculated as a test statistic.

  • If this value is large enough, 𝑋 and 𝑌 are correlated.

df.groupby('version').sum()
userid sum_gamerounds retention_1 retention_7
version
gate_30 222937707836 2294941 20034 8501
gate_40 227857702576 2333530 20119 8279
df.groupby('version').count()
userid sum_gamerounds retention_1 retention_7
version
gate_30 44699 44699 44699 44699
gate_40 45489 45489 45489 45489

Create a contingency table for each version.

  retention_1=False retention_1=True

|——|—|—|

version=gate30 (44699-20034) 20034
version=gate40 (45489-20119) 20119
  retention_7=False retention_7=True

|——|—|—|

version=gate30 (44699-8501) 8501
version=gate40 (45489-8279) 8279
import scipy as sp
obs1 = np.array([[20119, (45489-20119)], [20034, (44699-20034)]])
sp.stats.chi2_contingency(obs1)
(3.1698355431707994,
 0.07500999897705699,
 1,
 array([[20252.35970417, 25236.64029583],
        [19900.64029583, 24798.35970417]]))
  • The significance probability of the chi-square independent test is 7.5%.

  • That is, 𝑋 and 𝑌 cannot be said to be correlated.

obs2 = np.array([[8501, (44699-8501)], [8279, (45489-8279)]])
sp.stats.chi2_contingency(obs2)
(9.915275528905669,
 0.0016391259678654423,
 1,
 array([[ 8316.50796115, 36382.49203885],
        [ 8463.49203885, 37025.50796115]]))
  • The significance probability of the chi-square independent test is 0.1%.

  • In other words, we can say that 𝑋 and 𝑌 are correlated.

  • Retention after 7 days is correlated with whether the gate is at 30 or 40.

  • Gate must be kept at 30 to maintain retention after 7 days.


conclusion

The gate should be kept at 30.

More to think about

  • Actually, there are various metrics to consider other than retention.

  • In-app purchases, number of games played, referrer due to friend invitation, etc.

  • In this data, only retention is given, so we focused on one thing and analyzed it.

  • It is important for service operators and planners to determine really important metrics and evaluate test results based on them.

댓글남기기