How to setup/launch an A/B test

Notes: This part is handled by our team, if you are part of a studio, get in touch with your publishing manager to define the best tests together, and review results.

There are currently no standard parameter values for each game but we are working on preparing a standard naming convention and will share it soon. In the meantime, please reach out to your publishing manager to validate the list of parameters.

Only Homa publisher managers can start A/B tests. You will however be able to review the results.

Once the game is ready for N-testing, Homa team members can create A/B tests to determine whether changing different aspects of the game can increase revenue or engagement.

Here are some examples of tests that most game teams run:

Changing the interval between advertisements

Changing the price of an item - e.g. skin, bundle, or in-game item like a weapon

Change the UI/UX or artwork - e.g. make the shop icon a different colour, add visual notifications

Once the test reaches a conclusion, it’s possible to roll out the change to all players.

Requirements:

Make sure to have your remote parameters setup:

Creating an experiment

Go to N-testing

Click on your game

Click on Create new test and follow the form to create your test.

You can find more info about the settings you can define below:

Name of the value	Description	Guideline
Experiment name	The name will be displayed in the list of tests. We strongly recommend to use a descriptive name that indicates what’s being tested.	- Add the scope in the name of the test - e.g. [iOS WW] for iOS Worldwide - Use a clear title to quickly see what you are testing - e.g. Enemy speed
Scope [legacy only]	The platform and region of players included in the test.
Countries / OS [segmentation only]	The countries / OS that are targeted by the test
Target users [segmentation only]	Whether you want to only inlude New users (default) or test with / only on existing users	- For games with low retention, it is generally sufficient to focus on new users
Sample size	Proportion, in percentages, of the users who will be assigned to this test. Users will be randomly allocated to the test based on this target allocation.	- Make sure the number of users you get is sufficient to make a decision in the timeframe you expect - You will find some additional utility to help you gauge this in the form - You cannot allocate more than 100% of sample size across tests on the same country/OS or Scope
Target Metric	The metric that will be used to decide whether the test is a win or not (other metrics will still be displayed - but the statistical test will be more precise on that metric)	- Total LTV D3 generally makes a good tradeoff between reactivity and being a good indicator of long term performance
Maturation [segmentation only]	How long users are kept in the test (defines the metrics available)	- To get D14 metrics, you need maturation to be at least 14
Notes	Description of the test, which can be used to mention the parameters that were changed or add any mentions you’d like others to be aware of.	- This field is not mandatory but strongly recommended. - Notes added here you appear in the list of tests, in the Note column.
Parameters to test	Choose the parameters that you want to modify (between variants)
Segments [segmentation only]	Add segments that can have different overrides (on the parameters to test) between variants
Variant	You can add multiple variants. Users will fall in one of the variants and get overrides that you specify. These overrides are what define what you want to test	- Each variant reduces the number of users allocated to each variant and so the test will require longer to make a decision

Click on “Save experiment” to save your changes and configurations.

This will create a Draft experiment which can be activated by clicking on the three dot contextual menu and choosing Activate experiment. This will activate all test variants from the experiment and new users will be assigned to one of the groups randomly based on the sample size.

Analyze an A/B test

In the A/B test tab, the experiments are displayed and can be filtered according to status.

Expanding experiments, metrics are displayed per variants:

Metrics with less than 500 users are not shown

Metrics displayed in bold color reach statistical significance (careful - there is still a chance it is driven by noise. But definitely don’t trust grey numbers as signal)

Metrics are all cohorted from the user entry in the test and cumulative. Ex: Ad LTV D3 = the average cumulative ad revenue generated by users for the first 3 days after entering the test

The selection of metrics can be modified and saved using the view system

Multiple tabs are available to breakdown the results of each variant by a dimension (country, app version, UA network, segment, user type).

Each variant has a conclusion about whether the variant can be considered a win or not based on the conclusion metric. More information about the extent of the win and the probabilities is displayed on hover on the conclusion.

Apply a winning variant

When a variant is a clear win that needs to be applied to live traffic, the three dot contextual menu of the variant allows for rollout. From the modal, select the adequate scopes or segments to apply configuration.

This method is to be preferred over manually applying new configuration as it allows for tracking the rollout of successful experiments.

[For games using scopes]

Configuration can optionally be applied to existing users by using the “apply all players” tab.

Notes

If you have multiple tests which are in “running”, a new player is assigned in only one test

Data can be reviewed the day after the activation, but reaching a conclusive result will take a few days worth of users + the time to get to the target maturation (Ex: if your conclusion metric is total LTV D7, you need 7 + 3 days = 10 days to get 3 days worth of new users to get to the 7 day maturation required to compute the metric).

Metrics with less than 500 users are not shown

Metrics displayed in bold color reach statistical significance (careful - there is still a chance it is driven by noise. But definitely don’t trust grey numbers as signal)

You will need to have a minimum of installations in order to have a valid result, if it’s not the case, then you will have a message like below on your test. In order to avoid it, please modify the size to a value lower than the previous one.

Table of contents

Requirements:

Creating an experiment

Analyze an A/B test

Apply a winning variant

Notes