Data Metrics
meaningful metrics -> daily sessions per user
time spent/session (the user is idle on the page)
reactions, comments and shares in newfeed content. Reactions would include: likes, hearts, sad face, angry face etc.
average number of interactions a user has per visit to the Newsfeed.
click through rate for ads; this would help us understand whether ads are relevant
Grow business -> user retention or user acquisition
user retention -> increasing user engagement -> metrics with time threshold ( Average likes per user per day)
Improve product -> feature demand that users already doing something despite a complicated user flow. Simplifying the flow will most likely improve your target metrics
optimize a long term metric like retention rate or lifetime value -> find a short term metric that can predict the long term one
pick variables -> pick a combination of user characteristics (age, sex, country, etc.) and behavioral ones (device, they came from ads/SEO/direct link, session time, etc.)
Engagement on FB -> proportion of users who take at least one action per day
response rate on Quora -> percentage of questions that get at least one response with at least 3 up-votes within a day
Airbnb -> if you want to go to a given place, you can do it
uber new UI -> AB test in two comparable markets (To identify required sample size, choose power, significance level, minimum difference between test and control, and std deviation)
novelty effect -> control for this by subsetting by drivers for which it is the first experience. Look at test results between new users in the control group vs new users in test group
A/B test win but cost of change -> Human labor costs (engineering time to make the change), opportunity-cost (not working on something else with a possibly higher expected value), Risk of bugs
missing value in a varibale (Uber trips without rider review) -> missing value is important information. predict missing value or use -1 as missing value
e-commerce demand -> going to a site and searching for "jeans", ads click-through-rate (CTR)
e-commerce supply -> # conversions/# searches only considering people who used filters in their search. Or people whose session time is above a given value.
site funnel -> home page, search results, click on item, buy it
predict Y (Instagram usage), how to find out whether X (here mobile Operative System - OS ) is a discriminant variable or not -> 1. build a decision tree using user-related variables + OS to predict Instagram usage. 2. building two models: one including OS and one without. 3. generate simulated datasets where you adjust the distributions of all other variables. So that now you have the same age, country, etc, distribution for both iOS and Android
subscription retention -> percentage of users who don't unsubscribe within the first 12 months too long -> proportion of people who unsubscribed or never used my product within the first week / three weeks
user demographic vs behavioral characteristics -> 1. Looking at a user browsing history gives information about what a user is interested in buying regardless of whether it is a gift or for herself. 2. Timing Browsing data tells the moment in which a certain user is thinking about buying a product.
acquiring new users -> new sign ups per day from users who send at least 1 message within the first 2 days
retain current users -> engagement -> average messages per day per user
new feature -> find something that people are already doing, but in a complicated way requiring multiple steps. An example of this could be identifying that the last message of a conversation is about calling Uber, ordering food, or using any kind of other app. And a possible next step could be to integrate that functionality from within WhatsApp, kind of like Google Map can be called from inside WhatsApp.
user lifetime value -> pay for a click -> revenue coming from that user within the first year -> using short term data to predict -> find features user location, user device, operative system, type of browser, source
recommendation -> shared connection, shared cluster (work friends, high school friends, university friends)
predict fraud -> Device ID, IP address, Ratings, Price, pictures, description, Browsing behavior that led to the seller creating the account.
A/B test drawbacks -> never be as similar markets, no full independence. Check one metric that's not supposed to be affected by your test. Make sure that during the test keeps behaving similarly for both markets
customer service performance measurement -> average user lifetime value (1 year) -> user bought within 1 year after the ticket -> Build a model to predict -> response time and user feedback feature
whether to add new feature -> 1. good for site? engagement 2. demand. already doing it. 3. simply current flow.
Two step authentication -> ROC threshold -> cost of false negatives (actual frauds happening) and value of true negatives (value of a legitimate user) -> A/B testing, is the number of bad actors that two-step is blocking worth the number of good actors that the site is losing since it is harder to log-in.
why a metric is down? -> year over year metrics -> numerator and denominator -> if numberator down -> new user are not liking as much as the usual ones or number of users is normal and number of likes suddenly down. -> if new users are less engaged, find feature to predict "up week" new user and previous week old user -> way more users from China this week. This might depend on a marketing campaign there that got a huge number of users, but these users are less engaged, as often when users come from sudden marketing campaigns. Or that all these new users come from very few different IP addresses. That would mean that all these users are probably fake accounts.
30 tests and 1 test (20 data segment countries and 1 segment country win) wins with p-value 0.04 -> Bonferroni correction, simply dividing 0.05 by the number of tests and this becomes the new threshold for significance. -> make the change only if test is better and p-value were less than 0.05/30.
Test wins by 5%, Will that metric actually go up by ~5%, more, or less? -> Control group numbers are likely inflated and it is likely that, if applied to all users, this change will lead to a larger gain than 5% vs old UI.
cost of a false positive is way higher than false negative -> recruiting process
cost of a false positive is way lower than false negative -> cancer detection
how long I should run an A/B test? -> 1. Significance level, usually 0.05. 2. Power, usually 0.8. 3. Expected standard deviation of the change in the metric. 4. Minimum effect size you are interested in detecting via your test. If the final number is less than 14 days, you still want to run the test for 14 days in order to reliably capture weekly patterns.
We found a drop in pictures uploads. How to find out the reason? segment users by various features, such as browser, device, country, etc. Then you assume that you discovered that one segment dropped to zero. So you say it is likely a bug and, finally, explain where the bug could be.
Isolate the impact of the algorithm and the UI change -> Version 1 is the old version. Version 2 is the site with new Feature and machine learning model. Version 3 is the site with the People You May Know Feature, but suggestions are random or history-based model.
detect fake information school -> 1. email validation negatively affect legitimate users 2. user info from their profile + how they interacted with LinkedIn. (how many connection requests they sent, how they were distributed over time, acceptance rate, whether they visited other people profiles before sending the connection request.) build clusters.
small dataset -> 1. cross-validate 2. bootstrapping your original dataset. bagging
predict job change -> monthly -> user profile data, data about when you took the snapshot, user behavior on the site, and some external data about job demand.
response time of an inquiry at Airbnb -> percentage of responses within 16 hrs is better than average response time considering only responses within 16 hrs because percentage consider all the population including people who never response.
re-run the same A/B test -> the underlying distribution of users has changed. early adopter vs new users
identify clickbait -> high than usual CTR + medium term (say 2 weeks) change in CTR
revenue -> 1. Increase CTR by better targeting. 2. Increase number of page views. 3. maximizing probability of conversion or working with the advertisers to improve the user flow after people click on an ad.
H0 mu = 20
Ha mu >= 20
A/B test power-> p(reject H0 | H0 false) = 1 - p(not reject H0 | H0 false) = 1 - Type II error
A/B test alpha 0.05 -> Significance level. p-value lower than significance level reject null hypothesis
A/B test p-value -> take sample mean = 25, P(mu >= sample mean| H0 true) < 0.05 reject null hypothesis
Uber:
Monthly active platform consumer
number of unique consumers who completed a ride or received an eats on platform in a given month
Trips:
number of completed ride or uber meal deliveries in a period
Gross bookings:
total dollar value
Lyft:
Active Rider
Revenue per active rider
Netflix:
Paid membership subscription
average avenue per user
Pinterest:
Mau
authenticated user who visit at least once during 30 days
ARPU
total revenue divided by average number of mau in a period
Facebook:
DAU:
A logged in user who visit at least one of the family product once on a given day
MauL
A logged in user who visit at least one of the family product once in the last 30 days
ARPU
Expedia:
Room night growth
gross booking: total retail value of transaction booked
revenue per room night
Spotify:
Total monthly active user
premium subscribers
ad-supported MAUs
ARPU
Twitter:
Monetizable DAU
Logged in user that are able to show ads
Snapchat:
ARPU
DAU registered user who opens application at least once during 24 hours