The following are a few of the things that I learned through the Growth Marketing degree at CXL.
When calculating a sample size when running a test, for example between two brand new pages. You should starting with it brand new, there is a flow and most students follow this chart, by checking off the board or the chart was going to give instructions. How can we start running this as an A/B test?
You do it by calculating a sample size. You put in your baseline conversion rate, the minimal uplift you want to see — the control is the baseline, which in this example, will be the old website. But when starting the baseline, let’s look at both websites as brand new. So you created two versions of a brand new website. So then you need to calculate it on the go, so you set the test live, and after two or three or four weeks you’ll see what the conversion rate for both of them are. Then you take the lower converting website’s average conversion rate, set that as a baseline, and use that as a base of your calculations.
Which tool should be used for cross-browser testing and who should do the work?
Typically it’s developers who would do this, but you would use a tool like crossbrowsertesting.com. There are like 25 tools that do this stuff. So in analytics, you’ll see which browser or device is underperforming. Then after, you’ll tell the QA team something like: “Hey,”why don’t you go through our site” using whatever browser there is like, Firefox 27,” and see if you can find anything odd.” Depending on your site structure, budget and capabilities, you can also run automated QA testing to discover bugs. Again, typically developers handle this stuff.
What are some popular tools? So for mouse tracking stuff, there’s Hotjar — which is the best at this, although, it’s not perfect. As a testing tool, there’s VWO. There’s also SessionCam, which is used by enterprises. Decibel Insight, is also used by enterprises, as well as Clicktale. If you’re not enterprise, it is best to use Hotjar. For analytics, Formisimo is a great tool — it’s the most detailed. The following are more testing tools: Optimizely, VWO, and Sentient Ascend is an interesting tool. Interesting tests have been run through Conductrics. What you can do with Conductrics is to set up a test, then once you make it live, it starts learning which particular segment responds well. Which variation is best, it’s kind of like machine learning. For example, it may tell you “Hey! 25 to 35 year olds males from Idaho” are responding really well to this potato treatment over here.” Then it automatically increases how much traffic of that segment it’s serving to one certain variation.
With machine learning tools like Conductrics, it is a machine that will automatically learn this stuff and tell you what works. Back to the tool question, as someone who spends a lot of time scaling, optimizing our native apps. Tools that make it easier to run app on for testing could be Optimizely, which has an SDK, and VWO. There’s also Apptimize, a dedicated mobile testing tool.
When sending out surveys, open ended questions are key to getting responses. “What’s the typical response rate, when you send out a survey?” If these are your previous customers, their response rate is going to be better than your average subscriber and the exact response rate depends on your relationship, and how often you email these people. If they get an email every day, they’re less likely to respond. A response rate of 10 to 20% is typical. Usually, if you want to get 200 responses, you would send the email to 2000 people, as a ballpark. If you don’t get enough people responding to the survey, do another batch. It is best to send out the survey to people who recently bought something for the very first time, so the recent first-time buyers who don’t have seven year relationship with you, who usually don’t buy because of loyalty, or because of the customer service experience, they may buy because the site was good.
What was the benchmark… for like win rate? So basically the question is, “What’s a good win rate for testing?” This really depends on how optimized your site is? Well optimized sites like Booking.com have less than one in 10, so less than 10% win rate, but it’s an extremely optimized site. It could definitely be an internal benchmark. Do you typically segment mobile in your tests or do it separate? Separate is best, different tests for desktop and different tests for mobile. For two reasons:
- One is that different stuff works. What works for mobile doesn’t work on desktop and vice versa.
- Two is that unless your traffic is really 50–50, like mobiles desktop, like say it’s 70–30, and you run the same tests for both and you think you’ll do post-test analysis to see was it significant for mobile as well? But you run into this problem that after three weeks you have enough sample size for the desktop segment, but the mobile segment needs two more weeks of traffic. You won’t have a good end result.
It is always best to split the tests for mobile and for desktop. If you’re contracting people to perform some QA tests on your site, go through, find errors, and think about how you might compensate them? For example, do you think it would be a good idea to kind of pay them like a set amount for every error or issue they find? If this is the case, QA might spend five days and then find zero issues. Therefore, you can’t pay per errors that are found, it’s not like pay per performance. QA should simply be paid for doing their investigation. They usually bill you hourly if you want to use a service.
Testlio is a company that does QA testing. There’s also MyCrowd. Both are alike in professional tester market places. Or you can find someone with QA testing experience using a freelance portals. If you push to an internal network like an integration environment where we have some internal user testers to provide feedback, to close that loop. Versus organizations that push directly to production like Spotify. Are there any differences on the overall quality of the product, or the quality of the feedback on the end? The question here is that basically is their a difference between the quality of internal testers, versus external testers? Whenever you push anything live, you definitely want to test on the internal side, on a test server to do some basic usability testing ,if it’s a major change in the ways users use the site, spend time testing the product. There’s a likelihood that what you came up with is total crap and if you release it live, it will kill the results. So if you have resources for internal testing, I would definitely do that first, before using external people. Keep in mind that internal people don’t see everything, cause they’re so used to the site.
What type of tools do you use to actually document the results under predictions? One tool is called GrowthHacker Projects. There are dedicated testing, program management tools like Iridian — Iridian is a German tool. Trello is also popular, the simplicity of it. And then there was one new one that was released called MiaProva.
How many users do you need in order to effectively test for QA? Is that a hundred users using every day, is it — It’s more like how many people sign up for your product or purchases, based on your business model? The absolute minimum where you might even start thinking of one test a month pace would be like 500 a month, 500 purchases a month. Before that, just focus on changing stuff and seeing what happens. You need to do the research to figure out what the problems are, but change is you just about implementing, and just making sure your analytics are configured to measure every little micro change in behavior as well. In order to use a sample size calculator to figure out how long we should run a test or if it’s worth running a test. With Optimizely’s new documentation on the Stats Engine, it turned the whole model around. So on Optimizely when they switched to the new Stats Engine, the percentage of false positives went way down. They require very large sample sizes now so they’re the best tool in the market for giving you adequate sample size info because it shows you, that you need a hundred thousand more people in this test before showing a declare significance.
I look forward to learning more next week!