Received 03.08.2024, Revised 19.10.2024, Accepted 26.12.2024

Improved A/B testing acceleration methods for parametric hypothesis testing: T-test comparison with CUPED, CUPED++ and Bayesian Estimator

Artur Markov

The study aimed to compare statistical analysis methods to improve the testing of alternatives. The study evaluated four main methods: the classic T-test, the conventional and advanced method of Controlled Experiments Using Pre-Experimental Data (CUPED), and the Bayesian Estimator. The main results included a demonstration of the A/B testing process, and the described statistical analysis methods included detailed characteristics and examples of use. The simulations and practical application revealed that the T-test provides high accuracy with small samples, but its effectiveness decreases with increasing sample size due to high resource requirements. The calculator for this method demonstrated effectiveness in simple tasks but had limitations with large data. The conventional CUPED method has shown increased accuracy due to variation correction, but its effectiveness decreases when working with large and complex data sets. The written program for this method has shown to be effective in cases where the previous data is well represented, but its capabilities are limited when processing large data sets. The improved version provided a significant improvement in both accuracy and processing speed, especially for large datasets, thanks to advanced modelling and optimisation. The code results confirmed that this method is highly efficient for complex experiments, particularly when processing large amounts of data. Moreover, the Bayesian Estimator demonstrated high accuracy due to the integration of prior knowledge but required more computational resources and time. The platform used for this method demonstrated the ability to account for uncertainty yet required complex model settings. The results highlighted the importance of selecting the appropriate statistical analysis method depending on the scale and complexity of the data to ensure optimal accuracy and efficiency of testing

statistical analysis; correction of variations; effectiveness of approaches; data processing; modelling of experiments
119-131
Markov, A. (2024). Improved A/B testing acceleration methods for parametric hypothesis testing: T-test comparison with CUPED, CUPED++ and Bayesian Estimator. Information Technologies and Computer Engineering, 21(3), 119-131. https://doi.org/10.63341/vitce/3.2024.119

References

[1] Allard, C., & Marchand, É. (2024). Bayesian and Minimax estimators of loss. Japanese Journal of Statistics and Data Sciencedoi: 10.1007/s42081-024-00261-2.

[2] Baik, S.M., Byon, E., & Ko, Y.M. (2023). Distributionally robust stratified sampling for stochastic simulations with multiple uncertain input models. ArXiv. doi: 10.48550/arXiv.2306.09020.

[3] Bertolino, F., Manca, M., Musio, M., Racugno, W., & Ventura, L. (2024). A new Bayesian discrepancy measure. Journal of the Italian Statistical Society, 33, 381-405. doi: 10.1007/s10260-024-00745-1.

[4] Chauvet, L.A., & Cruz, D.M. (2024). Computational modeling of decision-making in substance abusers: Testing Bechara’s hypotheses. Frontiers in Psychology, 15, article number 1281082. doi: 10.3389/fpsyg.2024.1281082.

[5] Cho, Y.W., Chow, S.-M., Marini, C.M., & Martire, L.M. (2024). Multilevel latent differential structural equation model with short time series and time-varying covariates: A comparison of frequentist and Bayesian estimators. Multivariate Behavioral Research, 59(5), 934-956. doi: 10.1080/00273171.2024.2347959.

[6] Cissé, A., Evangelopoulos, X., Carruthers, S., Gusev, V.V., & Cooper, A.I. (2024). HypBO: Accelerating black-box scientific experiments using experts’ hypotheses. In K. Larson (Ed.), Proceedings of the thirty-third international joint conference on artificial intelligence (pp. 3881-3889). Vienna: IJCAI. doi: 10.24963/ijcai.2024/429.

[7] Deiri, E. (2021). Expected Bayesian estimator and hierarchical Bayesian estimator for the parameter of a Rayleigh distribution reliability system under the progressive type-II data sample. Mathematical Researches, 7(3), 527-544. doi: 10.52547/mmr.7.3.527.

[8] Deng, A., Yuan, L.-H., & Salama-Manteau, A. (2021). Variance reduction for experiments with one-sided triggering using CUPED. ArXiv. doi: 10.48550/arXiv.2112.13299.

[9] Dogan, O., Taspinar, S., & Bera, A.K. (2020). A Bayesian robust chi-squared test for testing simple hypotheses. Journal of Econometrics, 222(2), 933-958. doi: 10.1016/j.jeconom.2020.07.046.

[10] Duthie, B. (2024). Fundamental statistical concepts and techniques in the biological and environmental sciences. New York: Chapman and Hall. doi: 10.1201/9781032692388.

[11] Francq, C., & Zakoïan, J.-M. (2022). Testing hypotheses on the innovations distribution in semi-parametric conditional volatility models. Journal of Financial Econometrics, 21(5), 1443-1482. doi: 10.1093/jjfinec/nbac011.

[12] Gu, X., Zhu, X., Zhang, L., & Pan, J.-H. (2023). Testing informative hypotheses in factor analysis models using bayes factors. Psychological Methodsdoi: 10.1037/met0000627.

[13] Jin, Y., & Ba, S. (2022). Toward optimal variance reduction in online controlled experiments. Technometrics, 65(2), 231-242. doi: 10.1080/00401706.2022.2142670.

[14] Kachiashvili, K. (2018). Constrained Bayesian methods of hypotheses testing: A new philosophy of hypotheses testing in parallel and sequential experiments. New York: Nova Science Publishers.

[15] Kachiashvili, K., Kvaratskhelia, V., & Prangishvili, A. (2023). Comparison of constrained Bayesian and classical methods of testing statistical hypotheses in sequential experiments. In M. Zgurovsky & N. Pankratova (Eds.), System analysis and artificial intelligence (pp. 289-306). Cham: Springer. doi: 10.1007/978-3-031-37450-0_17.

[16] Kalchenko, V. (2018). Review of penetration testing methods for assesing the protection of computer systems. Control, Navigation and Communication Systems, 4(50), 109-114. doi: 10.26906/SUNZ.2018.4.109.

[17] Kelter, R. (2020). Bayesian and frequentist testing for differences between two groups with parametric and nonparametric two-sample tests. Wiley Interdisciplinary Reviews: Computational Statistics, 13(6), article number e1523. doi: 10.1002/wics.1523.

[18] Khambir, V. (2024). Automation of mobile application testing processes. Computer-Integrated Technologies: Education, Science, Production, 55, 213-224. doi: 10.36910/6775-2524-0560-2024-55-27.

[19] Khatami, S. (2020). Evaluating catchment models as multiple working hypotheses under uncertainty. (Doctoral dissertation, University of Melbourne, Melbourne, Australia). doi: 10.31237/osf.io/agcbd.

[20] Liley, J., & Wallace, C. (2018). Improved consistency in estimates of conditional false discovery rates increases power relative to both existing methods and parametric estimators. BioRxivdoi: 10.1101/414326.

[21] Lu, Z.-H. (2020). An improved closed procedure for testing multiple hypotheses. Statistics in Medicine, 39(26), 3772-3786. doi: 10.1002/sim.8692.

[22] Ochieng, D. (2024). Multiple testing of interval composite null hypotheses using randomized p-values. Statistical Papersdoi: 10.1007/s00362-024-01591-9.

[23] Ramesh, Bhagyamma, G., & Wasiq, M.R. (2023). Exploring hypotheses in scientific inquiry: Challenges, formulation, and testing. Vaikunta Baliga College of Law, 8, 87-120.

[24] Raykov, T., Doebler, P., & Marcoulides, G.A. (2022). Applications of Bayesian confirmatory factor analysis in behavioral measurement: Strong convergence of a Bayesian parameter estimator. Measurement Interdisciplinary Research and Perspectives, 20(4), 215-227. doi: 10.1080/15366367.2021.2005959.

[25] Sekulovski, N., & Hoijtink, H. (2023). A default bayes factor for testing null hypotheses about the fixed effects of linear two-level models. Psychological Methodsdoi: 10.1037/met0000573.

[26] Shportko, O.V., & Mushyn M.M. (2023). Using program testing opportunities on remote servers for comparing the efficiency of combinatory optimization methods. Automation of Technological and Business Processes, 15(1). doi: 10.15673/atbp.v15i1.2497.

[27] Tang, J., & Dette, H. (2024). Simultaneous semiparametric inference for single-index models. ArXiv. doi: 10.48550/ arXiv.2407.01874.

[28] Wang, J.J. (2022). Approximate Bayesian estimator for the random-coefficients model. Communication in Statistics – Simulation and Computation, 53(6), 2579-2594. doi: 10.1080/03610918.2022.2093372.

[29] Woo, S. (2023). Design methodology – parametric accelerated life testing. In S. Woo (Ed.), Design of mechanical systems (pp. 305-327). Cham: Springer. doi: 10.1007/978-3-031-28938-5_7.

[30] Zhou, H., & Zou, H. (2024). A non-parametric box-cox approach to robustifying high-dimensional linear hypothesis testing. ArXiv. doi: 10.48550/arXiv.2405.12816.