INFERENCE-TIME SCALING AS A UNIVERSAL PRINCIPLE IN MACHINE LEARNING: CLASSIFICATION AND CROSS-DOMAIN ANALYSIS
DOI:
https://doi.org/10.35546/kntu2078-4481.2026.1.28Keywords:
inference-time scaling, test-time compute, machine learning, generative models, chain-of-thought, diffusion models, flow matching, ensemble methods, Monte Carlo tree search, adaptive computationAbstract
This paper proposes a formal definition and classification of inference-time scaling methods across machine learning. Inference-time scaling – spending additional computation at prediction time to improve output quality – appears throughout the field but has not been unified within a single framework. Methods as varied as ensemble averaging, Monte Carlo tree search, test-time augmentation, chain-of-thought reasoning in large language models, iterative denoising in diffusion models, and ODE solving in flow matching share one structural property: quality improves with additional inference computation, subject to diminishing returns. Unfortunately, these methods are studied by separate communities, and no existing taxonomy spans more than a single domain. We argue that it blocks the sharing and enrichment of methods across different areas of machine learning. We propose a classification organized by computational topology – the structure of the additional computation performed at prediction time – comprising six types: sequential refinement, parallel sampling with aggregation, tree/graph search, test-time model adaptation, guided generation, and adaptive computation routing. Each method is decomposed into generation, selection, and allocation policies, providing a uniform analytical language. The analysis identifies four cross-domain invariants: a universally sublinear cost-quality curve, a verifier bottleneck limiting selection-based approaches, a sequential–parallel duality across all domains examined, and method migration between domains once their shared structure is recognized.
References
Kaplan J., McCandlish S., Henighan T. J., Brown T. B., Chess B., Child R., Gray S., Radford A., Wu J., Amodei D. Scaling Laws for Neural Language Models. ArXiv. 2020. arXiv:2001.08361. DOI: https://doi.org/10.48550/arXiv.2001.08361
Hoffmann J., Borgeaud S., Mensch A., Buchatskaya E., Cai T., Rutherford E.,... Sifre L. An empirical analysis of compute-optimal large language model training. Advances in neural information processing systems. 2022. Vol. 35, pp. 30016-30030. DOI: https://dl.acm.org/doi/10.5555/3600270.3602446
Wei J., Wang X., Schuurmans D., Bosma M., Xia F., Chi E.,... Zhou D. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems. 2022. Vol. 35, pp. 24824-24837. DOI: https://doi.org/10.48550/arXiv.2201.11903
Ho J., Jain A., Abbeel P. Denoising diffusion probabilistic models. Advances in neural information processing systems. 2022. Vol. 33, pp. 6840-6851. DOI: https://doi.org/10.48550/arXiv.2006.11239
Lipman Y., Chen R. T., Ben-Hamu H., Nickel M., Le M. Flow matching for generative modeling. arXiv. 2022. arXiv:2210.02747. DOI: https://doi.org/10.48550/arXiv.2210.02747
Silver D., Huang A., Maddison C. J., Guez A., Sifre L., Van Den Driessche G.,... Hassabis D. Mastering the game of Go with deep neural networks and tree search. Nature. 2026. Vol. 529(7587), pp. 484-489. DOI: https://doi.org/10.1038/nature16961
Breiman L. Random forests. Machine learning. 2001. Vol. 45(1), pp. 5-32. DOI: https://doi.org/10.1023/A:1010933404324
Welleck S., Bertsch A., Finlayson M., Schoelkopf H., Xie A., Neubig G., Kulikov I., Harchaoui Z. From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models. ArXiv. 2024. abs/2406.16838. DOI: https://doi.org/10.48550/arXiv.2406.16838
Balachandran V., Chen J., Chen L., Garg S., Joshi N., Lara Y.,... Yousefi S. Inference-time scaling for complex tasks: Where we stand and what lies ahead. arXiv. 2025. arXiv:2504.00294. DOI: https://doi.org/10.48550/arXiv.2504.00294
Krizhevsky A., Sutskever I., Hinton G. E. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems. 2012. Vol. 25. DOI: https://doi.org/10.1145/3065386
Gal Y., Ghahramani Z. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning. PMLR. 2016. pp. 1050-1059. DOI: https://dl.acm.org/doi/10.5555/3045390.3045502
Brown T., Mann B., Ryder N., Subbiah M., Kaplan J. D., Dhariwal P.,... Amodei D. Language models are few-shot learners. Advances in neural information processing systems. 2020. Vol. 33, pp. 1877-1901. DOI: https://doi.org/10.48550/arXiv.2005.14165
Бондар В. В., Бабенко В. Г. Масштабування обчислень під час генерації як універсальний принцип для генеративних моделей. Телекомунікаційні та інформаційні технології. 2025. № 4. С. 229–234. DOI: https://doi.org/10.31673/2412-4338.2025.048926
Bondar V., Babenko V., Trembovetskyi R., Korobeinyk Y., Dzyuba V. Deep generative models as the probability transformation functions. arXiv. 2025. arXiv:2506.17171. DOI: https://doi.org/10.48550/arXiv.2506.17171
Snell C.V., Lee J., Xu K., Kumar A. Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters. ArXiv. 2024. abs/2408.03314. DOI: https://doi.org/10.48550/arXiv.2408.03314
Wang X., Wei J., Schuurmans D., Le Q., Chi E., Narang S.,... Zhou D. Self-consistency improves chain of thought reasoning in language models. arXiv. 2022. arXiv:2203.11171. DOI: https://doi.org/10.48550/arXiv.2203.11171
Yao S., Yu D., Zhao J., Shafran I., Griffiths T., Cao Y., Narasimhan K. Tree of thoughts: Deliberate problem solving with large language models. Advances in neural information processing systems. 2023. Vol. 36, pp. 11809–11822. DOI: https://dl.acm.org/doi/10.5555/3666122.3666639
Lightman H., Kosaraju V., Burda Y., Edwards H., Baker B., Lee T.,... Cobbe K. Let’s verify step by step. In The Twelfth International Conference on Learning Representations. 2023. DOI: https://doi.org/10.48550/arXiv.2305.20050
Brown B., Juravsky J., Ehrlich R., Clark R., Le Q. V., Ré C., Mirhoseini A. Large language monkeys: Scaling inference compute with repeated sampling. arXiv. 2024. arXiv:2407.21787. DOI: https://doi.org/10.48550/arXiv.2407.21787
Jaech A., Kalai A., Lerer A., Richardson A., El-Kishky A., Low A.,... Metz L. Openai o1 system card. arXiv. 2024. arXiv:2412.16720. DOI: https://doi.org/10.48550/arXiv.2412.16720
Guo D., Yang D., Zhang H., Song J., Zhang R., Xu R.,... He Y. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. Nature. 2025. Vol. 645, pp. 633–638. DOI: https://doi.org/10.1038/s41586-025-09422-z
Muennighoff N., Yang Z., Shi W., Li X. L., Fei-Fei L., Hajishirzi H.,... Hashimoto T. B. s1: Simple test-time scaling. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. pp. 20286–20332. DOI: https://doi.org/10.18653/v1/2025.emnlp-main.1025
Hao S., Gu Y., Ma H., Hong J., Wang Z., Wang D., Hu Z. Reasoning with language model is planning with world model. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. pp. 8154–8173. DOI: https://doi.org/10.18653/v1/2023.emnlp-main.507
Song Y., Sohl-Dickstein J., Kingma D. P., Kumar A., Ermon S., Poole B. Score-Based Generative Modeling through Stochastic Differential Equations. In International Conference on Learning Representations. 2020. DOI: https://doi.org/10.48550/arXiv.2011.13456
Dhariwal P., Nichol A. Diffusion models beat gans on image synthesis. Advances in neural information processing systems. 2021. Vol. 34, pp. 8780-8794. DOI: https://doi.org/10.48550/arXiv.2105.05233
Ho J., Salimans T. Classifier-free diffusion guidance. arXiv. 2022. DOI: arXiv:2207.12598. https://doi.org/10.48550/arXiv.2207.12598
Ma N., Tong S., Jia H., Hu H., Su Y. C., Zhang M.,... Xie S. Inference-time scaling for diffusion models beyond scaling denoising steps. arXiv. 2025. DOI: arXiv:2501.09732. https://doi.org/10.48550/arXiv.2501.09732
Sun Y., Wang X., Liu Z., Miller J., Efros A., Hardt M. Test-time training with self-supervision for generalization under distribution shifts. In International conference on machine learning. 2020. PMLR. pp. 9229-9248. DOI: https://doi.org/10.48550/arXiv.1909.13231
Wang D., Shelhamer E., Liu S., Olshausen B. A., Darrell T. Tent: Fully Test-Time Adaptation by Entropy Minimization. International Conference on Learning Representations. 2021. DOI: https://doi.org/10.48550/arXiv.2006.10726
Finn C., Abbeel P., Levine S. Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning. 2017. PMLR. pp. 1126-1135. DOI: https://doi.org/10.48550/arXiv.1703.03400
Sutskever I., Vinyals O., Le Q. V. Sequence to sequence learning with neural networks. Advances in neural information processing systems. 2014. Vol. 27. DOI: https://doi.org/10.48550/arXiv.1409.3215
Jumper J. M., Evans R., Pritzel A., Green T., Figurnov M., Ronneberger O., … Hassabis D. Highly accurate protein structure prediction with AlphaFold. Nature. 2021. Vol. 596, pp. 583–589. DOI: https://doi.org/10.1038/s41586-021-03819-2
Li Y., Choi D., Chung J., Kushman N., Schrittwieser J., Leblond R.,... Vinyals O. Competition-level code generation with alphacode. Science. 2022. Vol. 378(6624), pp. 1092-1097. DOI: https://doi.org/10.1126/science.abq1158





