
Frequently Asked Questions

Journal of Machine Learning Research
The Journal of Machine Learning Research (JMLR), established in 2000 , provides an international forum for the electronic and paper publication of high-quality scholarly articles in all areas of machine learning. All published papers are freely available online.
- 2023.01.20 : Volume 23 completed; Volume 24 began.
- 2022.07.20 : New special issue on climate change .
- 2022.02.18 : New blog post: Retrospectives from 20 Years of JMLR .
- 2022.01.25 : Volume 22 completed; Volume 23 began.
- 2021.12.02 : Message from outgoing co-EiC Bernhard Schölkopf .
- 2021.02.10 : Volume 21 completed; Volume 22 began.
- More news ...
Latest papers
Bagging in overparameterized learning: Risk characterization and risk monotonization Pratik Patil, Jin-Hong Du, Arun Kumar Kuchibhotla , 2023. [ abs ][ pdf ][ bib ]
Operator learning with PCA-Net: upper and lower complexity bounds Samuel Lanthaler , 2023. [ abs ][ pdf ][ bib ]
Mixed Regression via Approximate Message Passing Nelvin Tan, Ramji Venkataramanan , 2023. [ abs ][ pdf ][ bib ] [ code ]
The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines and Drifting Towards Wide Minima Peter L. Bartlett, Philip M. Long, Olivier Bousquet , 2023. [ abs ][ pdf ][ bib ]
MARLlib: A Scalable and Efficient Multi-agent Reinforcement Learning Library Siyi Hu, Yifan Zhong, Minquan Gao, Weixun Wang, Hao Dong, Xiaodan Liang, Zhihui Li, Xiaojun Chang, Yaodong Yang , 2023. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ] [ code ]
Fast Expectation Propagation for Heteroscedastic, Lasso-Penalized, and Quantile Regression Jackson Zhou, John T. Ormerod, Clara Grazian , 2023. [ abs ][ pdf ][ bib ] [ code ]
Zeroth-Order Alternating Gradient Descent Ascent Algorithms for A Class of Nonconvex-Nonconcave Minimax Problems Zi Xu, Zi-Qi Wang, Jun-Lin Wang, Yu-Hong Dai , 2023. [ abs ][ pdf ][ bib ]
The Measure and Mismeasure of Fairness Sam Corbett-Davies, Johann D. Gaebler, Hamed Nilforoshan, Ravi Shroff, Sharad Goel , 2023. [ abs ][ pdf ][ bib ] [ code ]
Microcanonical Hamiltonian Monte Carlo Jakob Robnik, G. Bruno De Luca, Eva Silverstein, Uroš Seljak , 2023. [ abs ][ pdf ][ bib ] [ code ]
Prediction Equilibrium for Dynamic Network Flows Lukas Graf, Tobias Harks, Kostas Kollias, Michael Markl , 2023. [ abs ][ pdf ][ bib ] [ code ]
Dimension Reduction and MARS Yu Liu LIU, Degui Li, Yingcun Xia , 2023. [ abs ][ pdf ][ bib ]
Nevis'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision Research Jorg Bornschein, Alexandre Galashov, Ross Hemsley, Amal Rannen-Triki, Yutian Chen, Arslan Chaudhry, Xu Owen He, Arthur Douillard, Massimo Caccia, Qixuan Feng, Jiajun Shen, Sylvestre-Alvise Rebuffi, Kitty Stacpoole, Diego de las Casas, Will Hawkins, Angeliki Lazaridou, Yee Whye Teh, Andrei A. Rusu, Razvan Pascanu, Marc’Aurelio Ranzato , 2023. [ abs ][ pdf ][ bib ] [ code ]
Fast Screening Rules for Optimal Design via Quadratic Lasso Reformulation Guillaume Sagnol, Luc Pronzato , 2023. [ abs ][ pdf ][ bib ] [ code ]
Multi-Consensus Decentralized Accelerated Gradient Descent Haishan Ye, Luo Luo, Ziang Zhou, Tong Zhang , 2023. [ abs ][ pdf ][ bib ]
Continuous-in-time Limit for Bayesian Bandits Yuhua Zhu, Zachary Izzo, Lexing Ying , 2023. [ abs ][ pdf ][ bib ]
Two Sample Testing in High Dimension via Maximum Mean Discrepancy Hanjia Gao, Xiaofeng Shao , 2023. [ abs ][ pdf ][ bib ]
Random Feature Amplification: Feature Learning and Generalization in Neural Networks Spencer Frei, Niladri S. Chatterji, Peter L. Bartlett , 2023. [ abs ][ pdf ][ bib ]
Pivotal Estimation of Linear Discriminant Analysis in High Dimensions Ethan X. Fang, Yajun Mei, Yuyang Shi, Qunzhi Xu, Tuo Zhao , 2023. [ abs ][ pdf ][ bib ]
Learning Optimal Feedback Operators and their Sparse Polynomial Approximations Karl Kunisch, Donato Vásquez-Varas, Daniel Walter , 2023. [ abs ][ pdf ][ bib ]
Sensitivity-Free Gradient Descent Algorithms Ion Matei, Maksym Zhenirovskyy, Johan de Kleer, John Maxwell , 2023. [ abs ][ pdf ][ bib ]
A PDE approach for regret bounds under partial monitoring Erhan Bayraktar, Ibrahim Ekren, Xin Zhang , 2023. [ abs ][ pdf ][ bib ]
A General Learning Framework for Open Ad Hoc Teamwork Using Graph-based Policy Learning Arrasy Rahman, Ignacio Carlucho, Niklas Höpner, Stefano V. Albrecht , 2023. [ abs ][ pdf ][ bib ] [ code ]
Causal Bandits for Linear Structural Equation Models Burak Varici, Karthikeyan Shanmugam, Prasanna Sattigeri, Ali Tajer , 2023. [ abs ][ pdf ][ bib ]
High-Dimensional Inference for Generalized Linear Models with Hidden Confounding Jing Ouyang, Kean Ming Tan, Gongjun Xu , 2023. [ abs ][ pdf ][ bib ]
Weibull Racing Survival Analysis with Competing Events, Left Truncation, and Time-Varying Covariates Quan Zhang, Yanxun Xu, Mei-Cheng Wang, Mingyuan Zhou , 2023. [ abs ][ pdf ][ bib ]
Erratum: Risk Bounds for the Majority Vote: From a PAC-Bayesian Analysis to a Learning Algorithm Louis-Philippe Vignault, Audrey Durand, Pascal Germain , 2023. [ abs ][ pdf ][ bib ]
Augmented Transfer Regression Learning with Semi-non-parametric Nuisance Models Molei Liu, Yi Zhang, Katherine P. Liao, Tianxi Cai , 2023. [ abs ][ pdf ][ bib ]
From Understanding Genetic Drift to a Smart-Restart Mechanism for Estimation-of-Distribution Algorithms Weijie Zheng, Benjamin Doerr , 2023. [ abs ][ pdf ][ bib ]
A Unified Analysis of Multi-task Functional Linear Regression Models with Manifold Constraint and Composite Quadratic Penalty Shiyuan He, Hanxuan Ye, Kejun He , 2023. [ abs ][ pdf ][ bib ]
Deletion and Insertion Tests in Regression Models Naofumi Hama, Masayoshi Mase, Art B. Owen , 2023. [ abs ][ pdf ][ bib ]
Deep Neural Networks with Dependent Weights: Gaussian Process Mixture Limit, Heavy Tails, Sparsity and Compressibility Hoil Lee, Fadhel Ayed, Paul Jung, Juho Lee, Hongseok Yang, Francois Caron , 2023. [ abs ][ pdf ][ bib ] [ code ]
A New Look at Dynamic Regret for Non-Stationary Stochastic Bandits Yasin Abbasi-Yadkori, András György, Nevena Lazić , 2023. [ abs ][ pdf ][ bib ]
Universal Approximation Property of Invertible Neural Networks Isao Ishikawa, Takeshi Teshima, Koichi Tojo, Kenta Oono, Masahiro Ikeda, Masashi Sugiyama , 2023. [ abs ][ pdf ][ bib ]
Low Tree-Rank Bayesian Vector Autoregression Models Leo L Duan, Zeyu Yuwen, George Michailidis, Zhengwu Zhang , 2023. [ abs ][ pdf ][ bib ] [ code ]
Generic Unsupervised Optimization for a Latent Variable Model With Exponential Family Observables Hamid Mousavi, Jakob Drefs, Florian Hirschberger, Jörg Lücke , 2023. [ abs ][ pdf ][ bib ] [ code ]
A Complete Characterization of Linear Estimators for Offline Policy Evaluation Juan C. Perdomo, Akshay Krishnamurthy, Peter Bartlett, Sham Kakade , 2023. [ abs ][ pdf ][ bib ]
Near-Optimal Weighted Matrix Completion Oscar López , 2023. [ abs ][ pdf ][ bib ]
Community models for networks observed through edge nominations Tianxi Li, Elizaveta Levina, Ji Zhu , 2023. [ abs ][ pdf ][ bib ] [ code ]
The Bayesian Learning Rule Mohammad Emtiyaz Khan, Håvard Rue , 2023. [ abs ][ pdf ][ bib ]
Removing Data Heterogeneity Influence Enhances Network Topology Dependence of Decentralized SGD Kun Yuan, Sulaiman A. Alghunaim, Xinmeng Huang , 2023. [ abs ][ pdf ][ bib ]
Sparse Markov Models for High-dimensional Inference Guilherme Ost, Daniel Y. Takahashi , 2023. [ abs ][ pdf ][ bib ]
Distinguishing Cause and Effect in Bivariate Structural Causal Models: A Systematic Investigation Christoph Käding,, Jakob Runge, , 2023. [ abs ][ pdf ][ bib ]
Elastic Gradient Descent, an Iterative Optimization Method Approximating the Solution Paths of the Elastic Net Oskar Allerbo, Johan Jonasson, Rebecka Jörnsten , 2023. [ abs ][ pdf ][ bib ] [ code ]
On Biased Compression for Distributed Learning Aleksandr Beznosikov, Samuel Horváth, Peter Richtárik, Mher Safaryan , 2023. [ abs ][ pdf ][ bib ]
Adaptive Clustering Using Kernel Density Estimators Ingo Steinwart, Bharath K. Sriperumbudur, Philipp Thomann , 2023. [ abs ][ pdf ][ bib ]
A Continuous-time Stochastic Gradient Descent Method for Continuous Data Kexin Jin, Jonas Latz, Chenguang Liu, Carola-Bibiane Schönlieb , 2023. [ abs ][ pdf ][ bib ]
Online Non-stochastic Control with Partial Feedback Yu-Hu Yan, Peng Zhao, Zhi-Hua Zhou , 2023. [ abs ][ pdf ][ bib ]
Distributed Sparse Regression via Penalization Yao Ji, Gesualdo Scutari, Ying Sun, Harsha Honnappa , 2023. [ abs ][ pdf ][ bib ]
Causal Discovery with Unobserved Confounding and Non-Gaussian Data Y. Samuel Wang, Mathias Drton , 2023. [ abs ][ pdf ][ bib ]
Sharper Analysis for Minibatch Stochastic Proximal Point Methods: Stability, Smoothness, and Deviation Xiao-Tong Yuan, Ping Li , 2023. [ abs ][ pdf ][ bib ]
Dynamic Ranking with the BTL Model: A Nearest Neighbor based Rank Centrality Method Eglantine Karlé, Hemant Tyagi , 2023. [ abs ][ pdf ][ bib ] [ code ]
Revisiting minimum description length complexity in overparameterized models Raaz Dwivedi, Chandan Singh, Bin Yu, Martin Wainwright , 2023. [ abs ][ pdf ][ bib ] [ code ]
Sparse Plus Low Rank Matrix Decomposition: A Discrete Optimization Approach Dimitris Bertsimas, Ryan Cory-Wright, Nicholas A. G. Johnson , 2023. [ abs ][ pdf ][ bib ] [ code ]
On the Estimation of Derivatives Using Plug-in Kernel Ridge Regression Estimators Zejian Liu, Meng Li , 2023. [ abs ][ pdf ][ bib ]
Surrogate Assisted Semi-supervised Inference for High Dimensional Risk Prediction Jue Hou, Zijian Guo, Tianxi Cai , 2023. [ abs ][ pdf ][ bib ]
ProtoryNet - Interpretable Text Classification Via Prototype Trajectories Dat Hong, Tong Wang, Stephen Baek , 2023. [ abs ][ pdf ][ bib ] [ code ]
Distributed Algorithms for U-statistics-based Empirical Risk Minimization Lanjue Chen, Alan T.K. Wan, Shuyi Zhang, Yong Zhou , 2023. [ abs ][ pdf ][ bib ]
Minimax Estimation for Personalized Federated Learning: An Alternative between FedAvg and Local Training? Shuxiao Chen, Qinqing Zheng, Qi Long, Weijie J. Su , 2023. [ abs ][ pdf ][ bib ]
Nearest Neighbor Dirichlet Mixtures Shounak Chattopadhyay, Antik Chakraborty, David B. Dunson , 2023. [ abs ][ pdf ][ bib ] [ code ]
Learning to Rank under Multinomial Logit Choice James A. Grant, David S. Leslie , 2023. [ abs ][ pdf ][ bib ]
Scalable high-dimensional Bayesian varying coefficient models with unknown within-subject covariance Ray Bai, Mary R. Boland, Yong Chen , 2023. [ abs ][ pdf ][ bib ] [ code ]
Multi-view Collaborative Gaussian Process Dynamical Systems Shiliang Sun, Jingjing Fei, Jing Zhao, Liang Mao , 2023. [ abs ][ pdf ][ bib ]
Fairlearn: Assessing and Improving Fairness of AI Systems Hilde Weerts, Miroslav Dudík, Richard Edgar, Adrin Jalali, Roman Lutz, Michael Madaio , 2023. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ] [ code ]
Scalable Real-Time Recurrent Learning Using Columnar-Constructive Networks Khurram Javed, Haseeb Shah, Richard S. Sutton, Martha White , 2023. [ abs ][ pdf ][ bib ] [ code ]
Torchhd: An Open Source Python Library to Support Research on Hyperdimensional Computing and Vector Symbolic Architectures Mike Heddes, Igor Nunes, Pere Vergés, Denis Kleyko, Danny Abraham, Tony Givargis, Alexandru Nicolau, Alexander Veidenbaum , 2023. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ] [ code ]
skrl: Modular and Flexible Library for Reinforcement Learning Antonio Serrano-Muñoz, Dimitrios Chrysostomou, Simon Bøgh, Nestor Arana-Arexolaleiba , 2023. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ] [ code ]
Estimating the Carbon Footprint of BLOOM, a 176B Parameter Language Model Alexandra Sasha Luccioni, Sylvain Viguier, Anne-Laure Ligozat , 2023. [ abs ][ pdf ][ bib ] [ code ]
Adaptive False Discovery Rate Control with Privacy Guarantee Xintao Xia, Zhanrui Cai , 2023. [ abs ][ pdf ][ bib ]
Atlas: Few-shot Learning with Retrieval Augmented Language Models Gautier Izacard, Patrick Lewis, Maria Lomeli, Lucas Hosseini, Fabio Petroni, Timo Schick, Jane Dwivedi-Yu, Armand Joulin, Sebastian Riedel, Edouard Grave , 2023. [ abs ][ pdf ][ bib ] [ code ]
Convex Reinforcement Learning in Finite Trials Mirco Mutti, Riccardo De Santi, Piersilvio De Bartolomeis, Marcello Restelli , 2023. [ abs ][ pdf ][ bib ]
Unbiased Multilevel Monte Carlo Methods for Intractable Distributions: MLMC Meets MCMC Tianze Wang, Guanyang Wang , 2023. [ abs ][ pdf ][ bib ] [ code ]
Improving multiple-try Metropolis with local balancing Philippe Gagnon, Florian Maire, Giacomo Zanella , 2023. [ abs ][ pdf ][ bib ]
Importance Sparsification for Sinkhorn Algorithm Mengyu Li, Jun Yu, Tao Li, Cheng Meng , 2023. [ abs ][ pdf ][ bib ] [ code ]
Graph Attention Retrospective Kimon Fountoulakis, Amit Levi, Shenghao Yang, Aseem Baranwal, Aukosh Jagannath , 2023. [ abs ][ pdf ][ bib ] [ code ]
Confidence Intervals and Hypothesis Testing for High-dimensional Quantile Regression: Convolution Smoothing and Debiasing Yibo Yan, Xiaozhou Wang, Riquan Zhang , 2023. [ abs ][ pdf ][ bib ]
Selection by Prediction with Conformal p-values Ying Jin, Emmanuel J. Candes , 2023. [ abs ][ pdf ][ bib ] [ code ]
Alpha-divergence Variational Inference Meets Importance Weighted Auto-Encoders: Methodology and Asymptotics Kamélia Daudel, Joe Benton, Yuyang Shi, Arnaud Doucet , 2023. [ abs ][ pdf ][ bib ]
Sparse Graph Learning from Spatiotemporal Time Series Andrea Cini, Daniele Zambon, Cesare Alippi , 2023. [ abs ][ pdf ][ bib ]
Improved Powered Stochastic Optimization Algorithms for Large-Scale Machine Learning Zhuang Yang , 2023. [ abs ][ pdf ][ bib ]
PaLM: Scaling Language Modeling with Pathways Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James Bradbury, Jacob Austin, Michael Isard, Guy Gur-Ari, Pengcheng Yin, Toju Duke, Anselm Levskaya, Sanjay Ghemawat, Sunipa Dev, Henryk Michalewski, Xavier Garcia, Vedant Misra, Kevin Robinson, Liam Fedus, Denny Zhou, Daphne Ippolito, David Luan, Hyeontaek Lim, Barret Zoph, Alexander Spiridonov, Ryan Sepassi, David Dohan, Shivani Agrawal, Mark Omernick, Andrew M. Dai, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Aitor Lewkowycz, Erica Moreira, Rewon Child, Oleksandr Polozov, Katherine Lee, Zongwei Zhou, Xuezhi Wang, Brennan Saeta, Mark Diaz, Orhan Firat, Michele Catasta, Jason Wei, Kathy Meier-Hellstern, Douglas Eck, Jeff Dean, Slav Petrov, Noah Fiedel , 2023. [ abs ][ pdf ][ bib ]
Leaky Hockey Stick Loss: The First Negatively Divergent Margin-based Loss Function for Classification Oh-Ran Kwon, Hui Zou , 2023. [ abs ][ pdf ][ bib ] [ code ]
Efficient Computation of Rankings from Pairwise Comparisons M. E. J. Newman , 2023. [ abs ][ pdf ][ bib ]
Scalable Computation of Causal Bounds Madhumitha Shridharan, Garud Iyengar , 2023. [ abs ][ pdf ][ bib ]
Neural Q-learning for solving PDEs Samuel N. Cohen, Deqing Jiang, Justin Sirignano , 2023. [ abs ][ pdf ][ bib ] [ code ]
Tractable and Near-Optimal Adversarial Algorithms for Robust Estimation in Contaminated Gaussian Models Ziyue Wang, Zhiqiang Tan , 2023. [ abs ][ pdf ][ bib ] [ code ]
MultiZoo and MultiBench: A Standardized Toolkit for Multimodal Deep Learning Paul Pu Liang, Yiwei Lyu, Xiang Fan, Arav Agarwal, Yun Cheng, Louis-Philippe Morency, Ruslan Salakhutdinov , 2023. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ] [ code ]
Strategic Knowledge Transfer Max Olan Smith, Thomas Anthony, Michael P. Wellman , 2023. [ abs ][ pdf ][ bib ]
Lifted Bregman Training of Neural Networks Xiaoyu Wang, Martin Benning , 2023. [ abs ][ pdf ][ bib ] [ code ]
Statistical Comparisons of Classifiers by Generalized Stochastic Dominance Christoph Jansen, Malte Nalenz, Georg Schollmeyer, Thomas Augustin , 2023. [ abs ][ pdf ][ bib ]
Sample Complexity for Distributionally Robust Learning under chi-square divergence Zhengyu Zhou, Weiwei Liu , 2023. [ abs ][ pdf ][ bib ]
Interpretable and Fair Boolean Rule Sets via Column Generation Connor Lawless, Sanjeeb Dash, Oktay Gunluk, Dennis Wei , 2023. [ abs ][ pdf ][ bib ]
On the Optimality of Nuclear-norm-based Matrix Completion for Problems with Smooth Non-linear Structure Yunhua Xiang, Tianyu Zhang, Xu Wang, Ali Shojaie, Noah Simon , 2023. [ abs ][ pdf ][ bib ]
Autoregressive Networks Binyan Jiang, Jialiang Li, Qiwei Yao , 2023. [ abs ][ pdf ][ bib ]
Merlion: End-to-End Machine Learning for Time Series Aadyot Bhatnagar, Paul Kassianik, Chenghao Liu, Tian Lan, Wenzhuo Yang, Rowan Cassius, Doyen Sahoo, Devansh Arpit, Sri Subramanian, Gerald Woo, Amrita Saha, Arun Kumar Jagota, Gokulakrishnan Gopalakrishnan, Manpreet Singh, K C Krithika, Sukumar Maddineni, Daeki Cho, Bo Zong, Yingbo Zhou, Caiming Xiong, Silvio Savarese, Steven Hoi, Huan Wang , 2023. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ] [ code ]
Limits of Dense Simplicial Complexes T. Mitchell Roddenberry, Santiago Segarra , 2023. [ abs ][ pdf ][ bib ]
RankSEG: A Consistent Ranking-based Framework for Segmentation Ben Dai, Chunlin Li , 2023. [ abs ][ pdf ][ bib ] [ code ]
Conditional Distribution Function Estimation Using Neural Networks for Censored and Uncensored Data Bingqing Hu, Bin Nan , 2023. [ abs ][ pdf ][ bib ] [ code ]
Single Timescale Actor-Critic Method to Solve the Linear Quadratic Regulator with Convergence Guarantees Mo Zhou, Jianfeng Lu , 2023. [ abs ][ pdf ][ bib ] [ code ]
Multi-source Learning via Completion of Block-wise Overlapping Noisy Matrices Doudou Zhou, Tianxi Cai, Junwei Lu , 2023. [ abs ][ pdf ][ bib ] [ code ]
A Unified Framework for Factorizing Distributional Value Functions for Multi-Agent Reinforcement Learning Wei-Fang Sun, Cheng-Kuang Lee, Simon See, Chun-Yi Lee , 2023. [ abs ][ pdf ][ bib ] [ code ]
Functional L-Optimality Subsampling for Functional Generalized Linear Models with Massive Data Hua Liu, Jinhong You, Jiguo Cao , 2023. [ abs ][ pdf ][ bib ] [ code ]
Adaptation Augmented Model-based Policy Optimization Jian Shen, Hang Lai, Minghuan Liu, Han Zhao, Yong Yu, Weinan Zhang , 2023. [ abs ][ pdf ][ bib ]
GANs as Gradient Flows that Converge Yu-Jui Huang, Yuchong Zhang , 2023. [ abs ][ pdf ][ bib ]
Random Forests for Change Point Detection Malte Londschien, Peter Bühlmann, Solt Kovács , 2023. [ abs ][ pdf ][ bib ] [ code ]
Least Squares Model Averaging for Distributed Data Haili Zhang, Zhaobo Liu, Guohua Zou , 2023. [ abs ][ pdf ][ bib ]
An Empirical Investigation of the Role of Pre-training in Lifelong Learning Sanket Vaibhav Mehta, Darshan Patil, Sarath Chandar, Emma Strubell , 2023. [ abs ][ pdf ][ bib ] [ code ]
Polynomial-Time Algorithms for Counting and Sampling Markov Equivalent DAGs with Applications Marcel Wienöbst, Max Bannach, Maciej Liśkiewicz , 2023. [ abs ][ pdf ][ bib ] [ code ]
An Inexact Augmented Lagrangian Algorithm for Training Leaky ReLU Neural Network with Group Sparsity Wei Liu, Xin Liu, Xiaojun Chen , 2023. [ abs ][ pdf ][ bib ]
Entropic Fictitious Play for Mean Field Optimization Problem Fan Chen, Zhenjie Ren, Songbo Wang , 2023. [ abs ][ pdf ][ bib ]
GFlowNet Foundations Yoshua Bengio, Salem Lahlou, Tristan Deleu, Edward J. Hu, Mo Tiwari, Emmanuel Bengio , 2023. [ abs ][ pdf ][ bib ]
LibMTL: A Python Library for Deep Multi-Task Learning Baijiong Lin, Yu Zhang , 2023. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ] [ code ]
Minimax Risk Classifiers with 0-1 Loss Santiago Mazuelas, Mauricio Romero, Peter Grunwald , 2023. [ abs ][ pdf ][ bib ]
Augmented Sparsifiers for Generalized Hypergraph Cuts Nate Veldt, Austin R. Benson, Jon Kleinberg , 2023. [ abs ][ pdf ][ bib ] [ code ]
Non-stationary Online Learning with Memory and Non-stochastic Control Peng Zhao, Yu-Hu Yan, Yu-Xiang Wang, Zhi-Hua Zhou , 2023. [ abs ][ pdf ][ bib ]
L0Learn: A Scalable Package for Sparse Learning using L0 Regularization Hussein Hazimeh, Rahul Mazumder, Tim Nonet , 2023. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ] [ code ]
Buffered Asynchronous SGD for Byzantine Learning Yi-Rui Yang, Wu-Jun Li , 2023. [ abs ][ pdf ][ bib ]
A Non-parametric View of FedAvg and FedProx:Beyond Stationary Points Lili Su, Jiaming Xu, Pengkun Yang , 2023. [ abs ][ pdf ][ bib ]
Multiplayer Performative Prediction: Learning in Decision-Dependent Games Adhyyan Narang, Evan Faulkner, Dmitriy Drusvyatskiy, Maryam Fazel, Lillian J. Ratliff , 2023. [ abs ][ pdf ][ bib ] [ code ]
Variational Inverting Network for Statistical Inverse Problems of Partial Differential Equations Junxiong Jia, Yanni Wu, Peijun Li, Deyu Meng , 2023. [ abs ][ pdf ][ bib ]
Model-based Causal Discovery for Zero-Inflated Count Data Junsouk Choi, Yang Ni , 2023. [ abs ][ pdf ][ bib ] [ code ]
Q-Learning for MDPs with General Spaces: Convergence and Near Optimality via Quantization under Weak Continuity Ali Kara, Naci Saldi, Serdar Yüksel , 2023. [ abs ][ pdf ][ bib ]
CodaLab Competitions: An Open Source Platform to Organize Scientific Challenges Adrien Pavao, Isabelle Guyon, Anne-Catherine Letournel, Dinh-Tuan Tran, Xavier Baro, Hugo Jair Escalante, Sergio Escalera, Tyler Thomas, Zhen Xu , 2023. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ] [ code ]
Contrasting Identifying Assumptions of Average Causal Effects: Robustness and Semiparametric Efficiency Tetiana Gorbach, Xavier de Luna, Juha Karvanen, Ingeborg Waernbaum , 2023. [ abs ][ pdf ][ bib ] [ code ]
Variational Gibbs Inference for Statistical Model Estimation from Incomplete Data Vaidotas Simkus, Benjamin Rhodes, Michael U. Gutmann , 2023. [ abs ][ pdf ][ bib ] [ code ]
Clustering and Structural Robustness in Causal Diagrams Santtu Tikka, Jouni Helske, Juha Karvanen , 2023. [ abs ][ pdf ][ bib ] [ code ]
MMD Aggregated Two-Sample Test Antonin Schrab, Ilmun Kim, Mélisande Albert, Béatrice Laurent, Benjamin Guedj, Arthur Gretton , 2023. [ abs ][ pdf ][ bib ] [ code ]
Divide-and-Conquer Fusion Ryan S.Y. Chan, Murray Pollock, Adam M. Johansen, Gareth O. Roberts , 2023. [ abs ][ pdf ][ bib ]
PAC-learning for Strategic Classification Ravi Sundaram, Anil Vullikanti, Haifeng Xu, Fan Yao , 2023. [ abs ][ pdf ][ bib ]
Insights into Ordinal Embedding Algorithms: A Systematic Evaluation Leena Chennuru Vankadara, Michael Lohaus, Siavash Haghiri, Faiz Ul Wahab, Ulrike von Luxburg , 2023. [ abs ][ pdf ][ bib ] [ code ]
Clustering with Tangles: Algorithmic Framework and Theoretical Guarantees Solveig Klepper, Christian Elbracht, Diego Fioravanti, Jakob Kneip, Luca Rendsburg, Maximilian Teegen, Ulrike von Luxburg , 2023. [ abs ][ pdf ][ bib ] [ code ]
Random Feature Neural Networks Learn Black-Scholes Type PDEs Without Curse of Dimensionality Lukas Gonon , 2023. [ abs ][ pdf ][ bib ]
The Proximal ID Algorithm Ilya Shpitser, Zach Wood-Doughty, Eric J. Tchetgen Tchetgen , 2023. [ abs ][ pdf ][ bib ] [ code ]
Quantifying Network Similarity using Graph Cumulants Gecia Bravo-Hermsdorff, Lee M. Gunderson, Pierre-André Maugis, Carey E. Priebe , 2023. [ abs ][ pdf ][ bib ] [ code ]
Learning an Explicit Hyper-parameter Prediction Function Conditioned on Tasks Jun Shu, Deyu Meng, Zongben Xu , 2023. [ abs ][ pdf ][ bib ] [ code ]
On the Theoretical Equivalence of Several Trade-Off Curves Assessing Statistical Proximity Rodrigue Siry, Ryan Webster, Loic Simon, Julien Rabin , 2023. [ abs ][ pdf ][ bib ]
Metrizing Weak Convergence with Maximum Mean Discrepancies Carl-Johann Simon-Gabriel, Alessandro Barp, Bernhard Schölkopf, Lester Mackey , 2023. [ abs ][ pdf ][ bib ]
Quasi-Equivalence between Width and Depth of Neural Networks Fenglei Fan, Rongjie Lai, Ge Wang , 2023. [ abs ][ pdf ][ bib ]
Naive regression requires weaker assumptions than factor models to adjust for multiple cause confounding Justin Grimmer, Dean Knox, Brandon Stewart , 2023. [ abs ][ pdf ][ bib ]
Factor Graph Neural Networks Zhen Zhang, Mohammed Haroon Dupty, Fan Wu, Javen Qinfeng Shi, Wee Sun Lee , 2023. [ abs ][ pdf ][ bib ] [ code ]
Dropout Training is Distributionally Robust Optimal José Blanchet, Yang Kang, José Luis Montiel Olea, Viet Anh Nguyen, Xuhui Zhang , 2023. [ abs ][ pdf ][ bib ]
Variational Inference for Deblending Crowded Starfields Runjing Liu, Jon D. McAuliffe, Jeffrey Regier, The LSST Dark Energy Science Collaboration , 2023. [ abs ][ pdf ][ bib ] [ code ]
F2A2: Flexible Fully-decentralized Approximate Actor-critic for Cooperative Multi-agent Reinforcement Learning Wenhao Li, Bo Jin, Xiangfeng Wang, Junchi Yan, Hongyuan Zha , 2023. [ abs ][ pdf ][ bib ]
Comprehensive Algorithm Portfolio Evaluation using Item Response Theory Sevvandi Kandanaarachchi, Kate Smith-Miles , 2023. [ abs ][ pdf ][ bib ] [ code ]
Evaluating Instrument Validity using the Principle of Independent Mechanisms Patrick F. Burauel , 2023. [ abs ][ pdf ][ bib ]
Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity Kaiqing Zhang, Sham M. Kakade, Tamer Basar, Lin F. Yang , 2023. [ abs ][ pdf ][ bib ]
Posterior Consistency for Bayesian Relevance Vector Machines Xiao Fang, Malay Ghosh , 2023. [ abs ][ pdf ][ bib ]
From Classification Accuracy to Proper Scoring Rules: Elicitability of Probabilistic Top List Predictions Johannes Resin , 2023. [ abs ][ pdf ][ bib ]
Beyond the Golden Ratio for Variational Inequality Algorithms Ahmet Alacaoglu, Axel Böhm, Yura Malitsky , 2023. [ abs ][ pdf ][ bib ] [ code ]
Incremental Learning in Diagonal Linear Networks Raphaël Berthier , 2023. [ abs ][ pdf ][ bib ]
Small Transformers Compute Universal Metric Embeddings Anastasis Kratsios, Valentin Debarnot, Ivan Dokmanić , 2023. [ abs ][ pdf ][ bib ] [ code ]
DART: Distance Assisted Recursive Testing Xuechan Li, Anthony D. Sung, Jichun Xie , 2023. [ abs ][ pdf ][ bib ]
Inference on the Change Point under a High Dimensional Covariance Shift Abhishek Kaul, Hongjin Zhang, Konstantinos Tsampourakis, George Michailidis , 2023. [ abs ][ pdf ][ bib ]
Bilevel Optimization with a Lower-level Contraction: Optimal Sample Complexity without Warm-Start Riccardo Grazzi, Massimiliano Pontil, Saverio Salzo , 2023. [ abs ][ pdf ][ bib ] [ code ]
A Parameter-Free Conditional Gradient Method for Composite Minimization under Hölder Condition Masaru Ito, Zhaosong Lu, Chuan He , 2023. [ abs ][ pdf ][ bib ]
Robust Methods for High-Dimensional Linear Learning Ibrahim Merad, Stéphane Gaïffas , 2023. [ abs ][ pdf ][ bib ]
A Framework and Benchmark for Deep Batch Active Learning for Regression David Holzmüller, Viktor Zaverkin, Johannes Kästner, Ingo Steinwart , 2023. [ abs ][ pdf ][ bib ] [ code ]
Preconditioned Gradient Descent for Overparameterized Nonconvex Burer--Monteiro Factorization with Global Optimality Certification Gavin Zhang, Salar Fattahi, Richard Y. Zhang , 2023. [ abs ][ pdf ][ bib ]
Flexible Model Aggregation for Quantile Regression Rasool Fakoor, Taesup Kim, Jonas Mueller, Alexander J. Smola, Ryan J. Tibshirani , 2023. [ abs ][ pdf ][ bib ] [ code ]
q-Learning in Continuous Time Yanwei Jia, Xun Yu Zhou , 2023. [ abs ][ pdf ][ bib ] [ code ]
Multivariate Soft Rank via Entropy-Regularized Optimal Transport: Sample Efficiency and Generative Modeling Shoaib Bin Masud, Matthew Werenski, James M. Murphy, Shuchin Aeron , 2023. [ abs ][ pdf ][ bib ] [ code ]
Infinite-dimensional optimization and Bayesian nonparametric learning of stochastic differential equations Arnab Ganguly, Riten Mitra, Jinpu Zhou , 2023. [ abs ][ pdf ][ bib ]
Asynchronous Iterations in Optimization: New Sequence Results and Sharper Algorithmic Guarantees Hamid Reza Feyzmahdavian, Mikael Johansson , 2023. [ abs ][ pdf ][ bib ]
Restarted Nonconvex Accelerated Gradient Descent: No More Polylogarithmic Factor in the in the O(epsilon^(-7/4)) Complexity Huan Li, Zhouchen Lin , 2023. [ abs ][ pdf ][ bib ] [ code ]
Integrating Random Effects in Deep Neural Networks Giora Simchoni, Saharon Rosset , 2023. [ abs ][ pdf ][ bib ] [ code ]
Adaptive Data Depth via Multi-Armed Bandits Tavor Baharav, Tze Leung Lai , 2023. [ abs ][ pdf ][ bib ]
Adapting and Evaluating Influence-Estimation Methods for Gradient-Boosted Decision Trees Jonathan Brophy, Zayd Hammoudeh, Daniel Lowd , 2023. [ abs ][ pdf ][ bib ] [ code ]
Consistent Model-based Clustering using the Quasi-Bernoulli Stick-breaking Process Cheng Zeng, Jeffrey W Miller, Leo L Duan , 2023. [ abs ][ pdf ][ bib ] [ code ]
Selective inference for k-means clustering Yiqun T. Chen, Daniela M. Witten , 2023. [ abs ][ pdf ][ bib ] [ code ]
Generalization error bounds for multiclass sparse linear classifiers Tomer Levy, Felix Abramovich , 2023. [ abs ][ pdf ][ bib ]
MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning Ming Zhou, Ziyu Wan, Hanjing Wang, Muning Wen, Runzhe Wu, Ying Wen, Yaodong Yang, Yong Yu, Jun Wang, Weinan Zhang , 2023. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ] [ code ]
Controlling Wasserstein Distances by Kernel Norms with Application to Compressive Statistical Learning Titouan Vayer, Rémi Gribonval , 2023. [ abs ][ pdf ][ bib ]
Fast Objective & Duality Gap Convergence for Non-Convex Strongly-Concave Min-Max Problems with PL Condition Zhishuai Guo, Yan Yan, Zhuoning Yuan, Tianbao Yang , 2023. [ abs ][ pdf ][ bib ]
Stochastic Optimization under Distributional Drift Joshua Cutler, Dmitriy Drusvyatskiy, Zaid Harchaoui , 2023. [ abs ][ pdf ][ bib ]
Off-Policy Actor-Critic with Emphatic Weightings Eric Graves, Ehsan Imani, Raksha Kumaraswamy, Martha White , 2023. [ abs ][ pdf ][ bib ] [ code ]
Memory-Based Optimization Methods for Model-Agnostic Meta-Learning and Personalized Federated Learning Bokun Wang, Zhuoning Yuan, Yiming Ying, Tianbao Yang , 2023. [ abs ][ pdf ][ bib ] [ code ]
Escaping The Curse of Dimensionality in Bayesian Model-Based Clustering Noirrit Kiran Chandra, Antonio Canale, David B. Dunson , 2023. [ abs ][ pdf ][ bib ]
Large sample spectral analysis of graph-based multi-manifold clustering Nicolas Garcia Trillos, Pengfei He, Chenghui Li , 2023. [ abs ][ pdf ][ bib ] [ code ]
On Tilted Losses in Machine Learning: Theory and Applications Tian Li, Ahmad Beirami, Maziar Sanjabi, Virginia Smith , 2023. [ abs ][ pdf ][ bib ] [ code ]
Optimal Convergence Rates for Distributed Nystroem Approximation Jian Li, Yong Liu, Weiping Wang , 2023. [ abs ][ pdf ][ bib ] [ code ]
Jump Interval-Learning for Individualized Decision Making with Continuous Treatments Hengrui Cai, Chengchun Shi, Rui Song, Wenbin Lu , 2023. [ abs ][ pdf ][ bib ] [ code ]
Policy Gradient Methods Find the Nash Equilibrium in N-player General-sum Linear-quadratic Games Ben Hambly, Renyuan Xu, Huining Yang , 2023. [ abs ][ pdf ][ bib ]
Asymptotics of Network Embeddings Learned via Subsampling Andrew Davison, Morgane Austern , 2023. [ abs ][ pdf ][ bib ] [ code ]
Implicit Bias of Gradient Descent for Mean Squared Error Regression with Two-Layer Wide Neural Networks Hui Jin, Guido Montufar , 2023. [ abs ][ pdf ][ bib ] [ code ]
Dimension Reduction in Contextual Online Learning via Nonparametric Variable Selection Wenhao Li, Ningyuan Chen, L. Jeff Hong , 2023. [ abs ][ pdf ][ bib ]
Sparse GCA and Thresholded Gradient Descent Sheng Gao, Zongming Ma , 2023. [ abs ][ pdf ][ bib ]
MARS: A Second-Order Reduction Algorithm for High-Dimensional Sparse Precision Matrices Estimation Qian Li, Binyan Jiang, Defeng Sun , 2023. [ abs ][ pdf ][ bib ]
Exploiting Discovered Regression Discontinuities to Debias Conditioned-on-observable Estimators Benjamin Jakubowski, Sriram Somanchi, Edward McFowland III, Daniel B. Neill , 2023. [ abs ][ pdf ][ bib ] [ code ]
Generalized Linear Models in Non-interactive Local Differential Privacy with Public Data Di Wang, Lijie Hu, Huanyu Zhang, Marco Gaboardi, Jinhui Xu , 2023. [ abs ][ pdf ][ bib ]
A Rigorous Information-Theoretic Definition of Redundancy and Relevancy in Feature Selection Based on (Partial) Information Decomposition Patricia Wollstadt, Sebastian Schmitt, Michael Wibral , 2023. [ abs ][ pdf ][ bib ]
Combinatorial Optimization and Reasoning with Graph Neural Networks Quentin Cappart, Didier Chételat, Elias B. Khalil, Andrea Lodi, Christopher Morris, Petar Veličković , 2023. [ abs ][ pdf ][ bib ]
A First Look into the Carbon Footprint of Federated Learning Xinchi Qiu, Titouan Parcollet, Javier Fernandez-Marques, Pedro P. B. Gusmao, Yan Gao, Daniel J. Beutel, Taner Topal, Akhil Mathur, Nicholas D. Lane , 2023. [ abs ][ pdf ][ bib ]
An Eigenmodel for Dynamic Multilayer Networks Joshua Daniel Loyal, Yuguo Chen , 2023. [ abs ][ pdf ][ bib ] [ code ]
Graph Clustering with Graph Neural Networks Anton Tsitsulin, John Palowitch, Bryan Perozzi, Emmanuel Müller , 2023. [ abs ][ pdf ][ bib ] [ code ]
Euler-Lagrange Analysis of Generative Adversarial Networks Siddarth Asokan, Chandra Sekhar Seelamantula , 2023. [ abs ][ pdf ][ bib ] [ code ]
Statistical Robustness of Empirical Risks in Machine Learning Shaoyan Guo, Huifu Xu, Liwei Zhang , 2023. [ abs ][ pdf ][ bib ]
HiGrad: Uncertainty Quantification for Online Learning and Stochastic Approximation Weijie J. Su, Yuancheng Zhu , 2023. [ abs ][ pdf ][ bib ]
Benign overfitting in ridge regression Alexander Tsigler, Peter L. Bartlett , 2023. [ abs ][ pdf ][ bib ]
Compute-Efficient Deep Learning: Algorithmic Trends and Opportunities Brian R. Bartoldson, Bhavya Kailkhura, Davis Blalock , 2023. [ abs ][ pdf ][ bib ]
Minimal Width for Universal Property of Deep RNN Chang hoon Song, Geonho Hwang, Jun ho Lee, Myungjoo Kang , 2023. [ abs ][ pdf ][ bib ]
Maximum likelihood estimation in Gaussian process regression is ill-posed Toni Karvonen, Chris J. Oates , 2023. [ abs ][ pdf ][ bib ]
An Annotated Graph Model with Differential Degree Heterogeneity for Directed Networks Stefan Stein, Chenlei Leng , 2023. [ abs ][ pdf ][ bib ]
A Unified Framework for Optimization-Based Graph Coarsening Manoj Kumar, Anurag Sharma, Sandeep Kumar , 2023. [ abs ][ pdf ][ bib ] [ code ]
Deep linear networks can benignly overfit when shallow ones do Niladri S. Chatterji, Philip M. Long , 2023. [ abs ][ pdf ][ bib ] [ code ]
SQLFlow: An Extensible Toolkit Integrating DB and AI Jun Zhou, Ke Zhang, Lin Wang, Hua Wu, Yi Wang, ChaoChao Chen , 2023. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ] [ code ]
Learning Good State and Action Representations for Markov Decision Process via Tensor Decomposition Chengzhuo Ni, Yaqi Duan, Munther Dahleh, Mengdi Wang, Anru R. Zhang , 2023. [ abs ][ pdf ][ bib ]
Generalization Bounds for Adversarial Contrastive Learning Xin Zou, Weiwei Liu , 2023. [ abs ][ pdf ][ bib ]
The Implicit Bias of Benign Overfitting Ohad Shamir , 2023. [ abs ][ pdf ][ bib ]
The Hyperspherical Geometry of Community Detection: Modularity as a Distance Martijn Gösgens, Remco van der Hofstad, Nelly Litvak , 2023. [ abs ][ pdf ][ bib ] [ code ]
FLIP: A Utility Preserving Privacy Mechanism for Time Series Tucker McElroy, Anindya Roy, Gaurab Hore , 2023. [ abs ][ pdf ][ bib ]
A General Theory for Federated Optimization with Asynchronous and Heterogeneous Clients Updates Yann Fraboni, Richard Vidal, Laetitia Kameni, Marco Lorenzi , 2023. [ abs ][ pdf ][ bib ] [ code ]
Dimensionless machine learning: Imposing exact units equivariance Soledad Villar, Weichi Yao, David W. Hogg, Ben Blum-Smith, Bianca Dumitrascu , 2023. [ abs ][ pdf ][ bib ]
Bayesian Calibration of Imperfect Computer Models using Physics-Informed Priors Michail Spitieris, Ingelin Steinsland , 2023. [ abs ][ pdf ][ bib ] [ code ]
Risk Bounds for Positive-Unlabeled Learning Under the Selected At Random Assumption Olivier Coudray, Christine Keribin, Pascal Massart, Patrick Pamphile , 2023. [ abs ][ pdf ][ bib ]
Concentration analysis of multivariate elliptic diffusions Lukas Trottner, Cathrine Aeckerle-Willems, Claudia Strauch , 2023. [ abs ][ pdf ][ bib ]
Knowledge Hypergraph Embedding Meets Relational Algebra Bahare Fatemi, Perouz Taslakian, David Vazquez, David Poole , 2023. [ abs ][ pdf ][ bib ] [ code ]
Intrinsic Gaussian Process on Unknown Manifolds with Probabilistic Metrics Mu Niu, Zhenwen Dai, Pokman Cheung, Yizhu Wang , 2023. [ abs ][ pdf ][ bib ]
Sparse Training with Lipschitz Continuous Loss Functions and a Weighted Group L0-norm Constraint Michael R. Metel , 2023. [ abs ][ pdf ][ bib ]
Learning Optimal Group-structured Individualized Treatment Rules with Many Treatments Haixu Ma, Donglin Zeng, Yufeng Liu , 2023. [ abs ][ pdf ][ bib ]
Inference for Gaussian Processes with Matern Covariogram on Compact Riemannian Manifolds Didong Li, Wenpin Tang, Sudipto Banerjee , 2023. [ abs ][ pdf ][ bib ]
FedLab: A Flexible Federated Learning Framework Dun Zeng, Siqi Liang, Xiangjing Hu, Hui Wang, Zenglin Xu , 2023. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ] [ code ]
Connectivity Matters: Neural Network Pruning Through the Lens of Effective Sparsity Artem Vysogorets, Julia Kempe , 2023. [ abs ][ pdf ][ bib ]
An Analysis of Robustness of Non-Lipschitz Networks Maria-Florina Balcan, Avrim Blum, Dravyansh Sharma, Hongyang Zhang , 2023. [ abs ][ pdf ][ bib ] [ code ]
Fitting Autoregressive Graph Generative Models through Maximum Likelihood Estimation Xu Han, Xiaohui Chen, Francisco J. R. Ruiz, Li-Ping Liu , 2023. [ abs ][ pdf ][ bib ] [ code ]
Global Convergence of Sub-gradient Method for Robust Matrix Recovery: Small Initialization, Noisy Measurements, and Over-parameterization Jianhao Ma, Salar Fattahi , 2023. [ abs ][ pdf ][ bib ]
Statistical Inference for Noisy Incomplete Binary Matrix Yunxiao Chen, Chengcheng Li, Jing Ouyang, Gongjun Xu , 2023. [ abs ][ pdf ][ bib ]
Faith-Shap: The Faithful Shapley Interaction Index Che-Ping Tsai, Chih-Kuan Yeh, Pradeep Ravikumar , 2023. [ abs ][ pdf ][ bib ]
Decentralized Learning: Theoretical Optimality and Practical Improvements Yucheng Lu, Christopher De Sa , 2023. [ abs ][ pdf ][ bib ]
Non-Asymptotic Guarantees for Robust Statistical Learning under Infinite Variance Assumption Lihu Xu, Fang Yao, Qiuran Yao, Huiming Zhang , 2023. [ abs ][ pdf ][ bib ]
Recursive Quantile Estimation: Non-Asymptotic Confidence Bounds Likai Chen, Georg Keilbar, Wei Biao Wu , 2023. [ abs ][ pdf ][ bib ]
Outlier-Robust Subsampling Techniques for Persistent Homology Bernadette J. Stolz , 2023. [ abs ][ pdf ][ bib ] [ code ]
Neural Operator: Learning Maps Between Function Spaces With Applications to PDEs Nikola Kovachki, Zongyi Li, Burigede Liu, Kamyar Azizzadenesheli, Kaushik Bhattacharya, Andrew Stuart, Anima Anandkumar , 2023. [ abs ][ pdf ][ bib ] [ code ]
Dimension-Grouped Mixed Membership Models for Multivariate Categorical Data Yuqi Gu, Elena E. Erosheva, Gongjun Xu, David B. Dunson , 2023. [ abs ][ pdf ][ bib ]
Gaussian Processes with Errors in Variables: Theory and Computation Shuang Zhou, Debdeep Pati, Tianying Wang, Yun Yang, Raymond J. Carroll , 2023. [ abs ][ pdf ][ bib ]
Learning Partial Differential Equations in Reproducing Kernel Hilbert Spaces George Stepaniants , 2023. [ abs ][ pdf ][ bib ] [ code ]
Doubly Robust Stein-Kernelized Monte Carlo Estimator: Simultaneous Bias-Variance Reduction and Supercanonical Convergence Henry Lam, Haofeng Zhang , 2023. [ abs ][ pdf ][ bib ]
Online Optimization over Riemannian Manifolds Xi Wang, Zhipeng Tu, Yiguang Hong, Yingyi Wu, Guodong Shi , 2023. [ abs ][ pdf ][ bib ] [ code ]
Bayes-Newton Methods for Approximate Bayesian Inference with PSD Guarantees William J. Wilkinson, Simo Särkkä, Arno Solin , 2023. [ abs ][ pdf ][ bib ] [ code ]
Iterated Block Particle Filter for High-dimensional Parameter Learning: Beating the Curse of Dimensionality Ning Ning, Edward L. Ionides , 2023. [ abs ][ pdf ][ bib ]
Fast Online Changepoint Detection via Functional Pruning CUSUM Statistics Gaetano Romano, Idris A. Eckley, Paul Fearnhead, Guillem Rigaill , 2023. [ abs ][ pdf ][ bib ] [ code ]
Temporal Abstraction in Reinforcement Learning with the Successor Representation Marlos C. Machado, Andre Barreto, Doina Precup, Michael Bowling , 2023. [ abs ][ pdf ][ bib ]
Approximate Post-Selective Inference for Regression with the Group LASSO Snigdha Panigrahi, Peter W MacDonald, Daniel Kessler , 2023. [ abs ][ pdf ][ bib ]
Towards Learning to Imitate from a Single Video Demonstration Glen Berseth, Florian Golemo, Christopher Pal , 2023. [ abs ][ pdf ][ bib ]
A Likelihood Approach to Nonparametric Estimation of a Singular Distribution Using Deep Generative Models Minwoo Chae, Dongha Kim, Yongdai Kim, Lizhen Lin , 2023. [ abs ][ pdf ][ bib ]
A Randomized Subspace-based Approach for Dimensionality Reduction and Important Variable Selection Di Bo, Hoon Hwangbo, Vinit Sharma, Corey Arndt, Stephanie TerMaath , 2023. [ abs ][ pdf ][ bib ]
Intrinsic Persistent Homology via Density-based Metric Learning Ximena Fernández, Eugenio Borghini, Gabriel Mindlin, Pablo Groisman , 2023. [ abs ][ pdf ][ bib ] [ code ]
Privacy-Aware Rejection Sampling Jordan Awan, Vinayak Rao , 2023. [ abs ][ pdf ][ bib ]
Inference for a Large Directed Acyclic Graph with Unspecified Interventions Chunlin Li, Xiaotong Shen, Wei Pan , 2023. [ abs ][ pdf ][ bib ] [ code ]
How Do You Want Your Greedy: Simultaneous or Repeated? Moran Feldman, Christopher Harshaw, Amin Karbasi , 2023. [ abs ][ pdf ][ bib ] [ code ]
Kernel-Matrix Determinant Estimates from stopped Cholesky Decomposition Simon Bartels, Wouter Boomsma, Jes Frellsen, Damien Garreau , 2023. [ abs ][ pdf ][ bib ] [ code ]
Optimizing ROC Curves with a Sort-Based Surrogate Loss for Binary Classification and Changepoint Detection Jonathan Hillman, Toby Dylan Hocking , 2023. [ abs ][ pdf ][ bib ] [ code ]
When Locally Linear Embedding Hits Boundary Hau-Tieng Wu, Nan Wu , 2023. [ abs ][ pdf ][ bib ]
Distributed Nonparametric Regression Imputation for Missing Response Problems with Large-scale Data Ruoyu Wang, Miaomiao Su, Qihua Wang , 2023. [ abs ][ pdf ][ bib ] [ code ]
Prior Specification for Bayesian Matrix Factorization via Prior Predictive Matching Eliezer de Souza da Silva, Tomasz Kuśmierczyk, Marcelo Hartmann, Arto Klami , 2023. [ abs ][ pdf ][ bib ] [ code ]
Posterior Contraction for Deep Gaussian Process Priors Gianluca Finocchio, Johannes Schmidt-Hieber , 2023. [ abs ][ pdf ][ bib ]
Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule Nikhil Iyer, V. Thejas, Nipun Kwatra, Ramachandran Ramjee, Muthian Sivathanu , 2023. [ abs ][ pdf ][ bib ] [ code ]
Fundamental limits and algorithms for sparse linear regression with sublinear sparsity Lan V. Truong , 2023. [ abs ][ pdf ][ bib ] [ code ]
On the Complexity of SHAP-Score-Based Explanations: Tractability via Knowledge Compilation and Non-Approximability Results Marcelo Arenas, Pablo Barcelo, Leopoldo Bertossi, Mikael Monet , 2023. [ abs ][ pdf ][ bib ]
Monotonic Alpha-divergence Minimisation for Variational Inference Kamélia Daudel, Randal Douc, François Roueff , 2023. [ abs ][ pdf ][ bib ]
Density estimation on low-dimensional manifolds: an inflation-deflation approach Christian Horvat, Jean-Pascal Pfister , 2023. [ abs ][ pdf ][ bib ] [ code ]
Provably Sample-Efficient Model-Free Algorithm for MDPs with Peak Constraints Qinbo Bai, Vaneet Aggarwal, Ather Gattami , 2023. [ abs ][ pdf ][ bib ]
Topological Convolutional Layers for Deep Learning Ephy R. Love, Benjamin Filippenko, Vasileios Maroulas, Gunnar Carlsson , 2023. [ abs ][ pdf ][ bib ]
Online Stochastic Gradient Descent with Arbitrary Initialization Solves Non-smooth, Non-convex Phase Retrieval Yan Shuo Tan, Roman Vershynin , 2023. [ abs ][ pdf ][ bib ]
Tree-AMP: Compositional Inference with Tree Approximate Message Passing Antoine Baker, Florent Krzakala, Benjamin Aubin, Lenka Zdeborová , 2023. [ abs ][ pdf ][ bib ] [ code ]
On the geometry of Stein variational gradient descent Andrew Duncan, Nikolas Nüsken, Lukasz Szpruch , 2023. [ abs ][ pdf ][ bib ]
Kernel-based estimation for partially functional linear model: Minimax rates and randomized sketches Shaogao Lv, Xin He, Junhui Wang , 2023. [ abs ][ pdf ][ bib ]
Contextual Stochastic Block Model: Sharp Thresholds and Contiguity Chen Lu, Subhabrata Sen , 2023. [ abs ][ pdf ][ bib ]
VCG Mechanism Design with Unknown Agent Values under Stochastic Bandit Feedback Kirthevasan Kandasamy, Joseph E Gonzalez, Michael I Jordan, Ion Stoica , 2023. [ abs ][ pdf ][ bib ]
Necessary and Sufficient Conditions for Inverse Reinforcement Learning of Bayesian Stopping Time Problems Kunal Pattanayak, Vikram Krishnamurthy , 2023. [ abs ][ pdf ][ bib ]
Online Change-Point Detection in High-Dimensional Covariance Structure with Application to Dynamic Networks Lingjun Li, Jun Li , 2023. [ abs ][ pdf ][ bib ]
Convergence Rates of a Class of Multivariate Density Estimation Methods Based on Adaptive Partitioning Linxi Liu, Dangna Li, Wing Hung Wong , 2023. [ abs ][ pdf ][ bib ]
Reinforcement Learning for Joint Optimization of Multiple Rewards Mridul Agarwal, Vaneet Aggarwal , 2023. [ abs ][ pdf ][ bib ]
On the Convergence of Stochastic Gradient Descent with Bandwidth-based Step Size Xiaoyu Wang, Ya-xiang Yuan , 2023. [ abs ][ pdf ][ bib ]
A Group-Theoretic Approach to Computational Abstraction: Symmetry-Driven Hierarchical Clustering Haizi Yu, Igor Mineyev, Lav R. Varshney , 2023. [ abs ][ pdf ][ bib ]
The d-Separation Criterion in Categorical Probability Tobias Fritz, Andreas Klingler , 2023. [ abs ][ pdf ][ bib ]
The multimarginal optimal transport formulation of adversarial multiclass classification Nicolás García Trillos, Matt Jacobs, Jakwang Kim , 2023. [ abs ][ pdf ][ bib ]
Robust Load Balancing with Machine Learned Advice Sara Ahmadian, Hossein Esfandiari, Vahab Mirrokni, Binghui Peng , 2023. [ abs ][ pdf ][ bib ]
Benchmarking Graph Neural Networks Vijay Prakash Dwivedi, Chaitanya K. Joshi, Anh Tuan Luu, Thomas Laurent, Yoshua Bengio, Xavier Bresson , 2023. [ abs ][ pdf ][ bib ] [ code ]
A Simple Approach to Improve Single-Model Deep Uncertainty via Distance-Awareness Jeremiah Zhe Liu, Shreyas Padhy, Jie Ren, Zi Lin, Yeming Wen, Ghassen Jerfel, Zachary Nado, Jasper Snoek, Dustin Tran, Balaji Lakshminarayanan , 2023. [ abs ][ pdf ][ bib ] [ code ]
Neural Implicit Flow: a mesh-agnostic dimensionality reduction paradigm of spatio-temporal data Shaowu Pan, Steven L. Brunton, J. Nathan Kutz , 2023. [ abs ][ pdf ][ bib ] [ code ]
On Batch Teaching Without Collusion Shaun Fallat, David Kirkpatrick, Hans U. Simon, Abolghasem Soltani, Sandra Zilles , 2023. [ abs ][ pdf ][ bib ]
Sensing Theorems for Unsupervised Learning in Linear Inverse Problems Julián Tachella, Dongdong Chen, Mike Davies , 2023. [ abs ][ pdf ][ bib ]
First-Order Algorithms for Nonlinear Generalized Nash Equilibrium Problems Michael I. Jordan, Tianyi Lin, Manolis Zampetakis , 2023. [ abs ][ pdf ][ bib ]
Ridges, Neural Networks, and the Radon Transform Michael Unser , 2023. [ abs ][ pdf ][ bib ]
Label Distribution Changing Learning with Sample Space Expanding Chao Xu, Hong Tao, Jing Zhang, Dewen Hu, Chenping Hou , 2023. [ abs ][ pdf ][ bib ]
Can Reinforcement Learning Find Stackelberg-Nash Equilibria in General-Sum Markov Games with Myopically Rational Followers? Han Zhong, Zhuoran Yang, Zhaoran Wang, Michael I. Jordan , 2023. [ abs ][ pdf ][ bib ]
Quantus: An Explainable AI Toolkit for Responsible Evaluation of Neural Network Explanations and Beyond Anna Hedström, Leander Weber, Daniel Krakowczyk, Dilyara Bareeva, Franz Motzkus, Wojciech Samek, Sebastian Lapuschkin, Marina M.-C. Höhne , 2023. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ] [ code ]
Gap Minimization for Knowledge Sharing and Transfer Boyu Wang, Jorge A. Mendez, Changjian Shui, Fan Zhou, Di Wu, Gezheng Xu, Christian Gagné, Eric Eaton , 2023. [ abs ][ pdf ][ bib ] [ code ]
Sparse PCA: a Geometric Approach Dimitris Bertsimas, Driss Lahlou Kitane , 2023. [ abs ][ pdf ][ bib ]
Labels, Information, and Computation: Efficient Learning Using Sufficient Labels Shiyu Duan, Spencer Chang, Jose C. Principe , 2023. [ abs ][ pdf ][ bib ]
Attacks against Federated Learning Defense Systems and their Mitigation Cody Lewis, Vijay Varadharajan, Nasimul Noman , 2023. [ abs ][ pdf ][ bib ] [ code ]
HiClass: a Python Library for Local Hierarchical Classification Compatible with Scikit-learn Fábio M. Miranda, Niklas Köhnecke, Bernhard Y. Renard , 2023. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ] [ code ]
Impact of classification difficulty on the weight matrices spectra in Deep Learning and application to early-stopping XuranMeng, JeffYao , 2023. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ] [ code ]
The SKIM-FA Kernel: High-Dimensional Variable Selection and Nonlinear Interaction Discovery in Linear Time Raj Agrawal, Tamara Broderick , 2023. [ abs ][ pdf ][ bib ]
Generalization Bounds for Noisy Iterative Algorithms Using Properties of Additive Noise Channels Hao Wang, Rui Gao, Flavio P. Calmon , 2023. [ abs ][ pdf ][ bib ]
Discrete Variational Calculus for Accelerated Optimization Cédric M. Campos, Alejandro Mahillo, David Martín de Diego , 2023. [ abs ][ pdf ][ bib ] [ code ]
Calibrated Multiple-Output Quantile Regression with Representation Learning Shai Feldman, Stephen Bates, Yaniv Romano , 2023. [ abs ][ pdf ][ bib ] [ code ]
Bayesian Data Selection Eli N. Weinstein, Jeffrey W. Miller , 2023. [ abs ][ pdf ][ bib ] [ code ]
Lower Bounds and Accelerated Algorithms for Bilevel Optimization Kaiyi ji, Yingbin Liang , 2023. [ abs ][ pdf ][ bib ]
Graph-Aided Online Multi-Kernel Learning Pouya M. Ghari, Yanning Shen , 2023. [ abs ][ pdf ][ bib ] [ code ]
Interpolating Classifiers Make Few Mistakes Tengyuan Liang, Benjamin Recht , 2023. [ abs ][ pdf ][ bib ]
Regularized Joint Mixture Models Konstantinos Perrakis, Thomas Lartigue, Frank Dondelinger, Sach Mukherjee , 2023. [ abs ][ pdf ][ bib ] [ code ]
An Inertial Block Majorization Minimization Framework for Nonsmooth Nonconvex Optimization Le Thi Khanh Hien, Duy Nhat Phan, Nicolas Gillis , 2023. [ abs ][ pdf ][ bib ] [ code ]
Learning Mean-Field Games with Discounted and Average Costs Berkay Anahtarci, Can Deha Kariksiz, Naci Saldi , 2023. [ abs ][ pdf ][ bib ]
Globally-Consistent Rule-Based Summary-Explanations for Machine Learning Models: Application to Credit-Risk Evaluation Cynthia Rudin, Yaron Shaposhnik , 2023. [ abs ][ pdf ][ bib ] [ code ]
Extending Adversarial Attacks to Produce Adversarial Class Probability Distributions Jon Vadillo, Roberto Santana, Jose A. Lozano , 2023. [ abs ][ pdf ][ bib ] [ code ]
Python package for causal discovery based on LiNGAM Takashi Ikeuchi, Mayumi Ide, Yan Zeng, Takashi Nicholas Maeda, Shohei Shimizu , 2023. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ] [ code ]
Adaptation to the Range in K-Armed Bandits Hédi Hadiji, Gilles Stoltz , 2023. [ abs ][ pdf ][ bib ]
Learning-augmented count-min sketches via Bayesian nonparametrics Emanuele Dolera, Stefano Favaro, Stefano Peluchetti , 2023. [ abs ][ pdf ][ bib ]
Optimal Strategies for Reject Option Classifiers Vojtech Franc, Daniel Prusa, Vaclav Voracek , 2023. [ abs ][ pdf ][ bib ]
A Line-Search Descent Algorithm for Strict Saddle Functions with Complexity Guarantees Michael J. O'Neill, Stephen J. Wright , 2023. [ abs ][ pdf ][ bib ]
Sampling random graph homomorphisms and applications to network data analysis Hanbaek Lyu, Facundo Memoli, David Sivakoff , 2023. [ abs ][ pdf ][ bib ] [ code ]
A Relaxed Inertial Forward-Backward-Forward Algorithm for Solving Monotone Inclusions with Application to GANs Radu I. Bot, Michael Sedlmayer, Phan Tu Vuong , 2023. [ abs ][ pdf ][ bib ]
On Distance and Kernel Measures of Conditional Dependence Tianhong Sheng, Bharath K. Sriperumbudur , 2023. [ abs ][ pdf ][ bib ]
AutoKeras: An AutoML Library for Deep Learning Haifeng Jin, François Chollet, Qingquan Song, Xia Hu , 2023. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ] [ code ]
Cluster-Specific Predictions with Multi-Task Gaussian Processes Arthur Leroy, Pierre Latouche, Benjamin Guedj, Servane Gey , 2023. [ abs ][ pdf ][ bib ] [ code ]
Efficient Structure-preserving Support Tensor Train Machine Kirandeep Kour, Sergey Dolgov, Martin Stoll, Peter Benner , 2023. [ abs ][ pdf ][ bib ] [ code ]
Bayesian Spiked Laplacian Graphs Leo L Duan, George Michailidis, Mingzhou Ding , 2023. [ abs ][ pdf ][ bib ] [ code ]
The Brier Score under Administrative Censoring: Problems and a Solution Håvard Kvamme, Ørnulf Borgan , 2023. [ abs ][ pdf ][ bib ]
Approximation Bounds for Hierarchical Clustering: Average Linkage, Bisecting K-means, and Local Search Benjamin Moseley, Joshua R. Wang , 2023. [ abs ][ pdf ][ bib ]
Subscribe to the PwC Newsletter
Join the community, trending research, video-llava: learning united visual representation by alignment before projection.

In this work, we unify visual representation into the language feature space to advance the foundational LLM towards a unified LVLM.

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
We present Stable Video Diffusion — a latent video diffusion model for high-resolution, state-of-the-art text-to-video and image-to-video generation.

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
In this paper, we present StyleTTS 2, a text-to-speech (TTS) model that leverages style diffusion and adversarial training with large speech language models (SLMs) to achieve human-level TTS synthesis.
Exponentially Faster Language Modelling
Language models only really need to use an exponential fraction of their neurons for individual inferences.
Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models
We present a method to create interpretable concept sliders that enable precise control over attributes in image generations from diffusion models.
LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching
The recent advancements in text-to-3D generation mark a significant milestone in generative models, unlocking new possibilities for creating imaginative 3D assets across various real-world scenarios.
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis
Furthermore, we significantly improve the naturalness and speaker similarity of synthetic speech even in zero-shot speech synthesis scenarios.

Diffuse, Attend, and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion
The proposed method does not require any training or language dependency to extract quality segmentation for any images.

MagicDance: Realistic Human Dance Video Generation with Motions & Facial Expressions Transfer
boese0601/magicdance • 18 Nov 2023
In this work, we propose MagicDance, a diffusion-based model for 2D human motion and facial expression transfer on challenging human dance videos.
Improving Sample Quality of Diffusion Models Using Self-Attention Guidance
Denoising diffusion models (DDMs) have attracted attention for their exceptional generation quality and diversity.

- Review Article
- Published: 22 March 2021
Machine Learning: Algorithms, Real-World Applications and Research Directions
- Iqbal H. Sarker ORCID: orcid.org/0000-0003-1740-5517 1 , 2
SN Computer Science volume 2 , Article number: 160 ( 2021 ) Cite this article
359k Accesses
956 Citations
19 Altmetric
Metrics details
In the current age of the Fourth Industrial Revolution (4 IR or Industry 4.0), the digital world has a wealth of data, such as Internet of Things (IoT) data, cybersecurity data, mobile data, business data, social media data, health data, etc. To intelligently analyze these data and develop the corresponding smart and automated applications, the knowledge of artificial intelligence (AI), particularly, machine learning (ML) is the key. Various types of machine learning algorithms such as supervised, unsupervised, semi-supervised, and reinforcement learning exist in the area. Besides, the deep learning , which is part of a broader family of machine learning methods, can intelligently analyze the data on a large scale. In this paper, we present a comprehensive view on these machine learning algorithms that can be applied to enhance the intelligence and the capabilities of an application. Thus, this study’s key contribution is explaining the principles of different machine learning techniques and their applicability in various real-world application domains, such as cybersecurity systems, smart cities, healthcare, e-commerce, agriculture, and many more. We also highlight the challenges and potential research directions based on our study. Overall, this paper aims to serve as a reference point for both academia and industry professionals as well as for decision-makers in various real-world situations and application areas, particularly from the technical point of view.
Working on a manuscript?
Introduction.
We live in the age of data, where everything around us is connected to a data source, and everything in our lives is digitally recorded [ 21 , 103 ]. For instance, the current electronic world has a wealth of various kinds of data, such as the Internet of Things (IoT) data, cybersecurity data, smart city data, business data, smartphone data, social media data, health data, COVID-19 data, and many more. The data can be structured, semi-structured, or unstructured, discussed briefly in Sect. “ Types of Real-World Data and Machine Learning Techniques ”, which is increasing day-by-day. Extracting insights from these data can be used to build various intelligent applications in the relevant domains. For instance, to build a data-driven automated and intelligent cybersecurity system, the relevant cybersecurity data can be used [ 105 ]; to build personalized context-aware smart mobile applications, the relevant mobile data can be used [ 103 ], and so on. Thus, the data management tools and techniques having the capability of extracting insights or useful knowledge from the data in a timely and intelligent way is urgently needed, on which the real-world applications are based.

The worldwide popularity score of various types of ML algorithms (supervised, unsupervised, semi-supervised, and reinforcement) in a range of 0 (min) to 100 (max) over time where x-axis represents the timestamp information and y-axis represents the corresponding score
Artificial intelligence (AI), particularly, machine learning (ML) have grown rapidly in recent years in the context of data analysis and computing that typically allows the applications to function in an intelligent manner [ 95 ]. ML usually provides systems with the ability to learn and enhance from experience automatically without being specifically programmed and is generally referred to as the most popular latest technologies in the fourth industrial revolution (4 IR or Industry 4.0) [ 103 , 105 ]. “Industry 4.0” [ 114 ] is typically the ongoing automation of conventional manufacturing and industrial practices, including exploratory data processing, using new smart technologies such as machine learning automation. Thus, to intelligently analyze these data and to develop the corresponding real-world applications, machine learning algorithms is the key. The learning algorithms can be categorized into four major types, such as supervised, unsupervised, semi-supervised, and reinforcement learning in the area [ 75 ], discussed briefly in Sect. “ Types of Real-World Data and Machine Learning Techniques ”. The popularity of these approaches to learning is increasing day-by-day, which is shown in Fig. 1 , based on data collected from Google Trends [ 4 ] over the last five years. The x - axis of the figure indicates the specific dates and the corresponding popularity score within the range of \(0 \; (minimum)\) to \(100 \; (maximum)\) has been shown in y - axis . According to Fig. 1 , the popularity indication values for these learning types are low in 2015 and are increasing day by day. These statistics motivate us to study on machine learning in this paper, which can play an important role in the real-world through Industry 4.0 automation.
In general, the effectiveness and the efficiency of a machine learning solution depend on the nature and characteristics of data and the performance of the learning algorithms . In the area of machine learning algorithms, classification analysis, regression, data clustering, feature engineering and dimensionality reduction, association rule learning, or reinforcement learning techniques exist to effectively build data-driven systems [ 41 , 125 ]. Besides, deep learning originated from the artificial neural network that can be used to intelligently analyze data, which is known as part of a wider family of machine learning approaches [ 96 ]. Thus, selecting a proper learning algorithm that is suitable for the target application in a particular domain is challenging. The reason is that the purpose of different learning algorithms is different, even the outcome of different learning algorithms in a similar category may vary depending on the data characteristics [ 106 ]. Thus, it is important to understand the principles of various machine learning algorithms and their applicability to apply in various real-world application areas, such as IoT systems, cybersecurity services, business and recommendation systems, smart cities, healthcare and COVID-19, context-aware systems, sustainable agriculture, and many more that are explained briefly in Sect. “ Applications of Machine Learning ”.
Based on the importance and potentiality of “Machine Learning” to analyze the data mentioned above, in this paper, we provide a comprehensive view on various types of machine learning algorithms that can be applied to enhance the intelligence and the capabilities of an application. Thus, the key contribution of this study is explaining the principles and potentiality of different machine learning techniques, and their applicability in various real-world application areas mentioned earlier. The purpose of this paper is, therefore, to provide a basic guide for those academia and industry people who want to study, research, and develop data-driven automated and intelligent systems in the relevant areas based on machine learning techniques.
The key contributions of this paper are listed as follows:
To define the scope of our study by taking into account the nature and characteristics of various types of real-world data and the capabilities of various learning techniques.
To provide a comprehensive view on machine learning algorithms that can be applied to enhance the intelligence and capabilities of a data-driven application.
To discuss the applicability of machine learning-based solutions in various real-world application domains.
To highlight and summarize the potential research directions within the scope of our study for intelligent data analysis and services.
The rest of the paper is organized as follows. The next section presents the types of data and machine learning algorithms in a broader sense and defines the scope of our study. We briefly discuss and explain different machine learning algorithms in the subsequent section followed by which various real-world application areas based on machine learning algorithms are discussed and summarized. In the penultimate section, we highlight several research issues and potential future directions, and the final section concludes this paper.
Types of Real-World Data and Machine Learning Techniques
Machine learning algorithms typically consume and process data to learn the related patterns about individuals, business processes, transactions, events, and so on. In the following, we discuss various types of real-world data as well as categories of machine learning algorithms.
Types of Real-World Data
Usually, the availability of data is considered as the key to construct a machine learning model or data-driven real-world systems [ 103 , 105 ]. Data can be of various forms, such as structured, semi-structured, or unstructured [ 41 , 72 ]. Besides, the “metadata” is another type that typically represents data about the data. In the following, we briefly discuss these types of data.
Structured: It has a well-defined structure, conforms to a data model following a standard order, which is highly organized and easily accessed, and used by an entity or a computer program. In well-defined schemes, such as relational databases, structured data are typically stored, i.e., in a tabular format. For instance, names, dates, addresses, credit card numbers, stock information, geolocation, etc. are examples of structured data.
Unstructured: On the other hand, there is no pre-defined format or organization for unstructured data, making it much more difficult to capture, process, and analyze, mostly containing text and multimedia material. For example, sensor data, emails, blog entries, wikis, and word processing documents, PDF files, audio files, videos, images, presentations, web pages, and many other types of business documents can be considered as unstructured data.
Semi-structured: Semi-structured data are not stored in a relational database like the structured data mentioned above, but it does have certain organizational properties that make it easier to analyze. HTML, XML, JSON documents, NoSQL databases, etc., are some examples of semi-structured data.
Metadata: It is not the normal form of data, but “data about data”. The primary difference between “data” and “metadata” is that data are simply the material that can classify, measure, or even document something relative to an organization’s data properties. On the other hand, metadata describes the relevant data information, giving it more significance for data users. A basic example of a document’s metadata might be the author, file size, date generated by the document, keywords to define the document, etc.
In the area of machine learning and data science, researchers use various widely used datasets for different purposes. These are, for example, cybersecurity datasets such as NSL-KDD [ 119 ], UNSW-NB15 [ 76 ], ISCX’12 [ 1 ], CIC-DDoS2019 [ 2 ], Bot-IoT [ 59 ], etc., smartphone datasets such as phone call logs [ 84 , 101 ], SMS Log [ 29 ], mobile application usages logs [ 137 ] [ 117 ], mobile phone notification logs [ 73 ] etc., IoT data [ 16 , 57 , 62 ], agriculture and e-commerce data [ 120 , 138 ], health data such as heart disease [ 92 ], diabetes mellitus [ 83 , 134 ], COVID-19 [ 43 , 74 ], etc., and many more in various application domains. The data can be in different types discussed above, which may vary from application to application in the real world. To analyze such data in a particular problem domain, and to extract the insights or useful knowledge from the data for building the real-world intelligent applications, different types of machine learning techniques can be used according to their learning capabilities, which is discussed in the following.
Types of Machine Learning Techniques
Machine Learning algorithms are mainly divided into four categories: Supervised learning, Unsupervised learning, Semi-supervised learning, and Reinforcement learning [ 75 ], as shown in Fig. 2 . In the following, we briefly discuss each type of learning technique with the scope of their applicability to solve real-world problems.

Various types of machine learning techniques
Supervised: Supervised learning is typically the task of machine learning to learn a function that maps an input to an output based on sample input-output pairs [ 41 ]. It uses labeled training data and a collection of training examples to infer a function. Supervised learning is carried out when certain goals are identified to be accomplished from a certain set of inputs [ 105 ], i.e., a task-driven approach . The most common supervised tasks are “classification” that separates the data, and “regression” that fits the data. For instance, predicting the class label or sentiment of a piece of text, like a tweet or a product review, i.e., text classification, is an example of supervised learning.
Unsupervised: Unsupervised learning analyzes unlabeled datasets without the need for human interference, i.e., a data-driven process [ 41 ]. This is widely used for extracting generative features, identifying meaningful trends and structures, groupings in results, and exploratory purposes. The most common unsupervised learning tasks are clustering, density estimation, feature learning, dimensionality reduction, finding association rules, anomaly detection, etc.
Semi-supervised: Semi-supervised learning can be defined as a hybridization of the above-mentioned supervised and unsupervised methods, as it operates on both labeled and unlabeled data [ 41 , 105 ]. Thus, it falls between learning “without supervision” and learning “with supervision”. In the real world, labeled data could be rare in several contexts, and unlabeled data are numerous, where semi-supervised learning is useful [ 75 ]. The ultimate goal of a semi-supervised learning model is to provide a better outcome for prediction than that produced using the labeled data alone from the model. Some application areas where semi-supervised learning is used include machine translation, fraud detection, labeling data and text classification.
Reinforcement: Reinforcement learning is a type of machine learning algorithm that enables software agents and machines to automatically evaluate the optimal behavior in a particular context or environment to improve its efficiency [ 52 ], i.e., an environment-driven approach . This type of learning is based on reward or penalty, and its ultimate goal is to use insights obtained from environmental activists to take action to increase the reward or minimize the risk [ 75 ]. It is a powerful tool for training AI models that can help increase automation or optimize the operational efficiency of sophisticated systems such as robotics, autonomous driving tasks, manufacturing and supply chain logistics, however, not preferable to use it for solving the basic or straightforward problems.
Thus, to build effective models in various application areas different types of machine learning techniques can play a significant role according to their learning capabilities, depending on the nature of the data discussed earlier, and the target outcome. In Table 1 , we summarize various types of machine learning techniques with examples. In the following, we provide a comprehensive view of machine learning algorithms that can be applied to enhance the intelligence and capabilities of a data-driven application.
Machine Learning Tasks and Algorithms
In this section, we discuss various machine learning algorithms that include classification analysis, regression analysis, data clustering, association rule learning, feature engineering for dimensionality reduction, as well as deep learning methods. A general structure of a machine learning-based predictive model has been shown in Fig. 3 , where the model is trained from historical data in phase 1 and the outcome is generated in phase 2 for the new test data.

A general structure of a machine learning based predictive model considering both the training and testing phase
Classification Analysis
Classification is regarded as a supervised learning method in machine learning, referring to a problem of predictive modeling as well, where a class label is predicted for a given example [ 41 ]. Mathematically, it maps a function ( f ) from input variables ( X ) to output variables ( Y ) as target, label or categories. To predict the class of given data points, it can be carried out on structured or unstructured data. For example, spam detection such as “spam” and “not spam” in email service providers can be a classification problem. In the following, we summarize the common classification problems.
Binary classification: It refers to the classification tasks having two class labels such as “true and false” or “yes and no” [ 41 ]. In such binary classification tasks, one class could be the normal state, while the abnormal state could be another class. For instance, “cancer not detected” is the normal state of a task that involves a medical test, and “cancer detected” could be considered as the abnormal state. Similarly, “spam” and “not spam” in the above example of email service providers are considered as binary classification.
Multiclass classification: Traditionally, this refers to those classification tasks having more than two class labels [ 41 ]. The multiclass classification does not have the principle of normal and abnormal outcomes, unlike binary classification tasks. Instead, within a range of specified classes, examples are classified as belonging to one. For example, it can be a multiclass classification task to classify various types of network attacks in the NSL-KDD [ 119 ] dataset, where the attack categories are classified into four class labels, such as DoS (Denial of Service Attack), U2R (User to Root Attack), R2L (Root to Local Attack), and Probing Attack.
Multi-label classification: In machine learning, multi-label classification is an important consideration where an example is associated with several classes or labels. Thus, it is a generalization of multiclass classification, where the classes involved in the problem are hierarchically structured, and each example may simultaneously belong to more than one class in each hierarchical level, e.g., multi-level text classification. For instance, Google news can be presented under the categories of a “city name”, “technology”, or “latest news”, etc. Multi-label classification includes advanced machine learning algorithms that support predicting various mutually non-exclusive classes or labels, unlike traditional classification tasks where class labels are mutually exclusive [ 82 ].
Many classification algorithms have been proposed in the machine learning and data science literature [ 41 , 125 ]. In the following, we summarize the most common and popular methods that are used widely in various application areas.
Naive Bayes (NB): The naive Bayes algorithm is based on the Bayes’ theorem with the assumption of independence between each pair of features [ 51 ]. It works well and can be used for both binary and multi-class categories in many real-world situations, such as document or text classification, spam filtering, etc. To effectively classify the noisy instances in the data and to construct a robust prediction model, the NB classifier can be used [ 94 ]. The key benefit is that, compared to more sophisticated approaches, it needs a small amount of training data to estimate the necessary parameters and quickly [ 82 ]. However, its performance may affect due to its strong assumptions on features independence. Gaussian, Multinomial, Complement, Bernoulli, and Categorical are the common variants of NB classifier [ 82 ].
Linear Discriminant Analysis (LDA): Linear Discriminant Analysis (LDA) is a linear decision boundary classifier created by fitting class conditional densities to data and applying Bayes’ rule [ 51 , 82 ]. This method is also known as a generalization of Fisher’s linear discriminant, which projects a given dataset into a lower-dimensional space, i.e., a reduction of dimensionality that minimizes the complexity of the model or reduces the resulting model’s computational costs. The standard LDA model usually suits each class with a Gaussian density, assuming that all classes share the same covariance matrix [ 82 ]. LDA is closely related to ANOVA (analysis of variance) and regression analysis, which seek to express one dependent variable as a linear combination of other features or measurements.
Logistic regression (LR): Another common probabilistic based statistical model used to solve classification issues in machine learning is Logistic Regression (LR) [ 64 ]. Logistic regression typically uses a logistic function to estimate the probabilities, which is also referred to as the mathematically defined sigmoid function in Eq. 1 . It can overfit high-dimensional datasets and works well when the dataset can be separated linearly. The regularization (L1 and L2) techniques [ 82 ] can be used to avoid over-fitting in such scenarios. The assumption of linearity between the dependent and independent variables is considered as a major drawback of Logistic Regression. It can be used for both classification and regression problems, but it is more commonly used for classification.
K-nearest neighbors (KNN): K-Nearest Neighbors (KNN) [ 9 ] is an “instance-based learning” or non-generalizing learning, also known as a “lazy learning” algorithm. It does not focus on constructing a general internal model; instead, it stores all instances corresponding to training data in n -dimensional space. KNN uses data and classifies new data points based on similarity measures (e.g., Euclidean distance function) [ 82 ]. Classification is computed from a simple majority vote of the k nearest neighbors of each point. It is quite robust to noisy training data, and accuracy depends on the data quality. The biggest issue with KNN is to choose the optimal number of neighbors to be considered. KNN can be used both for classification as well as regression.
Support vector machine (SVM): In machine learning, another common technique that can be used for classification, regression, or other tasks is a support vector machine (SVM) [ 56 ]. In high- or infinite-dimensional space, a support vector machine constructs a hyper-plane or set of hyper-planes. Intuitively, the hyper-plane, which has the greatest distance from the nearest training data points in any class, achieves a strong separation since, in general, the greater the margin, the lower the classifier’s generalization error. It is effective in high-dimensional spaces and can behave differently based on different mathematical functions known as the kernel. Linear, polynomial, radial basis function (RBF), sigmoid, etc., are the popular kernel functions used in SVM classifier [ 82 ]. However, when the data set contains more noise, such as overlapping target classes, SVM does not perform well.
Decision tree (DT): Decision tree (DT) [ 88 ] is a well-known non-parametric supervised learning method. DT learning methods are used for both the classification and regression tasks [ 82 ]. ID3 [ 87 ], C4.5 [ 88 ], and CART [ 20 ] are well known for DT algorithms. Moreover, recently proposed BehavDT [ 100 ], and IntrudTree [ 97 ] by Sarker et al. are effective in the relevant application domains, such as user behavior analytics and cybersecurity analytics, respectively. By sorting down the tree from the root to some leaf nodes, as shown in Fig. 4 , DT classifies the instances. Instances are classified by checking the attribute defined by that node, starting at the root node of the tree, and then moving down the tree branch corresponding to the attribute value. For splitting, the most popular criteria are “gini” for the Gini impurity and “entropy” for the information gain that can be expressed mathematically as [ 82 ].

An example of a decision tree structure

An example of a random forest structure considering multiple decision trees
Random forest (RF): A random forest classifier [ 19 ] is well known as an ensemble classification technique that is used in the field of machine learning and data science in various application areas. This method uses “parallel ensembling” which fits several decision tree classifiers in parallel, as shown in Fig. 5 , on different data set sub-samples and uses majority voting or averages for the outcome or final result. It thus minimizes the over-fitting problem and increases the prediction accuracy and control [ 82 ]. Therefore, the RF learning model with multiple decision trees is typically more accurate than a single decision tree based model [ 106 ]. To build a series of decision trees with controlled variation, it combines bootstrap aggregation (bagging) [ 18 ] and random feature selection [ 11 ]. It is adaptable to both classification and regression problems and fits well for both categorical and continuous values.
Adaptive Boosting (AdaBoost): Adaptive Boosting (AdaBoost) is an ensemble learning process that employs an iterative approach to improve poor classifiers by learning from their errors. This is developed by Yoav Freund et al. [ 35 ] and also known as “meta-learning”. Unlike the random forest that uses parallel ensembling, Adaboost uses “sequential ensembling”. It creates a powerful classifier by combining many poorly performing classifiers to obtain a good classifier of high accuracy. In that sense, AdaBoost is called an adaptive classifier by significantly improving the efficiency of the classifier, but in some instances, it can trigger overfits. AdaBoost is best used to boost the performance of decision trees, base estimator [ 82 ], on binary classification problems, however, is sensitive to noisy data and outliers.
Extreme gradient boosting (XGBoost): Gradient Boosting, like Random Forests [ 19 ] above, is an ensemble learning algorithm that generates a final model based on a series of individual models, typically decision trees. The gradient is used to minimize the loss function, similar to how neural networks [ 41 ] use gradient descent to optimize weights. Extreme Gradient Boosting (XGBoost) is a form of gradient boosting that takes more detailed approximations into account when determining the best model [ 82 ]. It computes second-order gradients of the loss function to minimize loss and advanced regularization (L1 and L2) [ 82 ], which reduces over-fitting, and improves model generalization and performance. XGBoost is fast to interpret and can handle large-sized datasets well.
Stochastic gradient descent (SGD): Stochastic gradient descent (SGD) [ 41 ] is an iterative method for optimizing an objective function with appropriate smoothness properties, where the word ‘stochastic’ refers to random probability. This reduces the computational burden, particularly in high-dimensional optimization problems, allowing for faster iterations in exchange for a lower convergence rate. A gradient is the slope of a function that calculates a variable’s degree of change in response to another variable’s changes. Mathematically, the Gradient Descent is a convex function whose output is a partial derivative of a set of its input parameters. Let, \(\alpha\) is the learning rate, and \(J_i\) is the training example cost of \(i \mathrm{th}\) , then Eq. ( 4 ) represents the stochastic gradient descent weight update method at the \(j^\mathrm{th}\) iteration. In large-scale and sparse machine learning, SGD has been successfully applied to problems often encountered in text classification and natural language processing [ 82 ]. However, SGD is sensitive to feature scaling and needs a range of hyperparameters, such as the regularization parameter and the number of iterations.
Rule-based classification : The term rule-based classification can be used to refer to any classification scheme that makes use of IF-THEN rules for class prediction. Several classification algorithms such as Zero-R [ 125 ], One-R [ 47 ], decision trees [ 87 , 88 ], DTNB [ 110 ], Ripple Down Rule learner (RIDOR) [ 125 ], Repeated Incremental Pruning to Produce Error Reduction (RIPPER) [ 126 ] exist with the ability of rule generation. The decision tree is one of the most common rule-based classification algorithms among these techniques because it has several advantages, such as being easier to interpret; the ability to handle high-dimensional data; simplicity and speed; good accuracy; and the capability to produce rules for human clear and understandable classification [ 127 ] [ 128 ]. The decision tree-based rules also provide significant accuracy in a prediction model for unseen test cases [ 106 ]. Since the rules are easily interpretable, these rule-based classifiers are often used to produce descriptive models that can describe a system including the entities and their relationships.

Classification vs. regression. In classification the dotted line represents a linear boundary that separates the two classes; in regression, the dotted line models the linear relationship between the two variables
Regression Analysis
Regression analysis includes several methods of machine learning that allow to predict a continuous ( y ) result variable based on the value of one or more ( x ) predictor variables [ 41 ]. The most significant distinction between classification and regression is that classification predicts distinct class labels, while regression facilitates the prediction of a continuous quantity. Figure 6 shows an example of how classification is different with regression models. Some overlaps are often found between the two types of machine learning algorithms. Regression models are now widely used in a variety of fields, including financial forecasting or prediction, cost estimation, trend analysis, marketing, time series estimation, drug response modeling, and many more. Some of the familiar types of regression algorithms are linear, polynomial, lasso and ridge regression, etc., which are explained briefly in the following.
Simple and multiple linear regression: This is one of the most popular ML modeling techniques as well as a well-known regression technique. In this technique, the dependent variable is continuous, the independent variable(s) can be continuous or discrete, and the form of the regression line is linear. Linear regression creates a relationship between the dependent variable ( Y ) and one or more independent variables ( X ) (also known as regression line) using the best fit straight line [ 41 ]. It is defined by the following equations:
where a is the intercept, b is the slope of the line, and e is the error term. This equation can be used to predict the value of the target variable based on the given predictor variable(s). Multiple linear regression is an extension of simple linear regression that allows two or more predictor variables to model a response variable, y, as a linear function [ 41 ] defined in Eq. 6 , whereas simple linear regression has only 1 independent variable, defined in Eq. 5 .
Polynomial regression: Polynomial regression is a form of regression analysis in which the relationship between the independent variable x and the dependent variable y is not linear, but is the polynomial degree of \(n^\mathrm{th}\) in x [ 82 ]. The equation for polynomial regression is also derived from linear regression (polynomial regression of degree 1) equation, which is defined as below:
Here, y is the predicted/target output, \(b_0, b_1,... b_n\) are the regression coefficients, x is an independent/ input variable. In simple words, we can say that if data are not distributed linearly, instead it is \(n^\mathrm{th}\) degree of polynomial then we use polynomial regression to get desired output.
LASSO and ridge regression: LASSO and Ridge regression are well known as powerful techniques which are typically used for building learning models in presence of a large number of features, due to their capability to preventing over-fitting and reducing the complexity of the model. The LASSO (least absolute shrinkage and selection operator) regression model uses L 1 regularization technique [ 82 ] that uses shrinkage, which penalizes “absolute value of magnitude of coefficients” ( L 1 penalty). As a result, LASSO appears to render coefficients to absolute zero. Thus, LASSO regression aims to find the subset of predictors that minimizes the prediction error for a quantitative response variable. On the other hand, ridge regression uses L 2 regularization [ 82 ], which is the “squared magnitude of coefficients” ( L 2 penalty). Thus, ridge regression forces the weights to be small but never sets the coefficient value to zero, and does a non-sparse solution. Overall, LASSO regression is useful to obtain a subset of predictors by eliminating less important features, and ridge regression is useful when a data set has “multicollinearity” which refers to the predictors that are correlated with other predictors.
Cluster Analysis
Cluster analysis, also known as clustering, is an unsupervised machine learning technique for identifying and grouping related data points in large datasets without concern for the specific outcome. It does grouping a collection of objects in such a way that objects in the same category, called a cluster, are in some sense more similar to each other than objects in other groups [ 41 ]. It is often used as a data analysis technique to discover interesting trends or patterns in data, e.g., groups of consumers based on their behavior. In a broad range of application areas, such as cybersecurity, e-commerce, mobile data processing, health analytics, user modeling and behavioral analytics, clustering can be used. In the following, we briefly discuss and summarize various types of clustering methods.
Partitioning methods: Based on the features and similarities in the data, this clustering approach categorizes the data into multiple groups or clusters. The data scientists or analysts typically determine the number of clusters either dynamically or statically depending on the nature of the target applications, to produce for the methods of clustering. The most common clustering algorithms based on partitioning methods are K-means [ 69 ], K-Mediods [ 80 ], CLARA [ 55 ] etc.
Density-based methods: To identify distinct groups or clusters, it uses the concept that a cluster in the data space is a contiguous region of high point density isolated from other such clusters by contiguous regions of low point density. Points that are not part of a cluster are considered as noise. The typical clustering algorithms based on density are DBSCAN [ 32 ], OPTICS [ 12 ] etc. The density-based methods typically struggle with clusters of similar density and high dimensionality data.
Hierarchical-based methods: Hierarchical clustering typically seeks to construct a hierarchy of clusters, i.e., the tree structure. Strategies for hierarchical clustering generally fall into two types: (i) Agglomerative—a “bottom-up” approach in which each observation begins in its cluster and pairs of clusters are combined as one, moves up the hierarchy, and (ii) Divisive—a “top-down” approach in which all observations begin in one cluster and splits are performed recursively, moves down the hierarchy, as shown in Fig 7 . Our earlier proposed BOTS technique, Sarker et al. [ 102 ] is an example of a hierarchical, particularly, bottom-up clustering algorithm.
Grid-based methods: To deal with massive datasets, grid-based clustering is especially suitable. To obtain clusters, the principle is first to summarize the dataset with a grid representation and then to combine grid cells. STING [ 122 ], CLIQUE [ 6 ], etc. are the standard algorithms of grid-based clustering.
Model-based methods: There are mainly two types of model-based clustering algorithms: one that uses statistical learning, and the other based on a method of neural network learning [ 130 ]. For instance, GMM [ 89 ] is an example of a statistical learning method, and SOM [ 22 ] [ 96 ] is an example of a neural network learning method.
Constraint-based methods: Constrained-based clustering is a semi-supervised approach to data clustering that uses constraints to incorporate domain knowledge. Application or user-oriented constraints are incorporated to perform the clustering. The typical algorithms of this kind of clustering are COP K-means [ 121 ], CMWK-Means [ 27 ], etc.

A graphical interpretation of the widely-used hierarchical clustering (Bottom-up and top-down) technique
Many clustering algorithms have been proposed with the ability to grouping data in machine learning and data science literature [ 41 , 125 ]. In the following, we summarize the popular methods that are used widely in various application areas.
K-means clustering: K-means clustering [ 69 ] is a fast, robust, and simple algorithm that provides reliable results when data sets are well-separated from each other. The data points are allocated to a cluster in this algorithm in such a way that the amount of the squared distance between the data points and the centroid is as small as possible. In other words, the K-means algorithm identifies the k number of centroids and then assigns each data point to the nearest cluster while keeping the centroids as small as possible. Since it begins with a random selection of cluster centers, the results can be inconsistent. Since extreme values can easily affect a mean, the K-means clustering algorithm is sensitive to outliers. K-medoids clustering [ 91 ] is a variant of K-means that is more robust to noises and outliers.
Mean-shift clustering: Mean-shift clustering [ 37 ] is a nonparametric clustering technique that does not require prior knowledge of the number of clusters or constraints on cluster shape. Mean-shift clustering aims to discover “blobs” in a smooth distribution or density of samples [ 82 ]. It is a centroid-based algorithm that works by updating centroid candidates to be the mean of the points in a given region. To form the final set of centroids, these candidates are filtered in a post-processing stage to remove near-duplicates. Cluster analysis in computer vision and image processing are examples of application domains. Mean Shift has the disadvantage of being computationally expensive. Moreover, in cases of high dimension, where the number of clusters shifts abruptly, the mean-shift algorithm does not work well.
DBSCAN: Density-based spatial clustering of applications with noise (DBSCAN) [ 32 ] is a base algorithm for density-based clustering which is widely used in data mining and machine learning. This is known as a non-parametric density-based clustering technique for separating high-density clusters from low-density clusters that are used in model building. DBSCAN’s main idea is that a point belongs to a cluster if it is close to many points from that cluster. It can find clusters of various shapes and sizes in a vast volume of data that is noisy and contains outliers. DBSCAN, unlike k-means, does not require a priori specification of the number of clusters in the data and can find arbitrarily shaped clusters. Although k-means is much faster than DBSCAN, it is efficient at finding high-density regions and outliers, i.e., is robust to outliers.
GMM clustering: Gaussian mixture models (GMMs) are often used for data clustering, which is a distribution-based clustering algorithm. A Gaussian mixture model is a probabilistic model in which all the data points are produced by a mixture of a finite number of Gaussian distributions with unknown parameters [ 82 ]. To find the Gaussian parameters for each cluster, an optimization algorithm called expectation-maximization (EM) [ 82 ] can be used. EM is an iterative method that uses a statistical model to estimate the parameters. In contrast to k-means, Gaussian mixture models account for uncertainty and return the likelihood that a data point belongs to one of the k clusters. GMM clustering is more robust than k-means and works well even with non-linear data distributions.
Agglomerative hierarchical clustering: The most common method of hierarchical clustering used to group objects in clusters based on their similarity is agglomerative clustering. This technique uses a bottom-up approach, where each object is first treated as a singleton cluster by the algorithm. Following that, pairs of clusters are merged one by one until all clusters have been merged into a single large cluster containing all objects. The result is a dendrogram, which is a tree-based representation of the elements. Single linkage [ 115 ], Complete linkage [ 116 ], BOTS [ 102 ] etc. are some examples of such techniques. The main advantage of agglomerative hierarchical clustering over k-means is that the tree-structure hierarchy generated by agglomerative clustering is more informative than the unstructured collection of flat clusters returned by k-means, which can help to make better decisions in the relevant application areas.
Dimensionality Reduction and Feature Learning
In machine learning and data science, high-dimensional data processing is a challenging task for both researchers and application developers. Thus, dimensionality reduction which is an unsupervised learning technique, is important because it leads to better human interpretations, lower computational costs, and avoids overfitting and redundancy by simplifying models. Both the process of feature selection and feature extraction can be used for dimensionality reduction. The primary distinction between the selection and extraction of features is that the “feature selection” keeps a subset of the original features [ 97 ], while “feature extraction” creates brand new ones [ 98 ]. In the following, we briefly discuss these techniques.
Feature selection: The selection of features, also known as the selection of variables or attributes in the data, is the process of choosing a subset of unique features (variables, predictors) to use in building machine learning and data science model. It decreases a model’s complexity by eliminating the irrelevant or less important features and allows for faster training of machine learning algorithms. A right and optimal subset of the selected features in a problem domain is capable to minimize the overfitting problem through simplifying and generalizing the model as well as increases the model’s accuracy [ 97 ]. Thus, “feature selection” [ 66 , 99 ] is considered as one of the primary concepts in machine learning that greatly affects the effectiveness and efficiency of the target machine learning model. Chi-squared test, Analysis of variance (ANOVA) test, Pearson’s correlation coefficient, recursive feature elimination, are some popular techniques that can be used for feature selection.
Feature extraction: In a machine learning-based model or system, feature extraction techniques usually provide a better understanding of the data, a way to improve prediction accuracy, and to reduce computational cost or training time. The aim of “feature extraction” [ 66 , 99 ] is to reduce the number of features in a dataset by generating new ones from the existing ones and then discarding the original features. The majority of the information found in the original set of features can then be summarized using this new reduced set of features. For instance, principal components analysis (PCA) is often used as a dimensionality-reduction technique to extract a lower-dimensional space creating new brand components from the existing features in a dataset [ 98 ].
Many algorithms have been proposed to reduce data dimensions in the machine learning and data science literature [ 41 , 125 ]. In the following, we summarize the popular methods that are used widely in various application areas.
Variance threshold: A simple basic approach to feature selection is the variance threshold [ 82 ]. This excludes all features of low variance, i.e., all features whose variance does not exceed the threshold. It eliminates all zero-variance characteristics by default, i.e., characteristics that have the same value in all samples. This feature selection algorithm looks only at the ( X ) features, not the ( y ) outputs needed, and can, therefore, be used for unsupervised learning.
Pearson correlation: Pearson’s correlation is another method to understand a feature’s relation to the response variable and can be used for feature selection [ 99 ]. This method is also used for finding the association between the features in a dataset. The resulting value is \([-1, 1]\) , where \(-1\) means perfect negative correlation, \(+1\) means perfect positive correlation, and 0 means that the two variables do not have a linear correlation. If two random variables represent X and Y , then the correlation coefficient between X and Y is defined as [ 41 ]
ANOVA: Analysis of variance (ANOVA) is a statistical tool used to verify the mean values of two or more groups that differ significantly from each other. ANOVA assumes a linear relationship between the variables and the target and the variables’ normal distribution. To statistically test the equality of means, the ANOVA method utilizes F tests. For feature selection, the results ‘ANOVA F value’ [ 82 ] of this test can be used where certain features independent of the goal variable can be omitted.
Chi square: The chi-square \({\chi }^2\) [ 82 ] statistic is an estimate of the difference between the effects of a series of events or variables observed and expected frequencies. The magnitude of the difference between the real and observed values, the degrees of freedom, and the sample size depends on \({\chi }^2\) . The chi-square \({\chi }^2\) is commonly used for testing relationships between categorical variables. If \(O_i\) represents observed value and \(E_i\) represents expected value, then
Recursive feature elimination (RFE): Recursive Feature Elimination (RFE) is a brute force approach to feature selection. RFE [ 82 ] fits the model and removes the weakest feature before it meets the specified number of features. Features are ranked by the coefficients or feature significance of the model. RFE aims to remove dependencies and collinearity in the model by recursively removing a small number of features per iteration.
Model-based selection: To reduce the dimensionality of the data, linear models penalized with the L 1 regularization can be used. Least absolute shrinkage and selection operator (Lasso) regression is a type of linear regression that has the property of shrinking some of the coefficients to zero [ 82 ]. Therefore, that feature can be removed from the model. Thus, the penalized lasso regression method, often used in machine learning to select the subset of variables. Extra Trees Classifier [ 82 ] is an example of a tree-based estimator that can be used to compute impurity-based function importance, which can then be used to discard irrelevant features.
Principal component analysis (PCA): Principal component analysis (PCA) is a well-known unsupervised learning approach in the field of machine learning and data science. PCA is a mathematical technique that transforms a set of correlated variables into a set of uncorrelated variables known as principal components [ 48 , 81 ]. Figure 8 shows an example of the effect of PCA on various dimensions space, where Fig. 8 a shows the original features in 3D space, and Fig. 8 b shows the created principal components PC1 and PC2 onto a 2D plane, and 1D line with the principal component PC1 respectively. Thus, PCA can be used as a feature extraction technique that reduces the dimensionality of the datasets, and to build an effective machine learning model [ 98 ]. Technically, PCA identifies the completely transformed with the highest eigenvalues of a covariance matrix and then uses those to project the data into a new subspace of equal or fewer dimensions [ 82 ].

An example of a principal component analysis (PCA) and created principal components PC1 and PC2 in different dimension space
Association Rule Learning
Association rule learning is a rule-based machine learning approach to discover interesting relationships, “IF-THEN” statements, in large datasets between variables [ 7 ]. One example is that “if a customer buys a computer or laptop (an item), s/he is likely to also buy anti-virus software (another item) at the same time”. Association rules are employed today in many application areas, including IoT services, medical diagnosis, usage behavior analytics, web usage mining, smartphone applications, cybersecurity applications, and bioinformatics. In comparison to sequence mining, association rule learning does not usually take into account the order of things within or across transactions. A common way of measuring the usefulness of association rules is to use its parameter, the ‘support’ and ‘confidence’, which is introduced in [ 7 ].
In the data mining literature, many association rule learning methods have been proposed, such as logic dependent [ 34 ], frequent pattern based [ 8 , 49 , 68 ], and tree-based [ 42 ]. The most popular association rule learning algorithms are summarized below.
AIS and SETM: AIS is the first algorithm proposed by Agrawal et al. [ 7 ] for association rule mining. The AIS algorithm’s main downside is that too many candidate itemsets are generated, requiring more space and wasting a lot of effort. This algorithm calls for too many passes over the entire dataset to produce the rules. Another approach SETM [ 49 ] exhibits good performance and stable behavior with execution time; however, it suffers from the same flaw as the AIS algorithm.
Apriori: For generating association rules for a given dataset, Agrawal et al. [ 8 ] proposed the Apriori, Apriori-TID, and Apriori-Hybrid algorithms. These later algorithms outperform the AIS and SETM mentioned above due to the Apriori property of frequent itemset [ 8 ]. The term ‘Apriori’ usually refers to having prior knowledge of frequent itemset properties. Apriori uses a “bottom-up” approach, where it generates the candidate itemsets. To reduce the search space, Apriori uses the property “all subsets of a frequent itemset must be frequent; and if an itemset is infrequent, then all its supersets must also be infrequent”. Another approach predictive Apriori [ 108 ] can also generate rules; however, it receives unexpected results as it combines both the support and confidence. The Apriori [ 8 ] is the widely applicable techniques in mining association rules.
ECLAT: This technique was proposed by Zaki et al. [ 131 ] and stands for Equivalence Class Clustering and bottom-up Lattice Traversal. ECLAT uses a depth-first search to find frequent itemsets. In contrast to the Apriori [ 8 ] algorithm, which represents data in a horizontal pattern, it represents data vertically. Hence, the ECLAT algorithm is more efficient and scalable in the area of association rule learning. This algorithm is better suited for small and medium datasets whereas the Apriori algorithm is used for large datasets.
FP-Growth: Another common association rule learning technique based on the frequent-pattern tree (FP-tree) proposed by Han et al. [ 42 ] is Frequent Pattern Growth, known as FP-Growth. The key difference with Apriori is that while generating rules, the Apriori algorithm [ 8 ] generates frequent candidate itemsets; on the other hand, the FP-growth algorithm [ 42 ] prevents candidate generation and thus produces a tree by the successful strategy of ‘divide and conquer’ approach. Due to its sophistication, however, FP-Tree is challenging to use in an interactive mining environment [ 133 ]. Thus, the FP-Tree would not fit into memory for massive data sets, making it challenging to process big data as well. Another solution is RARM (Rapid Association Rule Mining) proposed by Das et al. [ 26 ] but faces a related FP-tree issue [ 133 ].
ABC-RuleMiner: A rule-based machine learning method, recently proposed in our earlier paper, by Sarker et al. [ 104 ], to discover the interesting non-redundant rules to provide real-world intelligent services. This algorithm effectively identifies the redundancy in associations by taking into account the impact or precedence of the related contextual features and discovers a set of non-redundant association rules. This algorithm first constructs an association generation tree (AGT), a top-down approach, and then extracts the association rules through traversing the tree. Thus, ABC-RuleMiner is more potent than traditional rule-based methods in terms of both non-redundant rule generation and intelligent decision-making, particularly in a context-aware smart computing environment, where human or user preferences are involved.
Among the association rule learning techniques discussed above, Apriori [ 8 ] is the most widely used algorithm for discovering association rules from a given dataset [ 133 ]. The main strength of the association learning technique is its comprehensiveness, as it generates all associations that satisfy the user-specified constraints, such as minimum support and confidence value. The ABC-RuleMiner approach [ 104 ] discussed earlier could give significant results in terms of non-redundant rule generation and intelligent decision-making for the relevant application areas in the real world.
Reinforcement Learning
Reinforcement learning (RL) is a machine learning technique that allows an agent to learn by trial and error in an interactive environment using input from its actions and experiences. Unlike supervised learning, which is based on given sample data or examples, the RL method is based on interacting with the environment. The problem to be solved in reinforcement learning (RL) is defined as a Markov Decision Process (MDP) [ 86 ], i.e., all about sequentially making decisions. An RL problem typically includes four elements such as Agent, Environment, Rewards, and Policy.
RL can be split roughly into Model-based and Model-free techniques. Model-based RL is the process of inferring optimal behavior from a model of the environment by performing actions and observing the results, which include the next state and the immediate reward [ 85 ]. AlphaZero, AlphaGo [ 113 ] are examples of the model-based approaches. On the other hand, a model-free approach does not use the distribution of the transition probability and the reward function associated with MDP. Q-learning, Deep Q Network, Monte Carlo Control, SARSA (State–Action–Reward–State–Action), etc. are some examples of model-free algorithms [ 52 ]. The policy network, which is required for model-based RL but not for model-free, is the key difference between model-free and model-based learning. In the following, we discuss the popular RL algorithms.
Monte Carlo methods: Monte Carlo techniques, or Monte Carlo experiments, are a wide category of computational algorithms that rely on repeated random sampling to obtain numerical results [ 52 ]. The underlying concept is to use randomness to solve problems that are deterministic in principle. Optimization, numerical integration, and making drawings from the probability distribution are the three problem classes where Monte Carlo techniques are most commonly used.
Q-learning: Q-learning is a model-free reinforcement learning algorithm for learning the quality of behaviors that tell an agent what action to take under what conditions [ 52 ]. It does not need a model of the environment (hence the term “model-free”), and it can deal with stochastic transitions and rewards without the need for adaptations. The ‘Q’ in Q-learning usually stands for quality, as the algorithm calculates the maximum expected rewards for a given behavior in a given state.
Deep Q-learning: The basic working step in Deep Q-Learning [ 52 ] is that the initial state is fed into the neural network, which returns the Q-value of all possible actions as an output. Still, when we have a reasonably simple setting to overcome, Q-learning works well. However, when the number of states and actions becomes more complicated, deep learning can be used as a function approximator.
Reinforcement learning, along with supervised and unsupervised learning, is one of the basic machine learning paradigms. RL can be used to solve numerous real-world problems in various fields, such as game theory, control theory, operations analysis, information theory, simulation-based optimization, manufacturing, supply chain logistics, multi-agent systems, swarm intelligence, aircraft control, robot motion control, and many more.
Artificial Neural Network and Deep Learning
Deep learning is part of a wider family of artificial neural networks (ANN)-based machine learning approaches with representation learning. Deep learning provides a computational architecture by combining several processing layers, such as input, hidden, and output layers, to learn from data [ 41 ]. The main advantage of deep learning over traditional machine learning methods is its better performance in several cases, particularly learning from large datasets [ 105 , 129 ]. Figure 9 shows a general performance of deep learning over machine learning considering the increasing amount of data. However, it may vary depending on the data characteristics and experimental set up.

Machine learning and deep learning performance in general with the amount of data
The most common deep learning algorithms are: Multi-layer Perceptron (MLP), Convolutional Neural Network (CNN, or ConvNet), Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) [ 96 ]. In the following, we discuss various types of deep learning methods that can be used to build effective data-driven models for various purposes.

A structure of an artificial neural network modeling with multiple processing layers
MLP: The base architecture of deep learning, which is also known as the feed-forward artificial neural network, is called a multilayer perceptron (MLP) [ 82 ]. A typical MLP is a fully connected network consisting of an input layer, one or more hidden layers, and an output layer, as shown in Fig. 10 . Each node in one layer connects to each node in the following layer at a certain weight. MLP utilizes the “Backpropagation” technique [ 41 ], the most “fundamental building block” in a neural network, to adjust the weight values internally while building the model. MLP is sensitive to scaling features and allows a variety of hyperparameters to be tuned, such as the number of hidden layers, neurons, and iterations, which can result in a computationally costly model.
CNN or ConvNet: The convolution neural network (CNN) [ 65 ] enhances the design of the standard ANN, consisting of convolutional layers, pooling layers, as well as fully connected layers, as shown in Fig. 11 . As it takes the advantage of the two-dimensional (2D) structure of the input data, it is typically broadly used in several areas such as image and video recognition, image processing and classification, medical image analysis, natural language processing, etc. While CNN has a greater computational burden, without any manual intervention, it has the advantage of automatically detecting the important features, and hence CNN is considered to be more powerful than conventional ANN. A number of advanced deep learning models based on CNN can be used in the field, such as AlexNet [ 60 ], Xception [ 24 ], Inception [ 118 ], Visual Geometry Group (VGG) [ 44 ], ResNet [ 45 ], etc.
LSTM-RNN: Long short-term memory (LSTM) is an artificial recurrent neural network (RNN) architecture used in the area of deep learning [ 38 ]. LSTM has feedback links, unlike normal feed-forward neural networks. LSTM networks are well-suited for analyzing and learning sequential data, such as classifying, processing, and predicting data based on time series data, which differentiates it from other conventional networks. Thus, LSTM can be used when the data are in a sequential format, such as time, sentence, etc., and commonly applied in the area of time-series analysis, natural language processing, speech recognition, etc.

An example of a convolutional neural network (CNN or ConvNet) including multiple convolution and pooling layers
In addition to these most common deep learning methods discussed above, several other deep learning approaches [ 96 ] exist in the area for various purposes. For instance, the self-organizing map (SOM) [ 58 ] uses unsupervised learning to represent the high-dimensional data by a 2D grid map, thus achieving dimensionality reduction. The autoencoder (AE) [ 15 ] is another learning technique that is widely used for dimensionality reduction as well and feature extraction in unsupervised learning tasks. Restricted Boltzmann machines (RBM) [ 46 ] can be used for dimensionality reduction, classification, regression, collaborative filtering, feature learning, and topic modeling. A deep belief network (DBN) is typically composed of simple, unsupervised networks such as restricted Boltzmann machines (RBMs) or autoencoders, and a backpropagation neural network (BPNN) [ 123 ]. A generative adversarial network (GAN) [ 39 ] is a form of the network for deep learning that can generate data with characteristics close to the actual data input. Transfer learning is currently very common because it can train deep neural networks with comparatively low data, which is typically the re-use of a new problem with a pre-trained model [ 124 ]. A brief discussion of these artificial neural networks (ANN) and deep learning (DL) models are summarized in our earlier paper Sarker et al. [ 96 ].
Overall, based on the learning techniques discussed above, we can conclude that various types of machine learning techniques, such as classification analysis, regression, data clustering, feature selection and extraction, and dimensionality reduction, association rule learning, reinforcement learning, or deep learning techniques, can play a significant role for various purposes according to their capabilities. In the following section, we discuss several application areas based on machine learning algorithms.
Applications of Machine Learning
In the current age of the Fourth Industrial Revolution (4IR), machine learning becomes popular in various application areas, because of its learning capabilities from the past and making intelligent decisions. In the following, we summarize and discuss ten popular application areas of machine learning technology.
Predictive analytics and intelligent decision-making: A major application field of machine learning is intelligent decision-making by data-driven predictive analytics [ 21 , 70 ]. The basis of predictive analytics is capturing and exploiting relationships between explanatory variables and predicted variables from previous events to predict the unknown outcome [ 41 ]. For instance, identifying suspects or criminals after a crime has been committed, or detecting credit card fraud as it happens. Another application, where machine learning algorithms can assist retailers in better understanding consumer preferences and behavior, better manage inventory, avoiding out-of-stock situations, and optimizing logistics and warehousing in e-commerce. Various machine learning algorithms such as decision trees, support vector machines, artificial neural networks, etc. [ 106 , 125 ] are commonly used in the area. Since accurate predictions provide insight into the unknown, they can improve the decisions of industries, businesses, and almost any organization, including government agencies, e-commerce, telecommunications, banking and financial services, healthcare, sales and marketing, transportation, social networking, and many others.
Cybersecurity and threat intelligence: Cybersecurity is one of the most essential areas of Industry 4.0. [ 114 ], which is typically the practice of protecting networks, systems, hardware, and data from digital attacks [ 114 ]. Machine learning has become a crucial cybersecurity technology that constantly learns by analyzing data to identify patterns, better detect malware in encrypted traffic, find insider threats, predict where bad neighborhoods are online, keep people safe while browsing, or secure data in the cloud by uncovering suspicious activity. For instance, clustering techniques can be used to identify cyber-anomalies, policy violations, etc. To detect various types of cyber-attacks or intrusions machine learning classification models by taking into account the impact of security features are useful [ 97 ]. Various deep learning-based security models can also be used on the large scale of security datasets [ 96 , 129 ]. Moreover, security policy rules generated by association rule learning techniques can play a significant role to build a rule-based security system [ 105 ]. Thus, we can say that various learning techniques discussed in Sect. Machine Learning Tasks and Algorithms , can enable cybersecurity professionals to be more proactive inefficiently preventing threats and cyber-attacks.
Internet of things (IoT) and smart cities: Internet of Things (IoT) is another essential area of Industry 4.0. [ 114 ], which turns everyday objects into smart objects by allowing them to transmit data and automate tasks without the need for human interaction. IoT is, therefore, considered to be the big frontier that can enhance almost all activities in our lives, such as smart governance, smart home, education, communication, transportation, retail, agriculture, health care, business, and many more [ 70 ]. Smart city is one of IoT’s core fields of application, using technologies to enhance city services and residents’ living experiences [ 132 , 135 ]. As machine learning utilizes experience to recognize trends and create models that help predict future behavior and events, it has become a crucial technology for IoT applications [ 103 ]. For example, to predict traffic in smart cities, parking availability prediction, estimate the total usage of energy of the citizens for a particular period, make context-aware and timely decisions for the people, etc. are some tasks that can be solved using machine learning techniques according to the current needs of the people.
Traffic prediction and transportation: Transportation systems have become a crucial component of every country’s economic development. Nonetheless, several cities around the world are experiencing an excessive rise in traffic volume, resulting in serious issues such as delays, traffic congestion, higher fuel prices, increased CO \(_2\) pollution, accidents, emergencies, and a decline in modern society’s quality of life [ 40 ]. Thus, an intelligent transportation system through predicting future traffic is important, which is an indispensable part of a smart city. Accurate traffic prediction based on machine and deep learning modeling can help to minimize the issues [ 17 , 30 , 31 ]. For example, based on the travel history and trend of traveling through various routes, machine learning can assist transportation companies in predicting possible issues that may occur on specific routes and recommending their customers to take a different path. Ultimately, these learning-based data-driven models help improve traffic flow, increase the usage and efficiency of sustainable modes of transportation, and limit real-world disruption by modeling and visualizing future changes.
Healthcare and COVID-19 pandemic: Machine learning can help to solve diagnostic and prognostic problems in a variety of medical domains, such as disease prediction, medical knowledge extraction, detecting regularities in data, patient management, etc. [ 33 , 77 , 112 ]. Coronavirus disease (COVID-19) is an infectious disease caused by a newly discovered coronavirus, according to the World Health Organization (WHO) [ 3 ]. Recently, the learning techniques have become popular in the battle against COVID-19 [ 61 , 63 ]. For the COVID-19 pandemic, the learning techniques are used to classify patients at high risk, their mortality rate, and other anomalies [ 61 ]. It can also be used to better understand the virus’s origin, COVID-19 outbreak prediction, as well as for disease diagnosis and treatment [ 14 , 50 ]. With the help of machine learning, researchers can forecast where and when, the COVID-19 is likely to spread, and notify those regions to match the required arrangements. Deep learning also provides exciting solutions to the problems of medical image processing and is seen as a crucial technique for potential applications, particularly for COVID-19 pandemic [ 10 , 78 , 111 ]. Overall, machine and deep learning techniques can help to fight the COVID-19 virus and the pandemic as well as intelligent clinical decisions making in the domain of healthcare.
E-commerce and product recommendations: Product recommendation is one of the most well known and widely used applications of machine learning, and it is one of the most prominent features of almost any e-commerce website today. Machine learning technology can assist businesses in analyzing their consumers’ purchasing histories and making customized product suggestions for their next purchase based on their behavior and preferences. E-commerce companies, for example, can easily position product suggestions and offers by analyzing browsing trends and click-through rates of specific items. Using predictive modeling based on machine learning techniques, many online retailers, such as Amazon [ 71 ], can better manage inventory, prevent out-of-stock situations, and optimize logistics and warehousing. The future of sales and marketing is the ability to capture, evaluate, and use consumer data to provide a customized shopping experience. Furthermore, machine learning techniques enable companies to create packages and content that are tailored to the needs of their customers, allowing them to maintain existing customers while attracting new ones.
NLP and sentiment analysis: Natural language processing (NLP) involves the reading and understanding of spoken or written language through the medium of a computer [ 79 , 103 ]. Thus, NLP helps computers, for instance, to read a text, hear speech, interpret it, analyze sentiment, and decide which aspects are significant, where machine learning techniques can be used. Virtual personal assistant, chatbot, speech recognition, document description, language or machine translation, etc. are some examples of NLP-related tasks. Sentiment Analysis [ 90 ] (also referred to as opinion mining or emotion AI) is an NLP sub-field that seeks to identify and extract public mood and views within a given text through blogs, reviews, social media, forums, news, etc. For instance, businesses and brands use sentiment analysis to understand the social sentiment of their brand, product, or service through social media platforms or the web as a whole. Overall, sentiment analysis is considered as a machine learning task that analyzes texts for polarity, such as “positive”, “negative”, or “neutral” along with more intense emotions like very happy, happy, sad, very sad, angry, have interest, or not interested etc.
Image, speech and pattern recognition: Image recognition [ 36 ] is a well-known and widespread example of machine learning in the real world, which can identify an object as a digital image. For instance, to label an x-ray as cancerous or not, character recognition, or face detection in an image, tagging suggestions on social media, e.g., Facebook, are common examples of image recognition. Speech recognition [ 23 ] is also very popular that typically uses sound and linguistic models, e.g., Google Assistant, Cortana, Siri, Alexa, etc. [ 67 ], where machine learning methods are used. Pattern recognition [ 13 ] is defined as the automated recognition of patterns and regularities in data, e.g., image analysis. Several machine learning techniques such as classification, feature selection, clustering, or sequence labeling methods are used in the area.
Sustainable agriculture: Agriculture is essential to the survival of all human activities [ 109 ]. Sustainable agriculture practices help to improve agricultural productivity while also reducing negative impacts on the environment [ 5 , 25 , 109 ]. The sustainable agriculture supply chains are knowledge-intensive and based on information, skills, technologies, etc., where knowledge transfer encourages farmers to enhance their decisions to adopt sustainable agriculture practices utilizing the increasing amount of data captured by emerging technologies, e.g., the Internet of Things (IoT), mobile technologies and devices, etc. [ 5 , 53 , 54 ]. Machine learning can be applied in various phases of sustainable agriculture, such as in the pre-production phase - for the prediction of crop yield, soil properties, irrigation requirements, etc.; in the production phase—for weather prediction, disease detection, weed detection, soil nutrient management, livestock management, etc.; in processing phase—for demand estimation, production planning, etc. and in the distribution phase - the inventory management, consumer analysis, etc.
User behavior analytics and context-aware smartphone applications: Context-awareness is a system’s ability to capture knowledge about its surroundings at any moment and modify behaviors accordingly [ 28 , 93 ]. Context-aware computing uses software and hardware to automatically collect and interpret data for direct responses. The mobile app development environment has been changed greatly with the power of AI, particularly, machine learning techniques through their learning capabilities from contextual data [ 103 , 136 ]. Thus, the developers of mobile apps can rely on machine learning to create smart apps that can understand human behavior, support, and entertain users [ 107 , 137 , 140 ]. To build various personalized data-driven context-aware systems, such as smart interruption management, smart mobile recommendation, context-aware smart searching, decision-making that intelligently assist end mobile phone users in a pervasive computing environment, machine learning techniques are applicable. For example, context-aware association rules can be used to build an intelligent phone call application [ 104 ]. Clustering approaches are useful in capturing users’ diverse behavioral activities by taking into account data in time series [ 102 ]. To predict the future events in various contexts, the classification methods can be used [ 106 , 139 ]. Thus, various learning techniques discussed in Sect. “ Machine Learning Tasks and Algorithms ” can help to build context-aware adaptive and smart applications according to the preferences of the mobile phone users.
In addition to these application areas, machine learning-based models can also apply to several other domains such as bioinformatics, cheminformatics, computer networks, DNA sequence classification, economics and banking, robotics, advanced engineering, and many more.
Challenges and Research Directions
Our study on machine learning algorithms for intelligent data analysis and applications opens several research issues in the area. Thus, in this section, we summarize and discuss the challenges faced and the potential research opportunities and future directions.
In general, the effectiveness and the efficiency of a machine learning-based solution depend on the nature and characteristics of the data, and the performance of the learning algorithms. To collect the data in the relevant domain, such as cybersecurity, IoT, healthcare and agriculture discussed in Sect. “ Applications of Machine Learning ” is not straightforward, although the current cyberspace enables the production of a huge amount of data with very high frequency. Thus, collecting useful data for the target machine learning-based applications, e.g., smart city applications, and their management is important to further analysis. Therefore, a more in-depth investigation of data collection methods is needed while working on the real-world data. Moreover, the historical data may contain many ambiguous values, missing values, outliers, and meaningless data. The machine learning algorithms, discussed in Sect “ Machine Learning Tasks and Algorithms ” highly impact on data quality, and availability for training, and consequently on the resultant model. Thus, to accurately clean and pre-process the diverse data collected from diverse sources is a challenging task. Therefore, effectively modifying or enhance existing pre-processing methods, or proposing new data preparation techniques are required to effectively use the learning algorithms in the associated application domain.
To analyze the data and extract insights, there exist many machine learning algorithms, summarized in Sect. “ Machine Learning Tasks and Algorithms ”. Thus, selecting a proper learning algorithm that is suitable for the target application is challenging. The reason is that the outcome of different learning algorithms may vary depending on the data characteristics [ 106 ]. Selecting a wrong learning algorithm would result in producing unexpected outcomes that may lead to loss of effort, as well as the model’s effectiveness and accuracy. In terms of model building, the techniques discussed in Sect. “ Machine Learning Tasks and Algorithms ” can directly be used to solve many real-world issues in diverse domains, such as cybersecurity, smart cities and healthcare summarized in Sect. “ Applications of Machine Learning ”. However, the hybrid learning model, e.g., the ensemble of methods, modifying or enhancement of the existing learning techniques, or designing new learning methods, could be a potential future work in the area.
Thus, the ultimate success of a machine learning-based solution and corresponding applications mainly depends on both the data and the learning algorithms. If the data are bad to learn, such as non-representative, poor-quality, irrelevant features, or insufficient quantity for training, then the machine learning models may become useless or will produce lower accuracy. Therefore, effectively processing the data and handling the diverse learning algorithms are important, for a machine learning-based solution and eventually building intelligent applications.
In this paper, we have conducted a comprehensive overview of machine learning algorithms for intelligent data analysis and applications. According to our goal, we have briefly discussed how various types of machine learning methods can be used for making solutions to various real-world issues. A successful machine learning model depends on both the data and the performance of the learning algorithms. The sophisticated learning algorithms then need to be trained through the collected real-world data and knowledge related to the target application before the system can assist with intelligent decision-making. We also discussed several popular application areas based on machine learning techniques to highlight their applicability in various real-world issues. Finally, we have summarized and discussed the challenges faced and the potential research opportunities and future directions in the area. Therefore, the challenges that are identified create promising research opportunities in the field which must be addressed with effective solutions in various application areas. Overall, we believe that our study on machine learning-based solutions opens up a promising direction and can be used as a reference guide for potential research and applications for both academia and industry professionals as well as for decision-makers, from a technical point of view.
Canadian institute of cybersecurity, university of new brunswick, iscx dataset, http://www.unb.ca/cic/datasets/index.html/ (Accessed on 20 October 2019).
Cic-ddos2019 [online]. available: https://www.unb.ca/cic/datasets/ddos-2019.html/ (Accessed on 28 March 2020).
World health organization: WHO. http://www.who.int/ .
Google trends. In https://trends.google.com/trends/ , 2019.
Adnan N, Nordin Shahrina Md, Rahman I, Noor A. The effects of knowledge transfer on farmers decision making toward sustainable agriculture practices. World J Sci Technol Sustain Dev. 2018.
Agrawal R, Gehrke J, Gunopulos D, Raghavan P. Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the 1998 ACM SIGMOD international conference on Management of data. 1998; 94–105
Agrawal R, Imieliński T, Swami A. Mining association rules between sets of items in large databases. In: ACM SIGMOD Record. ACM. 1993;22: 207–216
Agrawal R, Gehrke J, Gunopulos D, Raghavan P. Fast algorithms for mining association rules. In: Proceedings of the International Joint Conference on Very Large Data Bases, Santiago Chile. 1994; 1215: 487–499.
Aha DW, Kibler D, Albert M. Instance-based learning algorithms. Mach Learn. 1991;6(1):37–66.
Article Google Scholar
Alakus TB, Turkoglu I. Comparison of deep learning approaches to predict covid-19 infection. Chaos Solit Fract. 2020;140:
Amit Y, Geman D. Shape quantization and recognition with randomized trees. Neural Comput. 1997;9(7):1545–88.
Ankerst M, Breunig MM, Kriegel H-P, Sander J. Optics: ordering points to identify the clustering structure. ACM Sigmod Record. 1999;28(2):49–60.
Anzai Y. Pattern recognition and machine learning. Elsevier; 2012.
MATH Google Scholar
Ardabili SF, Mosavi A, Ghamisi P, Ferdinand F, Varkonyi-Koczy AR, Reuter U, Rabczuk T, Atkinson PM. Covid-19 outbreak prediction with machine learning. Algorithms. 2020;13(10):249.
Article MathSciNet Google Scholar
Baldi P. Autoencoders, unsupervised learning, and deep architectures. In: Proceedings of ICML workshop on unsupervised and transfer learning, 2012; 37–49 .
Balducci F, Impedovo D, Pirlo G. Machine learning applications on agricultural datasets for smart farm enhancement. Machines. 2018;6(3):38.
Boukerche A, Wang J. Machine learning-based traffic prediction models for intelligent transportation systems. Comput Netw. 2020;181
Breiman L. Bagging predictors. Mach Learn. 1996;24(2):123–40.
Article MATH Google Scholar
Breiman L. Random forests. Mach Learn. 2001;45(1):5–32.
Breiman L, Friedman J, Stone CJ, Olshen RA. Classification and regression trees. CRC Press; 1984.
Cao L. Data science: a comprehensive overview. ACM Comput Surv (CSUR). 2017;50(3):43.
Google Scholar
Carpenter GA, Grossberg S. A massively parallel architecture for a self-organizing neural pattern recognition machine. Comput Vis Graph Image Process. 1987;37(1):54–115.
Chiu C-C, Sainath TN, Wu Y, Prabhavalkar R, Nguyen P, Chen Z, Kannan A, Weiss RJ, Rao K, Gonina E, et al. State-of-the-art speech recognition with sequence-to-sequence models. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018 pages 4774–4778. IEEE .
Chollet F. Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1251–1258, 2017.
Cobuloglu H, Büyüktahtakın IE. A stochastic multi-criteria decision analysis for sustainable biomass crop selection. Expert Syst Appl. 2015;42(15–16):6065–74.
Das A, Ng W-K, Woon Y-K. Rapid association rule mining. In: Proceedings of the tenth international conference on Information and knowledge management, pages 474–481. ACM, 2001.
de Amorim RC. Constrained clustering with minkowski weighted k-means. In: 2012 IEEE 13th International Symposium on Computational Intelligence and Informatics (CINTI), pages 13–17. IEEE, 2012.
Dey AK. Understanding and using context. Person Ubiquit Comput. 2001;5(1):4–7.
Eagle N, Pentland AS. Reality mining: sensing complex social systems. Person Ubiquit Comput. 2006;10(4):255–68.
Essien A, Petrounias I, Sampaio P, Sampaio S. Improving urban traffic speed prediction using data source fusion and deep learning. In: 2019 IEEE International Conference on Big Data and Smart Computing (BigComp). IEEE. 2019: 1–8. .
Essien A, Petrounias I, Sampaio P, Sampaio S. A deep-learning model for urban traffic flow prediction with traffic events mined from twitter. In: World Wide Web, 2020: 1–24 .
Ester M, Kriegel H-P, Sander J, Xiaowei X, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd. 1996;96:226–31.
Fatima M, Pasha M, et al. Survey of machine learning algorithms for disease diagnostic. J Intell Learn Syst Appl. 2017;9(01):1.
Flach PA, Lachiche N. Confirmation-guided discovery of first-order rules with tertius. Mach Learn. 2001;42(1–2):61–95.
Freund Y, Schapire RE, et al. Experiments with a new boosting algorithm. In: Icml, Citeseer. 1996; 96: 148–156
Fujiyoshi H, Hirakawa T, Yamashita T. Deep learning-based image recognition for autonomous driving. IATSS Res. 2019;43(4):244–52.
Fukunaga K, Hostetler L. The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans Inform Theory. 1975;21(1):32–40.
Article MathSciNet MATH Google Scholar
Goodfellow I, Bengio Y, Courville A, Bengio Y. Deep learning. Cambridge: MIT Press; 2016.
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. In: Advances in neural information processing systems. 2014: 2672–2680.
Guerrero-Ibáñez J, Zeadally S, Contreras-Castillo J. Sensor technologies for intelligent transportation systems. Sensors. 2018;18(4):1212.
Han J, Pei J, Kamber M. Data mining: concepts and techniques. Amsterdam: Elsevier; 2011.
Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation. In: ACM Sigmod Record, ACM. 2000;29: 1–12.
Harmon SA, Sanford TH, Sheng X, Turkbey EB, Roth H, Ziyue X, Yang D, Myronenko A, Anderson V, Amalou A, et al. Artificial intelligence for the detection of covid-19 pneumonia on chest ct using multinational datasets. Nat Commun. 2020;11(1):1–7.
He K, Zhang X, Ren S, Sun J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell. 2015;37(9):1904–16.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016: 770–778.
Hinton GE. A practical guide to training restricted boltzmann machines. In: Neural networks: Tricks of the trade. Springer. 2012; 599-619
Holte RC. Very simple classification rules perform well on most commonly used datasets. Mach Learn. 1993;11(1):63–90.
Hotelling H. Analysis of a complex of statistical variables into principal components. J Edu Psychol. 1933;24(6):417.
Houtsma M, Swami A. Set-oriented mining for association rules in relational databases. In: Data Engineering, 1995. Proceedings of the Eleventh International Conference on, IEEE.1995:25–33.
Jamshidi M, Lalbakhsh A, Talla J, Peroutka Z, Hadjilooei F, Lalbakhsh P, Jamshidi M, La Spada L, Mirmozafari M, Dehghani M, et al. Artificial intelligence and covid-19: deep learning approaches for diagnosis and treatment. IEEE Access. 2020;8:109581–95.
John GH, Langley P. Estimating continuous distributions in bayesian classifiers. In: Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc. 1995; 338–345
Kaelbling LP, Littman ML, Moore AW. Reinforcement learning: a survey. J Artif Intell Res. 1996;4:237–85.
Kamble SS, Gunasekaran A, Gawankar SA. Sustainable industry 4.0 framework: a systematic literature review identifying the current trends and future perspectives. Process Saf Environ Protect. 2018;117:408–25.
Kamble SS, Gunasekaran A, Gawankar SA. Achieving sustainable performance in a data-driven agriculture supply chain: a review for research and applications. Int J Prod Econ. 2020;219:179–94.
Kaufman L, Rousseeuw PJ. Finding groups in data: an introduction to cluster analysis, vol. 344. John Wiley & Sons; 2009.
Keerthi SS, Shevade SK, Bhattacharyya C, Radha Krishna MK. Improvements to platt’s smo algorithm for svm classifier design. Neural Comput. 2001;13(3):637–49.
Khadse V, Mahalle PN, Biraris SV. An empirical comparison of supervised machine learning algorithms for internet of things data. In: 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), IEEE. 2018; 1–6
Kohonen T. The self-organizing map. Proc IEEE. 1990;78(9):1464–80.
Koroniotis N, Moustafa N, Sitnikova E, Turnbull B. Towards the development of realistic botnet dataset in the internet of things for network forensic analytics: bot-iot dataset. Fut Gen Comput Syst. 2019;100:779–96.
Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, 2012: 1097–1105
Kushwaha S, Bahl S, Bagha AK, Parmar KS, Javaid M, Haleem A, Singh RP. Significant applications of machine learning for covid-19 pandemic. J Ind Integr Manag. 2020;5(4).
Lade P, Ghosh R, Srinivasan S. Manufacturing analytics and industrial internet of things. IEEE Intell Syst. 2017;32(3):74–9.
Lalmuanawma S, Hussain J, Chhakchhuak L. Applications of machine learning and artificial intelligence for covid-19 (sars-cov-2) pandemic: a review. Chaos Sol Fract. 2020:110059 .
LeCessie S, Van Houwelingen JC. Ridge estimators in logistic regression. J R Stat Soc Ser C (Appl Stat). 1992;41(1):191–201.
LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE. 1998;86(11):2278–324.
Liu H, Motoda H. Feature extraction, construction and selection: A data mining perspective, vol. 453. Springer Science & Business Media; 1998.
López G, Quesada L, Guerrero LA. Alexa vs. siri vs. cortana vs. google assistant: a comparison of speech-based natural user interfaces. In: International Conference on Applied Human Factors and Ergonomics, Springer. 2017; 241–250.
Liu B, HsuW, Ma Y. Integrating classification and association rule mining. In: Proceedings of the fourth international conference on knowledge discovery and data mining, 1998.
MacQueen J, et al. Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, 1967;volume 1, pages 281–297. Oakland, CA, USA.
Mahdavinejad MS, Rezvan M, Barekatain M, Adibi P, Barnaghi P, Sheth AP. Machine learning for internet of things data analysis: a survey. Digit Commun Netw. 2018;4(3):161–75.
Marchand A, Marx P. Automated product recommendations with preference-based explanations. J Retail. 2020;96(3):328–43.
McCallum A. Information extraction: distilling structured data from unstructured text. Queue. 2005;3(9):48–57.
Mehrotra A, Hendley R, Musolesi M. Prefminer: mining user’s preferences for intelligent mobile notification management. In: Proceedings of the International Joint Conference on Pervasive and Ubiquitous Computing, Heidelberg, Germany, 12–16 September, 2016; pp. 1223–1234. ACM, New York, USA. .
Mohamadou Y, Halidou A, Kapen PT. A review of mathematical modeling, artificial intelligence and datasets used in the study, prediction and management of covid-19. Appl Intell. 2020;50(11):3913–25.
Mohammed M, Khan MB, Bashier Mohammed BE. Machine learning: algorithms and applications. CRC Press; 2016.
Book Google Scholar
Moustafa N, Slay J. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In: 2015 military communications and information systems conference (MilCIS), 2015;pages 1–6. IEEE .
Nilashi M, Ibrahim OB, Ahmadi H, Shahmoradi L. An analytical method for diseases prediction using machine learning techniques. Comput Chem Eng. 2017;106:212–23.
Yujin O, Park S, Ye JC. Deep learning covid-19 features on cxr using limited training data sets. IEEE Trans Med Imaging. 2020;39(8):2688–700.
Otter DW, Medina JR , Kalita JK. A survey of the usages of deep learning for natural language processing. IEEE Trans Neural Netw Learn Syst. 2020.
Park H-S, Jun C-H. A simple and fast algorithm for k-medoids clustering. Expert Syst Appl. 2009;36(2):3336–41.
Liii Pearson K. on lines and planes of closest fit to systems of points in space. Lond Edinb Dublin Philos Mag J Sci. 1901;2(11):559–72.
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, et al. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12:2825–30.
MathSciNet MATH Google Scholar
Perveen S, Shahbaz M, Keshavjee K, Guergachi A. Metabolic syndrome and development of diabetes mellitus: predictive modeling based on machine learning techniques. IEEE Access. 2018;7:1365–75.
Santi P, Ram D, Rob C, Nathan E. Behavior-based adaptive call predictor. ACM Trans Auton Adapt Syst. 2011;6(3):21:1–21:28.
Polydoros AS, Nalpantidis L. Survey of model-based reinforcement learning: applications on robotics. J Intell Robot Syst. 2017;86(2):153–73.
Puterman ML. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons; 2014.
Quinlan JR. Induction of decision trees. Mach Learn. 1986;1:81–106.
Quinlan JR. C4.5: programs for machine learning. Mach Learn. 1993.
Rasmussen C. The infinite gaussian mixture model. Adv Neural Inform Process Syst. 1999;12:554–60.
Ravi K, Ravi V. A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowl Syst. 2015;89:14–46.
Rokach L. A survey of clustering algorithms. In: Data mining and knowledge discovery handbook, pages 269–298. Springer, 2010.
Safdar S, Zafar S, Zafar N, Khan NF. Machine learning based decision support systems (dss) for heart disease diagnosis: a review. Artif Intell Rev. 2018;50(4):597–623.
Sarker IH. Context-aware rule learning from smartphone data: survey, challenges and future directions. J Big Data. 2019;6(1):1–25.
Sarker IH. A machine learning based robust prediction model for real-life mobile phone data. Internet Things. 2019;5:180–93.
Sarker IH. Ai-driven cybersecurity: an overview, security intelligence modeling and research directions. SN Comput Sci. 2021.
Sarker IH. Deep cybersecurity: a comprehensive overview from neural network and deep learning perspective. SN Comput Sci. 2021.
Sarker IH, Abushark YB, Alsolami F, Khan A. Intrudtree: a machine learning based cyber security intrusion detection model. Symmetry. 2020;12(5):754.
Sarker IH, Abushark YB, Khan A. Contextpca: predicting context-aware smartphone apps usage based on machine learning techniques. Symmetry. 2020;12(4):499.
Sarker IH, Alqahtani H, Alsolami F, Khan A, Abushark YB, Siddiqui MK. Context pre-modeling: an empirical analysis for classification based user-centric context-aware predictive modeling. J Big Data. 2020;7(1):1–23.
Sarker IH, Alan C, Jun H, Khan AI, Abushark YB, Khaled S. Behavdt: a behavioral decision tree learning to build user-centric context-aware predictive model. Mob Netw Appl. 2019; 1–11.
Sarker IH, Colman A, Kabir MA, Han J. Phone call log as a context source to modeling individual user behavior. In: Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing (Ubicomp): Adjunct, Germany, pages 630–634. ACM, 2016.
Sarker IH, Colman A, Kabir MA, Han J. Individualized time-series segmentation for mining mobile phone user behavior. Comput J Oxf Univ UK. 2018;61(3):349–68.
Sarker IH, Hoque MM, MdK Uddin, Tawfeeq A. Mobile data science and intelligent apps: concepts, ai-based modeling and research directions. Mob Netw Appl, pages 1–19, 2020.
Sarker IH, Kayes ASM. Abc-ruleminer: user behavioral rule-based machine learning method for context-aware intelligent services. J Netw Comput Appl. 2020; page 102762
Sarker IH, Kayes ASM, Badsha S, Alqahtani H, Watters P, Ng A. Cybersecurity data science: an overview from machine learning perspective. J Big Data. 2020;7(1):1–29.
Sarker IH, Watters P, Kayes ASM. Effectiveness analysis of machine learning classification models for predicting personalized context-aware smartphone usage. J Big Data. 2019;6(1):1–28.
Sarker IH, Salah K. Appspred: predicting context-aware smartphone apps using random forest learning. Internet Things. 2019;8:
Scheffer T. Finding association rules that trade support optimally against confidence. Intell Data Anal. 2005;9(4):381–95.
Sharma R, Kamble SS, Gunasekaran A, Kumar V, Kumar A. A systematic literature review on machine learning applications for sustainable agriculture supply chain performance. Comput Oper Res. 2020;119:
Shengli S, Ling CX. Hybrid cost-sensitive decision tree, knowledge discovery in databases. In: PKDD 2005, Proceedings of 9th European Conference on Principles and Practice of Knowledge Discovery in Databases. Lecture Notes in Computer Science, volume 3721, 2005.
Shorten C, Khoshgoftaar TM, Furht B. Deep learning applications for covid-19. J Big Data. 2021;8(1):1–54.
Gökhan S, Nevin Y. Data analysis in health and big data: a machine learning medical diagnosis model based on patients’ complaints. Commun Stat Theory Methods. 2019;1–10
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, et al. Mastering the game of go with deep neural networks and tree search. nature. 2016;529(7587):484–9.
Ślusarczyk B. Industry 4.0: Are we ready? Polish J Manag Stud. 17, 2018.
Sneath Peter HA. The application of computers to taxonomy. J Gen Microbiol. 1957;17(1).
Sorensen T. Method of establishing groups of equal amplitude in plant sociology based on similarity of species. Biol Skr. 1948; 5.
Srinivasan V, Moghaddam S, Mukherji A. Mobileminer: mining your frequent patterns on your phone. In: Proceedings of the International Joint Conference on Pervasive and Ubiquitous Computing, Seattle, WA, USA, 13-17 September, pp. 389–400. ACM, New York, USA. 2014.
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2015; pages 1–9.
Tavallaee M, Bagheri E, Lu W, Ghorbani AA. A detailed analysis of the kdd cup 99 data set. In. IEEE symposium on computational intelligence for security and defense applications. IEEE. 2009;2009:1–6.
Tsagkias M. Tracy HK, Surya K, Vanessa M, de Rijke M. Challenges and research opportunities in ecommerce search and recommendations. In: ACM SIGIR Forum. volume 54. NY, USA: ACM New York; 2021. p. 1–23.
Wagstaff K, Cardie C, Rogers S, Schrödl S, et al. Constrained k-means clustering with background knowledge. Icml. 2001;1:577–84.
Wang W, Yang J, Muntz R, et al. Sting: a statistical information grid approach to spatial data mining. VLDB. 1997;97:186–95.
Wei P, Li Y, Zhang Z, Tao H, Li Z, Liu D. An optimization method for intrusion detection classification model based on deep belief network. IEEE Access. 2019;7:87593–605.
Weiss K, Khoshgoftaar TM, Wang DD. A survey of transfer learning. J Big data. 2016;3(1):9.
Witten IH, Frank E. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann; 2005.
Witten IH, Frank E, Trigg LE, Hall MA, Holmes G, Cunningham SJ. Weka: practical machine learning tools and techniques with java implementations. 1999.
Wu C-C, Yen-Liang C, Yi-Hung L, Xiang-Yu Y. Decision tree induction with a constrained number of leaf nodes. Appl Intell. 2016;45(3):673–85.
Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Philip SY, et al. Top 10 algorithms in data mining. Knowl Inform Syst. 2008;14(1):1–37.
Xin Y, Kong L, Liu Z, Chen Y, Li Y, Zhu H, Gao M, Hou H, Wang C. Machine learning and deep learning methods for cybersecurity. IEEE Access. 2018;6:35365–81.
Xu D, Yingjie T. A comprehensive survey of clustering algorithms. Ann Data Sci. 2015;2(2):165–93.
Zaki MJ. Scalable algorithms for association mining. IEEE Trans Knowl Data Eng. 2000;12(3):372–90.
Zanella A, Bui N, Castellani A, Vangelista L, Zorzi M. Internet of things for smart cities. IEEE Internet Things J. 2014;1(1):22–32.
Zhao Q, Bhowmick SS. Association rule mining: a survey. Singapore: Nanyang Technological University; 2003.
Zheng T, Xie W, Xu L, He X, Zhang Y, You M, Yang G, Chen Y. A machine learning-based framework to identify type 2 diabetes through electronic health records. Int J Med Inform. 2017;97:120–7.
Zheng Y, Rajasegarar S, Leckie C. Parking availability prediction for sensor-enabled car parks in smart cities. In: Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP), 2015 IEEE Tenth International Conference on. IEEE, 2015; pages 1–6.
Zhu H, Cao H, Chen E, Xiong H, Tian J. Exploiting enriched contextual information for mobile app classification. In: Proceedings of the 21st ACM international conference on Information and knowledge management. ACM, 2012; pages 1617–1621
Zhu H, Chen E, Xiong H, Kuifei Y, Cao H, Tian J. Mining mobile user preferences for personalized context-aware recommendation. ACM Trans Intell Syst Technol (TIST). 2014;5(4):58.
Zikang H, Yong Y, Guofeng Y, Xinyu Z. Sentiment analysis of agricultural product ecommerce review data based on deep learning. In: 2020 International Conference on Internet of Things and Intelligent Applications (ITIA), IEEE, 2020; pages 1–7
Zulkernain S, Madiraju P, Ahamed SI. A context aware interruption management system for mobile devices. In: Mobile Wireless Middleware, Operating Systems, and Applications. Springer. 2010; pages 221–234
Zulkernain S, Madiraju P, Ahamed S, Stamm K. A mobile intelligent interruption management system. J UCS. 2010;16(15):2060–80.
Download references
Author information
Authors and affiliations.
Swinburne University of Technology, Melbourne, VIC, 3122, Australia
Iqbal H. Sarker
Department of Computer Science and Engineering, Chittagong University of Engineering & Technology, 4349, Chattogram, Bangladesh
You can also search for this author in PubMed Google Scholar
Corresponding author
Correspondence to Iqbal H. Sarker .
Ethics declarations
Conflict of interest.
The author declares no conflict of interest.
Additional information
Publisher's note.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the topical collection “Advances in Computational Approaches for Artificial Intelligence, Image Processing, IoT and Cloud Applications” guest edited by Bhanu Prakash K N and M. Shivakumar.
Rights and permissions
Reprints and Permissions
About this article
Cite this article.
Sarker, I.H. Machine Learning: Algorithms, Real-World Applications and Research Directions. SN COMPUT. SCI. 2 , 160 (2021). https://doi.org/10.1007/s42979-021-00592-x
Download citation
Received : 27 January 2021
Accepted : 12 March 2021
Published : 22 March 2021
DOI : https://doi.org/10.1007/s42979-021-00592-x
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
- Machine learning
- Deep learning
- Artificial intelligence
- Data science
- Data-driven decision-making
- Predictive analytics
- Intelligent applications
Advertisement
- Find a journal
- Publish with us
machine learning Recently Published Documents
Total documents.
- Latest Documents
- Most Cited Documents
- Contributed Authors
- Related Sources
- Related Keywords
An explainable machine learning model for identifying geographical origins of sea cucumber Apostichopus japonicus based on multi-element profile
A comparison of machine learning- and regression-based models for predicting ductility ratio of rc beam-column joints, alexa, is this a historical record.
Digital transformation in government has brought an increase in the scale, variety, and complexity of records and greater levels of disorganised data. Current practices for selecting records for transfer to The National Archives (TNA) were developed to deal with paper records and are struggling to deal with this shift. This article examines the background to the problem and outlines a project that TNA undertook to research the feasibility of using commercially available artificial intelligence tools to aid selection. The project AI for Selection evaluated a range of commercial solutions varying from off-the-shelf products to cloud-hosted machine learning platforms, as well as a benchmarking tool developed in-house. Suitability of tools depended on several factors, including requirements and skills of transferring bodies as well as the tools’ usability and configurability. This article also explores questions around trust and explainability of decisions made when using AI for sensitive tasks such as selection.
Automated Text Classification of Maintenance Data of Higher Education Buildings Using Text Mining and Machine Learning Techniques
Data-driven analysis and machine learning for energy prediction in distributed photovoltaic generation plants: a case study in queensland, australia, modeling nutrient removal by membrane bioreactor at a sewage treatment plant using machine learning models, big five personality prediction based in indonesian tweets using machine learning methods.
<span lang="EN-US">The popularity of social media has drawn the attention of researchers who have conducted cross-disciplinary studies examining the relationship between personality traits and behavior on social media. Most current work focuses on personality prediction analysis of English texts, but Indonesian has received scant attention. Therefore, this research aims to predict user’s personalities based on Indonesian text from social media using machine learning techniques. This paper evaluates several machine learning techniques, including <a name="_Hlk87278444"></a>naive Bayes (NB), K-nearest neighbors (KNN), and support vector machine (SVM), based on semantic features including emotion, sentiment, and publicly available Twitter profile. We predict the personality based on the big five personality model, the most appropriate model for predicting user personality in social media. We examine the relationships between the semantic features and the Big Five personality dimensions. The experimental results indicate that the Big Five personality exhibit distinct emotional, sentimental, and social characteristics and that SVM outperformed NB and KNN for Indonesian. In addition, we observe several terms in Indonesian that specifically refer to each personality type, each of which has distinct emotional, sentimental, and social features.</span>
Compressive strength of concrete with recycled aggregate; a machine learning-based evaluation
Temperature prediction of flat steel box girders of long-span bridges utilizing in situ environmental parameters and machine learning, computer-assisted cohort identification in practice.
The standard approach to expert-in-the-loop machine learning is active learning, where, repeatedly, an expert is asked to annotate one or more records and the machine finds a classifier that respects all annotations made until that point. We propose an alternative approach, IQRef , in which the expert iteratively designs a classifier and the machine helps him or her to determine how well it is performing and, importantly, when to stop, by reporting statistics on a fixed, hold-out sample of annotated records. We justify our approach based on prior work giving a theoretical model of how to re-use hold-out data. We compare the two approaches in the context of identifying a cohort of EHRs and examine their strengths and weaknesses through a case study arising from an optometric research problem. We conclude that both approaches are complementary, and we recommend that they both be employed in conjunction to address the problem of cohort identification in health research.
Export Citation Format
Share document.

An official website of the United States government
The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.
The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.
- Publications
- Account settings
- Advanced Search
- Journal List
- Springer Nature - PMC COVID-19 Collection

Machine Learning: Algorithms, Real-World Applications and Research Directions
Iqbal h. sarker.
1 Swinburne University of Technology, Melbourne, VIC 3122 Australia
2 Department of Computer Science and Engineering, Chittagong University of Engineering & Technology, 4349 Chattogram, Bangladesh
In the current age of the Fourth Industrial Revolution (4 IR or Industry 4.0), the digital world has a wealth of data, such as Internet of Things (IoT) data, cybersecurity data, mobile data, business data, social media data, health data, etc. To intelligently analyze these data and develop the corresponding smart and automated applications, the knowledge of artificial intelligence (AI), particularly, machine learning (ML) is the key. Various types of machine learning algorithms such as supervised, unsupervised, semi-supervised, and reinforcement learning exist in the area. Besides, the deep learning , which is part of a broader family of machine learning methods, can intelligently analyze the data on a large scale. In this paper, we present a comprehensive view on these machine learning algorithms that can be applied to enhance the intelligence and the capabilities of an application. Thus, this study’s key contribution is explaining the principles of different machine learning techniques and their applicability in various real-world application domains, such as cybersecurity systems, smart cities, healthcare, e-commerce, agriculture, and many more. We also highlight the challenges and potential research directions based on our study. Overall, this paper aims to serve as a reference point for both academia and industry professionals as well as for decision-makers in various real-world situations and application areas, particularly from the technical point of view.
Introduction
We live in the age of data, where everything around us is connected to a data source, and everything in our lives is digitally recorded [ 21 , 103 ]. For instance, the current electronic world has a wealth of various kinds of data, such as the Internet of Things (IoT) data, cybersecurity data, smart city data, business data, smartphone data, social media data, health data, COVID-19 data, and many more. The data can be structured, semi-structured, or unstructured, discussed briefly in Sect. “ Types of Real-World Data and Machine Learning Techniques ”, which is increasing day-by-day. Extracting insights from these data can be used to build various intelligent applications in the relevant domains. For instance, to build a data-driven automated and intelligent cybersecurity system, the relevant cybersecurity data can be used [ 105 ]; to build personalized context-aware smart mobile applications, the relevant mobile data can be used [ 103 ], and so on. Thus, the data management tools and techniques having the capability of extracting insights or useful knowledge from the data in a timely and intelligent way is urgently needed, on which the real-world applications are based.
Artificial intelligence (AI), particularly, machine learning (ML) have grown rapidly in recent years in the context of data analysis and computing that typically allows the applications to function in an intelligent manner [ 95 ]. ML usually provides systems with the ability to learn and enhance from experience automatically without being specifically programmed and is generally referred to as the most popular latest technologies in the fourth industrial revolution (4 IR or Industry 4.0) [ 103 , 105 ]. “Industry 4.0” [ 114 ] is typically the ongoing automation of conventional manufacturing and industrial practices, including exploratory data processing, using new smart technologies such as machine learning automation. Thus, to intelligently analyze these data and to develop the corresponding real-world applications, machine learning algorithms is the key. The learning algorithms can be categorized into four major types, such as supervised, unsupervised, semi-supervised, and reinforcement learning in the area [ 75 ], discussed briefly in Sect. “ Types of Real-World Data and Machine Learning Techniques ”. The popularity of these approaches to learning is increasing day-by-day, which is shown in Fig. Fig.1, 1 , based on data collected from Google Trends [ 4 ] over the last five years. The x - axis of the figure indicates the specific dates and the corresponding popularity score within the range of 0 ( m i n i m u m ) to 100 ( m a x i m u m ) has been shown in y - axis . According to Fig. Fig.1, 1 , the popularity indication values for these learning types are low in 2015 and are increasing day by day. These statistics motivate us to study on machine learning in this paper, which can play an important role in the real-world through Industry 4.0 automation.

The worldwide popularity score of various types of ML algorithms (supervised, unsupervised, semi-supervised, and reinforcement) in a range of 0 (min) to 100 (max) over time where x-axis represents the timestamp information and y-axis represents the corresponding score
In general, the effectiveness and the efficiency of a machine learning solution depend on the nature and characteristics of data and the performance of the learning algorithms . In the area of machine learning algorithms, classification analysis, regression, data clustering, feature engineering and dimensionality reduction, association rule learning, or reinforcement learning techniques exist to effectively build data-driven systems [ 41 , 125 ]. Besides, deep learning originated from the artificial neural network that can be used to intelligently analyze data, which is known as part of a wider family of machine learning approaches [ 96 ]. Thus, selecting a proper learning algorithm that is suitable for the target application in a particular domain is challenging. The reason is that the purpose of different learning algorithms is different, even the outcome of different learning algorithms in a similar category may vary depending on the data characteristics [ 106 ]. Thus, it is important to understand the principles of various machine learning algorithms and their applicability to apply in various real-world application areas, such as IoT systems, cybersecurity services, business and recommendation systems, smart cities, healthcare and COVID-19, context-aware systems, sustainable agriculture, and many more that are explained briefly in Sect. “ Applications of Machine Learning ”.
Based on the importance and potentiality of “Machine Learning” to analyze the data mentioned above, in this paper, we provide a comprehensive view on various types of machine learning algorithms that can be applied to enhance the intelligence and the capabilities of an application. Thus, the key contribution of this study is explaining the principles and potentiality of different machine learning techniques, and their applicability in various real-world application areas mentioned earlier. The purpose of this paper is, therefore, to provide a basic guide for those academia and industry people who want to study, research, and develop data-driven automated and intelligent systems in the relevant areas based on machine learning techniques.
The key contributions of this paper are listed as follows:
- To define the scope of our study by taking into account the nature and characteristics of various types of real-world data and the capabilities of various learning techniques.
- To provide a comprehensive view on machine learning algorithms that can be applied to enhance the intelligence and capabilities of a data-driven application.
- To discuss the applicability of machine learning-based solutions in various real-world application domains.
- To highlight and summarize the potential research directions within the scope of our study for intelligent data analysis and services.
The rest of the paper is organized as follows. The next section presents the types of data and machine learning algorithms in a broader sense and defines the scope of our study. We briefly discuss and explain different machine learning algorithms in the subsequent section followed by which various real-world application areas based on machine learning algorithms are discussed and summarized. In the penultimate section, we highlight several research issues and potential future directions, and the final section concludes this paper.
Types of Real-World Data and Machine Learning Techniques
Machine learning algorithms typically consume and process data to learn the related patterns about individuals, business processes, transactions, events, and so on. In the following, we discuss various types of real-world data as well as categories of machine learning algorithms.
Types of Real-World Data
Usually, the availability of data is considered as the key to construct a machine learning model or data-driven real-world systems [ 103 , 105 ]. Data can be of various forms, such as structured, semi-structured, or unstructured [ 41 , 72 ]. Besides, the “metadata” is another type that typically represents data about the data. In the following, we briefly discuss these types of data.
- Structured: It has a well-defined structure, conforms to a data model following a standard order, which is highly organized and easily accessed, and used by an entity or a computer program. In well-defined schemes, such as relational databases, structured data are typically stored, i.e., in a tabular format. For instance, names, dates, addresses, credit card numbers, stock information, geolocation, etc. are examples of structured data.
- Unstructured: On the other hand, there is no pre-defined format or organization for unstructured data, making it much more difficult to capture, process, and analyze, mostly containing text and multimedia material. For example, sensor data, emails, blog entries, wikis, and word processing documents, PDF files, audio files, videos, images, presentations, web pages, and many other types of business documents can be considered as unstructured data.
- Semi-structured: Semi-structured data are not stored in a relational database like the structured data mentioned above, but it does have certain organizational properties that make it easier to analyze. HTML, XML, JSON documents, NoSQL databases, etc., are some examples of semi-structured data.
- Metadata: It is not the normal form of data, but “data about data”. The primary difference between “data” and “metadata” is that data are simply the material that can classify, measure, or even document something relative to an organization’s data properties. On the other hand, metadata describes the relevant data information, giving it more significance for data users. A basic example of a document’s metadata might be the author, file size, date generated by the document, keywords to define the document, etc.
In the area of machine learning and data science, researchers use various widely used datasets for different purposes. These are, for example, cybersecurity datasets such as NSL-KDD [ 119 ], UNSW-NB15 [ 76 ], ISCX’12 [ 1 ], CIC-DDoS2019 [ 2 ], Bot-IoT [ 59 ], etc., smartphone datasets such as phone call logs [ 84 , 101 ], SMS Log [ 29 ], mobile application usages logs [ 137 ] [ 117 ], mobile phone notification logs [ 73 ] etc., IoT data [ 16 , 57 , 62 ], agriculture and e-commerce data [ 120 , 138 ], health data such as heart disease [ 92 ], diabetes mellitus [ 83 , 134 ], COVID-19 [ 43 , 74 ], etc., and many more in various application domains. The data can be in different types discussed above, which may vary from application to application in the real world. To analyze such data in a particular problem domain, and to extract the insights or useful knowledge from the data for building the real-world intelligent applications, different types of machine learning techniques can be used according to their learning capabilities, which is discussed in the following.
Types of Machine Learning Techniques
Machine Learning algorithms are mainly divided into four categories: Supervised learning, Unsupervised learning, Semi-supervised learning, and Reinforcement learning [ 75 ], as shown in Fig. Fig.2. 2 . In the following, we briefly discuss each type of learning technique with the scope of their applicability to solve real-world problems.

Various types of machine learning techniques
- Supervised: Supervised learning is typically the task of machine learning to learn a function that maps an input to an output based on sample input-output pairs [ 41 ]. It uses labeled training data and a collection of training examples to infer a function. Supervised learning is carried out when certain goals are identified to be accomplished from a certain set of inputs [ 105 ], i.e., a task-driven approach . The most common supervised tasks are “classification” that separates the data, and “regression” that fits the data. For instance, predicting the class label or sentiment of a piece of text, like a tweet or a product review, i.e., text classification, is an example of supervised learning.
- Unsupervised: Unsupervised learning analyzes unlabeled datasets without the need for human interference, i.e., a data-driven process [ 41 ]. This is widely used for extracting generative features, identifying meaningful trends and structures, groupings in results, and exploratory purposes. The most common unsupervised learning tasks are clustering, density estimation, feature learning, dimensionality reduction, finding association rules, anomaly detection, etc.
- Semi-supervised: Semi-supervised learning can be defined as a hybridization of the above-mentioned supervised and unsupervised methods, as it operates on both labeled and unlabeled data [ 41 , 105 ]. Thus, it falls between learning “without supervision” and learning “with supervision”. In the real world, labeled data could be rare in several contexts, and unlabeled data are numerous, where semi-supervised learning is useful [ 75 ]. The ultimate goal of a semi-supervised learning model is to provide a better outcome for prediction than that produced using the labeled data alone from the model. Some application areas where semi-supervised learning is used include machine translation, fraud detection, labeling data and text classification.
- Reinforcement: Reinforcement learning is a type of machine learning algorithm that enables software agents and machines to automatically evaluate the optimal behavior in a particular context or environment to improve its efficiency [ 52 ], i.e., an environment-driven approach . This type of learning is based on reward or penalty, and its ultimate goal is to use insights obtained from environmental activists to take action to increase the reward or minimize the risk [ 75 ]. It is a powerful tool for training AI models that can help increase automation or optimize the operational efficiency of sophisticated systems such as robotics, autonomous driving tasks, manufacturing and supply chain logistics, however, not preferable to use it for solving the basic or straightforward problems.
Thus, to build effective models in various application areas different types of machine learning techniques can play a significant role according to their learning capabilities, depending on the nature of the data discussed earlier, and the target outcome. In Table Table1, 1 , we summarize various types of machine learning techniques with examples. In the following, we provide a comprehensive view of machine learning algorithms that can be applied to enhance the intelligence and capabilities of a data-driven application.
Various types of machine learning techniques with examples
Machine Learning Tasks and Algorithms
In this section, we discuss various machine learning algorithms that include classification analysis, regression analysis, data clustering, association rule learning, feature engineering for dimensionality reduction, as well as deep learning methods. A general structure of a machine learning-based predictive model has been shown in Fig. Fig.3, 3 , where the model is trained from historical data in phase 1 and the outcome is generated in phase 2 for the new test data.

A general structure of a machine learning based predictive model considering both the training and testing phase
Classification Analysis
Classification is regarded as a supervised learning method in machine learning, referring to a problem of predictive modeling as well, where a class label is predicted for a given example [ 41 ]. Mathematically, it maps a function ( f ) from input variables ( X ) to output variables ( Y ) as target, label or categories. To predict the class of given data points, it can be carried out on structured or unstructured data. For example, spam detection such as “spam” and “not spam” in email service providers can be a classification problem. In the following, we summarize the common classification problems.
- Binary classification: It refers to the classification tasks having two class labels such as “true and false” or “yes and no” [ 41 ]. In such binary classification tasks, one class could be the normal state, while the abnormal state could be another class. For instance, “cancer not detected” is the normal state of a task that involves a medical test, and “cancer detected” could be considered as the abnormal state. Similarly, “spam” and “not spam” in the above example of email service providers are considered as binary classification.
- Multiclass classification: Traditionally, this refers to those classification tasks having more than two class labels [ 41 ]. The multiclass classification does not have the principle of normal and abnormal outcomes, unlike binary classification tasks. Instead, within a range of specified classes, examples are classified as belonging to one. For example, it can be a multiclass classification task to classify various types of network attacks in the NSL-KDD [ 119 ] dataset, where the attack categories are classified into four class labels, such as DoS (Denial of Service Attack), U2R (User to Root Attack), R2L (Root to Local Attack), and Probing Attack.
- Multi-label classification: In machine learning, multi-label classification is an important consideration where an example is associated with several classes or labels. Thus, it is a generalization of multiclass classification, where the classes involved in the problem are hierarchically structured, and each example may simultaneously belong to more than one class in each hierarchical level, e.g., multi-level text classification. For instance, Google news can be presented under the categories of a “city name”, “technology”, or “latest news”, etc. Multi-label classification includes advanced machine learning algorithms that support predicting various mutually non-exclusive classes or labels, unlike traditional classification tasks where class labels are mutually exclusive [ 82 ].
Many classification algorithms have been proposed in the machine learning and data science literature [ 41 , 125 ]. In the following, we summarize the most common and popular methods that are used widely in various application areas.
- Naive Bayes (NB): The naive Bayes algorithm is based on the Bayes’ theorem with the assumption of independence between each pair of features [ 51 ]. It works well and can be used for both binary and multi-class categories in many real-world situations, such as document or text classification, spam filtering, etc. To effectively classify the noisy instances in the data and to construct a robust prediction model, the NB classifier can be used [ 94 ]. The key benefit is that, compared to more sophisticated approaches, it needs a small amount of training data to estimate the necessary parameters and quickly [ 82 ]. However, its performance may affect due to its strong assumptions on features independence. Gaussian, Multinomial, Complement, Bernoulli, and Categorical are the common variants of NB classifier [ 82 ].
- Linear Discriminant Analysis (LDA): Linear Discriminant Analysis (LDA) is a linear decision boundary classifier created by fitting class conditional densities to data and applying Bayes’ rule [ 51 , 82 ]. This method is also known as a generalization of Fisher’s linear discriminant, which projects a given dataset into a lower-dimensional space, i.e., a reduction of dimensionality that minimizes the complexity of the model or reduces the resulting model’s computational costs. The standard LDA model usually suits each class with a Gaussian density, assuming that all classes share the same covariance matrix [ 82 ]. LDA is closely related to ANOVA (analysis of variance) and regression analysis, which seek to express one dependent variable as a linear combination of other features or measurements.
- Logistic regression (LR): Another common probabilistic based statistical model used to solve classification issues in machine learning is Logistic Regression (LR) [ 64 ]. Logistic regression typically uses a logistic function to estimate the probabilities, which is also referred to as the mathematically defined sigmoid function in Eq. 1 . It can overfit high-dimensional datasets and works well when the dataset can be separated linearly. The regularization (L1 and L2) techniques [ 82 ] can be used to avoid over-fitting in such scenarios. The assumption of linearity between the dependent and independent variables is considered as a major drawback of Logistic Regression. It can be used for both classification and regression problems, but it is more commonly used for classification. g ( z ) = 1 1 + exp ( - z ) . 1
- K-nearest neighbors (KNN): K-Nearest Neighbors (KNN) [ 9 ] is an “instance-based learning” or non-generalizing learning, also known as a “lazy learning” algorithm. It does not focus on constructing a general internal model; instead, it stores all instances corresponding to training data in n -dimensional space. KNN uses data and classifies new data points based on similarity measures (e.g., Euclidean distance function) [ 82 ]. Classification is computed from a simple majority vote of the k nearest neighbors of each point. It is quite robust to noisy training data, and accuracy depends on the data quality. The biggest issue with KNN is to choose the optimal number of neighbors to be considered. KNN can be used both for classification as well as regression.
- Support vector machine (SVM): In machine learning, another common technique that can be used for classification, regression, or other tasks is a support vector machine (SVM) [ 56 ]. In high- or infinite-dimensional space, a support vector machine constructs a hyper-plane or set of hyper-planes. Intuitively, the hyper-plane, which has the greatest distance from the nearest training data points in any class, achieves a strong separation since, in general, the greater the margin, the lower the classifier’s generalization error. It is effective in high-dimensional spaces and can behave differently based on different mathematical functions known as the kernel. Linear, polynomial, radial basis function (RBF), sigmoid, etc., are the popular kernel functions used in SVM classifier [ 82 ]. However, when the data set contains more noise, such as overlapping target classes, SVM does not perform well.

An example of a decision tree structure

An example of a random forest structure considering multiple decision trees
- Adaptive Boosting (AdaBoost): Adaptive Boosting (AdaBoost) is an ensemble learning process that employs an iterative approach to improve poor classifiers by learning from their errors. This is developed by Yoav Freund et al. [ 35 ] and also known as “meta-learning”. Unlike the random forest that uses parallel ensembling, Adaboost uses “sequential ensembling”. It creates a powerful classifier by combining many poorly performing classifiers to obtain a good classifier of high accuracy. In that sense, AdaBoost is called an adaptive classifier by significantly improving the efficiency of the classifier, but in some instances, it can trigger overfits. AdaBoost is best used to boost the performance of decision trees, base estimator [ 82 ], on binary classification problems, however, is sensitive to noisy data and outliers.
- Extreme gradient boosting (XGBoost): Gradient Boosting, like Random Forests [ 19 ] above, is an ensemble learning algorithm that generates a final model based on a series of individual models, typically decision trees. The gradient is used to minimize the loss function, similar to how neural networks [ 41 ] use gradient descent to optimize weights. Extreme Gradient Boosting (XGBoost) is a form of gradient boosting that takes more detailed approximations into account when determining the best model [ 82 ]. It computes second-order gradients of the loss function to minimize loss and advanced regularization (L1 and L2) [ 82 ], which reduces over-fitting, and improves model generalization and performance. XGBoost is fast to interpret and can handle large-sized datasets well.
- Stochastic gradient descent (SGD): Stochastic gradient descent (SGD) [ 41 ] is an iterative method for optimizing an objective function with appropriate smoothness properties, where the word ‘stochastic’ refers to random probability. This reduces the computational burden, particularly in high-dimensional optimization problems, allowing for faster iterations in exchange for a lower convergence rate. A gradient is the slope of a function that calculates a variable’s degree of change in response to another variable’s changes. Mathematically, the Gradient Descent is a convex function whose output is a partial derivative of a set of its input parameters. Let, α is the learning rate, and J i is the training example cost of i th , then Eq. ( 4 ) represents the stochastic gradient descent weight update method at the j th iteration. In large-scale and sparse machine learning, SGD has been successfully applied to problems often encountered in text classification and natural language processing [ 82 ]. However, SGD is sensitive to feature scaling and needs a range of hyperparameters, such as the regularization parameter and the number of iterations. w j : = w j - α ∂ J i ∂ w j . 4
- Rule-based classification : The term rule-based classification can be used to refer to any classification scheme that makes use of IF-THEN rules for class prediction. Several classification algorithms such as Zero-R [ 125 ], One-R [ 47 ], decision trees [ 87 , 88 ], DTNB [ 110 ], Ripple Down Rule learner (RIDOR) [ 125 ], Repeated Incremental Pruning to Produce Error Reduction (RIPPER) [ 126 ] exist with the ability of rule generation. The decision tree is one of the most common rule-based classification algorithms among these techniques because it has several advantages, such as being easier to interpret; the ability to handle high-dimensional data; simplicity and speed; good accuracy; and the capability to produce rules for human clear and understandable classification [ 127 ] [ 128 ]. The decision tree-based rules also provide significant accuracy in a prediction model for unseen test cases [ 106 ]. Since the rules are easily interpretable, these rule-based classifiers are often used to produce descriptive models that can describe a system including the entities and their relationships.
Regression Analysis
Regression analysis includes several methods of machine learning that allow to predict a continuous ( y ) result variable based on the value of one or more ( x ) predictor variables [ 41 ]. The most significant distinction between classification and regression is that classification predicts distinct class labels, while regression facilitates the prediction of a continuous quantity. Figure Figure6 6 shows an example of how classification is different with regression models. Some overlaps are often found between the two types of machine learning algorithms. Regression models are now widely used in a variety of fields, including financial forecasting or prediction, cost estimation, trend analysis, marketing, time series estimation, drug response modeling, and many more. Some of the familiar types of regression algorithms are linear, polynomial, lasso and ridge regression, etc., which are explained briefly in the following.
- Simple and multiple linear regression: This is one of the most popular ML modeling techniques as well as a well-known regression technique. In this technique, the dependent variable is continuous, the independent variable(s) can be continuous or discrete, and the form of the regression line is linear. Linear regression creates a relationship between the dependent variable ( Y ) and one or more independent variables ( X ) (also known as regression line) using the best fit straight line [ 41 ]. It is defined by the following equations: y = a + b x + e 5 y = a + b 1 x 1 + b 2 x 2 + ⋯ + b n x n + e , 6 where a is the intercept, b is the slope of the line, and e is the error term. This equation can be used to predict the value of the target variable based on the given predictor variable(s). Multiple linear regression is an extension of simple linear regression that allows two or more predictor variables to model a response variable, y, as a linear function [ 41 ] defined in Eq. 6 , whereas simple linear regression has only 1 independent variable, defined in Eq. 5 .
- Polynomial regression: Polynomial regression is a form of regression analysis in which the relationship between the independent variable x and the dependent variable y is not linear, but is the polynomial degree of n th in x [ 82 ]. The equation for polynomial regression is also derived from linear regression (polynomial regression of degree 1) equation, which is defined as below: y = b 0 + b 1 x + b 2 x 2 + b 3 x 3 + ⋯ + b n x n + e . 7 Here, y is the predicted/target output, b 0 , b 1 , . . . b n are the regression coefficients, x is an independent/ input variable. In simple words, we can say that if data are not distributed linearly, instead it is n th degree of polynomial then we use polynomial regression to get desired output.
- LASSO and ridge regression: LASSO and Ridge regression are well known as powerful techniques which are typically used for building learning models in presence of a large number of features, due to their capability to preventing over-fitting and reducing the complexity of the model. The LASSO (least absolute shrinkage and selection operator) regression model uses L 1 regularization technique [ 82 ] that uses shrinkage, which penalizes “absolute value of magnitude of coefficients” ( L 1 penalty). As a result, LASSO appears to render coefficients to absolute zero. Thus, LASSO regression aims to find the subset of predictors that minimizes the prediction error for a quantitative response variable. On the other hand, ridge regression uses L 2 regularization [ 82 ], which is the “squared magnitude of coefficients” ( L 2 penalty). Thus, ridge regression forces the weights to be small but never sets the coefficient value to zero, and does a non-sparse solution. Overall, LASSO regression is useful to obtain a subset of predictors by eliminating less important features, and ridge regression is useful when a data set has “multicollinearity” which refers to the predictors that are correlated with other predictors.

Classification vs. regression. In classification the dotted line represents a linear boundary that separates the two classes; in regression, the dotted line models the linear relationship between the two variables
Cluster Analysis
Cluster analysis, also known as clustering, is an unsupervised machine learning technique for identifying and grouping related data points in large datasets without concern for the specific outcome. It does grouping a collection of objects in such a way that objects in the same category, called a cluster, are in some sense more similar to each other than objects in other groups [ 41 ]. It is often used as a data analysis technique to discover interesting trends or patterns in data, e.g., groups of consumers based on their behavior. In a broad range of application areas, such as cybersecurity, e-commerce, mobile data processing, health analytics, user modeling and behavioral analytics, clustering can be used. In the following, we briefly discuss and summarize various types of clustering methods.
- Partitioning methods: Based on the features and similarities in the data, this clustering approach categorizes the data into multiple groups or clusters. The data scientists or analysts typically determine the number of clusters either dynamically or statically depending on the nature of the target applications, to produce for the methods of clustering. The most common clustering algorithms based on partitioning methods are K-means [ 69 ], K-Mediods [ 80 ], CLARA [ 55 ] etc.
- Density-based methods: To identify distinct groups or clusters, it uses the concept that a cluster in the data space is a contiguous region of high point density isolated from other such clusters by contiguous regions of low point density. Points that are not part of a cluster are considered as noise. The typical clustering algorithms based on density are DBSCAN [ 32 ], OPTICS [ 12 ] etc. The density-based methods typically struggle with clusters of similar density and high dimensionality data.

A graphical interpretation of the widely-used hierarchical clustering (Bottom-up and top-down) technique
- Grid-based methods: To deal with massive datasets, grid-based clustering is especially suitable. To obtain clusters, the principle is first to summarize the dataset with a grid representation and then to combine grid cells. STING [ 122 ], CLIQUE [ 6 ], etc. are the standard algorithms of grid-based clustering.
- Model-based methods: There are mainly two types of model-based clustering algorithms: one that uses statistical learning, and the other based on a method of neural network learning [ 130 ]. For instance, GMM [ 89 ] is an example of a statistical learning method, and SOM [ 22 ] [ 96 ] is an example of a neural network learning method.
- Constraint-based methods: Constrained-based clustering is a semi-supervised approach to data clustering that uses constraints to incorporate domain knowledge. Application or user-oriented constraints are incorporated to perform the clustering. The typical algorithms of this kind of clustering are COP K-means [ 121 ], CMWK-Means [ 27 ], etc.
Many clustering algorithms have been proposed with the ability to grouping data in machine learning and data science literature [ 41 , 125 ]. In the following, we summarize the popular methods that are used widely in various application areas.
- K-means clustering: K-means clustering [ 69 ] is a fast, robust, and simple algorithm that provides reliable results when data sets are well-separated from each other. The data points are allocated to a cluster in this algorithm in such a way that the amount of the squared distance between the data points and the centroid is as small as possible. In other words, the K-means algorithm identifies the k number of centroids and then assigns each data point to the nearest cluster while keeping the centroids as small as possible. Since it begins with a random selection of cluster centers, the results can be inconsistent. Since extreme values can easily affect a mean, the K-means clustering algorithm is sensitive to outliers. K-medoids clustering [ 91 ] is a variant of K-means that is more robust to noises and outliers.
- Mean-shift clustering: Mean-shift clustering [ 37 ] is a nonparametric clustering technique that does not require prior knowledge of the number of clusters or constraints on cluster shape. Mean-shift clustering aims to discover “blobs” in a smooth distribution or density of samples [ 82 ]. It is a centroid-based algorithm that works by updating centroid candidates to be the mean of the points in a given region. To form the final set of centroids, these candidates are filtered in a post-processing stage to remove near-duplicates. Cluster analysis in computer vision and image processing are examples of application domains. Mean Shift has the disadvantage of being computationally expensive. Moreover, in cases of high dimension, where the number of clusters shifts abruptly, the mean-shift algorithm does not work well.
- DBSCAN: Density-based spatial clustering of applications with noise (DBSCAN) [ 32 ] is a base algorithm for density-based clustering which is widely used in data mining and machine learning. This is known as a non-parametric density-based clustering technique for separating high-density clusters from low-density clusters that are used in model building. DBSCAN’s main idea is that a point belongs to a cluster if it is close to many points from that cluster. It can find clusters of various shapes and sizes in a vast volume of data that is noisy and contains outliers. DBSCAN, unlike k-means, does not require a priori specification of the number of clusters in the data and can find arbitrarily shaped clusters. Although k-means is much faster than DBSCAN, it is efficient at finding high-density regions and outliers, i.e., is robust to outliers.
- GMM clustering: Gaussian mixture models (GMMs) are often used for data clustering, which is a distribution-based clustering algorithm. A Gaussian mixture model is a probabilistic model in which all the data points are produced by a mixture of a finite number of Gaussian distributions with unknown parameters [ 82 ]. To find the Gaussian parameters for each cluster, an optimization algorithm called expectation-maximization (EM) [ 82 ] can be used. EM is an iterative method that uses a statistical model to estimate the parameters. In contrast to k-means, Gaussian mixture models account for uncertainty and return the likelihood that a data point belongs to one of the k clusters. GMM clustering is more robust than k-means and works well even with non-linear data distributions.
- Agglomerative hierarchical clustering: The most common method of hierarchical clustering used to group objects in clusters based on their similarity is agglomerative clustering. This technique uses a bottom-up approach, where each object is first treated as a singleton cluster by the algorithm. Following that, pairs of clusters are merged one by one until all clusters have been merged into a single large cluster containing all objects. The result is a dendrogram, which is a tree-based representation of the elements. Single linkage [ 115 ], Complete linkage [ 116 ], BOTS [ 102 ] etc. are some examples of such techniques. The main advantage of agglomerative hierarchical clustering over k-means is that the tree-structure hierarchy generated by agglomerative clustering is more informative than the unstructured collection of flat clusters returned by k-means, which can help to make better decisions in the relevant application areas.
Dimensionality Reduction and Feature Learning
In machine learning and data science, high-dimensional data processing is a challenging task for both researchers and application developers. Thus, dimensionality reduction which is an unsupervised learning technique, is important because it leads to better human interpretations, lower computational costs, and avoids overfitting and redundancy by simplifying models. Both the process of feature selection and feature extraction can be used for dimensionality reduction. The primary distinction between the selection and extraction of features is that the “feature selection” keeps a subset of the original features [ 97 ], while “feature extraction” creates brand new ones [ 98 ]. In the following, we briefly discuss these techniques.
- Feature selection: The selection of features, also known as the selection of variables or attributes in the data, is the process of choosing a subset of unique features (variables, predictors) to use in building machine learning and data science model. It decreases a model’s complexity by eliminating the irrelevant or less important features and allows for faster training of machine learning algorithms. A right and optimal subset of the selected features in a problem domain is capable to minimize the overfitting problem through simplifying and generalizing the model as well as increases the model’s accuracy [ 97 ]. Thus, “feature selection” [ 66 , 99 ] is considered as one of the primary concepts in machine learning that greatly affects the effectiveness and efficiency of the target machine learning model. Chi-squared test, Analysis of variance (ANOVA) test, Pearson’s correlation coefficient, recursive feature elimination, are some popular techniques that can be used for feature selection.
- Feature extraction: In a machine learning-based model or system, feature extraction techniques usually provide a better understanding of the data, a way to improve prediction accuracy, and to reduce computational cost or training time. The aim of “feature extraction” [ 66 , 99 ] is to reduce the number of features in a dataset by generating new ones from the existing ones and then discarding the original features. The majority of the information found in the original set of features can then be summarized using this new reduced set of features. For instance, principal components analysis (PCA) is often used as a dimensionality-reduction technique to extract a lower-dimensional space creating new brand components from the existing features in a dataset [ 98 ].
Many algorithms have been proposed to reduce data dimensions in the machine learning and data science literature [ 41 , 125 ]. In the following, we summarize the popular methods that are used widely in various application areas.
- Variance threshold: A simple basic approach to feature selection is the variance threshold [ 82 ]. This excludes all features of low variance, i.e., all features whose variance does not exceed the threshold. It eliminates all zero-variance characteristics by default, i.e., characteristics that have the same value in all samples. This feature selection algorithm looks only at the ( X ) features, not the ( y ) outputs needed, and can, therefore, be used for unsupervised learning.
- Pearson correlation: Pearson’s correlation is another method to understand a feature’s relation to the response variable and can be used for feature selection [ 99 ]. This method is also used for finding the association between the features in a dataset. The resulting value is [ - 1 , 1 ] , where - 1 means perfect negative correlation, + 1 means perfect positive correlation, and 0 means that the two variables do not have a linear correlation. If two random variables represent X and Y , then the correlation coefficient between X and Y is defined as [ 41 ] r ( X , Y ) = ∑ i = 1 n ( X i - X ¯ ) ( Y i - Y ¯ ) ∑ i = 1 n ( X i - X ¯ ) 2 ∑ i = 1 n ( Y i - Y ¯ ) 2 . 8
- ANOVA: Analysis of variance (ANOVA) is a statistical tool used to verify the mean values of two or more groups that differ significantly from each other. ANOVA assumes a linear relationship between the variables and the target and the variables’ normal distribution. To statistically test the equality of means, the ANOVA method utilizes F tests. For feature selection, the results ‘ANOVA F value’ [ 82 ] of this test can be used where certain features independent of the goal variable can be omitted.
- Chi square: The chi-square χ 2 [ 82 ] statistic is an estimate of the difference between the effects of a series of events or variables observed and expected frequencies. The magnitude of the difference between the real and observed values, the degrees of freedom, and the sample size depends on χ 2 . The chi-square χ 2 is commonly used for testing relationships between categorical variables. If O i represents observed value and E i represents expected value, then χ 2 = ∑ i = 1 n ( O i - E i ) 2 E i . 9
- Recursive feature elimination (RFE): Recursive Feature Elimination (RFE) is a brute force approach to feature selection. RFE [ 82 ] fits the model and removes the weakest feature before it meets the specified number of features. Features are ranked by the coefficients or feature significance of the model. RFE aims to remove dependencies and collinearity in the model by recursively removing a small number of features per iteration.
- Model-based selection: To reduce the dimensionality of the data, linear models penalized with the L 1 regularization can be used. Least absolute shrinkage and selection operator (Lasso) regression is a type of linear regression that has the property of shrinking some of the coefficients to zero [ 82 ]. Therefore, that feature can be removed from the model. Thus, the penalized lasso regression method, often used in machine learning to select the subset of variables. Extra Trees Classifier [ 82 ] is an example of a tree-based estimator that can be used to compute impurity-based function importance, which can then be used to discard irrelevant features.

An example of a principal component analysis (PCA) and created principal components PC1 and PC2 in different dimension space
Association Rule Learning
Association rule learning is a rule-based machine learning approach to discover interesting relationships, “IF-THEN” statements, in large datasets between variables [ 7 ]. One example is that “if a customer buys a computer or laptop (an item), s/he is likely to also buy anti-virus software (another item) at the same time”. Association rules are employed today in many application areas, including IoT services, medical diagnosis, usage behavior analytics, web usage mining, smartphone applications, cybersecurity applications, and bioinformatics. In comparison to sequence mining, association rule learning does not usually take into account the order of things within or across transactions. A common way of measuring the usefulness of association rules is to use its parameter, the ‘support’ and ‘confidence’, which is introduced in [ 7 ].
In the data mining literature, many association rule learning methods have been proposed, such as logic dependent [ 34 ], frequent pattern based [ 8 , 49 , 68 ], and tree-based [ 42 ]. The most popular association rule learning algorithms are summarized below.
- AIS and SETM: AIS is the first algorithm proposed by Agrawal et al. [ 7 ] for association rule mining. The AIS algorithm’s main downside is that too many candidate itemsets are generated, requiring more space and wasting a lot of effort. This algorithm calls for too many passes over the entire dataset to produce the rules. Another approach SETM [ 49 ] exhibits good performance and stable behavior with execution time; however, it suffers from the same flaw as the AIS algorithm.
- Apriori: For generating association rules for a given dataset, Agrawal et al. [ 8 ] proposed the Apriori, Apriori-TID, and Apriori-Hybrid algorithms. These later algorithms outperform the AIS and SETM mentioned above due to the Apriori property of frequent itemset [ 8 ]. The term ‘Apriori’ usually refers to having prior knowledge of frequent itemset properties. Apriori uses a “bottom-up” approach, where it generates the candidate itemsets. To reduce the search space, Apriori uses the property “all subsets of a frequent itemset must be frequent; and if an itemset is infrequent, then all its supersets must also be infrequent”. Another approach predictive Apriori [ 108 ] can also generate rules; however, it receives unexpected results as it combines both the support and confidence. The Apriori [ 8 ] is the widely applicable techniques in mining association rules.
- ECLAT: This technique was proposed by Zaki et al. [ 131 ] and stands for Equivalence Class Clustering and bottom-up Lattice Traversal. ECLAT uses a depth-first search to find frequent itemsets. In contrast to the Apriori [ 8 ] algorithm, which represents data in a horizontal pattern, it represents data vertically. Hence, the ECLAT algorithm is more efficient and scalable in the area of association rule learning. This algorithm is better suited for small and medium datasets whereas the Apriori algorithm is used for large datasets.
- FP-Growth: Another common association rule learning technique based on the frequent-pattern tree (FP-tree) proposed by Han et al. [ 42 ] is Frequent Pattern Growth, known as FP-Growth. The key difference with Apriori is that while generating rules, the Apriori algorithm [ 8 ] generates frequent candidate itemsets; on the other hand, the FP-growth algorithm [ 42 ] prevents candidate generation and thus produces a tree by the successful strategy of ‘divide and conquer’ approach. Due to its sophistication, however, FP-Tree is challenging to use in an interactive mining environment [ 133 ]. Thus, the FP-Tree would not fit into memory for massive data sets, making it challenging to process big data as well. Another solution is RARM (Rapid Association Rule Mining) proposed by Das et al. [ 26 ] but faces a related FP-tree issue [ 133 ].
- ABC-RuleMiner: A rule-based machine learning method, recently proposed in our earlier paper, by Sarker et al. [ 104 ], to discover the interesting non-redundant rules to provide real-world intelligent services. This algorithm effectively identifies the redundancy in associations by taking into account the impact or precedence of the related contextual features and discovers a set of non-redundant association rules. This algorithm first constructs an association generation tree (AGT), a top-down approach, and then extracts the association rules through traversing the tree. Thus, ABC-RuleMiner is more potent than traditional rule-based methods in terms of both non-redundant rule generation and intelligent decision-making, particularly in a context-aware smart computing environment, where human or user preferences are involved.
Among the association rule learning techniques discussed above, Apriori [ 8 ] is the most widely used algorithm for discovering association rules from a given dataset [ 133 ]. The main strength of the association learning technique is its comprehensiveness, as it generates all associations that satisfy the user-specified constraints, such as minimum support and confidence value. The ABC-RuleMiner approach [ 104 ] discussed earlier could give significant results in terms of non-redundant rule generation and intelligent decision-making for the relevant application areas in the real world.
Reinforcement Learning
Reinforcement learning (RL) is a machine learning technique that allows an agent to learn by trial and error in an interactive environment using input from its actions and experiences. Unlike supervised learning, which is based on given sample data or examples, the RL method is based on interacting with the environment. The problem to be solved in reinforcement learning (RL) is defined as a Markov Decision Process (MDP) [ 86 ], i.e., all about sequentially making decisions. An RL problem typically includes four elements such as Agent, Environment, Rewards, and Policy.
RL can be split roughly into Model-based and Model-free techniques. Model-based RL is the process of inferring optimal behavior from a model of the environment by performing actions and observing the results, which include the next state and the immediate reward [ 85 ]. AlphaZero, AlphaGo [ 113 ] are examples of the model-based approaches. On the other hand, a model-free approach does not use the distribution of the transition probability and the reward function associated with MDP. Q-learning, Deep Q Network, Monte Carlo Control, SARSA (State–Action–Reward–State–Action), etc. are some examples of model-free algorithms [ 52 ]. The policy network, which is required for model-based RL but not for model-free, is the key difference between model-free and model-based learning. In the following, we discuss the popular RL algorithms.
- Monte Carlo methods: Monte Carlo techniques, or Monte Carlo experiments, are a wide category of computational algorithms that rely on repeated random sampling to obtain numerical results [ 52 ]. The underlying concept is to use randomness to solve problems that are deterministic in principle. Optimization, numerical integration, and making drawings from the probability distribution are the three problem classes where Monte Carlo techniques are most commonly used.
- Q-learning: Q-learning is a model-free reinforcement learning algorithm for learning the quality of behaviors that tell an agent what action to take under what conditions [ 52 ]. It does not need a model of the environment (hence the term “model-free”), and it can deal with stochastic transitions and rewards without the need for adaptations. The ‘Q’ in Q-learning usually stands for quality, as the algorithm calculates the maximum expected rewards for a given behavior in a given state.
- Deep Q-learning: The basic working step in Deep Q-Learning [ 52 ] is that the initial state is fed into the neural network, which returns the Q-value of all possible actions as an output. Still, when we have a reasonably simple setting to overcome, Q-learning works well. However, when the number of states and actions becomes more complicated, deep learning can be used as a function approximator.
Reinforcement learning, along with supervised and unsupervised learning, is one of the basic machine learning paradigms. RL can be used to solve numerous real-world problems in various fields, such as game theory, control theory, operations analysis, information theory, simulation-based optimization, manufacturing, supply chain logistics, multi-agent systems, swarm intelligence, aircraft control, robot motion control, and many more.
Artificial Neural Network and Deep Learning
Deep learning is part of a wider family of artificial neural networks (ANN)-based machine learning approaches with representation learning. Deep learning provides a computational architecture by combining several processing layers, such as input, hidden, and output layers, to learn from data [ 41 ]. The main advantage of deep learning over traditional machine learning methods is its better performance in several cases, particularly learning from large datasets [ 105 , 129 ]. Figure Figure9 9 shows a general performance of deep learning over machine learning considering the increasing amount of data. However, it may vary depending on the data characteristics and experimental set up.

Machine learning and deep learning performance in general with the amount of data
The most common deep learning algorithms are: Multi-layer Perceptron (MLP), Convolutional Neural Network (CNN, or ConvNet), Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) [ 96 ]. In the following, we discuss various types of deep learning methods that can be used to build effective data-driven models for various purposes.

A structure of an artificial neural network modeling with multiple processing layers

An example of a convolutional neural network (CNN or ConvNet) including multiple convolution and pooling layers
- LSTM-RNN: Long short-term memory (LSTM) is an artificial recurrent neural network (RNN) architecture used in the area of deep learning [ 38 ]. LSTM has feedback links, unlike normal feed-forward neural networks. LSTM networks are well-suited for analyzing and learning sequential data, such as classifying, processing, and predicting data based on time series data, which differentiates it from other conventional networks. Thus, LSTM can be used when the data are in a sequential format, such as time, sentence, etc., and commonly applied in the area of time-series analysis, natural language processing, speech recognition, etc.
In addition to these most common deep learning methods discussed above, several other deep learning approaches [ 96 ] exist in the area for various purposes. For instance, the self-organizing map (SOM) [ 58 ] uses unsupervised learning to represent the high-dimensional data by a 2D grid map, thus achieving dimensionality reduction. The autoencoder (AE) [ 15 ] is another learning technique that is widely used for dimensionality reduction as well and feature extraction in unsupervised learning tasks. Restricted Boltzmann machines (RBM) [ 46 ] can be used for dimensionality reduction, classification, regression, collaborative filtering, feature learning, and topic modeling. A deep belief network (DBN) is typically composed of simple, unsupervised networks such as restricted Boltzmann machines (RBMs) or autoencoders, and a backpropagation neural network (BPNN) [ 123 ]. A generative adversarial network (GAN) [ 39 ] is a form of the network for deep learning that can generate data with characteristics close to the actual data input. Transfer learning is currently very common because it can train deep neural networks with comparatively low data, which is typically the re-use of a new problem with a pre-trained model [ 124 ]. A brief discussion of these artificial neural networks (ANN) and deep learning (DL) models are summarized in our earlier paper Sarker et al. [ 96 ].
Overall, based on the learning techniques discussed above, we can conclude that various types of machine learning techniques, such as classification analysis, regression, data clustering, feature selection and extraction, and dimensionality reduction, association rule learning, reinforcement learning, or deep learning techniques, can play a significant role for various purposes according to their capabilities. In the following section, we discuss several application areas based on machine learning algorithms.
Applications of Machine Learning
In the current age of the Fourth Industrial Revolution (4IR), machine learning becomes popular in various application areas, because of its learning capabilities from the past and making intelligent decisions. In the following, we summarize and discuss ten popular application areas of machine learning technology.
- Predictive analytics and intelligent decision-making: A major application field of machine learning is intelligent decision-making by data-driven predictive analytics [ 21 , 70 ]. The basis of predictive analytics is capturing and exploiting relationships between explanatory variables and predicted variables from previous events to predict the unknown outcome [ 41 ]. For instance, identifying suspects or criminals after a crime has been committed, or detecting credit card fraud as it happens. Another application, where machine learning algorithms can assist retailers in better understanding consumer preferences and behavior, better manage inventory, avoiding out-of-stock situations, and optimizing logistics and warehousing in e-commerce. Various machine learning algorithms such as decision trees, support vector machines, artificial neural networks, etc. [ 106 , 125 ] are commonly used in the area. Since accurate predictions provide insight into the unknown, they can improve the decisions of industries, businesses, and almost any organization, including government agencies, e-commerce, telecommunications, banking and financial services, healthcare, sales and marketing, transportation, social networking, and many others.
- Cybersecurity and threat intelligence: Cybersecurity is one of the most essential areas of Industry 4.0. [ 114 ], which is typically the practice of protecting networks, systems, hardware, and data from digital attacks [ 114 ]. Machine learning has become a crucial cybersecurity technology that constantly learns by analyzing data to identify patterns, better detect malware in encrypted traffic, find insider threats, predict where bad neighborhoods are online, keep people safe while browsing, or secure data in the cloud by uncovering suspicious activity. For instance, clustering techniques can be used to identify cyber-anomalies, policy violations, etc. To detect various types of cyber-attacks or intrusions machine learning classification models by taking into account the impact of security features are useful [ 97 ]. Various deep learning-based security models can also be used on the large scale of security datasets [ 96 , 129 ]. Moreover, security policy rules generated by association rule learning techniques can play a significant role to build a rule-based security system [ 105 ]. Thus, we can say that various learning techniques discussed in Sect. Machine Learning Tasks and Algorithms , can enable cybersecurity professionals to be more proactive inefficiently preventing threats and cyber-attacks.
- Internet of things (IoT) and smart cities: Internet of Things (IoT) is another essential area of Industry 4.0. [ 114 ], which turns everyday objects into smart objects by allowing them to transmit data and automate tasks without the need for human interaction. IoT is, therefore, considered to be the big frontier that can enhance almost all activities in our lives, such as smart governance, smart home, education, communication, transportation, retail, agriculture, health care, business, and many more [ 70 ]. Smart city is one of IoT’s core fields of application, using technologies to enhance city services and residents’ living experiences [ 132 , 135 ]. As machine learning utilizes experience to recognize trends and create models that help predict future behavior and events, it has become a crucial technology for IoT applications [ 103 ]. For example, to predict traffic in smart cities, parking availability prediction, estimate the total usage of energy of the citizens for a particular period, make context-aware and timely decisions for the people, etc. are some tasks that can be solved using machine learning techniques according to the current needs of the people.
- Traffic prediction and transportation: Transportation systems have become a crucial component of every country’s economic development. Nonetheless, several cities around the world are experiencing an excessive rise in traffic volume, resulting in serious issues such as delays, traffic congestion, higher fuel prices, increased CO 2 pollution, accidents, emergencies, and a decline in modern society’s quality of life [ 40 ]. Thus, an intelligent transportation system through predicting future traffic is important, which is an indispensable part of a smart city. Accurate traffic prediction based on machine and deep learning modeling can help to minimize the issues [ 17 , 30 , 31 ]. For example, based on the travel history and trend of traveling through various routes, machine learning can assist transportation companies in predicting possible issues that may occur on specific routes and recommending their customers to take a different path. Ultimately, these learning-based data-driven models help improve traffic flow, increase the usage and efficiency of sustainable modes of transportation, and limit real-world disruption by modeling and visualizing future changes.
- Healthcare and COVID-19 pandemic: Machine learning can help to solve diagnostic and prognostic problems in a variety of medical domains, such as disease prediction, medical knowledge extraction, detecting regularities in data, patient management, etc. [ 33 , 77 , 112 ]. Coronavirus disease (COVID-19) is an infectious disease caused by a newly discovered coronavirus, according to the World Health Organization (WHO) [ 3 ]. Recently, the learning techniques have become popular in the battle against COVID-19 [ 61 , 63 ]. For the COVID-19 pandemic, the learning techniques are used to classify patients at high risk, their mortality rate, and other anomalies [ 61 ]. It can also be used to better understand the virus’s origin, COVID-19 outbreak prediction, as well as for disease diagnosis and treatment [ 14 , 50 ]. With the help of machine learning, researchers can forecast where and when, the COVID-19 is likely to spread, and notify those regions to match the required arrangements. Deep learning also provides exciting solutions to the problems of medical image processing and is seen as a crucial technique for potential applications, particularly for COVID-19 pandemic [ 10 , 78 , 111 ]. Overall, machine and deep learning techniques can help to fight the COVID-19 virus and the pandemic as well as intelligent clinical decisions making in the domain of healthcare.
- E-commerce and product recommendations: Product recommendation is one of the most well known and widely used applications of machine learning, and it is one of the most prominent features of almost any e-commerce website today. Machine learning technology can assist businesses in analyzing their consumers’ purchasing histories and making customized product suggestions for their next purchase based on their behavior and preferences. E-commerce companies, for example, can easily position product suggestions and offers by analyzing browsing trends and click-through rates of specific items. Using predictive modeling based on machine learning techniques, many online retailers, such as Amazon [ 71 ], can better manage inventory, prevent out-of-stock situations, and optimize logistics and warehousing. The future of sales and marketing is the ability to capture, evaluate, and use consumer data to provide a customized shopping experience. Furthermore, machine learning techniques enable companies to create packages and content that are tailored to the needs of their customers, allowing them to maintain existing customers while attracting new ones.
- NLP and sentiment analysis: Natural language processing (NLP) involves the reading and understanding of spoken or written language through the medium of a computer [ 79 , 103 ]. Thus, NLP helps computers, for instance, to read a text, hear speech, interpret it, analyze sentiment, and decide which aspects are significant, where machine learning techniques can be used. Virtual personal assistant, chatbot, speech recognition, document description, language or machine translation, etc. are some examples of NLP-related tasks. Sentiment Analysis [ 90 ] (also referred to as opinion mining or emotion AI) is an NLP sub-field that seeks to identify and extract public mood and views within a given text through blogs, reviews, social media, forums, news, etc. For instance, businesses and brands use sentiment analysis to understand the social sentiment of their brand, product, or service through social media platforms or the web as a whole. Overall, sentiment analysis is considered as a machine learning task that analyzes texts for polarity, such as “positive”, “negative”, or “neutral” along with more intense emotions like very happy, happy, sad, very sad, angry, have interest, or not interested etc.
- Image, speech and pattern recognition: Image recognition [ 36 ] is a well-known and widespread example of machine learning in the real world, which can identify an object as a digital image. For instance, to label an x-ray as cancerous or not, character recognition, or face detection in an image, tagging suggestions on social media, e.g., Facebook, are common examples of image recognition. Speech recognition [ 23 ] is also very popular that typically uses sound and linguistic models, e.g., Google Assistant, Cortana, Siri, Alexa, etc. [ 67 ], where machine learning methods are used. Pattern recognition [ 13 ] is defined as the automated recognition of patterns and regularities in data, e.g., image analysis. Several machine learning techniques such as classification, feature selection, clustering, or sequence labeling methods are used in the area.
- Sustainable agriculture: Agriculture is essential to the survival of all human activities [ 109 ]. Sustainable agriculture practices help to improve agricultural productivity while also reducing negative impacts on the environment [ 5 , 25 , 109 ]. The sustainable agriculture supply chains are knowledge-intensive and based on information, skills, technologies, etc., where knowledge transfer encourages farmers to enhance their decisions to adopt sustainable agriculture practices utilizing the increasing amount of data captured by emerging technologies, e.g., the Internet of Things (IoT), mobile technologies and devices, etc. [ 5 , 53 , 54 ]. Machine learning can be applied in various phases of sustainable agriculture, such as in the pre-production phase - for the prediction of crop yield, soil properties, irrigation requirements, etc.; in the production phase—for weather prediction, disease detection, weed detection, soil nutrient management, livestock management, etc.; in processing phase—for demand estimation, production planning, etc. and in the distribution phase - the inventory management, consumer analysis, etc.
- User behavior analytics and context-aware smartphone applications: Context-awareness is a system’s ability to capture knowledge about its surroundings at any moment and modify behaviors accordingly [ 28 , 93 ]. Context-aware computing uses software and hardware to automatically collect and interpret data for direct responses. The mobile app development environment has been changed greatly with the power of AI, particularly, machine learning techniques through their learning capabilities from contextual data [ 103 , 136 ]. Thus, the developers of mobile apps can rely on machine learning to create smart apps that can understand human behavior, support, and entertain users [ 107 , 137 , 140 ]. To build various personalized data-driven context-aware systems, such as smart interruption management, smart mobile recommendation, context-aware smart searching, decision-making that intelligently assist end mobile phone users in a pervasive computing environment, machine learning techniques are applicable. For example, context-aware association rules can be used to build an intelligent phone call application [ 104 ]. Clustering approaches are useful in capturing users’ diverse behavioral activities by taking into account data in time series [ 102 ]. To predict the future events in various contexts, the classification methods can be used [ 106 , 139 ]. Thus, various learning techniques discussed in Sect. “ Machine Learning Tasks and Algorithms ” can help to build context-aware adaptive and smart applications according to the preferences of the mobile phone users.
In addition to these application areas, machine learning-based models can also apply to several other domains such as bioinformatics, cheminformatics, computer networks, DNA sequence classification, economics and banking, robotics, advanced engineering, and many more.
Challenges and Research Directions
Our study on machine learning algorithms for intelligent data analysis and applications opens several research issues in the area. Thus, in this section, we summarize and discuss the challenges faced and the potential research opportunities and future directions.
In general, the effectiveness and the efficiency of a machine learning-based solution depend on the nature and characteristics of the data, and the performance of the learning algorithms. To collect the data in the relevant domain, such as cybersecurity, IoT, healthcare and agriculture discussed in Sect. “ Applications of Machine Learning ” is not straightforward, although the current cyberspace enables the production of a huge amount of data with very high frequency. Thus, collecting useful data for the target machine learning-based applications, e.g., smart city applications, and their management is important to further analysis. Therefore, a more in-depth investigation of data collection methods is needed while working on the real-world data. Moreover, the historical data may contain many ambiguous values, missing values, outliers, and meaningless data. The machine learning algorithms, discussed in Sect “ Machine Learning Tasks and Algorithms ” highly impact on data quality, and availability for training, and consequently on the resultant model. Thus, to accurately clean and pre-process the diverse data collected from diverse sources is a challenging task. Therefore, effectively modifying or enhance existing pre-processing methods, or proposing new data preparation techniques are required to effectively use the learning algorithms in the associated application domain.
To analyze the data and extract insights, there exist many machine learning algorithms, summarized in Sect. “ Machine Learning Tasks and Algorithms ”. Thus, selecting a proper learning algorithm that is suitable for the target application is challenging. The reason is that the outcome of different learning algorithms may vary depending on the data characteristics [ 106 ]. Selecting a wrong learning algorithm would result in producing unexpected outcomes that may lead to loss of effort, as well as the model’s effectiveness and accuracy. In terms of model building, the techniques discussed in Sect. “ Machine Learning Tasks and Algorithms ” can directly be used to solve many real-world issues in diverse domains, such as cybersecurity, smart cities and healthcare summarized in Sect. “ Applications of Machine Learning ”. However, the hybrid learning model, e.g., the ensemble of methods, modifying or enhancement of the existing learning techniques, or designing new learning methods, could be a potential future work in the area.
Thus, the ultimate success of a machine learning-based solution and corresponding applications mainly depends on both the data and the learning algorithms. If the data are bad to learn, such as non-representative, poor-quality, irrelevant features, or insufficient quantity for training, then the machine learning models may become useless or will produce lower accuracy. Therefore, effectively processing the data and handling the diverse learning algorithms are important, for a machine learning-based solution and eventually building intelligent applications.
In this paper, we have conducted a comprehensive overview of machine learning algorithms for intelligent data analysis and applications. According to our goal, we have briefly discussed how various types of machine learning methods can be used for making solutions to various real-world issues. A successful machine learning model depends on both the data and the performance of the learning algorithms. The sophisticated learning algorithms then need to be trained through the collected real-world data and knowledge related to the target application before the system can assist with intelligent decision-making. We also discussed several popular application areas based on machine learning techniques to highlight their applicability in various real-world issues. Finally, we have summarized and discussed the challenges faced and the potential research opportunities and future directions in the area. Therefore, the challenges that are identified create promising research opportunities in the field which must be addressed with effective solutions in various application areas. Overall, we believe that our study on machine learning-based solutions opens up a promising direction and can be used as a reference guide for potential research and applications for both academia and industry professionals as well as for decision-makers, from a technical point of view.
Declaration
The author declares no conflict of interest.
This article is part of the topical collection “Advances in Computational Approaches for Artificial Intelligence, Image Processing, IoT and Cloud Applications” guest edited by Bhanu Prakash K N and M. Shivakumar.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Publication database
Publishing our work allows us to share ideas and work collaboratively to advance the field of computer science.
Filters Reset filters
Algorithms and theory.
Google’s mission presents many exciting algorithmic and optimization challenges across different product areas including Search, Ads, Social, and Google Infrastructure. These include optimizing internal systems such as scheduling the machines that power the numerous computations done each day, as well as optimizations that affect core products and users, from online allocation of ads to page-views to automatic management of ad campaigns, and from clustering large-scale graphs to finding best paths in transportation networks. Other than employing new algorithmic ideas to impact millions of users, Google researchers contribute to the state-of-the-art research in these areas by publishing in top conferences and journals.
Data Management
Google is deeply engaged in Data Management research across a variety of topics with deep connections to Google products. We are building intelligent systems to discover, annotate, and explore structured data from the Web, and to surface them creatively through Google products, such as Search (e.g., structured snippets , Docs, and many others). The overarching goal is to create a plethora of structured data on the Web that maximally help Google users consume, interact and explore information. Through those projects, we study various cutting-edge data management research issues including information extraction and integration, large scale data analysis, effective data exploration, etc., using a variety of techniques, such as information retrieval, data mining and machine learning.
A major research effort involves the management of structured data within the enterprise. The goal is to discover, index, monitor, and organize this type of data in order to make it easier to access high-quality datasets. This type of data carries different, and often richer, semantics than structured data on the Web, which in turn raises new opportunities and technical challenges in their management.
Furthermore, Data Management research across Google allows us to build technologies that power Google's largest businesses through scalable, reliable, fast, and general-purpose infrastructure for large-scale data processing as a service. Some examples of such technologies include F1 , the database serving our ads infrastructure; Mesa , a petabyte-scale analytic data warehousing system; and Dremel , for petabyte-scale data processing with interactive response times. Dremel is available for external customers to use as part of Google Cloud’s BigQuery .
Data Mining and Modeling
The proliferation of machine learning means that learned classifiers lie at the core of many products across Google. However, questions in practice are rarely so clean as to just to use an out-of-the-box algorithm. A big challenge is in developing metrics, designing experimental methodologies, and modeling the space to create parsimonious representations that capture the fundamentals of the problem. These problems cut across Google’s products and services, from designing experiments for testing new auction algorithms to developing automated metrics to measure the quality of a road map.
Data mining lies at the heart of many of these questions, and the research done at Google is at the forefront of the field. Whether it is finding more efficient algorithms for working with massive data sets, developing privacy-preserving methods for classification, or designing new machine learning approaches, our group continues to push the boundary of what is possible.
Distributed Systems and Parallel Computing
No matter how powerful individual computers become, there are still reasons to harness the power of multiple computational units, often spread across large geographic areas. Sometimes this is motivated by the need to collect data from widely dispersed locations (e.g., web pages from servers, or sensors for weather or traffic). Other times it is motivated by the need to perform enormous computations that simply cannot be done by a single CPU.
From our company’s beginning, Google has had to deal with both issues in our pursuit of organizing the world’s information and making it universally accessible and useful. We continue to face many exciting distributed systems and parallel computing challenges in areas such as concurrency control, fault tolerance, algorithmic efficiency, and communication. Some of our research involves answering fundamental theoretical questions, while other researchers and engineers are engaged in the construction of systems to operate at the largest possible scale, thanks to our hybrid research model .
Economics and Electronic Commerce
Google is a global leader in electronic commerce. Not surprisingly, it devotes considerable attention to research in this area. Topics include 1) auction design, 2) advertising effectiveness, 3) statistical methods, 4) forecasting and prediction, 5) survey research, 6) policy analysis and a host of other topics. This research involves interdisciplinary collaboration among computer scientists, economists, statisticians, and analytic marketing researchers both at Google and academic institutions around the world.
A major challenge is in solving these problems at very large scales. For example, the advertising market has billions of transactions daily, spread across millions of advertisers. It presents a unique opportunity to test and refine economic principles as applied to a very large number of interacting, self-interested parties with a myriad of objectives.
It is remarkable how some of the fundamental problems Google grapples with are also some of the hardest research problems in the academic community. At Google, this research translates direction into practice, influencing how production systems are designed and used.
Education Innovation
Our Education Innovation research area includes publications on: online learning at scale, educational technology (which is any technology that supports teaching and learning), curriculum and programming tools for computer science education, diversity and broadening participation in computer science the hiring and onboarding process at Google.
General Science
We aim to transform scientific research itself. Many scientific endeavors can benefit from large scale experimentation, data gathering, and machine learning (including deep learning). We aim to accelerate scientific research by applying Google’s computational power and techniques in areas such as drug discovery, biological pathway modeling, microscopy, medical diagnostics, material science, and agriculture. We collaborate closely with world-class research partners to help solve important problems with large scientific or humanitarian benefit.
Hardware and Architecture
The machinery that powers many of our interactions today — Web search, social networking, email, online video, shopping, game playing — is made of the smallest and the most massive computers. The smallest part is your smartphone, a machine that is over ten times faster than the iconic Cray-1 supercomputer. The capabilities of these remarkable mobile devices are amplified by orders of magnitude through their connection to Web services running on building-sized computing systems that we call Warehouse-scale computers (WSCs).
Google’s engineers and researchers have been pioneering both WSC and mobile hardware technology with the goal of providing Google programmers and our Cloud developers with a unique computing infrastructure in terms of scale, cost-efficiency, energy-efficiency, resiliency and speed. The tight collaboration among software, hardware, mechanical, electrical, environmental, thermal and civil engineers result in some of the most impressive and efficient computers in the world.
Human-Computer Interaction and Visualization
HCI researchers at Google have enormous potential to impact the experience of Google users as well as conduct innovative research. Grounded in user behavior understanding and real use, Google’s HCI researchers invent, design, build and trial large-scale interactive systems in the real world. We declare success only when we positively impact our users and user communities, often through new and improved Google products. HCI research has fundamentally contributed to the design of Search, Gmail, Docs, Maps, Chrome, Android, YouTube, serving over a billion daily users. We are engaged in a variety of HCI disciplines such as predictive and intelligent user interface technologies and software, mobile and ubiquitous computing, social and collaborative computing, interactive visualization and visual analytics. Many projects heavily incorporate machine learning with HCI, and current projects include predictive user interfaces; recommenders for content, apps, and activities; smart input and prediction of text on mobile devices; user engagement analytics; user interface development tools; and interactive visualization of complex data.
Information Retrieval and the Web
The science surrounding search engines is commonly referred to as information retrieval, in which algorithmic principles are developed to match user interests to the best information about those interests.
Google started as a result of our founders' attempt to find the best matching between the user queries and Web documents, and do it really fast. During the process, they uncovered a few basic principles: 1) best pages tend to be those linked to the most; 2) best description of a page is often derived from the anchor text associated with the links to a page. Theories were developed to exploit these principles to optimize the task of retrieving the best documents for a user query.
Search and Information Retrieval on the Web has advanced significantly from those early days: 1) the notion of "information" has greatly expanded from documents to much richer representations such as images, videos, etc., 2) users are increasingly searching on their Mobile devices with very different interaction characteristics from search on the Desktops; 3) users are increasingly looking for direct information, such as answers to a question, or seeking to complete tasks, such as appointment booking. Through our research, we are continuing to enhance and refine the world's foremost search engine by aiming to scientifically understand the implications of those changes and address new challenges that they bring.
Machine Intelligence
Google is at the forefront of innovation in Machine Intelligence, with active research exploring virtually all aspects of machine learning, including deep learning and more classical algorithms. Exploring theory as well as application, much of our work on language, speech, translation, visual processing, ranking and prediction relies on Machine Intelligence. In all of those tasks and many others, we gather large volumes of direct or indirect evidence of relationships of interest, applying learning algorithms to understand and generalize.
Machine Intelligence at Google raises deep scientific and engineering challenges, allowing us to contribute to the broader academic research community through technical talks and publications in major conferences and journals. Contrary to much of current theory and practice, the statistics of the data we observe shifts rapidly, the features of interest change as well, and the volume of data often requires enormous computation capacity. When learning systems are placed at the core of interactive services in a fast changing and sometimes adversarial environment, combinations of techniques including deep learning and statistical models need to be combined with ideas from control and game theory.
Machine Perception
Research in machine perception tackles the hard problems of understanding images, sounds, music and video. In recent years, our computers have become much better at such tasks, enabling a variety of new applications such as: content-based search in Google Photos and Image Search, natural handwriting interfaces for Android, optical character recognition for Google Drive documents, and recommendation systems that understand music and YouTube videos. Our approach is driven by algorithms that benefit from processing very large, partially-labeled datasets using parallel computing clusters. A good example is our recent work on object recognition using a novel deep convolutional neural network architecture known as Inception that achieves state-of-the-art results on academic benchmarks and allows users to easily search through their large collection of Google Photos. The ability to mine meaningful information from multimedia is broadly applied throughout Google.
Machine Translation
Machine Translation is an excellent example of how cutting-edge research and world-class infrastructure come together at Google. We focus our research efforts on developing statistical translation techniques that improve with more data and generalize well to new languages. Our large scale computing infrastructure allows us to rapidly experiment with new models trained on web-scale data to significantly improve translation quality. This research backs the translations served at translate.google.com, allowing our users to translate text, web pages and even speech. Deployed within a wide range of Google services like GMail , Books , Android and web search , Google Translate is a high-impact, research-driven product that bridges language barriers and makes it possible to explore the multilingual web in 90 languages. Exciting research challenges abound as we pursue human quality translation and develop machine translation systems for new languages.
Mobile Systems
Mobile devices are the prevalent computing device in many parts of the world, and over the coming years it is expected that mobile Internet usage will outpace desktop usage worldwide. Google is committed to realizing the potential of the mobile web to transform how people interact with computing technology. Google engineers and researchers work on a wide range of problems in mobile computing and networking, including new operating systems and programming platforms (such as Android and ChromeOS); new interaction paradigms between people and devices; advanced wireless communications; and optimizing the web for mobile settings. In addition, many of Google’s core product teams, such as Search, Gmail, and Maps, have groups focused on optimizing the mobile experience, making it faster and more seamless. We take a cross-layer approach to research in mobile systems and networking, cutting across applications, networks, operating systems, and hardware. The tremendous scale of Google’s products and the Android and Chrome platforms make this a very exciting place to work on these problems.
Some representative projects include mobile web performance optimization, new features in Android to greatly reduce network data usage and energy consumption; new platforms for developing high performance web applications on mobile devices; wireless communication protocols that will yield vastly greater performance over today’s standards; and multi-device interaction based on Android, which is now available on a wide variety of consumer electronics.
Natural Language Processing
Natural Language Processing (NLP) research at Google focuses on algorithms that apply at scale, across languages, and across domains. Our systems are used in numerous ways across Google, impacting user experience in search, mobile, apps, ads, translate and more.
Our work spans the range of traditional NLP tasks, with general-purpose syntax and semantic algorithms underpinning more specialized systems. We are particularly interested in algorithms that scale well and can be run efficiently in a highly distributed environment.
Our syntactic systems predict part-of-speech tags for each word in a given sentence, as well as morphological features such as gender and number. They also label relationships between words, such as subject, object, modification, and others. We focus on efficient algorithms that leverage large amounts of unlabeled data, and recently have incorporated neural net technology.
On the semantic side, we identify entities in free text, label them with types (such as person, location, or organization), cluster mentions of those entities within and across documents (coreference resolution), and resolve the entities to the Knowledge Graph.
Recent work has focused on incorporating multiple sources of knowledge and information to aid with analysis of text, as well as applying frame semantics at the noun phrase, sentence, and document level.
Networking is central to modern computing, from connecting cell phones to massive Cloud-based data stores to the interconnect for data centers that deliver seamless storage and fine-grained distributed computing at the scale of entire buildings. With an understanding that our distributed computing infrastructure is a key differentiator for the company, Google has long focused on building network infrastructure to support our scale, availability, and performance needs.
Our research combines building and deploying novel networking systems at massive scale, with recent work focusing on fundamental questions around data center architecture, wide area network interconnects, Software Defined Networking control and management infrastructure, as well as congestion control and bandwidth allocation. By publishing our findings at premier research venues, we continue to engage both academic and industrial partners to further the state of the art in networked systems.
Quantum Computing
Quantum Computing merges two great scientific revolutions of the 20th century: computer science and quantum physics. Quantum physics is the theoretical basis of the transistor, the laser, and other technologies which enabled the computing revolution. But on the algorithmic level, today's computing machinery still operates on "classical" Boolean logic. Quantum Computing is the design of hardware and software that replaces Boolean logic by quantum law at the algorithmic level. For certain computations such as optimization, sampling, search or quantum simulation this promises dramatic speedups. We are particularly interested in applying quantum computing to artificial intelligence and machine learning. This is because many tasks in these areas rely on solving hard optimization problems or performing efficient sampling.
Having a machine learning agent interact with its environment requires true unsupervised learning, skill acquisition, active learning, exploration and reinforcement, all ingredients of human learning that are still not well understood or exploited through the supervised approaches that dominate deep learning today.
Our goal is to improve robotics via machine learning, and improve machine learning via robotics. We foster close collaborations between machine learning researchers and roboticists to enable learning at scale on real and simulated robotic systems.
Security, Privacy and Abuse Prevention
The Internet and the World Wide Web have brought many changes that provide huge benefits, in particular by giving people easy access to information that was previously unavailable, or simply hard to find. Unfortunately, these changes have raised many new challenges in the security of computer systems and the protection of information against unauthorized access and abusive usage. At Google, our primary focus is the user, and his/her safety. We have people working on nearly every aspect of security, privacy, and anti-abuse including access control and information security, networking, operating systems, language design, cryptography, fraud detection and prevention, spam and abuse detection, denial of service, anonymity, privacy-preserving systems, disclosure controls, as well as user interfaces and other human-centered aspects of security and privacy. Our security and privacy efforts cover a broad range of systems including mobile, cloud, distributed, sensors and embedded systems, and large-scale machine learning.
Software Engineering
At Google, we pride ourselves on our ability to develop and launch new products and features at a very fast pace. This is made possible in part by our world-class engineers, but our approach to software development enables us to balance speed and quality, and is integral to our success. Our obsession for speed and scale is evident in our developer infrastructure and tools. Developers across the world continually write, build, test and release code in multiple programming languages like C++, Java, Python, Javascript and others, and the Engineering Tools team, for example, is challenged to keep this development ecosystem running smoothly. Our engineers leverage these tools and infrastructure to produce clean code and keep software development running at an ever-increasing scale. In our publications, we share associated technical challenges and lessons learned along the way.
Software Systems
Delivering Google's products to our users requires computer systems that have a scale previously unknown to the industry. Building on our hardware foundation, we develop technology across the entire systems stack, from operating system device drivers all the way up to multi-site software systems that run on hundreds of thousands of computers. We design, build and operate warehouse-scale computer systems that are deployed across the globe. We build storage systems that scale to exabytes, approach the performance of RAM, and never lose a byte. We design algorithms that transform our understanding of what is possible. Thanks to the distributed systems we provide our developers, they are some of the most productive in the industry. And we write and publish research papers to share what we have learned, and because peer feedback and interaction helps us build better systems that benefit everybody.
Speech Processing
Our goal in Speech Technology Research is to make speaking to devices--those around you, those that you wear, and those that you carry with you--ubiquitous and seamless.
Our research focuses on what makes Google unique: computing scale and data. Using large scale computing resources pushes us to rethink the architecture and algorithms of speech recognition, and experiment with the kind of methods that have in the past been considered prohibitively expensive. We also look at parallelism and cluster computing in a new light to change the way experiments are run, algorithms are developed and research is conducted. The field of speech recognition is data-hungry, and using more and more data to tackle a problem tends to help performance but poses new challenges: how do you deal with data overload? How do you leverage unsupervised and semi-supervised techniques at scale? Which class of algorithms merely compensate for lack of data and which scale well with the task at hand? Increasingly, we find that the answers to these questions are surprising, and steer the whole field into directions that would never have been considered, were it not for the availability of significantly higher orders of magnitude of data.
We are also in a unique position to deliver very user-centric research. Researchers are able to conduct live experiments to test and benchmark new algorithms directly in a realistic controlled environment. Whether these are algorithmic performance improvements or user experience and human-computer interaction studies, we focus on solving real problems and with real impact for users.
We have a huge commitment to the diversity of our users, and have made it a priority to deliver the best performance to every language on the planet. We currently have systems operating in more than 55 languages, and we continue to expand our reach to more users. The challenges of internationalizing at scale is immense and rewarding. Many speakers of the languages we reach have never had the experience of speaking to a computer before, and breaking this new ground brings up new research on how to better serve this wide variety of users. Combined with the unprecedented translation capabilities of Google Translate, we are now at the forefront of research in speech-to-speech translation and one step closer to a universal translator.
Indexing and transcribing the web’s audio content is another challenge we have set for ourselves, and is nothing short of gargantuan, both in scope and difficulty. The videos uploaded every day on YouTube range from lectures, to newscasts, music videos and, of course, cat videos. Making sense of them takes the challenges of noise robustness, music recognition, speaker segmentation, language detection to new levels of difficulty. The potential payoff is immense: imagine making every lecture on the web accessible to every language. This is the kind of impact for which we are striving.
Health & Bioscience
Research in health and biomedical sciences has a unique potential to improve peoples’ lives, and includes work ranging from basic science that aims to understand biology, to diagnosing individuals’ diseases, to epidemiological studies of whole populations.
We recognize that our strengths in machine learning, large-scale computing, and human-computer interaction can help accelerate the progress of research in this space. By collaborating with world-class institutions and researchers and engaging in both early-stage research and late-stage work, we hope to help people live healthier, longer, and more productive lives.
Responsible AI
///::card.abstract///
///::author.name/// ///::author.name/// ,
Learn more about how we do research
We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work

Analytics Insight
Top 10 Machine Learning Research Papers of 2021
Machine learning research papers showcasing the transformation of the technology
Unbiased gradient estimation in unrolled computation graphs with persistent evolution, solving high-dimensional parabolic pdes using the tensor train format.
- TOP 10 MACHINE LEARNING TOOLS 2021
- TOP COMPANIES USING MACHINE LEARNING IN A PROFITABLE WAY
- MACHINE LEARNING GUIDE: DIFFERENCES BETWEEN PYTHON AND JAVA
Oops I took a gradient: Scalable sampling for discrete distributions
Optimal complexity in decentralized training, understanding self-supervised learning dynamics without contrastive pairs, how transferable are featured in deep neural networks, do we need hundreds of classifiers to solve real-world classification problems, knowledge vault: a web-scale approach to probabilistic knowledge fusion, scalable nearest neighbor algorithms for high dimensional data, trends in extreme learning machines.

Disclaimer: Any financial and crypto market information given on Analytics Insight are sponsored articles, written for informational purpose only and is not an investment advice. The readers are further advised that Crypto products and NFTs are unregulated and can be highly risky. There may be no regulatory recourse for any loss from such transactions. Conduct your own research by contacting financial experts before making any investment decisions. The decision to read hereinafter is purely a matter of choice and shall be construed as an express undertaking/guarantee in favour of Analytics Insight of being absolved from any/ all potential legal action, or enforceable claims. We do not represent nor own any cryptocurrency, any complaints, abuse or concerns with regards to the information provided shall be immediately informed here .
You May Also Like

Odisha CM Launches AI Initiative in Partnership with Intel

How Robots Can Help in Battling the Current Pandemic Situation

Top 10 Big Data Analytics Tools For Professionals In 2022

Green Signal in Cryptocurrency Market: Crypto Prices on January 27, 2022

Analytics Insight® is an influential platform dedicated to insights, trends, and opinion from the world of data-driven technologies. It monitors developments, recognition, and achievements made by Artificial Intelligence, Big Data and Analytics companies across the globe.

- Select Language:
- Privacy Policy
- Content Licensing
- Terms & Conditions
- Submit an Interview
Special Editions
- 40 Under 40 Innovators
- Women In Technology
- Market Reports
- AI Glossary
- Infographics
Latest Issue

Disclaimer: Any financial and crypto market information given on Analytics Insight is written for informational purpose only and is not an investment advice. Conduct your own research by contacting financial experts before making any investment decisions, more information here .
Second Menu

- Skip to main content
- Skip to primary sidebar
- Skip to footer

The Best of Applied Artificial Intelligence, Machine Learning, Automation, Bots, Chatbots
2020’s Top AI & Machine Learning Research Papers
November 24, 2020 by Mariya Yao

Despite the challenges of 2020, the AI research community produced a number of meaningful technical breakthroughs. GPT-3 by OpenAI may be the most famous, but there are definitely many other research papers worth your attention.
For example, teams from Google introduced a revolutionary chatbot, Meena, and EfficientDet object detectors in image recognition. Researchers from Yale introduced a novel AdaBelief optimizer that combines many benefits of existing optimization methods. OpenAI researchers demonstrated how deep reinforcement learning techniques can achieve superhuman performance in Dota 2.
To help you catch up on essential reading, we’ve summarized 10 important machine learning research papers from 2020. These papers will give you a broad overview of AI research advancements this year. Of course, there are many more breakthrough papers worth reading as well.
We have also published the top 10 lists of key research papers in natural language processing and computer vision . In addition, you can read our premium research summaries , where we feature the top 25 conversational AI research papers introduced recently.
Subscribe to our AI Research mailing list at the bottom of this article to be alerted when we release new summaries.
If you’d like to skip around, here are the papers we featured:
- A Distributed Multi-Sensor Machine Learning Approach to Earthquake Early Warning
- Efficiently Sampling Functions from Gaussian Process Posteriors
- Dota 2 with Large Scale Deep Reinforcement Learning
- Towards a Human-like Open-Domain Chatbot
- Language Models are Few-Shot Learners
- Beyond Accuracy: Behavioral Testing of NLP models with CheckList
- EfficientDet: Scalable and Efficient Object Detection
- Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild
- An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale
- AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients
Best AI & ML Research Papers 2020
1. a distributed multi-sensor machine learning approach to earthquake early warning , by kévin fauvel, daniel balouek-thomert, diego melgar, pedro silva, anthony simonet, gabriel antoniu, alexandru costan, véronique masson, manish parashar, ivan rodero, and alexandre termier, original abstract .
Our research aims to improve the accuracy of Earthquake Early Warning (EEW) systems by means of machine learning. EEW systems are designed to detect and characterize medium and large earthquakes before their damaging effects reach a certain location. Traditional EEW methods based on seismometers fail to accurately identify large earthquakes due to their sensitivity to the ground motion velocity. The recently introduced high-precision GPS stations, on the other hand, are ineffective to identify medium earthquakes due to their propensity to produce noisy data. In addition, GPS stations and seismometers may be deployed in large numbers across different locations and may produce a significant volume of data, consequently affecting the response time and the robustness of EEW systems.
In practice, EEW can be seen as a typical classification problem in the machine learning field: multi-sensor data are given in input, and earthquake severity is the classification result. In this paper, we introduce the Distributed Multi-Sensor Earthquake Early Warning (DMSEEW) system, a novel machine learning-based approach that combines data from both types of sensors (GPS stations and seismometers) to detect medium and large earthquakes. DMSEEW is based on a new stacking ensemble method which has been evaluated on a real-world dataset validated with geoscientists. The system builds on a geographically distributed infrastructure, ensuring an efficient computation in terms of response time and robustness to partial infrastructure failures. Our experiments show that DMSEEW is more accurate than the traditional seismometer-only approach and the combined-sensors (GPS and seismometers) approach that adopts the rule of relative strength.
Our Summary
The authors claim that traditional Earthquake Early Warning (EEW) systems that are based on seismometers, as well as recently introduced GPS systems, have their disadvantages with regards to predicting large and medium earthquakes respectively. Thus, the researchers suggest approaching an early earthquake prediction problem with machine learning by using the data from seismometers and GPS stations as input data. In particular, they introduce the Distributed Multi-Sensor Earthquake Early Warning (DMSEEW) system, which is specifically tailored for efficient computation on large-scale distributed cyberinfrastructures. The evaluation demonstrates that the DMSEEW system is more accurate than other baseline approaches with regard to real-time earthquake detection.

What’s the core idea of this paper?
- Seismometers have difficulty detecting large earthquakes because of their sensitivity to ground motion velocity.
- GPS stations are ineffective in detecting medium earthquakes, as they are prone to producing lots of noisy data.
- takes sensor-level class predictions from seismometers and GPS stations (i.e. normal activity, medium earthquake, large earthquake);
- aggregates these predictions using a bag-of-words representation and defines a final prediction for the earthquake category.
- Furthermore, they introduce a distributed cyberinfrastructure that can support the processing of high volumes of data in real time and allows the redirection of data to other processing data centers in case of disaster situations.
What’s the key achievement?
- precision – 100% vs. 63.2%;
- recall – 100% vs. 85.7%;
- F1 score – 100% vs. 72.7%.
- precision – 76.7% vs. 70.7%;
- recall – 38.8% vs. 34.1%;
- F1 score – 51.6% vs. 45.0%.
What does the AI community think?
- The paper received an Outstanding Paper award at AAAI 2020 (special track on AI for Social Impact).
What are future research areas?
- Evaluating DMSEEW response time and robustness via simulation of different scenarios in an existing EEW execution platform.
- Evaluating the DMSEEW system on another seismic network.
2. Efficiently Sampling Functions from Gaussian Process Posteriors , by James T. Wilson, Viacheslav Borovitskiy, Alexander Terenin, Peter Mostowsky, Marc Peter Deisenroth
Gaussian processes are the gold standard for many real-world modeling problems, especially in cases where a model’s success hinges upon its ability to faithfully represent predictive uncertainty. These problems typically exist as parts of larger frameworks, wherein quantities of interest are ultimately defined by integrating over posterior distributions. These quantities are frequently intractable, motivating the use of Monte Carlo methods. Despite substantial progress in scaling up Gaussian processes to large training sets, methods for accurately generating draws from their posterior distributions still scale cubically in the number of test locations. We identify a decomposition of Gaussian processes that naturally lends itself to scalable sampling by separating out the prior from the data. Building off of this factorization, we propose an easy-to-use and general-purpose approach for fast posterior sampling, which seamlessly pairs with sparse approximations to afford scalability both during training and at test time. In a series of experiments designed to test competing sampling schemes’ statistical properties and practical ramifications, we demonstrate how decoupled sample paths accurately represent Gaussian process posteriors at a fraction of the usual cost.
In this paper, the authors explore techniques for efficiently sampling from Gaussian process (GP) posteriors. After investigating the behaviors of naive approaches to sampling and fast approximation strategies using Fourier features, they find that many of these strategies are complementary. They, therefore, introduce an approach that incorporates the best of different sampling approaches. First, they suggest decomposing the posterior as the sum of a prior and an update. Then they combine this idea with techniques from literature on approximate GPs and obtain an easy-to-use general-purpose approach for fast posterior sampling. The experiments demonstrate that decoupled sample paths accurately represent GP posteriors at a much lower cost.
- The introduced approach to sampling functions from GP posteriors centers on the observation that it is possible to implicitly condition Gaussian random variables by combining them with an explicit corrective term.
- The authors translate this intuition to Gaussian processes and suggest decomposing the posterior as the sum of a prior and an update.
- Building on this factorization, the researchers suggest an efficient approach for fast posterior sampling that seamlessly pairs with sparse approximations to achieve scalability both during training and at test time.
- Introducing an easy-to-use and general-purpose approach to sampling from GP posteriors.
- avoid many shortcomings of the alternative sampling strategies;
- accurately represent GP posteriors at a much lower cost; for example, simulation of a well-known model of a biological neuron required only 20 seconds using decoupled sampling, while the iterative approach required 10 hours.
- The paper received an Honorable Mention at ICML 2020.
Where can you get implementation code?
- The authors released the implementation of this paper on GitHub .
3. Dota 2 with Large Scale Deep Reinforcement Learning , by Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung, Przemysław “Psyho” Dębiak, Christy Dennison, David Farhi, Quirin Fischer, Shariq Hashme, Chris Hesse, Rafal Józefowicz, Scott Gray, Catherine Olsson, Jakub Pachocki, Michael Petrov, Henrique Pondé de Oliveira Pinto, Jonathan Raiman, Tim Salimans, Jeremy Schlatter, Jonas Schneider, Szymon Sidor, Ilya Sutskever, Jie Tang, Filip Wolski, Susan Zhang
On April 13th, 2019, OpenAI Five became the first AI system to defeat the world champions at an esports game. The game of Dota 2 presents novel challenges for AI systems such as long time horizons, imperfect information, and complex, continuous state-action spaces, all challenges which will become increasingly central to more capable AI systems. OpenAI Five leveraged existing reinforcement learning techniques, scaled to learn from batches of approximately 2 million frames every 2 seconds. We developed a distributed training system and tools for continual training which allowed us to train OpenAI Five for 10 months. By defeating the Dota 2 world champion (Team OG), OpenAI Five demonstrates that self-play reinforcement learning can achieve superhuman performance on a difficult task.
The OpenAI research team demonstrates that modern reinforcement learning techniques can achieve superhuman performance in such a challenging esports game as Dota 2. The challenges of this particular task for the AI system lies in the long time horizons, partial observability, and high dimensionality of observation and action spaces. To tackle this game, the researchers scaled existing RL systems to unprecedented levels with thousands of GPUs utilized for 10 months. The resulting OpenAI Five model was able to defeat the Dota 2 world champions and won 99.4% of over 7000 games played during the multi-day showcase.

- The goal of the introduced OpenAI Five model is to find the policy that maximizes the probability of winning the game against professional human players, which in practice implies maximizing the reward function with some additional signals like characters dying, resources collected, etc.
- While the Dota 2 engine runs at 30 frames per second, the OpenAI Five only acts on every 4th frame.
- At each timestep, the model receives an observation with all the information available to human players (approximated in a set of data arrays) and returns a discrete action , which encodes the desired movement, attack, etc.
- A policy is defined as a function from the history of observations to a probability distribution over actions that are parameterized as an LSTM with ~159M parameters.
- The policy is trained using a variant of advantage actor critic, Proximal Policy Optimization.
- The OpenAI Five model was trained for 180 days spread over 10 months of real time.

- defeated the Dota 2 world champions in a best-of-three match (2–0);
- won 99.4% of over 7000 games during a multi-day online showcase.
- Applying introduced methods to other zero-sum two-team continuous environments.
What are possible business applications?
- Tackling challenging esports games like Dota 2 can be a promising step towards solving advanced real-world problems using reinforcement learning techniques.
4. Towards a Human-like Open-Domain Chatbot , by Daniel Adiwardana, Minh-Thang Luong, David R. So, Jamie Hall, Noah Fiedel, Romal Thoppilan, Zi Yang, Apoorv Kulshreshtha, Gaurav Nemade, Yifeng Lu, Quoc V. Le
We present Meena, a multi-turn open-domain chatbot trained end-to-end on data mined and filtered from public domain social media conversations. This 2.6B parameter neural network is simply trained to minimize perplexity of the next token. We also propose a human evaluation metric called Sensibleness and Specificity Average (SSA), which captures key elements of a human-like multi-turn conversation. Our experiments show strong correlation between perplexity and SSA. The fact that the best perplexity end-to-end trained Meena scores high on SSA (72% on multi-turn evaluation) suggests that a human-level SSA of 86% is potentially within reach if we can better optimize perplexity. Additionally, the full version of Meena (with a filtering mechanism and tuned decoding) scores 79% SSA, 23% higher in absolute SSA than the existing chatbots we evaluated.
In contrast to most modern conversational agents, which are highly specialized, the Google research team introduces a chatbot Meena that can chat about virtually anything. It’s built on a large neural network with 2.6B parameters trained on 341 GB of text. The researchers also propose a new human evaluation metric for open-domain chatbots, called Sensibleness and Specificity Average (SSA), which can capture important attributes for human conversation. They demonstrate that this metric correlates highly with perplexity, an automatic metric that is readily available. Thus, the Meena chatbot, which is trained to minimize perplexity, can conduct conversations that are more sensible and specific compared to other chatbots. Particularly, the experiments demonstrate that Meena outperforms existing state-of-the-art chatbots by a large margin in terms of the SSA score (79% vs. 56%) and is closing the gap with human performance (86%).

- Despite recent progress, open-domain chatbots still have significant weaknesses: their responses often do not make sense or are too vague or generic.
- Meena is built on a seq2seq model with Evolved Transformer (ET) that includes 1 ET encoder block and 13 ET decoder blocks.
- The model is trained on multi-turn conversations with the input sequence including all turns of the context (up to 7) and the output sequence being the response.
- making sense,
- being specific.
- The research team discovered that the SSA metric shows high negative correlation (R2 = 0.93) with perplexity, a readily available automatic metric that Meena is trained to minimize.
- Proposing a simple human-evaluation metric for open-domain chatbots.
- The best end-to-end trained Meena model outperforms existing state-of-the-art open-domain chatbots by a large margin, achieving an SSA score of 72% (vs. 56%).
- Furthermore, the full version of Meena, with a filtering mechanism and tuned decoding, further advances the SSA score to 79%, which is not far from the 86% SSA achieved by the average human.
- “Google’s “Meena” chatbot was trained on a full TPUv3 pod (2048 TPU cores) for 30 full days – that’s more than $1,400,000 of compute time to train this chatbot model.” – Elliot Turner, CEO and founder of Hyperia .
- “So I was browsing the results for the new Google chatbot Meena, and they look pretty OK (if boring sometimes). However, every once in a while it enters ‘scary sociopath mode,’ which is, shall we say, sub-optimal” – Graham Neubig, Associate professor at Carnegie Mellon University .

- Lowering the perplexity through improvements in algorithms, architectures, data, and compute.
- Considering other aspects of conversations beyond sensibleness and specificity, such as, for example, personality and factuality.
- Tackling safety and bias in the models.
- further humanizing computer interactions;
- improving foreign language practice;
- making interactive movie and videogame characters relatable.
- Considering the challenges related to safety and bias in the models, the authors haven’t released the Meena model yet. However, they are still evaluating the risks and benefits and may decide otherwise in the coming months.
5. Language Models are Few-Shot Learners , by Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei
Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions – something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10× more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3’s few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.
The OpenAI research team draws attention to the fact that the need for a labeled dataset for every new language task limits the applicability of language models. Considering that there is a wide range of possible tasks and it’s often difficult to collect a large labeled training dataset, the researchers suggest an alternative solution, which is scaling up language models to improve task-agnostic few-shot performance. They test their solution by training a 175B-parameter autoregressive language model, called GPT-3 , and evaluating its performance on over two dozen NLP tasks. The evaluation under few-shot learning, one-shot learning, and zero-shot learning demonstrates that GPT-3 achieves promising results and even occasionally outperforms the state of the art achieved by fine-tuned models.

- The GPT-3 model uses the same model and architecture as GPT-2, including the modified initialization, pre-normalization, and reversible tokenization.
- However, in contrast to GPT-2, it uses alternating dense and locally banded sparse attention patterns in the layers of the transformer, as in the Sparse Transformer .
- Few-shot learning , when the model is given a few demonstrations of the task (typically, 10 to 100) at inference time but with no weight updates allowed.
- One-shot learning , when only one demonstration is allowed, together with a natural language description of the task.
- Zero-shot learning , when no demonstrations are allowed and the model has access only to a natural language description of the task.
- On the CoQA benchmark, 81.5 F1 in the zero-shot setting, 84.0 F1 in the one-shot setting, and 85.0 F1 in the few-shot setting, compared to the 90.7 F1 score achieved by fine-tuned SOTA.
- On the TriviaQA benchmark, 64.3% accuracy in the zero-shot setting, 68.0% in the one-shot setting, and 71.2% in the few-shot setting, surpassing the state of the art (68%) by 3.2%.
- On the LAMBADA dataset, 76.2 % accuracy in the zero-shot setting, 72.5% in the one-shot setting, and 86.4% in the few-shot setting, surpassing the state of the art (68%) by 18%.
- The news articles generated by the 175B-parameter GPT-3 model are hard to distinguish from real ones, according to human evaluations (with accuracy barely above the chance level at ~52%).
- “The GPT-3 hype is way too much. It’s impressive (thanks for the nice compliments!) but it still has serious weaknesses and sometimes makes very silly mistakes. AI is going to change the world, but GPT-3 is just a very early glimpse. We have a lot still to figure out.” – Sam Altman, CEO and co-founder of OpenAI .
- “I’m shocked how hard it is to generate text about Muslims from GPT-3 that has nothing to do with violence… or being killed…” – Abubakar Abid, CEO and founder of Gradio .
- “No. GPT-3 fundamentally does not understand the world that it talks about. Increasing corpus further will allow it to generate a more credible pastiche but not fix its fundamental lack of comprehension of the world. Demos of GPT-4 will still require human cherry picking.” – Gary Marcus, CEO and founder of Robust.ai .
- “Extrapolating the spectacular performance of GPT3 into the future suggests that the answer to life, the universe and everything is just 4.398 trillion parameters.” – Geoffrey Hinton, Turing Award winner .
- Improving pre-training sample efficiency.
- Exploring how few-shot learning works.
- Distillation of large models down to a manageable size for real-world applications.
- The model with 175B parameters is hard to apply to real business problems due to its impractical resource requirements, but if the researchers manage to distill this model down to a workable size, it could be applied to a wide range of language tasks, including question answering, dialog agents, and ad copy generation.
- The code itself is not available, but some dataset statistics together with unconditional, unfiltered 2048-token samples from GPT-3 are released on GitHub .
6. Beyond Accuracy: Behavioral Testing of NLP models with CheckList , by Marco Tulio Ribeiro, Tongshuang Wu, Carlos Guestrin, Sameer Singh
Although measuring held-out accuracy has been the primary approach to evaluate generalization, it often overestimates the performance of NLP models, while alternative approaches for evaluating models either focus on individual tasks or on specific behaviors. Inspired by principles of behavioral testing in software engineering, we introduce CheckList, a task-agnostic methodology for testing NLP models. CheckList includes a matrix of general linguistic capabilities and test types that facilitate comprehensive test ideation, as well as a software tool to generate a large and diverse number of test cases quickly. We illustrate the utility of CheckList with tests for three tasks, identifying critical failures in both commercial and state-of-art models. In a user study, a team responsible for a commercial sentiment analysis model found new and actionable bugs in an extensively tested model. In another user study, NLP practitioners with CheckList created twice as many tests, and found almost three times as many bugs as users without it.
The authors point out the shortcomings of existing approaches to evaluating performance of NLP models. A single aggregate statistic, like accuracy, makes it difficult to estimate where the model is failing and how to fix it. The alternative evaluation approaches usually focus on individual tasks or specific capabilities. To address the lack of comprehensive evaluation approaches, the researchers introduce CheckList , a new evaluation methodology for testing of NLP models. The approach is inspired by principles of behavioral testing in software engineering. Basically, CheckList is a matrix of linguistic capabilities and test types that facilitates test ideation. Multiple user studies demonstrate that CheckList is very effective at discovering actionable bugs, even in extensively tested NLP models.

- The primary approach to the evaluation of models’ generalization capabilities, which is accuracy on held-out data, may lead to performance overestimation, as the held-out data often contains the same biases as the training data. Moreover, this single aggregate statistic doesn’t help much in figuring out where the NLP model is failing and how to fix these bugs.
- The alternative approaches are usually designed for evaluation of specific behaviors on individual tasks and thus, lack comprehensiveness.
- CheckList provides users with a list of linguistic capabilities to be tested, like vocabulary, named entity recognition, and negation.
- Then, to break down potential capability failures into specific behaviors, CheckList suggests different test types , such as prediction invariance or directional expectation tests in case of certain perturbations.
- Potential tests are structured as a matrix, with capabilities as rows and test types as columns.
- The suggested implementation of CheckList also introduces a variety of abstractions to help users generate large numbers of test cases easily.
- Evaluation of state-of-the-art models with CheckList demonstrated that even though some NLP tasks are considered “solved” based on accuracy results, the behavioral testing highlights many areas for improvement.
- helps to identify and test for capabilities not previously considered;
- results in more thorough and comprehensive testing for previously considered capabilities;
- helps to discover many more actionable bugs.
- The paper received the Best Paper Award at ACL 2020, the leading conference in natural language processing.
- CheckList can be used to create more exhaustive testing for a variety of NLP tasks.
- Such comprehensive testing that helps in identifying many actionable bugs is likely to lead to more robust NLP systems.
- The code for testing NLP models with CheckList is available on GitHub .
7. EfficientDet: Scalable and Efficient Object Detection , by Mingxing Tan, Ruoming Pang, Quoc V. Le
Model efficiency has become increasingly important in computer vision. In this paper, we systematically study neural network architecture design choices for object detection and propose several key optimizations to improve efficiency. First, we propose a weighted bi-directional feature pyramid network (BiFPN), which allows easy and fast multi-scale feature fusion; Second, we propose a compound scaling method that uniformly scales the resolution, depth, and width for all backbone, feature network, and box/class prediction networks at the same time. Based on these optimizations and EfficientNet backbones, we have developed a new family of object detectors, called EfficientDet, which consistently achieve much better efficiency than prior art across a wide spectrum of resource constraints. In particular, with single-model and single-scale, our EfficientDet-D7 achieves state-of-the-art 52.2 AP on COCO test-dev with 52M parameters and 325B FLOPs, being 4×–9× smaller and using 13×–42× fewer FLOPs than previous detectors. Code is available on https://github.com/google/automl/tree/master/efficientdet .
The large size of object detection models deters their deployment in real-world applications such as self-driving cars and robotics. To address this problem, the Google Research team introduces two optimizations, namely (1) a weighted bi-directional feature pyramid network (BiFPN) for efficient multi-scale feature fusion and (2) a novel compound scaling method. By combining these optimizations with the EfficientNet backbones, the authors develop a family of object detectors, called EfficientDet . The experiments demonstrate that these object detectors consistently achieve higher accuracy with far fewer parameters and multiply-adds (FLOPs).

- A weighted bi-directional feature pyramid network (BiFPN) for easy and fast multi-scale feature fusion. It learns the importance of different input features and repeatedly applies top-down and bottom-up multi-scale feature fusion.
- A new compound scaling method for simultaneous scaling of the resolution, depth, and width for all backbone, feature network, and box/class prediction networks.
- These optimizations, together with the EfficientNet backbones, allow the development of a new family of object detectors, called EfficientDet .
- the EfficientDet model with 52M parameters gets state-of-the-art 52.2 AP on the COCO test-dev dataset, outperforming the previous best detector with 1.5 AP while being 4× smaller and using 13× fewer FLOPs;
- with simple modifications, the EfficientDet model achieves 81.74% mIOU accuracy, outperforming DeepLabV3+ by 1.7% on Pascal VOC 2012 semantic segmentation with 9.8x fewer FLOPs;
- the EfficientDet models are up to 3× to 8× faster on GPU/CPU than previous detectors.
- The paper was accepted to CVPR 2020, the leading conference in computer vision.
- The high level of interest in the code implementations of this paper makes this research one of the highest-trending papers introduced recently.
- The high accuracy and efficiency of the EfficientDet detectors may enable their application for real-world tasks, including self-driving cars and robotics.
- The authors released the official TensorFlow implementation of EfficientDet.
- The PyTorch implementation of this paper can be found here and here .
8. Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild , by Shangzhe Wu, Christian Rupprecht, Andrea Vedaldi
We propose a method to learn 3D deformable object categories from raw single-view images, without external supervision. The method is based on an autoencoder that factors each input image into depth, albedo, viewpoint and illumination. In order to disentangle these components without supervision, we use the fact that many object categories have, at least in principle, a symmetric structure. We show that reasoning about illumination allows us to exploit the underlying object symmetry even if the appearance is not symmetric due to shading. Furthermore, we model objects that are probably, but not certainly, symmetric by predicting a symmetry probability map, learned end-to-end with the other components of the model. Our experiments show that this method can recover very accurately the 3D shape of human faces, cat faces and cars from single-view images, without any supervision or a prior shape model. On benchmarks, we demonstrate superior accuracy compared to another method that uses supervision at the level of 2D image correspondences.
The research group from the University of Oxford studies the problem of learning 3D deformable object categories from single-view RGB images without additional supervision. To decompose the image into depth, albedo, illumination, and viewpoint without direct supervision for these factors, they suggest starting by assuming objects to be symmetric. Then, considering that real-world objects are never fully symmetrical, at least due to variations in pose and illumination, the researchers augment the model by explicitly modeling illumination and predicting a dense map with probabilities that any given pixel has a symmetric counterpart. The experiments demonstrate that the introduced approach achieves better reconstruction results than other unsupervised methods. Moreover, it outperforms the recent state-of-the-art method that leverages keypoint supervision.

- no access to 2D or 3D ground truth information such as keypoints, segmentation, depth maps, or prior knowledge of a 3D model;
- using an unconstrained collection of single-view images without having multiple views of the same instance.
- leveraging symmetry as a geometric cue to constrain the decomposition;
- explicitly modeling illumination and using it as an additional cue for recovering the shape;
- augmenting the model to account for potential lack of symmetry – particularly, predicting a dense map that contains the probability of a given pixel having a symmetric counterpart in the image.
- Qualitative evaluation of the suggested approach demonstrates that it reconstructs 3D faces of humans and cats with high fidelity, containing fine details of the nose, eyes, and mouth.
- The method reconstructs higher-quality shapes compared to other state-of-the-art unsupervised methods, and even outperforms the DepthNet model, which uses 2D keypoint annotations for depth prediction.

- The paper received the Best Paper Award at CVPR 2020, the leading conference in computer vision.
- Reconstructing more complex objects by extending the model to use either multiple canonical views or a different 3D representation, such as a mesh or a voxel map.
- Improving model performance under extreme lighting conditions and for extreme poses.
- The implementation code and demo are available on GitHub .

9. An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale , by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby
While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks while keeping their overall structure in place. We show that this reliance on CNNs is not necessary and a pure transformer can perform very well on image classification tasks when applied directly to sequences of image patches. When pre-trained on large amounts of data and transferred to multiple recognition benchmarks (ImageNet, CIFAR-100, VTAB, etc.), Vision Transformer attain excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.
The authors of this paper show that a pure Transformer can perform very well on image classification tasks. They introduce Vision Transformer (ViT) , which is applied directly to sequences of image patches by analogy with tokens (words) in NLP. When trained on large datasets of 14M–300M images, Vision Transformer approaches or beats state-of-the-art CNN-based models on image recognition tasks. In particular, it achieves an accuracy of 88.36% on ImageNet, 90.77% on ImageNet-ReaL, 94.55% on CIFAR-100, and 77.16% on the VTAB suite of 19 tasks.

- When applying Transformer architecture to images, the authors follow as closely as possible the design of the original Transformer designed for NLP.
- splitting images into fixed-size patches;
- linearly embedding each of them;
- adding position embeddings to the resulting sequence of vectors;
- feeding the patches to a standard Transformer encoder;
- adding an extra learnable ‘classification token’ to the sequence.
- Similarly to Transformers in NLP, Vision Transformer is typically pre-trained on large datasets and fine-tuned to downstream tasks.
- 88.36% on ImageNet;
- 90.77% on ImageNet-ReaL;
- 94.55% on CIFAR-100;
- 97.56% on Oxford-IIIT Pets;
- 99.74% on Oxford Flowers-102;
- 77.16% on the VTAB suite of 19 tasks.

- The paper is trending in the AI research community, as evident from the repository stats on GitHub .
- It is also under review for ICLR 2021 , one of the key conferences in deep learning.
- Applying Vision Transformer to other computer vision tasks, such as detection and segmentation.
- Exploring self-supervised pre-training methods.
- Analyzing the few-shot properties of Vision Transformer.
- Exploring contrastive pre-training.
- Further scaling ViT.
- Thanks to their efficient pre-training and high performance, Transformers may substitute convolutional networks in many computer vision applications, including navigation, automatic inspection, and visual surveillance.
- The PyTorch implementation of Vision Transformer is available on GitHub .
10. AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients , by Juntang Zhuang, Tommy Tang, Sekhar Tatikonda, Nicha Dvornek, Yifan Ding, Xenophon Papademetris, James S. Duncan
Most popular optimizers for deep learning can be broadly categorized as adaptive methods (e.g. Adam) or accelerated schemes (e.g. stochastic gradient descent (SGD) with momentum). For many models such as convolutional neural networks (CNNs), adaptive methods typically converge faster but generalize worse compared to SGD; for complex settings such as generative adversarial networks (GANs), adaptive methods are typically the default because of their stability. We propose AdaBelief to simultaneously achieve three goals: fast convergence as in adaptive methods, good generalization as in SGD, and training stability. The intuition for AdaBelief is to adapt the step size according to the “belief” in the current gradient direction. Viewing the exponential moving average (EMA) of the noisy gradient as the prediction of the gradient at the next time step, if the observed gradient greatly deviates from the prediction, we distrust the current observation and take a small step; if the observed gradient is close to the prediction, we trust it and take a large step. We validate AdaBelief in extensive experiments, showing that it outperforms other methods with fast convergence and high accuracy on image classification and language modeling. Specifically, on ImageNet, AdaBelief achieves comparable accuracy to SGD. Furthermore, in the training of a GAN on Cifar10, AdaBelief demonstrates high stability and improves the quality of generated samples compared to a well-tuned Adam optimizer. Code is available at https://github.com/juntang-zhuang/Adabelief-Optimizer .
The researchers introduce AdaBelief , a new optimizer, which combines the high convergence speed of adaptive optimization methods and good generalization capabilities of accelerated stochastic gradient descent (SGD) schemes. The core idea behind the AdaBelief optimizer is to adapt step size based on the difference between predicted gradient and observed gradient: the step is small if the observed gradient deviates significantly from the prediction, making us distrust this observation, and the step is large when the current observation is close to the prediction, making us believe in this observation. The experiments confirm that AdaBelief combines fast convergence of adaptive methods, good generalizability of the SGD family, and high stability in the training of GANs.
- The idea of the AdaBelief optimizer is to combine the advantages of adaptive optimization methods (e.g., Adam) and accelerated SGD optimizers. Adaptive methods typically converge faster, while SGD optimizers demonstrate better generalization performance.
- If the observed gradient deviates greatly from the prediction, we have a weak belief in this observation and take a small step.
- If the observed gradient is close to the prediction, we have a strong belief in this observation and take a large step.
- fast convergence, like adaptive optimization methods;
- good generalization, like the SGD family;
- training stability in complex settings such as GAN.
- In image classification tasks on CIFAR and ImageNet, AdaBelief demonstrates as fast convergence as Adam and as good generalization as SGD.
- It outperforms other methods in language modeling.
- In the training of a WGAN , AdaBelief significantly improves the quality of generated images compared to Adam.
- The paper was accepted to NeurIPS 2020, the top conference in artificial intelligence.
- It is also trending in the AI research community, as evident from the repository stats on GitHub .
- AdaBelief can boost the development and application of deep learning models as it can be applied to the training of any model that numerically estimates parameter gradient.
- Both PyTorch and Tensorflow implementations are released on GitHub.
If you like these research summaries, you might be also interested in the following articles:
- GPT-3 & Beyond: 10 NLP Research Papers You Should Read
- Novel Computer Vision Research Papers From 2020
- AAAI 2021: Top Research Papers With Business Applications
- ICLR 2021: Key Research Papers
Enjoy this article? Sign up for more AI research updates.
We’ll let you know when we release more summary articles like this one.
- Email Address *
- Name * First Last
- Natural Language Processing (NLP)
- Chatbots & Conversational AI
- Computer Vision
- Ethics & Safety
- Machine Learning
- Deep Learning
- Reinforcement Learning
- Generative Models
- Other (Please Describe Below)
- What is your biggest challenge with AI research? *
Reader Interactions
About Mariya Yao
Mariya is the co-author of Applied AI: A Handbook For Business Leaders and former CTO at Metamaven. She "translates" arcane technical concepts into actionable business advice for executives and designs lovable products people actually want to use. Follow her on Twitter at @thinkmariya to raise your AI IQ.
May 16, 2021 at 8:13 pm
Merci pour ces informations massives
Leave a Reply
Your email address will not be published. Required fields are marked *
About TOPBOTS
- Expert Contributors
- Terms of Service & Privacy Policy
- Contact TOPBOTS
For Developers
How to write a good research paper in the machine learning area.

A research paper on machine learning refers to the proper technical documentation that explains any fundamental theory, topic survey, or proof of concept using a mathematical model or practical implementation. It demands hours of study and effort to lay out all the information ideally that addresses the topic in a presentable manner.
The reviewers of the research paper utilize thumb rules, such as replicability of results, availability of code, and others, to analyze its worth. Additionally, the acceptance guidelines from all the prestigious journals and conferences like ICLR, ICML, NeurIPS, and others are quite strict. After so much skimming of the research paper, only a few lucky ones get selected and the rest are all discarded.
These few high-valued papers get published or applauded by the top researchers of the community and they get into practical applications.
Thus, it is important to know the ins and outs of how to write research paper in machine learning. In this article, we will help you with expert advice on how you can ace your research paper in machine learning.
Table of Contents
- 1. What makes an excellent research paper on machine learning?
- 2. What are the important parts of a research paper?
- 3. Types of machine learning papers you can write
- 4. How to write a successful research paper in machine learning?
- 5. How to submit your machine learning research papers?
- 6. How are machine learning papers assessed?
- 7. Do’s and don't of writing research paper
- 8. Conclusion
What makes an excellent research paper on machine learning?
An excellent machine learning paper is based on good research that is transparent and reproducible. It should be replicable in nature so that the study's findings can be tested by other researchers.
Such papers demand research with a completely new architecture, algorithm, or fundamentals. Include the goals of your research and categorize your paper in terms of some familiar class like a computational model of human learning, a formal analysis, application of any established methods, a description of any new learning algorithm, and others.
Further, ensure that you bring together various evidence, views, and facts about the topic that you are targeting in machine learning. You can derive your information from different interviews, articles, and books to verify the facts included in your research paper.
The four major characteristics that the writer of a machine learning research paper should consider are its length, format, style, and sources.
Additionally, including an abstract to your research paper will bring your machine learning paper into a nutshell from its introduction to the conclusion.
What are the important parts of a research paper?
An excellent research paper is drafted in a formal structure that includes several sections maintaining the flow of the content. It is important to ensure that the readers can quickly find the information they are looking for in your research paper.
Here’s a complete list of everything a research paper should include.
- Introduction
- Methodology
- Discussion and conclusion
These are some of the standard sections that is available is almost every research paper. However, there can be additional sections based on the topic you choose to write on, such as a dedicated space for the related research papers on machine learning to the author’s work.
Types of machine learning papers you can write
The initial step toward writing an excellent machine learning research paper is to select your targeted category. The below-given image will clear your thoughts on the same.

1. Survey paper without implementation
This paper category includes an excessive survey for any machine learning domain. For example, if someone wants to write a research paper on healthcare and machine learning, there will be tons of research already being carried out. To summarize that work in a single paper by finding some interesting facts can be enough to start with survey paper writing.
The following are excellent websites to check for the latest research papers.
- Google Scholar
- DBLP - computer science bibliography
- WorldWideScience
- Science.Gov
- Virtual Learning Resources Center
You can download a research paper on machine learning from the sites mentioned above, and then you can take any particular application or algorithm and check for advancement in it. Finally, prepare the summarized table of all the research held in your selected area with proper citation, its merits, and demerits.
2. Survey Paper With Implementation
If you wish to write a survey paper with implementation, you should select a topic and get the dataset for that domain. Following are the websites to get a free dataset.
- Google Dataset Search
- Open Data Portal
- AWS Open Data
- Academic Torrents
For example, using various machine learning algorithms, you can select the topic as employee attrition prediction. Next, you can datasets available for public use, apply supervised or unsupervised machine learning algorithms, and check the accuracy. Finally, show the comparative table of all five or six algorithms you are using for that dataset and conclude the best algorithm for your chosen problems.
3. Paper with just proof of concept
This category of paper requires in-depth knowledge of the selected area. Here, you must understand any available machine learning or deep learning algorithm and optimize it by modifying it or analyzing it mathematically. This paper showcases the brief, logical, and technical proof of the proposed new architecture or algorithm.
4. Developing new machine learning algorithms
Machine learning is still an emerging field. However, there are many application areas of machine learning algorithms like agriculture, health, social media, computer vision, image processing, NLP , sentimental analysis, recommender system, prediction, business analytics, and almost all the fields can directly or indirectly use machine learning in one or another way.
Any machine learning algorithm developed for one application may not work with the same efficiency on another application. Most of the algorithms are application-specific. So, there is always a scope to design a new algorithm for the application. For example, if you wish to apply machine learning for mangrove classification from satellite images, you need to modify any available algorithm that is good for camera-captured images and not satellite images. So it gives scope to create or modify the available algorithm.
5. Developing new architecture
IoT, or the Internet of Things, is an emerging field in the artificial intelligence area. As described in the previous point, machine learning can be applied in almost all areas. So, whenever you wish to include ML in IoT, it gives rise to new IoT+ML architecture. Such type of paper includes newly developed architecture for any technology. Green IoT, Privacy-Preserving ML, IoTML, Healthcare, ML, and more, are areas where there is huge research scope for new or modified architecture.
6. Comparison of various machine learning algorithms
This category of paper sounds more like a survey paper. The paper title for such category includes, “House price prediction: Survey of various machine learning algorithms.” Thus, such a paper includes one problem domain, and all the possible implementations which have already been done are documented using proper citations.
The main novelty of this type of paper lies in the summarized table, which includes algorithms, methods, merits, and demerits of using that algorithm for a given problem domain.
7. Analysis of any manually collected data
This kind of paper is generally preferred in MBA programs. Here researchers send Google forms or any physical questionaries’ to the end-users. The data is collected as per the user experience. Such collected data is then applied to any machine learning model for classification or prediction. Sometimes it can also be used to perform regression analysis . It can also be used for any data collected for business analytics. For example, searching buyers’ buying patterns or churn prediction.
8. Applying ML algorithms for prediction or classification
It is a purely implementation-based category. The first step here will be to define the problem statement, then select the properly suitable dataset for it, and divide the data into training and testing sets. Then assign the target variable in the case of supervised learning. Fit the appropriate machine learning model. Evaluate the result.
To sum up, the points mentioned above, research paper writing is not a skill that can be acquired in a few minutes, but it is a skill you acquire with more and more practice. To write a good research paper, one should be very clear with the objectives. Then, perform the implementation parts and demonstrate the results fruitfully.
How to write a successful research paper in machine learning?

1. Write as if your reader knows nothing
An average reader is not aware of the importance of your topic. You need to formulate clear thoughts and back up your information with credible sources. Spend enough time on your research and make the reader aware of your topic in the introduction section of your work.
Additionally, you need to bear at least four kinds of readers in mind while writing your research paper on machine learning.
Professionals of your research field: The people in the same research field as yours will know all the relevant terms and related work of your research. They will be a few in number and are less likely to be your peers.
Professionals in closely related research areas: Such people would not be aware of your research or the specific problems you are addressing in your research. But they do have a general understanding of the wider research area you are targeting. So it is important to include an aspect from their perspective to keep them connected till the conclusion of your research paper.
Supervisor: Your supervisor would already know what and why you are doing in your research paper. We recommend that you don’t write a research paper with your supervisor as a reader in your mind.
Professionals from remote areas: The biggest portion of your readers are the people from remotely related research areas. This group would include some of the reviewers or the people who aren’t aware of the importance of your research or methods. We recommend you not explain the same to them and continue writing a research paper considering a basic understanding of the topic in your readers' minds.
2. Write when your results are ready
It is important to have the results on the table before you start writing your machine learning research paper. However, you can write the introduction part as early as possible even before having your results analyzed. This exception will help you get a clear picture of your deep learning papers and identify the relevant work.
Many authors of the machine learning research paper may question the ticking clock towards the deadline. But it is important to know the complete story from the introduction to the conclusion before writing it down. We recommend you get the results of your research first, run an analysis of them, and then move on to writing all about it in your research paper.
3. Review your paper like a critic
There are some things that, as a research paper writer, you should be accustomed to. We have listed them below for you.
- Be aware of the limitations of your research. Make a list of all of them.
- Search for any weaknesses in the paper. If they can be fixed, resolve them or else describe the limits of what you did instead of giving an excuse.
- Proofread your research paper to its bits and pieces.
Additionally, there are some questions that your machine learning papers reviewer might ask you, so prepare their answers in advance.
- Did you get lucky with your choice of datasets?
- Why were the given parameters chosen for your experimental setup?
- Will your research findings also work on other datasets?
4. Avoid too much mathiness
Your research paper can have some formulas to describe your findings or concepts. But they should be put precisely so that the reader or the reviewer doesn’t take much time to understand them.
In many cases, when people overuse the formulae or provide spurious explanations to justify their finding, it reduces the impact of your research paper and you will lose a lot of readers as well, even if your paper gets published.
5. Abstract to be written at last
The abstract is one of the important aspects of a research paper is a vital part that is read by the majority of your readers. We advise you to write it at last so that you can include the key essences and takeaways of your research paper.
How to submit your machine learning research papers?
Once you complete your research paper, it is to be submitted under some policies set by the organizers of various journals. These policies are set up to ensure an established ecosystem that would encourage the machine learning practitioners who are writing research papers to volunteer for reproducing the claimed results.
In the new program introduced, there are three components that you should keep in mind.
- Code submission policy
- ML reproducibility checklist for claimed results
- Community-wide reproducibility challenge
They demanded these parameters from all machine learning papers in order to promote best practices and code repository assessments. It helps in eliminating the need to build your future work from scratch.
How are machine learning papers assessed?
Every year, the conferences and the journals receive thousands of research papers. There is an ML code completeness checklist that verifies the code repository in your research paper for artefacts and scripts provided in it.
In addition to the above, the further analysis of the paper by the reviewers sets the final decision on whether your paper will be published or not.
Do’s and don't of writing research paper
Every researcher wished to have their paper published in top journals. But, it isn’t that easy. There is a whole list of things that you should keep in mind while writing your research paper. We have elaborated on it below.
- Present your work precisely. Avoid writing stories. Justify your research with methodologies and innovative ideas that fellow researchers can follow.
- Maintain a certain flow of content in your research paper.
- Provide solid supportive arguments and pieces of evidence that justify your findings.
- Include scientific terminologies in your research paper.
- Refer to sources from diverse backgrounds for up-to-date and trustworthy information.
- Ensure that you proofread the paper several times to eliminate any possible errors.
- Avoid any kind of plagiarism in your research paper.
- Don’t just replicate Wikipedia. Instead, find trustworthy sources for your citation and create your own original piece of content.
- Don’t include incomplete information. Be honest with your readers and include all the aspects related to your work that would answer the queries in the reader's mind.
- Support each of your findings and don’t reveal any absurd reasons for doing the research.
- Avoid going beyond the recommended word limit to pose an impression of your seriousness about following the guidelines.
- Don’t include fillers in your research paper and stick to points that are sufficient.
With all said above, you will now know how to write research paper in machine learning. It will no longer be a challenge for you and will make things easier for you. We recommend you stick to the standards, as doing something new will increase the risk involved in getting your paper published. Just stick to the above-mentioned tips and tricks and you are good to go.
We hope you get your research paper published!

Author is a seasoned writer with a reputation for crafting highly engaging, well-researched, and useful content that is widely read by many of today's skilled programmers and developers.
Frequently Asked Questions
Yes, AI can write a research paper for you in less time than you would take to write it manually.
We have listed down some of the top journals where you can publish machine learning papers below.
- Elsevier Pattern Recognition
- Journal of Machine Learning Research
- IEEE Transactions on Pattern Analysis and Machine Intelligence
- Wiley International Journal of Intelligent Systems
- IEEE Transactions on Neural Networks and Learning Systems
Here is a list of some of the best research papers for machine learning.
Unbiased Gradient Estimation in Unrolled Computation Graphs with Persistent Evolution By Paul Vicol, Luke Metz, and Jascha Sohl-Dickstein
Scalable nearest neighbor algorithms for high dimensional data By Lowe, D.G., & Muja, M.
Trends in extreme learning machines By Huang, G., Huang, G., Song, S., & You, K.
Solving high-dimensional parabolic PDEs using the tensor train format By Lorenz Richter, Leon Sallandt, and Nikolas Nüsken
Optimal complexity in decentralized training By researchers at Cornell University, Yucheng Lu and Christopher De Sa
Follow the procedure given below to write a dataset in your research paper.
Step 1: Navigate to your study folder and then “Manage” tab.
Step 2: Select “Manage datasets.”
Step 3: Select “Create new dataset.”
Check out some free platforms which will publish your machine learning papers for free.
- ScienceOpen
- Social Science Research Network
- Directory of Open Access Journals
- Education Resources Information Center
- arXiv e-Print Archive
An abstract is something that summarises your paper in a small paragraph. So, when you write it for your research paper, ensure that:
- Its word count is 300 or less.
- It includes the purpose of your paper.
- Your discovery or findings as an outcome of your research paper
Hire remote developers
Tell us the skills you need and we'll find the best developer for you in days, not weeks.
Hire Developers

Exploring 250+ Machine Learning Research Topics

In recent years, machine learning has become super popular and grown very quickly. This happened because technology got better, and there’s a lot more data available. Because of this, we’ve seen lots of new and amazing things happen in different areas. Machine learning research is what makes all these cool things possible. In this blog, we’ll talk about machine learning research topics, why they’re important, how you can pick one, what areas are popular to study, what’s new and exciting, the tough problems, and where you can find help if you want to be a researcher.
Why Does Machine Learning Research Matter?
Table of Contents
Machine learning research is at the heart of the AI revolution. It underpins the development of intelligent systems capable of making predictions, automating tasks, and improving decision-making across industries. The importance of this research can be summarized as follows:
Advancements in Technology
The growth of machine learning research has led to the development of powerful algorithms, tools, and frameworks. Numerous industries, including healthcare, banking, autonomous cars, and natural language processing, have found use for these technology.
As researchers continue to push the boundaries of what’s possible, we can expect even more transformative technologies to emerge.
Real-world Applications
Machine learning research has brought about tangible changes in our daily lives. Voice assistants like Siri and Alexa, recommendation systems on streaming platforms, and personalized healthcare diagnostics are just a few examples of how this research impacts our world.
By working on new research topics, scientists can further refine these applications and create new ones.
Economic and Industrial Impacts
The economic implications of machine learning research are substantial. Companies that harness the power of machine learning gain a competitive edge in the market.
This creates a demand for skilled machine learning researchers, driving job opportunities and contributing to economic growth.
How to Choose the Machine Learning Research Topics?
Selecting the right machine learning research topics is crucial for your success as a machine learning researcher. Here’s a guide to help you make an informed decision:
- Understanding Your Interests
Start by considering your personal interests. Machine learning is a broad field with applications in virtually every sector. By choosing a topic that aligns with your passions, you’ll stay motivated and engaged throughout your research journey.
- Reviewing Current Trends
Stay updated on the latest trends in machine learning. Attend conferences, read research papers, and engage with the community to identify emerging research topics. Current trends often lead to exciting breakthroughs.
- Identifying Gaps in Existing Research
Sometimes, the most promising research topics involve addressing gaps in existing knowledge. These gaps may become evident through your own experiences, discussions with peers, or in the course of your studies.
- Collaborating with Experts
Collaboration is key in research. Working with experts in the field can help you refine your research topic and gain valuable insights. Seek mentors and collaborators who can guide you.
250+ Machine Learning Research Topics: Category-wise
Supervised learning.
- Explainable AI for Decision Support
- Few-shot Learning Methods
- Time Series Forecasting with Deep Learning
- Handling Imbalanced Datasets in Classification
- Regression Techniques for Non-linear Data
- Transfer Learning in Supervised Settings
- Multi-label Classification Strategies
- Semi-Supervised Learning Approaches
- Novel Feature Selection Methods
- Anomaly Detection in Supervised Scenarios
- Federated Learning for Distributed Supervised Models
- Ensemble Learning for Improved Accuracy
- Automated Hyperparameter Tuning
- Ethical Implications in Supervised Models
- Interpretability of Deep Neural Networks.
Unsupervised Learning
- Unsupervised Clustering of High-dimensional Data
- Semi-Supervised Clustering Approaches
- Density Estimation in Unsupervised Learning
- Anomaly Detection in Unsupervised Settings
- Transfer Learning for Unsupervised Tasks
- Representation Learning in Unsupervised Learning
- Outlier Detection Techniques
- Generative Models for Data Synthesis
- Manifold Learning in High-dimensional Spaces
- Unsupervised Feature Selection
- Privacy-Preserving Unsupervised Learning
- Community Detection in Complex Networks
- Clustering Interpretability and Visualization
- Unsupervised Learning for Image Segmentation
- Autoencoders for Dimensionality Reduction.
Reinforcement Learning
- Deep Reinforcement Learning in Real-world Applications
- Safe Reinforcement Learning for Autonomous Systems
- Transfer Learning in Reinforcement Learning
- Imitation Learning and Apprenticeship Learning
- Multi-agent Reinforcement Learning
- Explainable Reinforcement Learning Policies
- Hierarchical Reinforcement Learning
- Model-based Reinforcement Learning
- Curriculum Learning in Reinforcement Learning
- Reinforcement Learning in Robotics
- Exploration vs. Exploitation Strategies
- Reward Function Design and Ethical Considerations
- Reinforcement Learning in Healthcare
- Continuous Action Spaces in RL
- Reinforcement Learning for Resource Management.
Natural Language Processing (NLP)
- Multilingual and Cross-lingual NLP
- Contextualized Word Embeddings
- Bias Detection and Mitigation in NLP
- Named Entity Recognition for Low-resource Languages
- Sentiment Analysis in Social Media Text
- Dialogue Systems for Improved Customer Service
- Text Summarization for News Articles
- Low-resource Machine Translation
- Explainable NLP Models
- Coreference Resolution in NLP
- Question Answering in Specific Domains
- Detecting Fake News and Misinformation
- NLP for Healthcare: Clinical Document Understanding
- Emotion Analysis in Text
- Text Generation with Controlled Attributes.
Computer Vision
- Video Action Recognition and Event Detection
- Object Detection in Challenging Conditions (e.g., low light)
- Explainable Computer Vision Models
- Image Captioning for Accessibility
- Large-scale Image Retrieval
- Domain Adaptation in Computer Vision
- Fine-grained Image Classification
- Facial Expression Recognition
- Visual Question Answering
- Self-supervised Learning for Visual Representations
- Weakly Supervised Object Localization
- Human Pose Estimation in 3D
- Scene Understanding in Autonomous Vehicles
- Image Super-resolution
- Gaze Estimation for Human-Computer Interaction.
Deep Learning
- Neural Architecture Search for Efficient Models
- Self-attention Mechanisms and Transformers
- Interpretability in Deep Learning Models
- Robustness of Deep Neural Networks
- Generative Adversarial Networks (GANs) for Data Augmentation
- Neural Style Transfer in Art and Design
- Adversarial Attacks and Defenses
- Neural Networks for Audio and Speech Processing
- Explainable AI for Healthcare Diagnosis
- Automated Machine Learning (AutoML)
- Reinforcement Learning with Deep Neural Networks
- Model Compression and Quantization
- Lifelong Learning with Deep Learning Models
- Multimodal Learning with Vision and Language
- Federated Learning for Privacy-preserving Deep Learning.
Explainable AI
- Visualizing Model Decision Boundaries
- Saliency Maps and Feature Attribution
- Rule-based Explanations for Black-box Models
- Contrastive Explanations for Model Interpretability
- Counterfactual Explanations and What-if Analysis
- Human-centered AI for Explainable Healthcare
- Ethics and Fairness in Explainable AI
- Explanation Generation for Natural Language Processing
- Explainable AI in Financial Risk Assessment
- User-friendly Interfaces for Model Interpretability
- Scalability and Efficiency in Explainable Models
- Hybrid Models for Combined Accuracy and Explainability
- Post-hoc vs. Intrinsic Explanations
- Evaluation Metrics for Explanation Quality
- Explainable AI for Autonomous Vehicles.
Transfer Learning
- Zero-shot Learning and Few-shot Learning
- Cross-domain Transfer Learning
- Domain Adaptation for Improved Generalization
- Multilingual Transfer Learning in NLP
- Pretraining and Fine-tuning Techniques
- Lifelong Learning and Continual Learning
- Domain-specific Transfer Learning Applications
- Model Distillation for Knowledge Transfer
- Contrastive Learning for Transfer Learning
- Self-training and Pseudo-labeling
- Dynamic Adaption of Pretrained Models
- Privacy-Preserving Transfer Learning
- Unsupervised Domain Adaptation
- Negative Transfer Avoidance in Transfer Learning.
Federated Learning
- Secure Aggregation in Federated Learning
- Communication-efficient Federated Learning
- Privacy-preserving Techniques in Federated Learning
- Federated Transfer Learning
- Heterogeneous Federated Learning
- Real-world Applications of Federated Learning
- Federated Learning for Edge Devices
- Federated Learning for Healthcare Data
- Differential Privacy in Federated Learning
- Byzantine-robust Federated Learning
- Federated Learning with Non-IID Data
- Model Selection in Federated Learning
- Scalable Federated Learning for Large Datasets
- Client Selection and Sampling Strategies
- Global Model Update Synchronization in Federated Learning.
Quantum Machine Learning
- Quantum Neural Networks and Quantum Circuit Learning
- Quantum-enhanced Optimization for Machine Learning
- Quantum Data Compression and Quantum Principal Component Analysis
- Quantum Kernels and Quantum Feature Maps
- Quantum Variational Autoencoders
- Quantum Transfer Learning
- Quantum-inspired Classical Algorithms for ML
- Hybrid Quantum-Classical Models
- Quantum Machine Learning on Near-term Quantum Devices
- Quantum-inspired Reinforcement Learning
- Quantum Computing for Quantum Chemistry and Drug Discovery
- Quantum Machine Learning for Finance
- Quantum Data Structures and Quantum Databases
- Quantum-enhanced Cryptography in Machine Learning
- Quantum Generative Models and Quantum GANs.
Ethical AI and Bias Mitigation
- Fairness-aware Machine Learning Algorithms
- Bias Detection and Mitigation in Real-world Data
- Explainable AI for Ethical Decision Support
- Algorithmic Accountability and Transparency
- Privacy-preserving AI and Data Governance
- Ethical Considerations in AI for Healthcare
- Fairness in Recommender Systems
- Bias and Fairness in NLP Models
- Auditing AI Systems for Bias
- Societal Implications of AI in Criminal Justice
- Ethical AI Education and Training
- Bias Mitigation in Autonomous Vehicles
- Fair AI in Financial and Hiring Decisions
- Case Studies in Ethical AI Failures
- Legal and Policy Frameworks for Ethical AI.
Meta-Learning and AutoML
- Neural Architecture Search (NAS) for Efficient Models
- Transfer Learning in NAS
- Reinforcement Learning for NAS
- Multi-objective NAS
- Automated Data Augmentation
- Neural Architecture Optimization for Edge Devices
- Bayesian Optimization for AutoML
- Model Compression and Quantization in AutoML
- AutoML for Federated Learning
- AutoML in Healthcare Diagnostics
- Explainable AutoML
- Cost-sensitive Learning in AutoML
- AutoML for Small Data
- Human-in-the-Loop AutoML.
AI for Healthcare and Medicine
- Disease Prediction and Early Diagnosis
- Medical Image Analysis with Deep Learning
- Drug Discovery and Molecular Modeling
- Electronic Health Record Analysis
- Predictive Analytics in Healthcare
- Personalized Treatment Planning
- Healthcare Fraud Detection
- Telemedicine and Remote Patient Monitoring
- AI in Radiology and Pathology
- AI in Drug Repurposing
- AI for Medical Robotics and Surgery
- Genomic Data Analysis
- AI-powered Mental Health Assessment
- Explainable AI in Healthcare Decision Support
- AI in Epidemiology and Outbreak Prediction.
AI in Finance and Investment
- Algorithmic Trading and High-frequency Trading
- Credit Scoring and Risk Assessment
- Fraud Detection and Anti-money Laundering
- Portfolio Optimization with AI
- Financial Market Prediction
- Sentiment Analysis in Financial News
- Explainable AI in Financial Decision-making
- Algorithmic Pricing and Dynamic Pricing Strategies
- AI in Cryptocurrency and Blockchain
- Customer Behavior Analysis in Banking
- Explainable AI in Credit Decisioning
- AI in Regulatory Compliance
- Ethical AI in Financial Services
- AI for Real Estate Investment
- Automated Financial Reporting.
AI in Climate Change and Sustainability
- Climate Modeling and Prediction
- Renewable Energy Forecasting
- Smart Grid Optimization
- Energy Consumption Forecasting
- Carbon Emission Reduction with AI
- Ecosystem Monitoring and Preservation
- Precision Agriculture with AI
- AI for Wildlife Conservation
- Natural Disaster Prediction and Management
- Water Resource Management with AI
- Sustainable Transportation and Urban Planning
- Climate Change Mitigation Strategies with AI
- Environmental Impact Assessment with Machine Learning
- Eco-friendly Supply Chain Optimization
- Ethical AI in Climate-related Decision Support.
Data Privacy and Security
- Differential Privacy Mechanisms
- Federated Learning for Privacy-preserving AI
- Secure Multi-Party Computation
- Privacy-enhancing Technologies in Machine Learning
- Homomorphic Encryption for Machine Learning
- Ethical Considerations in Data Privacy
- Privacy-preserving AI in Healthcare
- AI for Secure Authentication and Access Control
- Blockchain and AI for Data Security
- Explainable Privacy in Machine Learning
- Privacy-preserving AI in Government and Public Services
- Privacy-compliant AI for IoT and Edge Devices
- Secure AI Models Sharing and Deployment
- Privacy-preserving AI in Financial Transactions
- AI in the Legal Frameworks of Data Privacy.
Global Collaboration in Research
- International Research Partnerships and Collaboration Models
- Multilingual and Cross-cultural AI Research
- Addressing Global Healthcare Challenges with AI
- Ethical Considerations in International AI Collaborations
- Interdisciplinary AI Research in Global Challenges
- AI Ethics and Human Rights in Global Research
- Data Sharing and Data Access in Global AI Research
- Cross-border Research Regulations and Compliance
- AI Innovation Hubs and International Research Centers
- AI Education and Training for Global Communities
- Humanitarian AI and AI for Sustainable Development Goals
- AI for Cultural Preservation and Heritage Protection
- Collaboration in AI-related Global Crises
- AI in Cross-cultural Communication and Understanding
- Global AI for Environmental Sustainability and Conservation.
Emerging Trends and Hot Topics in Machine Learning Research
The landscape of machine learning research topics is constantly evolving. Here are some of the emerging trends and hot topics that are shaping the field:
As AI systems become more prevalent, addressing ethical concerns and mitigating bias in algorithms are critical research areas.
Interpretable and Explainable Models
Understanding why machine learning models make specific decisions is crucial for their adoption in sensitive areas, such as healthcare and finance.
Meta-learning algorithms are designed to enable machines to learn how to learn, while AutoML aims to automate the machine learning process itself.
Machine learning is revolutionizing the healthcare sector, from diagnostic tools to drug discovery and patient care.
Algorithmic trading, risk assessment, and fraud detection are just a few applications of AI in finance, creating a wealth of research opportunities.
Machine learning research is crucial in analyzing and mitigating the impacts of climate change and promoting sustainable practices.
Challenges and Future Directions
While machine learning research has made tremendous strides, it also faces several challenges:
- Data Privacy and Security: As machine learning models require vast amounts of data, protecting individual privacy and data security are paramount concerns.
- Scalability and Efficiency: Developing efficient algorithms that can handle increasingly large datasets and complex computations remains a challenge.
- Ensuring Fairness and Transparency: Addressing bias in machine learning models and making their decisions transparent is essential for equitable AI systems.
- Quantum Computing and Machine Learning: The integration of quantum computing and machine learning has the potential to revolutionize the field, but it also presents unique challenges.
- Global Collaboration in Research: Machine learning research benefits from collaboration on a global scale. Ensuring that researchers from diverse backgrounds work together is vital for progress.
Resources for Machine Learning Researchers
If you’re looking to embark on a journey in machine learning research topics, there are various resources at your disposal:
- Journals and Conferences
Journals such as the “Journal of Machine Learning Research” and conferences like NeurIPS and ICML provide a platform for publishing and discussing research findings.
- Online Communities and Forums
Platforms like Stack Overflow, GitHub, and dedicated forums for machine learning provide spaces for collaboration and problem-solving.
- Datasets and Tools
Open-source datasets and tools like TensorFlow and PyTorch simplify the research process by providing access to data and pre-built models.
- Research Grants and Funding Opportunities
Many organizations and government agencies offer research grants and funding for machine learning projects. Seek out these opportunities to support your research.
Machine learning research is like a superhero in the world of technology. To be a part of this exciting journey, it’s important to choose the right machine learning research topics and keep up with the latest trends.
Machine learning research makes our lives better. It powers things like smart assistants and life-saving medical tools. It’s like the force driving the future of technology and society.
But, there are challenges too. We need to work together and be ethical in our research. Everyone should benefit from this technology. The future of machine learning research is incredibly bright. If you want to be a part of it, get ready for an exciting adventure. You can help create new solutions and make a big impact on the world.
Related Posts

Tips on How To Tackle A Machine Learning Project As A Beginner
Here in this blog, CodeAvail experts will explain to you tips on how to tackle a machine learning project as a beginner step by step…

Artificial Intelligence and Machine Learning Basics for Beginners
Here in this blog, CodeAvail experts will explain to you Artificial Intelligence and Machine Learning basics for beginners in detail step by step. What is…
Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
- View all journals
- Explore content
- About the journal
- Publish with us
- Sign up for alerts
- 06 November 2023
‘ChatGPT detector’ catches AI-generated papers with unprecedented accuracy
- McKenzie Prillaman
You can also search for this author in PubMed Google Scholar
A machine-learning tool can easily spot when chemistry papers are written using the chatbot ChatGPT, according to a study published on 6 November in Cell Reports Physical Science 1 . The specialized classifier, which outperformed two existing artificial intelligence (AI) detectors, could help academic publishers to identify papers created by AI text generators.
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
24,99 € / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
185,98 € per year
only 3,65 € per issue
Rent or buy this article
Prices vary by article type
Prices may be subject to local taxes which are calculated during checkout
doi: https://doi.org/10.1038/d41586-023-03479-4
Desaire, H., Chua, A. E., Kim, M.-G. & Hua, D. Cell Rep. Phys. Sci. https://doi.org/10.1016/j.xcrp.2023.101672 (2023).
Article Google Scholar
Desaire, H. et al. Cell Rep. Phys. Sci . https://doi.org/10.1016/j.xcrp.2023.101426 (2023).
Download references
Reprints and Permissions
Related Articles

- Machine learning

Microbiologist who was harassed during COVID pandemic sues university
News 21 NOV 23
Authors reply to questionable publicity
Correspondence 14 NOV 23

Who should pay for open-access publishing? APC alternatives emerge
News Feature 14 NOV 23

ChatGPT generates fake data set to support scientific hypothesis
News 22 NOV 23
AI should focus on equity in pandemic preparedness
Correspondence 21 NOV 23

Hypotheses devised by AI could find ‘blind spots’ in research
Nature Index 17 NOV 23

What the OpenAI drama means for AI progress — and safety
News Explainer 23 NOV 23

From the archive: a juice extractor in an insect’s gut, and amateur radio telephony
News & Views 21 NOV 23

SpaceX Starship launch ends in explosion — what’s next for the mega-rocket?
News 20 NOV 23
Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.
Quick links
- Explore articles by subject
- Guide to authors
- Editorial policies
- Search for: Toggle Search
What Is a Transformer Model?
If you want to ride the next big wave in AI, grab a transformer.
They’re not the shape-shifting toy robots on TV or the trash-can-sized tubs on telephone poles.
So, What’s a Transformer Model?
A transformer model is a neural network that learns context and thus meaning by tracking relationships in sequential data like the words in this sentence.
Transformer models apply an evolving set of mathematical techniques, called attention or self-attention, to detect subtle ways even distant data elements in a series influence and depend on each other.
First described in a 2017 paper from Google, transformers are among the newest and one of the most powerful classes of models invented to date. They’re driving a wave of advances in machine learning some have dubbed transformer AI.
Stanford researchers called transformers “foundation models” in an August 2021 paper because they see them driving a paradigm shift in AI. The “sheer scale and scope of foundation models over the last few years have stretched our imagination of what is possible,” they wrote.
What Can Transformer Models Do?
Transformers are translating text and speech in near real-time, opening meetings and classrooms to diverse and hearing-impaired attendees.
They’re helping researchers understand the chains of genes in DNA and amino acids in proteins in ways that can speed drug design.

Transformers can detect trends and anomalies to prevent fraud, streamline manufacturing, make online recommendations or improve healthcare.
People use transformers every time they search on Google or Microsoft Bing.
The Virtuous Cycle of Transformer AI
Any application using sequential text, image or video data is a candidate for transformer models.
That enables these models to ride a virtuous cycle in transformer AI. Created with large datasets, transformers make accurate predictions that drive their wider use, generating more data that can be used to create even better models.

“Transformers made self-supervised learning possible, and AI jumped to warp speed,” said NVIDIA founder and CEO Jensen Huang in his keynote address this week at GTC.
Transformers Replace CNNs, RNNs
Transformers are in many cases replacing convolutional and recurrent neural networks (CNNs and RNNs), the most popular types of deep learning models just five years ago.
Indeed, 70 percent of arXiv papers on AI posted in the last two years mention transformers. That’s a radical shift from a 2017 IEEE study that reported RNNs and CNNs were the most popular models for pattern recognition.
No Labels, More Performance
Before transformers arrived, users had to train neural networks with large, labeled datasets that were costly and time-consuming to produce. By finding patterns between elements mathematically, transformers eliminate that need, making available the trillions of images and petabytes of text data on the web and in corporate databases.
In addition, the math that transformers use lends itself to parallel processing, so these models can run fast.
Transformers now dominate popular performance leaderboards like SuperGLUE , a benchmark developed in 2019 for language-processing systems.
How Transformers Pay Attention
Like most neural networks, transformer models are basically large encoder/decoder blocks that process data.
Small but strategic additions to these blocks (shown in the diagram below) make transformers uniquely powerful.

Transformers use positional encoders to tag data elements coming in and out of the network. Attention units follow these tags, calculating a kind of algebraic map of how each element relates to the others.
Attention queries are typically executed in parallel by calculating a matrix of equations in what’s called multi-headed attention.
With these tools, computers can see the same patterns humans see.
Self-Attention Finds Meaning
For example, in the sentence:
She poured water from the pitcher to the cup until it was full.
We know “it” refers to the cup, while in the sentence:
She poured water from the pitcher to the cup until it was empty.
We know “it” refers to the pitcher.
“Meaning is a result of relationships between things, and self-attention is a general way of learning relationships,” said Ashish Vaswani, a former senior staff research scientist at Google Brain who led work on the seminal 2017 paper.
“Machine translation was a good vehicle to validate self-attention because you needed short- and long-distance relationships among words,” said Vaswani.
“Now we see self-attention is a powerful, flexible tool for learning,” he added.
How Transformers Got Their Name
Attention is so key to transformers the Google researchers almost used the term as the name for their 2017 model. Almost.
“Attention Net didn’t sound very exciting,” said Vaswani, who started working with neural nets in 2011.
.Jakob Uszkoreit, a senior software engineer on the team, came up with the name Transformer.
“I argued we were transforming representations, but that was just playing semantics,” Vaswani said.
The Birth of Transformers
In the paper for the 2017 NeurIPS conference, the Google team described their transformer and the accuracy records it set for machine translation.
Thanks to a basket of techniques, they trained their model in just 3.5 days on eight NVIDIA GPUs, a small fraction of the time and cost of training prior models. They trained it on datasets with up to a billion pairs of words.
“It was an intense three-month sprint to the paper submission date,” recalled Aidan Gomez, a Google intern in 2017 who contributed to the work.
“The night we were submitting, Ashish and I pulled an all-nighter at Google,” he said. “I caught a couple hours sleep in one of the small conference rooms, and I woke up just in time for the submission when someone coming in early to work opened the door and hit my head.”
It was a wakeup call in more ways than one.
“Ashish told me that night he was convinced this was going to be a huge deal, something game changing. I wasn’t convinced, I thought it would be a modest gain on a benchmark, but it turned out he was very right,” said Gomez, now CEO of startup Cohere that’s providing a language processing service based on transformers.
A Moment for Machine Learning
Vaswani recalls the excitement of seeing the results surpass similar work published by a Facebook team using CNNs.
“I could see this would likely be an important moment in machine learning,” he said.
A year later, another Google team tried processing text sequences both forward and backward with a transformer. That helped capture more relationships among words, improving the model’s ability to understand the meaning of a sentence.
Their Bidirectional Encoder Representations from Transformers ( BERT ) model set 11 new records and became part of the algorithm behind Google search.
Within weeks, researchers around the world were adapting BERT for use cases across many languages and industries “because text is one of the most common data types companies have,” said Anders Arpteg, a 20-year veteran of machine learning research.
Putting Transformers to Work
Soon transformer models were being adapted for science and healthcare.
DeepMind, in London, advanced the understanding of proteins, the building blocks of life, using a transformer called AlphaFold2, described in a recent Nature article . It processed amino acid chains like text strings to set a new watermark for describing how proteins fold, work that could speed drug discovery.
AstraZeneca and NVIDIA developed MegaMolBART , a transformer tailored for drug discovery. It’s a version of the pharmaceutical company’s MolBART transformer, trained on a large, unlabeled database of chemical compounds using the NVIDIA Megatron framework for building large-scale transformer models.
Reading Molecules, Medical Records
“Just as AI language models can learn the relationships between words in a sentence, our aim is that neural networks trained on molecular structure data will be able to learn the relationships between atoms in real-world molecules,” said Ola Engkvist, head of molecular AI, discovery sciences and R&D at AstraZeneca, when the work was announced last year .
Separately, the University of Florida ’s academic health center collaborated with NVIDIA researchers to create GatorTron . The transformer model aims to extract insights from massive volumes of clinical data to accelerate medical research.
Transformers Grow Up
Along the way, researchers found larger transformers performed better.
For example, researchers from the Rostlab at the Technical University of Munich, which helped pioneer work at the intersection of AI and biology, used natural-language processing to understand proteins . In 18 months, they graduated from using RNNs with 90 million parameters to transformer models with 567 million parameters.

The OpenAI lab showed bigger is better with its Generative Pretrained Transformer (GPT). The latest version, GPT-3 , has 175 billion parameters, up from 1.5 billion for GPT-2.
With the extra heft, GPT-3 can respond to a user’s query even on tasks it was not specifically trained to handle. It’s already being used by companies including Cisco, IBM and Salesforce.
Tale of a Mega Transformer
NVIDIA and Microsoft hit a high watermark in November, announcing the Megatron-Turing Natural Language Generation model ( MT-NLG ) with 530 billion parameters. It debuted along with a new framework, NVIDIA NeMo Megatron , that aims to let any business create its own billion- or trillion-parameter transformers to power custom chatbots, personal assistants and other AI applications that understand language.
MT-NLG had its public debut as the brain for TJ, the Toy Jensen avatar that gave part of the keynote at NVIDIA’s November 2021 GTC.
“When we saw TJ answer questions — the power of our work demonstrated by our CEO — that was exciting,” said Mostofa Patwary, who led the NVIDIA team that trained the model.

Creating such models is not for the faint of heart. MT-NLG was trained using hundreds of billions of data elements, a process that required thousands of GPUs running for weeks.
“Training large transformer models is expensive and time-consuming, so if you’re not successful the first or second time, projects might be canceled,” said Patwary.
Trillion-Parameter Transformers
Today, many AI engineers are working on trillion-parameter transformers and applications for them.
“We’re constantly exploring how these big models can deliver better applications. We also investigate in what aspects they fail, so we can build even better and bigger ones,” Patwary said.
To provide the computing muscle those models need, our latest accelerator — the NVIDIA H100 Tensor Core GPU — packs a Transformer Engine and supports a new FP8 format. That speeds training while preserving accuracy.
With those and other advances, “transformer model training can be reduced from weeks to days” said Huang at GTC.
MoE Means More for Transformers
Last year, Google researchers described the Switch Transformer , one of the first trillion-parameter models. It uses AI sparsity, a complex mixture-of experts (MoE) architecture and other advances to drive performance gains in language processing and up to 7x increases in pre-training speed.

For its part, Microsoft Azure worked with NVIDIA to implement an MoE transformer for its Translator service.
Tackling Transformers’ Challenges
Now some researchers aim to develop simpler transformers with fewer parameters that deliver performance similar to the largest models.
“I see promise in retrieval-based models that I’m super excited about because they could bend the curve,” said Gomez, of Cohere, noting the Retro model from DeepMind as an example.
Retrieval-based models learn by submitting queries to a database. “It’s cool because you can be choosy about what you put in that knowledge base,” he said.

The ultimate goal is to “make these models learn like humans do from context in the real world with very little data,” said Vaswani, now co-founder of a stealth AI startup.
He imagines future models that do more computation upfront so they need less data and sport better ways users can give them feedback.
“Our goal is to build models that will help people in their everyday lives,” he said of his new venture.
Safe, Responsible Models
Other researchers are studying ways to eliminate bias or toxicity if models amplify wrong or harmful language. For example, Stanford created the Center for Research on Foundation Models to explore these issues.
“These are important problems that need to be solved for safe deployment of models,” said Shrimai Prabhumoye, a research scientist at NVIDIA who’s among many across the industry working in the area.
“Today, most models look for certain words or phrases, but in real life these issues may come out subtly, so we have to consider the whole context,” added Prabhumoye.
“That’s a primary concern for Cohere, too,” said Gomez. “No one is going to use these models if they hurt people, so it’s table stakes to make the safest and most responsible models.”
Beyond the Horizon
Vaswani imagines a future where self-learning, attention-powered transformers approach the holy grail of AI.
“We have a chance of achieving some of the goals people talked about when they coined the term ‘general artificial intelligence’ and I find that north star very inspiring,” he said.
“We are in a time where simple methods like neural networks are giving us an explosion of new capabilities.”

Learn more about transformers on the NVIDIA Technical Blog .
NVIDIA websites use cookies to deliver and improve the website experience. See our cookie policy for further details on how we use cookies and how to change your cookie settings.
This paper is in the following e-collection/theme issue:
Published on 16.11.2023 in Vol 25 (2023)
Risk Factors and Predictive Models for Peripherally Inserted Central Catheter Unplanned Extubation in Patients With Cancer: Prospective, Machine Learning Study
Authors of this article:

Original Paper
- Jinghui Zhang 1, 2, 3 , PhD ;
- Guiyuan Ma 1, 2 , MD ;
- Sha Peng 1, 2 , MD ;
- Jianmei Hou 1 , MD ;
- Ran Xu 1, 2 , MD ;
- Lingxia Luo 1, 2 , MD ;
- Jiaji Hu 1, 2 , MD ;
- Nian Yao 1, 2 , MD ;
- Jiaan Wang 4 , BS ;
- Xin Huang 5 , BS
1 Teaching and Research Section of Clinical Nursing, Xiangya Hospital of Central South University, Changsha, Hunan, China
2 Xiangya School of Nursing, Central South University, Changsha, Hunan, China
3 National Clinical Research Center for Geriatric Diseases, Xiangya Hospital, Central South University, Changsha, Hunan, China
4 Vascular Access Department, Hainan Provincial People's Hospital, Hainan, China
5 Department of Nursing, Affiliated Hospital of Qinghai University, Qinghai, China
Corresponding Author:
Guiyuan Ma, MD
Teaching and Research Section of Clinical Nursing
Xiangya Hospital of Central South University
Number 87 Xiangya Road, Kaifu District
Changsha, Hunan, 410008
Phone: 86 13026179120
Email: [email protected]
Background: Cancer indeed represents a significant public health challenge, and unplanned extubation of peripherally inserted central catheter (PICC-UE) is a critical concern in patient safety. Identifying independent risk factors and implementing high-quality assessment tools for early detection in high-risk populations can play a crucial role in reducing the incidence of PICC-UE among patients with cancer. Precise prevention and treatment strategies are essential to improve patient outcomes and safety in clinical settings.
Objective: This study aims to identify the independent risk factors associated with PICC-UE in patients with cancer and to construct a predictive model tailored to this group, offering a theoretical framework for anticipating and preventing PICC-UE in these patients.
Methods: Prospective data were gathered from January to December 2022, encompassing patients with cancer with PICC at Xiangya Hospital, Central South University. Each patient underwent continuous monitoring until the catheter’s removal. The patients were categorized into 2 groups: the UE group (n=3107) and the non-UE group (n=284). Independent risk factors were identified through univariate analysis, the least absolute shrinkage and selection operator (LASSO) algorithm, and multivariate analysis. Subsequently, the 3391 patients were classified into a train set and a test set in a 7:3 ratio. Utilizing the identified predictors, 3 predictive models were constructed using the logistic regression, support vector machine, and random forest algorithms. The ultimate model was selected based on the receiver operating characteristic (ROC) curve and TOPSIS (Technique for Order Preference by Similarity to Ideal Solution) synthesis analysis. To further validate the model, we gathered prospective data from 600 patients with cancer at the Affiliated Hospital of Qinghai University and Hainan Provincial People’s Hospital from June to December 2022. We assessed the model’s performance using the area under the curve of the ROC to evaluate differentiation, the calibration curve for calibration capability, and decision curve analysis (DCA) to gauge the model’s clinical applicability.
Results: Independent risk factors for PICC-UE in patients with cancer were identified, including impaired physical mobility (odds ratio [OR] 2.775, 95% CI 1.951-3.946), diabetes (OR 1.754, 95% CI 1.134-2.712), surgical history (OR 1.734, 95% CI 1.313-2.290), elevated D-dimer concentration (OR 2.376, 95% CI 1.778-3.176), targeted therapy (OR 1.441, 95% CI 1.104-1.881), surgical treatment (OR 1.543, 95% CI 1.152-2.066), and more than 1 catheter puncture (OR 1.715, 95% CI 1.121-2.624). Protective factors were normal BMI (OR 0.449, 95% CI 0.342-0.590), polyurethane catheter material (OR 0.305, 95% CI 0.228-0.408), and valved catheter (OR 0.639, 95% CI 0.480-0.851). The TOPSIS synthesis analysis results showed that in the train set, the composite index (Ci) values were 0.00 for the logistic model, 0.82 for the support vector machine model, and 0.85 for the random forest model. In the test set, the Ci values were 0.00 for the logistic model, 1.00 for the support vector machine model, and 0.81 for the random forest model. The optimal model, constructed based on the support vector machine, was obtained and validated externally. The ROC curve, calibration curve, and DCA curve demonstrated that the model exhibited excellent accuracy, stability, generalizability, and clinical applicability.
Conclusions: In summary, this study identified 10 independent risk factors for PICC-UE in patients with cancer. The predictive model developed using the support vector machine algorithm demonstrated excellent clinical applicability and was validated externally, providing valuable support for the early prediction of PICC-UE in patients with cancer.
Introduction
Peripherally inserted central catheters (PICCs) are commonly used in patients with cancer who need long-term chemotherapy and supportive care therapy [ 1 ]. PICCs can effectively minimize vascular irritation caused by chemotherapy drugs, thereby preventing extravasation and the necessity for repeated punctures [ 2 , 3 ]. However, PICCs also have their share of disadvantages. One significant issue is the occurrence of unplanned extubation (UE) during PICC placement, which can be both frequent and severe [ 4 ]. PICC-UE occurs when the catheter needs to be withdrawn prematurely due to severe complications or accidental dislodgment resulting from patient or operator factors [ 4 , 5 ]. The incidence rates for PICC-UE range from 2.5% to 40.7% [ 6 ]. The occurrence of PICC-UE poses a significant risk to patients with cancer. It not only delays chemotherapy, prolongs hospitalization, and increases the financial burden on their families but also impacts the patients’ quality of life and, in some cases, even threatens their lives [ 7 ].
Previous studies primarily focused on risk factors for PICC-related complications. These complications can be associated with a variety of factors, including (1) patient-related factors, such as critically ill bedridden patients, age, and immunity [ 8 , 9 ]; operator-related factors, such as puncture times, professional skills, and the use of visualization technology [ 10 - 12 ]; catheter-related factors, such as catheter material, catheter lumen, and catheter diameter [ 13 - 15 ]; and treatment process–related factors, such as chemotherapy, radiotherapy, different drug types, and other aspects [ 16 - 18 ]. However, there is limited research on the risk factors for PICC-UE. Existing studies have primarily centered on accidental dislodgment of ventilator tubes [ 4 , 19 ], with insufficient attention paid to PICC-UE. Therefore, it is imperative to identify PICC-UE risk factors and develop predictive models in patients with cancer to enhance the safety of PICC usage.
To mitigate the adverse effects of PICC-UE, a promising strategy is to identify high-risk patients and offer appropriate advice for extended catheter usage. While risk prediction models for UE have been developed for intensive care unit (ICU) patients with ventilator tracheal intubation [ 20 , 21 ], there are no studies or models that can identify high-risk patients for PICC-UE. Lee et al [ 20 ] developed a risk assessment tool for evaluating UE of the endotracheal tube, while Hur et al [ 21 ] used 8 years of data to build a predictive model for UE using various machine learning (ML) algorithms. While both models exhibited high sensitivity and specificity, they were designed for predicting UE in ventilator tube cases.
ML algorithms are adept at extracting key features from complex data sets and are increasingly used in diagnosing and prognosticating various diseases [ 22 ]. In the context of PICC-related complications, previous studies have used ML techniques to assess risk [ 23 , 24 ]. Badheka et al [ 23 ] identified high-risk predictors of catheter-related thrombosis in infants under 1 year using conventional and neural network methods. Conversely, Liu et al [ 24 ] developed a predictive model for PICC-related vein thrombosis in patients with cancer using the least absolute shrinkage and selection operator (LASSO) and random forest (RF) algorithms, which exhibited impressive performance. However, as far as we know, no specific research on ML for PICC-UE in patients with cancer has been conducted yet.
This study aimed to identify PICC-UE risk factors in patients with cancer, develop and validate ML-based predictive models for PICC-UE, and promote early intervention to reduce its incidence and enhance patients’ quality of life. This study represents the first attempt to identify high-risk PICC-UE patients and serves as a valuable reference for future research and medical decision-making. We followed the Guidelines for Developing and Reporting Machine Learning Predictive Models in Biomedical Research [ 25 ] to report our study.
Study Design and Participants
This study used data from Xiangya Hospital of Central South University to build a predictive model for PICC-UE. Prospective data were collected from various hospital systems from January 1, 2022, to December 31, 2022, including the infusion system, the in-hospital Hitech electronic case system, and the PICC catheter integrated case management system. We utilized all available data to identify independent risk factors. The entire data set was divided into a train set and a test set using a 7:3 ratio through the random number table method. The train set was used for model construction, while the test set was used for internal validation. We collected data from the Affiliated Hospital of Qinghai University and Hainan Provincial People’s Hospital to perform additional validation of the model between June 1, 2022, and December 31, 2022. The external validation data were sourced from different hospitals and were independent of the data used for model construction.
Inclusion criteria were as follows: (1) pathological diagnosis of oncology; (2) availability of PICC catheterization information; and (3) voluntary participation with informed consent. Exclusion criteria were as follows: (1) patients or caregivers unable to cooperate with the investigation; (2) patients who missed visits before catheter removal; (3) incomplete data collection; and (4) abnormal values that affect judgment.
Sample Size and Sampling
We used the sample size formula designed for cohort studies to calculate the minimum number of PICC-UE cases needed. Then, we determined the sample size required to prospectively enroll patients with cancer with PICC insertions for this study based on the PICC-UE incidence. We set α=.05 and β=.10 and obtained μα/2=1.96 and μβ=1.28.
Previous studies [ 4 , 5 ] have identified multiple risk factors for PICC-UE, and among these risk factors, thrombosis had the largest minimum sample size requirement for the case group. In the group without PICC-UE, the incidence of thrombosis was 8.9% (22/247; P 0 =.09), whereas in the group with PICC-UE, it was 27% (12/44; P 1 =.27). Hence, this study’s case group (UE cases) requires a minimum sample size of 164. The incidence of PICC-UE is reported as 9% (11/121) [ 6 ]. Based on this value, the initial sample size needed for a prospective study was 2448. After accounting for the possibility of missed visits and increasing the sample size by 20%, the required sample size is at least 2937.
Instruments
The follow-up data collection schedule and clinical data collection form for this study were established through a literature review [ 4 - 19 , 23 , 24 ], semistructured interviews, and research group discussions.
The study investigators enrolled eligible participants who provided informed consent into a cancer whole-course management system. One-to-one follow-up through WeChat (Tencent Holdings Ltd.) was established, with follow-ups scheduled in advance. Patients were reminded to contact the investigators immediately in case of any catheter-related abnormalities. Collected data included observations of catheter patency; signs of redness, swelling, and pain in the extremity at the insertion site; blood and fluid leakage at the puncture site; catheter prolapse and its length; and any other abnormalities. Additionally, PICC-UE occurrences were monitored, and their time and reasons were recorded. Follow-up visits were conducted on the day of placement, as well as on days 1, 7, 14, 21, and every 21 days thereafter.
A total of 33 relevant factors were collected for data analysis, categorized as follows: (1) general information (gender, age, tumor type, education, BMI [calculated using height and weight], alcohol history, mental status, cooperation, and physical mobility); (2) medical history (history of deep vein thrombosis, history of central venous placement, diabetes, hypertension, cardiovascular disease, hyperlipidemia, and surgical history); (3) laboratory indicators (D-dimer concentration and fibrinogen concentration); (4) therapy schedule (radiotherapy treatment, targeted therapy, surgical treatment, anticoagulation, chemotherapy treatment, and hyperosmolar drugs); and (5) placement information (limb on the side of placement, puncture method, puncture times, catheter gauge, catheter lumen, catheter material, presence of a valve, high-pressure–resistant catheter, and catheter indwelling time). All variables were collected through observation using patient IDs and case numbers as the indexes. Data were obtained from the hospital’s Safe Infusion System (SIS) database and the Hitech electronic case system. Detailed explanations of the corresponding variables can be found in Multimedia Appendix 1 .
Criteria for PICC-UE, based on previous studies [ 4 , 5 ], were as follows: (1) a patient who still requires a PICC catheter, but experiences early extubation due to severe complications; and (2) a patient who still requires a PICC catheter, but experiences accidental catheter dislodgment due to patient or operator factors. PICC-UE serves as the primary outcome of this study.
Risk Factors Identification and Model Development
We reviewed the prospective data collected and categorized continuous variables, such as age, into 6 groups: “0-11,” “12-18,” “19-35,” “36-59,” “60-75,” and “≥76.” The variables height and weight were used to calculate BMI. D-dimer concentration and fibrinogen concentration values were converted into high or low categories. Missing values in the vector data were removed.
We conducted a univariate analysis of the overall data to identify variables with 2-sided statistical significance ( P <.05). Following a literature review and expert consultations, we used the LASSO regression algorithm to include clinically significant variables. The selected variables underwent multifactorial analysis to identify independent risk factors for PICC-UE in patients with cancer.
The model was constructed using prospective data from Xiangya Hospital of Central South University. Data order was randomized using a shuffling algorithm for even distribution. The data were then split into a train set and a test set at a ratio of 7:3 using the random number table method. The overall data were used for independent risk factor screening, the train set for model construction, and the test set for internal model validation. The risk prediction models were constructed using the train set, incorporating prescreened independent risk factors. In this study, 3 ML algorithms, namely, logistic regression (LR), support vector machine (SVM), and RF, were selected to build risk prediction models for PICC-UE in patients with cancer.
We compared these models using the area under the receiver operating characteristic (ROC) curve (AUC) and the TOPSIS (Technique for Order Preference by Similarity to Ideal Solution) method [ 26 ]. AUC assesses the predictive power of the PICC-UE model, while the model’s superiority was evaluated based on the Composite Index (Ci) value in the TOPSIS method. The model with the highest AUC and Ci values was considered optimal for predicting PICC-UE and selected as the best model.
Validation and Model Performance Evaluation
Data from June 2022 to December 2022 from Qinghai University Hospital and Hainan Provincial People’s Hospital were used for external validation. The collected data were randomized using a shuffling algorithm for even distribution. The optimal model was assessed for discrimination, calibration, and clinical applicability.
Discrimination assesses the model’s ability to distinguish between high and low PICC-UE risk in the cancer population, which we evaluated using the AUC. Calibration indicates the degree of agreement between the predicted and actual results. The calibration of the model was assessed using the Hosmer-Lemeshow test with a calibration curve [ 27 ]. Clinical applicability, which gauges the diagnostic accuracy of the model in clinical use, was evaluated using decision curve analysis (DCA) [ 21 ]. Additionally, model performance was measured using sensitivity, specificity, positive predictive value, negative predictive value [ 24 ], and AUC.
Ethical Considerations
The study was approved by the Hospital Ethics Review Committee (approval number 202204210). We adhered to the principles of informed consent, data confidentiality, anonymity, and nonharmfulness. Written informed consent was collected, and any papers or publications based on the study data will not reveal personal information about the patients. For younger or unconscious patients who were unable to participate, data collection was facilitated by their caregivers.
Statistical Analysis
We excluded data with missing or unusual variables from the prospective data set. Continuous variables were compared using independent-sample unpaired (2-sided) t tests or one-way analysis of variance (ANOVA). Categorical variables were presented as numbers and proportions and compared using the chi-square test or Fisher exact test. We collected variables with bilateral P <.05 statistical significance and then included variables with potential clinical significance for the LASSO algorithm based on literature analysis and expert consultation. We identified independent risk factors for PICC-UE in patients with cancer through multifactorial analysis. After consulting with experts in ML algorithms and discussions within the research group, we chose 3 ML methods to construct the study’s model: RF, SVM, and LR.
All hypothesis tests with 2-sided P <.05 indicated statistical significance. The “na.omit” function was used to remove missing values from the vector data. LASSO primarily used the “glmnet” package with a 10-fold orthogonal method to define the penalty function. LR, RF, and SVM were mainly implemented using “caret,” “randomForest,” “pROC,” “varImpPlot,” and “e1071,” respectively. The ROC curves were plotted using the “pROC” packet, and the Hosmer-Lemeshow test using the “hoslem.test,” “data.table,” and “plyr” data packages was used for the TOPSIS integrated analysis. The DCA decision curves were constructed using the “rms” and “rmda” packets. All the analyses were performed using R Statistical Software, version 4.1.3 (R Foundation).
Participants Characteristics
A total of 3391 patients were included, with a sample loss rate of 7.34% (269/3660). This included 2374 in the train set and 1017 in the test set, with 284 PICC-UE cases. The study flow diagram is presented in Figure 1 . Baseline participant characteristics are presented in Table 1 . Importantly, there was no multicollinearity among the variables, as all variance inflation factor values were less than 5.0.

a The mean catheter indwelling time for all participants is 91.22 (SD 78.88) days, for the non–PICC-UE group is 91.26 (SD 80.15) days, and for the PICC-UE group is 90.79 (SD 78.95) days (unpaired 2-tailed t test =.009; P =.92).
b PICC-UE: unplanned extubation of the peripherally inserted central catheter.
c Chi-square test
d 2-tailed P <.05.
e Fisher exact test.
f MST: modified Seldinger technique.
Independent Risk Factor Determination
A total of 19 potential risk factors, including gender, age, and education level, were initially screened using univariate analysis. Following consultations with specialists in vascular surgery, pathology, and venous therapy, catheter lumen and central venous placement history were added. Thus, there were a total of 21 independent variables for the LASSO analysis.
In Figure 2 , each colored line represents a variable trend that decreases as the penalty factor λ changes, resulting in the model incorporating fewer variables. In Figure 3 , the dashed line on the left indicates the λ value associated with the maximum AUC and the number of features included in the model. On the right, the dashed line represents a reduction in the number of features in the model as the standard error increases by 1 to achieve the maximum AUC. The minimum error is reached at 1SE=0.013, resulting in the screening of 11 predictor variables.

The 11 predictors identified by LASSO were analyzed using conditional LR with a fixed α in of 0.05 and α out of 0.10, using the backward LR method. The results revealed the following independent risk factors for PICC-UE in patients with cancer (ranked by importance from high to low): impaired physical mobility (odds ratio [OR] 2.775, 95% CI 1.951-3.946), elevated D-dimer concentration (OR 2.376, 95% CI 1.778-3.176), diabetes (OR 1.754, 95% CI 1.134-2.712), surgical history (OR 1.734, 95% CI 1.313-2.290), more than 1 catheter puncture (OR 1.715, 95% CI 1.121-2.624), surgical treatment (OR 1.543, 95% CI 1.152-2.066), and targeted therapy (OR 1.441, 95% CI 1.104-1.881). Protective factors, ranked by importance from high to low, were valved catheter (OR 0.639, 95% CI 0.480-0.851), normal BMI (OR 0.449, 95% CI 0.342-0.590), and polyurethane catheter material (OR 0.305, 95% CI 0.228-0.408). Details are presented in Table 2 .
Prediction Model Construction
The train set and the test set were well balanced, with no statistically significant differences in composition ( P >.05 in all cases). Further details can be found in Multimedia Appendix 2 .
The logistic predictive model was constructed using the 10 independent risk factors identified in the previous phase. The final model included 9 variables with a χ 8 2 value of 320.374 and P <.001. SVM modeling was performed with 10-fold cross-validation and grid search methods, autonomously determining the optimal number of vector machines and related parameters using the tune.svm function. The polynomial kernel function demonstrated the highest prediction accuracy among the 4 kernel functions. The RF predictive model for patients with cancer was constructed with a final minimum of 196 trees.
Model Comparison and Validation
The SVM predictive model exhibited the best predictive efficacy for PICC-UE when considering AUC and Ci values together. A comparison of the ROC curves of the 3 models is presented in Figures 4 (train set) and 5 (test set). The 3 models were assessed using the TOPSIS integrated analysis in the train set and test set, as depicted in Tables 3 and 4 . The RF predictive model performed the best in the train set, and the overall performance of the models is as follows: RF model>SVM model>logistic model. However, it is worth noting that the Ci value for the SVM model was 0.82, while that for the RF model was 0.85, with only a 0.03 difference. In the test set, the TOPSIS integrated analysis revealed that the SVM predictive model had the best fit, and the models ranked in terms of overall performance as SVM model>RF model>logistic model. For a visual comparison (AUC, sensitivity, specificity, accuracy, positive predictive value, and negative predictive value) of the 3 models, please refer to Figures 6 (train set) and 7 (test set). These figures demonstrate that both the SVM model and the RF model outperform the logistic model in terms of predictive effects.

We assessed the performance of the best model through discrimination, calibration capability, and clinical applicability analysis. The AUC values evaluated the discrimination, and the SVM model demonstrated strong differentiation with an AUC of 0.718 for external validation ( Figure 8 ). The Hosmer-Lemeshow test for goodness of fit resulted in χ 8 2 =8.205, P =.06, which is greater than 0.05, indicating a well-fitting model for external validation. The calibration curve for the optimal model is presented in Figure 9 . The clinical applicability of this predictive model is demonstrated by the DCA curve in Figure 10 .

Principal Findings
Our prospective study is a pioneering contribution to the field, being the first to develop and validate a predictive model for PICC-UE in patients with cancer that can guide decision-making without requiring extensive laboratory testing. We adhered to the Guidelines for Developing and Reporting Machine Learning Predictive Models in Biomedical Research for model development. Our model demonstrates outstanding performance in predicting PICC-UE in patients with cancer, achieving an AUC of 0.904 in the train set and 0.875 in the test set. Importantly, we identified 10 highly correlated independent risk factors using univariate, LASSO, and multivariate analyses to build the model, with the 3 most significant risk factors being physical mobility ( P <.001), D-dimer concentration ( P <.001), and diabetes ( P =.01).
PICC-UE incidence varies, ranging from 7.5% to 22.0% in China [ 28 , 29 ] and from 2.5% to 40.7% in other countries [ 4 ]. Duwadi et al [ 30 ] noted a higher PICC-UE incidence in the ICU compared with other units, attributing it to the ICU environment and patient severity. Additionally, PICC-UE rates differed in studies from different regions [ 28 , 29 ]. In our study, the incidence of PICC-UE was 8.38% (284/3391), which is lower than in most previous studies [ 4 , 29 ]. This could be attributed to our hospital’s intravenous infusion therapy committee, improved standardized nurse train, rigorous quality control management, and numerous educational sessions on patient health management. These differences in incidence may also be related to variations in inclusion criteria, follow-up methods, duration, and the sample size in our study. Future prospective studies with larger, multicenter samples and extended follow-up may be necessary for further validation.
In terms of general information, medical history, and laboratory indicators, we discovered that BMI, physical mobility, diabetes, surgical history, and D-dimer concentration were linked to the occurrence of PICC-UE. In particular, patients who are overweight (BMI>24.0 kg/m 2 ) [ 31 ], those with reduced physical activity [ 32 ], and individuals with diabetes prone to hypercoagulation [ 32 ] were at a higher risk of catheter thrombosis. A recent surgical trauma can also stimulate the release of a significant amount of coagulation factors to aid wound healing [ 33 ], while a prolonged period of postoperative bed rest can slow blood flow, both of which increase the risk of coagulation [ 34 ]. An elevated D-dimer level is indicative of a hypercoagulation state, with a concentration exceeding 500 μg/L signifying a high risk of thrombosis [ 35 ]. Bertoglio et al [ 36 ] demonstrated that PICC catheter thrombosis is a significant risk factor for UE. Patients with a low BMI (BMI<18.5 kg/m 2 ) have compromised immunity and are prone to malnutrition, increasing their risk of catheter-related complications and the need for catheter removal [ 37 ]. Excessive physical activity increases catheter-vessel wall friction, raising the risk of bloodstream infection and early catheter dislodgment [ 38 ].
In terms of therapy schedule and placement information, we observed that targeted therapy, surgical treatment, puncture times, catheter material, and the presence of a valve were linked to the occurrence of PICC-UE. The use of targeted drugs [ 39 ] and multiple punctures [ 40 ] can lead to vascular endothelial damage, exposing subendothelial prothrombotic components, inducing platelet aggregation, contributing to catheter thrombosis, and elevating the risk of extubation [ 41 ]. Surgical treatment leading to PICC-UE aligns with the explanation of the recent surgical history mentioned earlier. Additionally, patients recovering from postoperative anesthesia are often unconscious and may inadvertently remove the catheter due to the foreign body sensation at the catheter placement site [ 42 ]. We identified a higher risk of PICC-UE associated with silicone catheters. This is attributed to the use of new high-pressure–resistant polyurethane catheters in our hospital, which incorporate a surface-active macromolecule with fluorine atom doping. This component inhibits platelet adhesion and protein procoagulation, ultimately lowering the incidence of PICC-related thrombosis [ 13 ]. Catheter valves effectively prevent blood regurgitation, reducing both catheter-related blockages and thrombosis [ 13 ].
In this study, we developed a comprehensive predictive model to assess the risk of high-risk PICC-UE in patients with cancer. The model’s performance was evaluated using the AUC as a measure of classification efficacy, and all models in our study achieved AUC values exceeding 0.7, demonstrating their strong ability to distinguish high-risk patients. After comparing the AUC and Ci values, the SVM model emerged as the optimal choice. Calibration and DCA curves confirmed the SVM model’s accuracy, stability, generalizability, and clinical applicability.
The PICC-UE predictive model for patients with cancer developed in this study using the ML algorithm offers insights for related research. In the predictor screening process, previous studies often relied on a single statistical method [ 43 ], while our approach combined univariate analysis, 10-fold cross-validation LASSO, and multivariate screening, enhancing precision and rigor. This approach resulted in the creation of a more concise and accurate predictive model through multiple rounds of variable filtering. The LASSO method effectively aggregates features, achieves dimensionality reduction, and serves as a feature screening tool, preventing issues related to covariance and overfitting [ 44 ].
This study used multiple ML algorithms to construct the predictive model, a more scientifically rigorous approach compared with using a single method alone [ 26 ]. ML algorithms are well-suited for managing high-dimensional variables and their intricate interactions, making full use of the available data [ 22 ]. The test set demonstrated superior predictive performance in forecasting PICC-UE based on the results from the train set, significantly enhancing prediction accuracy. This study compared 3 ML models and selected the best-performing one, significantly improving the model’s accuracy. We used AUC and TOPSIS methods for a comprehensive and rigorous screening of the optimal predictive model. The SVM algorithm in the optimal model robustly encompasses the data and reduces the model’s complexity through linear regression with insensitive loss functions in a high-dimensional feature space [ 45 ]. Importantly, external validation of the model using independent data demonstrated significant predictive superiority.
Our study has successfully developed a highly predictive model for the risk of PICC-UE in patients with cancer using the SVM algorithm. This model enables the development of personalized precautions for patients with cancer at a high risk of PICC-UE, such as the regular assessment of physical mobility and the provision of targeted physical activity guidance for patients with impaired physical mobility [ 32 ]. For patients with abnormal BMI, dynamic monitoring of BMI and weight adjustment through exercise and diet should be implemented [ 32 ]. Patients with diabetes require special attention [ 32 ], with routine blood tests on admission and regular monitoring of D-dimer concentrations [ 35 ] to take preventive measures against early catheter removal. For patients with a history of surgery and those undergoing surgical treatment or targeted therapy [ 33 , 34 , 39 ], close monitoring of the catheter exit site is essential. Patients should receive instructions for regular catheter maintenance and be advised to seek medical attention if they experience any discomfort. Our study concluded that patients with multiple punctures are at a higher risk of PICC-UE. It is recommended that the medical department standardizes the qualifications of PICC placement nurses and conducts regular training and assessments [ 40 ]. Furthermore, medical departments should exercise strict control over the choice of catheter materials and the presence of valves in catheters to minimize catheter-related complications and lower the incidence of PICC-UE [ 13 ].
Limitations and Challenges
This study has some limitations. First, it did not include individual genetic data, which can be a significant factor in PICC-UE. Future studies may benefit from incorporating genetic data to improve predictive accuracy. Second, external validation was limited by a small data set, which included data from only 2 hospitals. More extensive external validation is required to thoroughly validate the predictive model. Lastly, we did not consider how the risk factors and predictive model for PICC-UE may differ among various subpopulations of patients with cancer, including different age groups, genders, and cancer stages.
Despite the limitations, our study has identified 10 independent predictors, including BMI, mobility, diabetes, surgical history, and other factors, that are significantly associated with an increased risk of PICC-UE in patients with cancer. Furthermore, our SVM predictive model has been externally validated and demonstrates excellent generalization. The optimal SVM model achieved a high accuracy of 97.68% in the train set and 98.82% in the test set, indicating excellent model fitting. The LASSO algorithm used for risk factor screening effectively prevented overfitting. Our findings can raise awareness among clinicians and patients for the early prevention and reduction of PICC-UE in high-risk cancer populations. Further prospective multicenter studies are needed to validate risk factors and establish effective UE prophylaxis interventions. Our group is in discussions with a computer company to develop a plug-in for our hospital’s electronic system. This plug-in aims to automatically capture independent risk factors for PICC-UE from patient hospitalization information. Using the optimal prediction model from this study, patients’ risk of PICC-UE is categorized into 3 levels: red (high risk), yellow (medium risk), and green (low risk). Using the color-coded cues, health care providers can implement tailored interventions for high-risk patients while offering self-monitoring guidance and health education to medium- and low-risk patients.
Conclusions
In summary, the developed predictive model for assessing the risk of PICC-UE in patients with cancer has shown excellent discrimination, high predictive accuracy, and broad applicability across a range of risk factors. This model serves as a valuable tool for the early identification of high-risk patients and holds promise for clinical implementation.
Acknowledgments
We appreciate the assistance and support of all those in charge of the selected hospitals in the data collection process, as well as the nurses who participated in the data collection for their time. We also thank all patients who participated in this study. This work was supported by the Clinical Research Fund of the National Clinical Research Center for Geriatric Disorders (grant number 2021LNJJ09), the National Natural Science Foundation of China (grant number 72174210), the Hunan Natural Science Foundation (grant number 2022JJ70168), and the Changsha Natural Science Foundation (grant number kq2208367).
Data Availability
The data sets generated or analyzed during this study are not publicly available due to the terms of consent and permission to which the participants agreed but are available from the corresponding author upon reasonable request.
Authors' Contributions
JHZ designed the study, extracted and analyzed the data, and wrote the paper as the first author. SP contributed to the analysis of the results in a statistical aspect. JMH verified the analytical methods. RX and LXL investigated and supervised the findings of this work and helped in the language edit. JJH and NY assisted in the support of clinical knowledge and reviewed the paper. JAW and XH contributed to the data collection of the external validation and reviewed the paper. GYM was in charge of the overall direction of the study as the corresponding author. All authors gave final approval of the paper for submission. We did not use generative artificial intelligence in any portion of the manuscript writing.
Conflicts of Interest
None declared.
Definitions for the factors examined in the models.
Detailed comparison of general information between the train set and the test set.
- Johansson E, Hammarskjöld F, Lundberg D, Arnlind MH. Advantages and disadvantages of peripherally inserted central venous catheters (PICC) compared to other central venous lines: a systematic review of the literature. Acta Oncol 2013 Jun;52(5):886-892 [ CrossRef ] [ Medline ]
- Glauser F, Kivrak S, Righini M. [Peripherally inserted central catheters : indications, contraindications, complications]. Rev Med Suisse 2018 Dec 05;14(630):2211-2213 [ Medline ]
- Sakai T, Kohda K, Konuma Y, Hiraoka Y, Ichikawa Y, Ono K, et al. A role for peripherally inserted central venous catheters in the prevention of catheter-related blood stream infections in patients with hematological malignancies. Int J Hematol 2014 Dec 24;100(6):592-598 [ CrossRef ] [ Medline ]
- Silva PSLD, Reis ME, Aguiar VE, Fonseca MCM. Unplanned extubation in the neonatal ICU: a systematic review, critical appraisal, and evidence-based recommendations. Respir Care 2013 Jul 27;58(7):1237-1245 [ http://rc.rcjournal.com/lookup/pmidlookup?view=short&pmid=23271815 ] [ CrossRef ] [ Medline ]
- Gao W, Luan X, Sun Y, Zhang M, Li K, Li QH, et al. Experiences of patients with abnormal extubation of PICC tubes: a qualitative study. Int J Clin Exp Med 2015;8(10):19297-19303 [ Medline ]
- Chan RJ, Northfield S, Larsen E, Mihala G, Ullman A, Hancock P, et al. Central venous Access device SeCurement And Dressing Effectiveness for peripherally inserted central catheters in adult acute hospital patients (CASCADE): a pilot randomised controlled trial. Trials 2017 Oct 04;18(1):458 [ https://trialsjournal.biomedcentral.com/articles/10.1186/s13063-017-2207-x ] [ CrossRef ] [ Medline ]
- Kang JR, Long LH, Yan SW, Wei WW, Jun HZ, Chen W. Peripherally Inserted Central Catheter-Related Vein Thrombosis in Patients With Lung Cancer. Clin Appl Thromb Hemost 2017 Mar 09;23(2):181-186 [ https://journals.sagepub.com/doi/10.1177/1076029615595880?url_ver=Z39.88-2003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub 0pubmed ] [ CrossRef ] [ Medline ]
- Fallouh N, McGuirk HM, Flanders SA, Chopra V. Peripherally Inserted Central Catheter-associated Deep Vein Thrombosis: A Narrative Review. Am J Med 2015 Jul;128(7):722-738 [ CrossRef ] [ Medline ]
- Pan L, Zhao Q, Yang X. Risk factors for venous thrombosis associated with peripherally inserted central venous catheters. Int J Clin Exp Med 2014;7(12):5814-5819 [ https://europepmc.org/abstract/MED/25664112 ] [ Medline ]
- Clarke DJ, Hawkins R, Sadler E, Harding G, McKevitt C, Godfrey M, et al. Introducing structured caregiver training in stroke care: findings from the TRACS process evaluation study. BMJ Open 2014 Apr 15;4(4):e004473 [ https://bmjopen.bmj.com/lookup/pmidlookup?view=long&pmid=24736035 ] [ CrossRef ] [ Medline ]
- Brewer C. Reducing upper extremity deep vein thrombosis when inserting PICCs. Br J Nurs 2012 Jul 25;21(Sup14):S12-S17 [ CrossRef ]
- Xiao W, Lin Q, Chen S, Li S, Lin C, Su S, et al. Catheterization of PICC through a superficial femoral vein for patients with superior vena cava syndrome using ECG positioning and ultrasound-guided technologies. J Vasc Access 2023 May 27;24(3):397-401 [ CrossRef ] [ Medline ]
- Westergaard B, Classen V, Walther-Larsen S. Peripherally inserted central catheters in infants and children - indications, techniques, complications and clinical recommendations. Acta Anaesthesiol Scand 2013 Mar 17;57(3):278-287 [ CrossRef ] [ Medline ]
- Bertoglio S, Faccini B, Lalli L, Cafiero F, Bruzzi P. Peripherally inserted central catheters (PICCs) in cancer patients under chemotherapy: A prospective study on the incidence of complications and overall failures. J Surg Oncol 2016 May 29;113(6):708-714 [ CrossRef ] [ Medline ]
- Liem TK, Yanit KE, Moseley SE, Landry GJ, Deloughery TG, Rumwell CA, et al. Peripherally inserted central catheter usage patterns and associated symptomatic upper extremity venous thrombosis. J Vasc Surg 2012 Mar;55(3):761-767 [ https://linkinghub.elsevier.com/retrieve/pii/S0741-5214(11)02321-4 ] [ CrossRef ] [ Medline ]
- Xie J, Xu L, Xu X, Huang Y. Complications of peripherally inserted central catheters in advanced cancer patients undergoing combined radiotherapy and chemotherapy. J Clin Nurs 2017 Dec 20;26(23-24):4726-4733 [ CrossRef ] [ Medline ]
- Johansson E, Hammarskjöld F, Lundberg D, Arnlind MH. Advantages and disadvantages of peripherally inserted central venous catheters (PICC) compared to other central venous lines: A systematic review of the literature. Acta Oncologica 2013 Mar 11;52(5):886-892 [ CrossRef ]
- Lv S, Liu Y, Wei G, Shi X, Chen S, Zhang X. The anticoagulants rivaroxaban and low molecular weight heparin prevent PICC-related upper extremity venous thrombosis in cancer patients. Medicine (Baltimore) 2019 Nov;98(47):e17894 [ https://europepmc.org/abstract/MED/31764785 ] [ CrossRef ] [ Medline ]
- Shen BH, Mahoney L, Molino J, Mermel LA. Risk factors for early PICC removal: A retrospective study of adult inpatients at an academic medical center. PLoS One 2022 Jul 8;17(7):e0264245 [ https://dx.plos.org/10.1371/journal.pone.0264245 ] [ CrossRef ] [ Medline ]
- Lee JY, Park H, Chung E. Use of electronic critical care flow sheet data to predict unplanned extubation in ICUs. Int J Med Inform 2018 Sep;117:6-12 [ CrossRef ] [ Medline ]
- Hur S, Min JY, Yoo J, Kim K, Chung CR, Dykes PC, et al. Development and Validation of Unplanned Extubation Prediction Models Using Intensive Care Unit Data: Retrospective, Comparative, Machine Learning Study. J Med Internet Res 2021 Aug 11;23(8):e23508 [ https://www.jmir.org/2021/8/e23508/ ] [ CrossRef ] [ Medline ]
- Kermany DS, Goldbaum M, Cai W, Valentim CC, Liang H, Baxter SL, et al. Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning. Cell 2018 Feb 22;172(5):1122-1131.e9 [ https://linkinghub.elsevier.com/retrieve/pii/S0092-8674(18)30154-5 ] [ CrossRef ] [ Medline ]
- Badheka AV, Hodge D, Ramesh S, Bloxham J, Espinoza E, Allareddy V, et al. Catheter related thrombosis in hospitalized infants: A neural network approach to predict risk factors. Thromb Res 2021 Apr;200:34-40 [ CrossRef ] [ Medline ]
- Liu S, Zhang F, Xie L, Wang Y, Xiang Q, Yue Z, et al. Machine learning approaches for risk assessment of peripherally inserted Central catheter-related vein thrombosis in hospitalized patients with cancer. Int J Med Inform 2019 Sep;129:175-183 [ CrossRef ] [ Medline ]
- Luo W, Phung D, Tran T, Gupta S, Rana S, Karmakar C, et al. Guidelines for Developing and Reporting Machine Learning Predictive Models in Biomedical Research: A Multidisciplinary View. J Med Internet Res 2016 Dec 16;18(12):e323 [ https://www.jmir.org/2016/12/e323/ ] [ CrossRef ] [ Medline ]
- Gupta H. Assessing organizations performance on the basis of GHRM practices using BWM and Fuzzy TOPSIS. J Environ Manage 2018 Nov 15;226:201-216 [ CrossRef ] [ Medline ]
- Peng S, Wei T, Li X, Yuan Z, Lin Q. A model to assess the risk of peripherally inserted central venous catheter-related thrombosis in patients with breast cancer: a retrospective cohort study. Support Care Cancer 2022 Feb 25;30(2):1127-1137 [ CrossRef ] [ Medline ]
- Collins GS, Reitsma JB, Altman DG, Moons KGM. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ 2015 Jan 07;350(jan07 4):g7594 [ https://core.ac.uk/reader/81583807?utm_source=linkout ] [ CrossRef ] [ Medline ]
- Kang J, Chen W, Sun W, Ge R, Li H, Ma E, et al. Peripherally inserted central catheter-related complications in cancer patients: a prospective study of over 50,000 catheter days. J Vasc Access 2017 Mar 21;18(2):153-157 [ CrossRef ] [ Medline ]
- Duwadi S, Zhao Q, Budal BS. Peripherally inserted central catheters in critically ill patients - complications and its prevention: a review. Int J Nurs Sci 2019 Jan 10;6(1):99-105 [ https://linkinghub.elsevier.com/retrieve/pii/S2352-0132(17)30129-1 ] [ CrossRef ] [ Medline ]
- Al-Asadi O, Almusarhed M, Eldeeb H. Predictive risk factors of venous thromboembolism (VTE) associated with peripherally inserted central catheters (PICC) in ambulant solid cancer patients: retrospective single Centre cohort study. Thromb J 2019 Jan 25;17(1):2 [ https://thrombosisjournal.biomedcentral.com/articles/10.1186/s12959-019-0191-y ] [ CrossRef ] [ Medline ]
- Yi X, Chen J, Li J, Feng L, Wang Y, Zhu J, et al. Risk factors associated with PICC-related upper extremity venous thrombosis in cancer patients. J Clin Nurs 2014 Mar 28;23(5-6):837-843 [ CrossRef ] [ Medline ]
- Leung A, Heal C, Perera M, Pretorius C. A systematic review of patient-related risk factors for catheter-related thrombosis. J Thromb Thrombolysis 2015 Oct 14;40(3):363-373 [ CrossRef ] [ Medline ]
- Shimizu Y, Kamada H, Sakane M, Aikawa S, Mutsuzaki H, Tanaka K, et al. A novel exercise device for venous thromboembolism prophylaxis improves venous flow in bed versus ankle movement exercises in healthy volunteers. J Orthop Surg (Hong Kong) 2017 Nov 14;25(3):2309499017739477 [ https://journals.sagepub.com/doi/abs/10.1177/2309499017739477?url_ver=Z39.88-2003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub 0pubmed ] [ CrossRef ] [ Medline ]
- Chen P, Wan G, Zhu B. Incidence and risk factors of symptomatic thrombosis related to peripherally inserted central catheter in patients with lung cancer. J Adv Nurs 2021 Mar 29;77(3):1284-1292 [ CrossRef ] [ Medline ]
- Bertoglio S, Cafiero F, Meszaros P, Varaldo E, Blondeaux E, Molinelli C, et al. PICC-PORT totally implantable vascular access device in breast cancer patients undergoing chemotherapy. J Vasc Access 2020 Jul 01;21(4):460-466 [ CrossRef ] [ Medline ]
- Bullock AF, Greenley SL, McKenzie GAG, Paton LW, Johnson MJ. Relationship between markers of malnutrition and clinical outcomes in older adults with cancer: systematic review, narrative synthesis and meta-analysis. Eur J Clin Nutr 2020 Nov 04;74(11):1519-1535 [ https://europepmc.org/abstract/MED/32366995 ] [ CrossRef ] [ Medline ]
- D'Arrigo S, Annetta MG, Musarò A, Distefano M, Pittiruti M. Secondary malposition of a PICC-port due to heavy physical exercise: A case report. J Vasc Access 2023 May 16;24(3):507-510 [ CrossRef ] [ Medline ]
- Mahajan A, Brunson A, White R, Wun T. The Epidemiology of Cancer-Associated Venous Thromboembolism: An Update. Semin Thromb Hemost 2019 Jun 30;45(4):321-325 [ CrossRef ] [ Medline ]
- Kang J, Sun W, Li H, Ma E, Wang K, Chen W. Peripherally inserted central catheter-related vein thrombosis in breast cancer patients. J Vasc Access 2016 Apr 09;17(1):67-71 [ CrossRef ] [ Medline ]
- Hou J, Zhang J, Ma M, Gong Z, Xu B, Shi Z. Thrombotic risk factors in patients with superior vena cava syndrome undergoing chemotherapy via femoral inserted central catheter. Thromb Res 2019 Dec;184:38-43 [ CrossRef ] [ Medline ]
- Wei B, Feng Y, Chen W, Ren D, Xiao D, Chen B. Risk factors for emergence agitation in adults after general anesthesia: A systematic review and meta-analysis. Acta Anaesthesiol Scand 2021 Jul 07;65(6):719-729 [ CrossRef ] [ Medline ]
- Hao N, Xie X, Zhou Z, Li J, Kang L, Wu H, et al. Nomogram predicted risk of peripherally inserted central catheter related thrombosis. Sci Rep 2017 Jul 24;7(1):6344 [ https://doi.org/10.1038/s41598-017-06609-x ] [ CrossRef ] [ Medline ]
- Meng Z, Wang M, Zhao Z, Zhou Y, Wu Y, Guo S, et al. Development and Validation of a Predictive Model for Severe COVID-19: A Case-Control Study in China. Front Med (Lausanne) 2021 May 25;8:663145 [ https://europepmc.org/abstract/MED/34113636 ] [ CrossRef ] [ Medline ]
- Feng Y, Yan X. Support Vector Machine Based Lane-Changing Behavior Recognition and Lateral Trajectory Prediction. Comput Intell Neurosci 2022 May 10;2022:3632333-3632339 [ https://doi.org/10.1155/2022/3632333 ] [ CrossRef ] [ Medline ]
Abbreviations
Edited by T de Azevedo Cardoso; submitted 15.05.23; peer-reviewed by L Guo, K Gupta; comments to author 10.08.23; revised version received 24.09.23; accepted 30.10.23; published 16.11.23
©Jinghui Zhang, Guiyuan Ma, Sha Peng, Jianmei Hou, Ran Xu, Lingxia Luo, Jiaji Hu, Nian Yao, Jiaan Wang, Xin Huang. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 16.11.2023.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
Empirical Methods in Natural Language Processing (EMNLP) 2023
Apple is sponsoring the Empirical Methods in Natural Language Processing (EMNLP) conference, which will take place in person from December 6 to 10 in Singapore. EMNLP is a leading conference focused on natural language processing. Below is the schedule of Apple-sponsored workshops and events at EMNLP 2023.
Wednesday, December 6
- Widening Natural Language Processing
- 8:30 AM - 5:00 PM LT
- Computational Models of Reference, Anaphora and Coreference
- 2:00-2:10 PM
- Accepted Paper: MARRS: Multi-Modal Reference Resolution Service
Thursday, December 7
- BlackboxNLP
- Computational Approaches to Linguistic Code-Switching
- Accepted Paper: Towards Real-World Streaming Speech Translation for Code-Switched Speech
Accepted Papers
DELPHI: Data for Evaluating LLMs' Performance in Handling Controversial Issues
David Q. Sun, Artem Abzaliev, Hadas Kotek, Zidi Xiu, Christopher Klein
EELBERT: Tiny Models through Dynamic Embeddings
Gabrielle Cohn, Rishika Agarwal, Deepanshu Gupta, Siddharth Patwardhan
STEER: Semantic Turn Extension-Expansion Recognition for Voice Assistants
Leon Liyang Zhang, Jiarui Lu, Joel Ruben Antony Moniz, Aditya Kulkarni, Dhivya Piraviperumal, Tien Dung Tran, Nicholas Tzou, Hong Yu
Workshop Accepted Papers
MARRS: Multi-Modal Reference Resolution Service
Anand Dhoot, Andy Tseng, Ankit Samal, Dhivya Piraviperumal, Halim Cagri Ates, Hong Yu, Jiarui Lu, Joel Moniz, Melis Ozyildirim, Roman Nguyen, Shirley (Rong) Zou, Shruti Bhargava, Siddardha Maddula, Site Li, Thy Tran, Yuan Zhang
Towards Real-World Streaming Speech Translation for Code-Switched Speech
Belen Alastruey, Matthias Sperber, Christian Gollan, Dominic Telaar, Tim Ng, Aashish Agargwal
Acknowledgements
Reza Shirani, Sid Patwardhan and Yizhe Zhangare reviewers for EMNLP 2023.
Related readings and updates.
Apple sponsored the Empirical Methods in Natural Language Processing (EMNLP) conference, which was held virtually from November 16 to 20. EMNLP is a leading conference focused on natural language processing.

Discover opportunities in Machine Learning.
Our research in machine learning breaks new ground every day.
Work with us
Help | Advanced Search
Computer Science > Machine Learning
Title: curriculum learning and imitation learning for model-free control on financial time-series.
Abstract: Curriculum learning and imitation learning have been leveraged extensively in the robotics domain. However, minimal research has been done on leveraging these ideas on control tasks over highly stochastic time-series data. Here, we theoretically and empirically explore these approaches in a representative control task over complex time-series data. We implement the fundamental ideas of curriculum learning via data augmentation, while imitation learning is implemented via policy distillation from an oracle. Our findings reveal that curriculum learning should be considered a novel direction in improving control-task performance over complex time-series. Our ample random-seed out-sample empirics and ablation studies are highly encouraging for curriculum learning for time-series control. These findings are especially encouraging as we tune all overlapping hyperparameters on the baseline -- giving an advantage to the baseline. On the other hand, we find that imitation learning should be used with caution.
Submission history
Access paper:.
- Download PDF
- Other Formats

References & Citations
- Google Scholar
- Semantic Scholar
BibTeX formatted citation

Bibliographic and Citation Tools
Code, data and media associated with this article, recommenders and search tools.
- Institution
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

IMAGES
VIDEO
COMMENTS
The Journal of Machine Learning Research (JMLR), established in 2000, provides an international forum for the electronic and paper publication of high-quality scholarly articles in all areas of machine learning. All published papers are freely available online. JMLR has a commitment to rigorous yet rapid reviewing.
Paper Code LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching envision-research/luciddreamer • • 19 Nov 2023 The recent advancements in text-to-3D generation mark a significant milestone in generative models, unlocking new possibilities for creating imaginative 3D assets across various real-world scenarios.
In this paper, we present a comprehensive view on these machine learning algorithms that can be applied to enhance the intelligence and the capabilities of an application.
machine learning Latest Research Papers | ScienceGate machine learning Recently Published Documents TOTAL DOCUMENTS 102881 (FIVE YEARS 84893) H-INDEX 193 (FIVE YEARS 91) Latest Documents Most Cited Documents Contributed Authors Related Sources Related Keywords
Machine learning is the ability of a machine to improve its performance based on previous results. Machine learning methods enable computers to learn without being explicitly programmed...
In this paper, we present a comprehensive view on these machine learning algorithms that can be applied to enhance the intelligence and the capabilities of an application.
The Journal of Machine Learning Research (JMLR) provides an international forum for the electronic and paper publication of high-quality scholarly articles in all areas of machine learning.JMLR seeks previously unpublished papers that contain:new algorithms with empirical, theoretical, psychological, or biological justification; experimental and/or theoretical studies yielding new insight into ...
Machine learning and deep learning Christian Janiesch, Patrick Zschech, Kai Heinrich Today, intelligent systems that offer artificial intelligence capabilities often rely on machine learning.
Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2023, December 10th, 2023, New Orleans, United States, 14 pages ... Comments: Machine Learning for Health (ML4H) 2023 in Proceedings of Machine Learning Research 225 Subjects: Machine Learning (cs.LG) arXiv:2311.08874 [pdf, other]
Machine Learning is an international forum for research on computational approaches to learning. The journal publishes articles reporting substantive results on a wide range of learning methods applied to a variety of learning problems. The journal features papers that describe research on problems and methods, applications research, and issues ...
Machine Learning with Applications (MLWA) is a peer reviewed, open access journal focused on research related to machine learning.The journal encompasses all aspects of research and development in ML, including but not limited to data mining, computer vision, natural language processing (NLP), intelligent systems, neural networks, AI-based software engineering, bioinformatics and their ...
Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications. ... Machine Learning and Knowledge ...
Machine learning is a vast area of research that is primarily concerned with finding patterns in empirical data. We restrict our attention to a limited number of core concepts that are most relevant for quantum learning algorithms. We discuss the importance of the data-driven approach, compared with the formal modeling of traditional artificial ...
We recognize that our strengths in machine learning, large-scale computing, and human-computer interaction can help accelerate the progress of research in this space. By collaborating with world-class institutions and researchers and engaging in both early-stage research and late-stage work, we hope to help people live healthier, longer, and ...
Top 10 Machine Learning Research Papers of 2021 Arti August 23, 2021 4 mins read Machine learning research papers showcasing the transformation of the technology In 2021, machine learning and deep learning had many amazing advances and important research papers may lead to breakthroughs in technology that get used by billions of people.
Abstract: In machine learning, a computer first learns to perform a task by studying a training set of examples. The computer then performs the same task with data it hasn't encountered before. This article presents a brief overview of machine-learning technologies, with a concrete case study from code analysis.
Research on Machine Learning and Its Algorithms and Development Journal of Physics Conference Series License CC BY 3.0 Authors: Wei Jin Abstract This article analyzes the basic classification of...
To help you catch up on essential reading, we've summarized 10 important machine learning research papers from 2020. These papers will give you a broad overview of AI research advancements this year. Of course, there are many more breakthrough papers worth reading as well.
A research paper on machine learning refers to the proper technical documentation that explains any fundamental theory, topic survey, or proof of concept using a mathematical model or practical implementation. It demands hours of study and effort to lay out all the information ideally that addresses the topic in a presentable manner.
2. Hidden Technical Debt In Machine Learning Systems, Sculley et al, Dec 2015. This is a popular paper that attempts to document the realities of machine learning in the real world from a costs perspective. The paper states "Machine learning offers a fantastically powerful toolkit for building useful complex prediction systems quickly.
Global medium-range weather forecasting is critical to decision-making across many social and economic domains. Traditional numerical weather prediction uses increased compute resources to improve forecast accuracy, but does not directly use historical weather data to improve the underlying model. Here, we introduce "GraphCast," a machine ...
Top 20 Recent Research Papers on Machine Learning and Deep Learning Machine learning and Deep Learning research advances are transforming our technology. Here are the 20 most important (most-cited) scientific papers that have been published since 2014, starting with "Dropout: a simple way to prevent neural networks from overfitting".
By Mohini Saxena In recent years, machine learning has become super popular and grown very quickly. This happened because technology got better, and there's a lot more data available. Because of this, we've seen lots of new and amazing things happen in different areas. Machine learning research is what makes all these cool things possible.
A machine-learning tool can easily spot when chemistry papers are written using the chatbot ChatGPT, according to a study published on 6 November in Cell Reports Physical Science 1. The ...
"Meaning is a result of relationships between things, and self-attention is a general way of learning relationships," said Ashish Vaswani, a former senior staff research scientist at Google Brain who led work on the seminal 2017 paper. "Machine translation was a good vehicle to validate self-attention because you needed short- and long ...
Machine learning (ML) has witnessed tremendous success in different areas of solid earth geoscience. Its benefits to geoscience research have been undoubtedly proven via numerous applications spanning geophysical processing and imaging to geological interpretation and investigations. Research focus in this field has switched from simple ...
Background: Cancer indeed represents a significant public health challenge, and unplanned extubation of peripherally inserted central catheter (PICC-UE) is a critical concern in patient safety. Identifying independent risk factors and implementing high-quality assessment tools for early detection in high-risk populations can play a crucial role in reducing the incidence of PICC-UE among ...
This paper was accepted at the EMNLP Workshop on Computational Approaches to Linguistic Code-Switching (CALCS). Code-switching (CS), i.e. mixing different languages in a single sentence, is a common phenomenon in communication and can be challenging in many Natural Language Processing (NLP) settings.
Curriculum learning and imitation learning have been leveraged extensively in the robotics domain. However, minimal research has been done on leveraging these ideas on control tasks over highly stochastic time-series data. Here, we theoretically and empirically explore these approaches in a representative control task over complex time-series data. We implement the fundamental ideas of ...