IEEE Account

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Subscribe to the PwC Newsletter

Join the community, computer vision, semantic segmentation.

computer world research paper

Tumor Segmentation

computer world research paper

Panoptic Segmentation

computer world research paper

3D Semantic Segmentation

computer world research paper

Weakly-Supervised Semantic Segmentation

Classification.

computer world research paper

Text Classification

computer world research paper

Graph Classification

computer world research paper

Audio Classification

computer world research paper

Medical Image Classification

Representation learning.

computer world research paper

Disentanglement

Graph representation learning, sentence embeddings.

computer world research paper

Network Embedding

Object detection.

computer world research paper

3D Object Detection

computer world research paper

Real-Time Object Detection

computer world research paper

RGB Salient Object Detection

computer world research paper

Few-Shot Object Detection

Image classification.

computer world research paper

Out of Distribution (OOD) Detection

computer world research paper

Few-Shot Image Classification

computer world research paper

Fine-Grained Image Classification

computer world research paper

Semi-Supervised Image Classification

Reinforcement learning (rl), off-policy evaluation, multi-objective reinforcement learning, 3d point cloud reinforcement learning, 2d object detection.

computer world research paper

Edge Detection

computer world research paper

Open Vocabulary Object Detection

computer world research paper

Semi-Supervised Object Detection

Deep hashing, table retrieval, domain adaptation.

computer world research paper

Unsupervised Domain Adaptation

computer world research paper

Domain Generalization

computer world research paper

Source-Free Domain Adaptation

Universal domain adaptation, image generation.

computer world research paper

Image-to-Image Translation

computer world research paper

Image Inpainting

computer world research paper

Text-to-Image Generation

computer world research paper

Conditional Image Generation

Data augmentation.

computer world research paper

Image Augmentation

computer world research paper

Text Augmentation

Autonomous vehicles.

computer world research paper

Autonomous Driving

computer world research paper

Self-Driving Cars

computer world research paper

Simultaneous Localization and Mapping

computer world research paper

Autonomous Navigation

computer world research paper

Image Denoising

computer world research paper

Color Image Denoising

computer world research paper

Sar Image Despeckling

Grayscale image denoising, meta-learning.

computer world research paper

Few-Shot Learning

computer world research paper

Sample Probing

computer world research paper

Depth Estimation

computer world research paper

Style Transfer

computer world research paper

3D Reconstruction

computer world research paper

Neural Rendering

computer world research paper

3D Face Reconstruction

Contrastive learning.

computer world research paper

Super-Resolution

computer world research paper

Image Super-Resolution

computer world research paper

Video Super-Resolution

computer world research paper

Multi-Frame Super-Resolution

computer world research paper

Reference-based Super-Resolution

Pose estimation.

computer world research paper

3D Human Pose Estimation

computer world research paper

Keypoint Detection

computer world research paper

3D Pose Estimation

computer world research paper

6D Pose Estimation

computer world research paper

Text-based Image Editing

Text-guided-image-editing.

computer world research paper

Zero-Shot Text-to-Image Generation

Concept alignment, self-supervised learning.

computer world research paper

Point Cloud Pre-training

Unsupervised video clustering, 2d semantic segmentation, image segmentation, text style transfer.

computer world research paper

Scene Parsing

computer world research paper

Reflection Removal

Sentiment analysis.

computer world research paper

Aspect-Based Sentiment Analysis (ABSA)

computer world research paper

Multimodal Sentiment Analysis

computer world research paper

Aspect Sentiment Triplet Extraction

computer world research paper

Twitter Sentiment Analysis

Visual question answering (vqa).

computer world research paper

Visual Question Answering

computer world research paper

Machine Reading Comprehension

computer world research paper

Chart Question Answering

computer world research paper

Embodied Question Answering

Anomaly detection.

computer world research paper

Unsupervised Anomaly Detection

computer world research paper

One-Class Classification

Supervised anomaly detection, anomaly detection in surveillance videos.

computer world research paper

Temporal Action Localization

computer world research paper

Video Understanding

computer world research paper

Video Object Segmentation

computer world research paper

Action Classification

Video generation, activity recognition.

computer world research paper

Action Recognition

computer world research paper

Human Activity Recognition

Egocentric activity recognition.

computer world research paper

Group Activity Recognition

computer world research paper

One-Shot Learning

computer world research paper

Few-Shot Semantic Segmentation

Cross-domain few-shot.

computer world research paper

Unsupervised Few-Shot Learning

3d object super-resolution, medical image segmentation.

computer world research paper

Lesion Segmentation

computer world research paper

Brain Tumor Segmentation

computer world research paper

Cell Segmentation

computer world research paper

Brain Segmentation

Monocular depth estimation.

computer world research paper

Stereo Depth Estimation

Depth and camera motion.

computer world research paper

3D Depth Estimation

Exposure fairness, optical character recognition (ocr).

computer world research paper

Active Learning

computer world research paper

Handwriting Recognition

Handwritten digit recognition, irregular text recognition, facial recognition and modelling.

computer world research paper

Face Recognition

computer world research paper

Face Swapping

computer world research paper

Face Detection

computer world research paper

Face Verification

computer world research paper

Facial Expression Recognition (FER)

Instance segmentation.

computer world research paper

Referring Expression Segmentation

computer world research paper

3D Instance Segmentation

computer world research paper

Real-time Instance Segmentation

computer world research paper

Unsupervised Object Segmentation

Object tracking.

computer world research paper

Multi-Object Tracking

computer world research paper

Visual Object Tracking

computer world research paper

Multiple Object Tracking

computer world research paper

Cell Tracking

Zero-shot learning.

computer world research paper

Generalized Zero-Shot Learning

computer world research paper

Compositional Zero-Shot Learning

Multi-label zero-shot learning.

computer world research paper

Action Recognition In Videos

Self-supervised action recognition.

computer world research paper

3D Action Recognition

Few shot action recognition, quantization, data free quantization, unet quantization, continual learning.

computer world research paper

Class Incremental Learning

Continual named entity recognition, unsupervised class-incremental learning.

computer world research paper

Scene Understanding

computer world research paper

Scene Text Recognition

computer world research paper

Scene Graph Generation

computer world research paper

Scene Recognition

Adversarial attack.

computer world research paper

Backdoor Attack

computer world research paper

Adversarial Text

Adversarial attack detection, real-world adversarial attack, active object detection, image retrieval.

computer world research paper

Sketch-Based Image Retrieval

computer world research paper

Content-Based Image Retrieval

computer world research paper

Composed Image Retrieval (CoIR)

computer world research paper

Medical Image Retrieval

Dimensionality reduction.

computer world research paper

Supervised dimensionality reduction

Online nonnegative cp decomposition.

computer world research paper

Image Stylization

Font style transfer, style generalization, face transfer, optical flow estimation.

computer world research paper

Video Stabilization

computer world research paper

Monocular 3D Object Detection

computer world research paper

3D Object Detection From Stereo Images

computer world research paper

Multiview Detection

Robust 3d object detection, emotion recognition.

computer world research paper

Speech Emotion Recognition

computer world research paper

Emotion Recognition in Conversation

computer world research paper

Multimodal Emotion Recognition

Emotion-cause pair extraction, image reconstruction.

computer world research paper

MRI Reconstruction

Action localization.

computer world research paper

Action Segmentation

Spatio-temporal action localization, person re-identification.

computer world research paper

Unsupervised Person Re-Identification

Video-based person re-identification, generalizable person re-identification, cloth-changing person re-identification, image captioning.

computer world research paper

3D dense captioning

Controllable image captioning, aesthetic image captioning.

computer world research paper

Relational Captioning

Action detection.

computer world research paper

Skeleton Based Action Recognition

computer world research paper

Online Action Detection

Audio-visual active speaker detection, visual relationship detection, lighting estimation.

computer world research paper

3D Room Layouts From A Single RGB Panorama

Road scene understanding, metric learning.

computer world research paper

Image Restoration

computer world research paper

Demosaicking

Spectral reconstruction, underwater image restoration.

computer world research paper

JPEG Artifact Correction

Object recognition.

computer world research paper

3D Object Recognition

Continuous object recognition.

computer world research paper

Depiction Invariant Object Recognition

computer world research paper

Monocular 3D Human Pose Estimation

Pose prediction.

computer world research paper

3D Multi-Person Pose Estimation

3d human pose and shape estimation, image enhancement.

computer world research paper

Low-Light Image Enhancement

Image relighting, de-aliasing, continuous control.

computer world research paper

Steering Control

Drone controller, 3d face modelling.

computer world research paper

Semi-Supervised Video Object Segmentation

computer world research paper

Unsupervised Video Object Segmentation

computer world research paper

Referring Video Object Segmentation

computer world research paper

Video Salient Object Detection

Multi-label classification.

computer world research paper

Extreme Multi-Label Classification

Medical code prediction, hierarchical multi-label classification, trajectory prediction.

computer world research paper

Trajectory Forecasting

Human motion prediction.

computer world research paper

Multivariate Time Series Imputation

Object localization.

computer world research paper

Weakly-Supervised Object Localization

Image-based localization, unsupervised object localization, monocular 3d object localization, out-of-distribution detection, image quality assessment, no-reference image quality assessment, blind image quality assessment.

computer world research paper

Aesthetics Quality Assessment

Stereoscopic image quality assessment.

computer world research paper

Blind Image Deblurring

Single-image blind deblurring, video semantic segmentation.

computer world research paper

Camera shot segmentation

Cloud removal.

computer world research paper

Facial Inpainting

computer world research paper

Fine-Grained Image Inpainting

Novel view synthesis.

computer world research paper

Gournd video synthesis from satellite image

Saliency detection.

computer world research paper

Saliency Prediction

computer world research paper

Co-Salient Object Detection

Video saliency detection, change detection.

computer world research paper

Semi-supervised Change Detection

Image compression.

computer world research paper

Feature Compression

Jpeg compression artifact reduction.

computer world research paper

Lossy-Compression Artifact Reduction

Color image compression artifact reduction, explainable artificial intelligence, explainable models, explanation fidelity evaluation, fad curve analysis, salient object detection, saliency ranking, ensemble learning, visual reasoning.

computer world research paper

Visual Commonsense Reasoning

Instruction following, visual instruction following, image registration.

computer world research paper

2D Classification

computer world research paper

Neural Network Compression

computer world research paper

Music Source Separation

Cell detection.

computer world research paper

Plant Phenotyping

Open-set classification, visual tracking.

computer world research paper

Point Tracking

Real-time visual tracking, rgb-t tracking.

computer world research paper

RF-based Visual Tracking

Image manipulation detection.

computer world research paper

Generalized Zero Shot skeletal action recognition

Zero shot skeletal action recognition, motion estimation, activity prediction, motion prediction, cyber attack detection, sequential skip prediction, gesture recognition.

computer world research paper

Hand Gesture Recognition

computer world research paper

Hand-Gesture Recognition

computer world research paper

RF-based Gesture Recognition

3d point cloud classification.

computer world research paper

3D Object Classification

computer world research paper

Few-Shot 3D Point Cloud Classification

Zero-shot transfer 3d point cloud classification, prompt engineering.

computer world research paper

Visual Prompting

Whole slide images.

computer world research paper

Robust 3D Semantic Segmentation

computer world research paper

Real-Time 3D Semantic Segmentation

computer world research paper

Unsupervised 3D Semantic Segmentation

Furniture segmentation, point cloud registration.

computer world research paper

Image to Point Cloud Registration

Video captioning.

computer world research paper

Dense Video Captioning

Boundary captioning, visual text correction, audio-visual video captioning, 3d point cloud interpolation, text detection, medical diagnosis.

computer world research paper

Alzheimer's Disease Detection

computer world research paper

Retinal OCT Disease Classification

Blood cell count, thoracic disease classification.

computer world research paper

Hand Pose Estimation

computer world research paper

Hand Segmentation

Gesture-to-gesture translation, visual grounding.

computer world research paper

Person-centric Visual Grounding

computer world research paper

Phrase Extraction and Grounding (PEG)

Visual odometry.

computer world research paper

Face Anti-Spoofing

Monocular visual odometry, rain removal.

computer world research paper

Single Image Deraining

Image clustering.

computer world research paper

Online Clustering

computer world research paper

Face Clustering

Multi-view subspace clustering, multi-modal subspace clustering, colorization.

computer world research paper

Line Art Colorization

computer world research paper

Point-interactive Image Colorization

computer world research paper

Color Mismatch Correction

Robot navigation.

computer world research paper

PointGoal Navigation

Social navigation.

computer world research paper

Sequential Place Learning

computer world research paper

Image Dehazing

computer world research paper

Single Image Dehazing

computer world research paper

Unsupervised Image-To-Image Translation

computer world research paper

Synthetic-to-Real Translation

computer world research paper

Multimodal Unsupervised Image-To-Image Translation

computer world research paper

Cross-View Image-to-Image Translation

computer world research paper

Fundus to Angiography Generation

Video question answering.

computer world research paper

Zero-Shot Video Question Answer

Few-shot video question answering, image manipulation, visual localization.

computer world research paper

Image Editing

Rolling shutter correction, shadow removal, joint deblur and frame interpolation, multimodal fashion image editing, multimodel-guided image editing, stereo matching.

computer world research paper

Crowd Counting

computer world research paper

Visual Crowd Analysis

Group detection in crowds, human-object interaction detection.

computer world research paper

Affordance Recognition

Conformal prediction, visual place recognition.

computer world research paper

Indoor Localization

3d place recognition, image matching.

computer world research paper

Semantic correspondence

Patch matching, set matching.

computer world research paper

Matching Disparate Images

Point cloud classification, jet tagging, few-shot point cloud classification, deepfake detection.

computer world research paper

Synthetic Speech Detection

Human detection of deepfakes, multimodal forgery detection, image deblurring, low-light image deblurring and enhancement, object reconstruction.

computer world research paper

3D Object Reconstruction

Document text classification, learning with noisy labels, multi-label classification of biomedical texts, political salient issue orientation detection, hyperspectral.

computer world research paper

Hyperspectral Image Classification

Hyperspectral unmixing, hyperspectral image segmentation, classification of hyperspectral images.

computer world research paper

Weakly Supervised Action Localization

Weakly-supervised temporal action localization.

computer world research paper

Temporal Action Proposal Generation

Activity recognition in videos, 2d human pose estimation, action anticipation.

computer world research paper

3D Face Animation

Semi-supervised human pose estimation, scene classification.

computer world research paper

Referring Expression

Point cloud generation, point cloud completion, compressive sensing, video quality assessment, video alignment, temporal sentence grounding, long-video activity recognition, keyword spotting.

computer world research paper

Small-Footprint Keyword Spotting

Visual keyword spotting, scene text detection.

computer world research paper

Curved Text Detection

Multi-oriented scene text detection, boundary detection.

computer world research paper

Junction Detection

Reconstruction, 3d human reconstruction.

computer world research paper

Single-View 3D Reconstruction

4d reconstruction, single-image-based hdr reconstruction, image matting.

computer world research paper

Semantic Image Matting

Camera calibration, superpixels, emotion classification.

computer world research paper

Video Retrieval

Video-text retrieval, video grounding, video-adverb retrieval, replay grounding, composed video retrieval (covr), point cloud segmentation.

computer world research paper

Point cloud reconstruction

computer world research paper

3D Semantic Scene Completion

computer world research paper

3D Semantic Scene Completion from a single RGB image

Garment reconstruction, sensor fusion.

computer world research paper

Few-Shot Transfer Learning for Saliency Prediction

computer world research paper

Aerial Video Saliency Prediction

Remote sensing.

computer world research paper

Remote Sensing Image Classification

Change detection for remote sensing images, building change detection for remote sensing images.

computer world research paper

Segmentation Of Remote Sensing Imagery

computer world research paper

The Semantic Segmentation Of Remote Sensing Imagery

Video summarization.

computer world research paper

Unsupervised Video Summarization

Supervised video summarization, document layout analysis.

computer world research paper

Document AI

Document understanding, human detection.

computer world research paper

Face Generation

computer world research paper

Talking Head Generation

Talking face generation.

computer world research paper

Face Age Editing

Facial expression generation, kinship face generation, cross-modal retrieval, image-text matching.

computer world research paper

Zero-shot Composed Person Retrieval

Cross-modal retrieval on rsitmd, video instance segmentation.

computer world research paper

Privacy Preserving Deep Learning

Membership inference attack, motion synthesis.

computer world research paper

Motion Style Transfer

computer world research paper

Generalized Few-Shot Semantic Segmentation

Depth completion.

computer world research paper

Video Editing

Video temporal consistency, face reconstruction, motion forecasting.

computer world research paper

Multi-Person Pose forecasting

computer world research paper

Multiple Object Forecasting

Virtual try-on, carla map leaderboard, dead-reckoning prediction, 3d anomaly detection, video anomaly detection, object discovery, 3d classification, scene flow estimation.

computer world research paper

Self-supervised Scene Flow Estimation

computer world research paper

Generalized Referring Expression Segmentation

Gaze estimation.

computer world research paper

Human Parsing

computer world research paper

Multi-Human Parsing

Texture synthesis, weakly supervised segmentation.

computer world research paper

3D Multi-Person Pose Estimation (absolute)

computer world research paper

3D Multi-Person Pose Estimation (root-relative)

computer world research paper

3D Multi-Person Mesh Recovery

Facial landmark detection.

computer world research paper

Unsupervised Facial Landmark Detection

computer world research paper

3D Facial Landmark Localization

Pose tracking.

computer world research paper

3D Human Pose Tracking

Activity detection, inverse rendering, gait recognition.

computer world research paper

Multiview Gait Recognition

Gait recognition in the wild, interest point detection, homography estimation, multi-view learning, incomplete multi-view clustering, scene segmentation.

computer world research paper

Thermal Image Segmentation

Sign language recognition.

computer world research paper

3D Character Animation From A Single Photo

Interactive segmentation, disease prediction, disease trajectory forecasting.

computer world research paper

Dichotomous Image Segmentation

Temporal localization.

computer world research paper

Language-Based Temporal Localization

Temporal defect localization, scene generation, template matching, event-based vision.

computer world research paper

Event-based Optical Flow

computer world research paper

Event-Based Video Reconstruction

Event-based motion estimation, multi-label image classification.

computer world research paper

Multi-label Image Recognition with Partial Labels

computer world research paper

3D Hand Pose Estimation

Object counting, intelligent surveillance.

computer world research paper

Vehicle Re-Identification

Relation network, visual dialog.

computer world research paper

Image Recognition

Fine-grained image recognition, license plate recognition, motion segmentation, camera localization.

computer world research paper

Camera Relocalization

Disparity estimation.

computer world research paper

LIDAR Semantic Segmentation

3d object tracking.

computer world research paper

3D Single Object Tracking

Text to video retrieval, partially relevant video retrieval, text spotting.

computer world research paper

Person Search

Decision making under uncertainty.

computer world research paper

Uncertainty Visualization

Knowledge distillation.

computer world research paper

Data-free Knowledge Distillation

Self-knowledge distillation, mixed reality, few-shot class-incremental learning, class-incremental semantic segmentation, non-exemplar-based class incremental learning, text-to-video generation, text-to-video editing, subject-driven video generation, shadow detection.

computer world research paper

Shadow Detection And Removal

computer world research paper

Unconstrained Lip-synchronization

computer world research paper

Cross-corpus

Micro-expression recognition, micro-expression spotting.

computer world research paper

3D Facial Expression Recognition

computer world research paper

Smile Recognition

Moment retrieval.

computer world research paper

Video Inpainting

computer world research paper

Future prediction

Overlapped 10-1, overlapped 15-1, overlapped 15-5, disjoint 10-1, disjoint 15-1, image categorization, fine-grained visual categorization, deep attention, video enhancement.

computer world research paper

Face Image Quality Assessment

Lightweight face recognition.

computer world research paper

Age-Invariant Face Recognition

Synthetic face recognition, face quality assessement.

computer world research paper

Stereo Image Super-Resolution

Burst image super-resolution, satellite image super-resolution, multispectral image super-resolution, physics-informed machine learning, soil moisture estimation, line detection, color constancy.

computer world research paper

Few-Shot Camera-Adaptive Color Constancy

Image cropping, stereo matching hand.

computer world research paper

Visual Recognition

computer world research paper

Fine-Grained Visual Recognition

Zero-shot action recognition.

computer world research paper

3D Multi-Object Tracking

Real-time multi-object tracking, multi-animal tracking with identification, grounded multiple object tracking, human mesh recovery, sign language translation.

computer world research paper

Tone Mapping

Video reconstruction.

computer world research paper

Zero Shot Segmentation

Surface normals estimation.

computer world research paper

Natural Language Transduction

Transparent object detection, transparent objects, video restoration.

computer world research paper

Analog Video Restoration

3d absolute human pose estimation.

computer world research paper

Text-to-Face Generation

Image forensics, novel class discovery.

computer world research paper

HDR Reconstruction

Multi-exposure image fusion, cross-domain few-shot learning, probabilistic deep learning, unsupervised few-shot image classification, generalized few-shot classification, breast cancer histology image classification.

computer world research paper

Breast Cancer Detection

Breast cancer histology image classification (20% labels), abnormal event detection in video.

computer world research paper

Semi-supervised Anomaly Detection

Infrared and visible image fusion.

computer world research paper

Steganalysis

Texture classification, vision-language navigation.

computer world research paper

Spoof Detection

Face presentation attack detection, detecting image manipulation, cross-domain iris presentation attack detection, finger dorsal image spoof detection, image animation.

computer world research paper

Pedestrian Attribute Recognition

computer world research paper

One-shot visual object segmentation

computer world research paper

Sketch Recognition

computer world research paper

Face Sketch Synthesis

Drawing pictures.

computer world research paper

Photo-To-Caricature Translation

Iris recognition, pupil dilation.

computer world research paper

Unbiased Scene Graph Generation

computer world research paper

Panoptic Scene Graph Generation

Action understanding, automatic post-editing.

computer world research paper

Document Image Classification

computer world research paper

Geometric Matching

Highlight detection, multi-view 3d reconstruction, object categorization, severity prediction, intubation support prediction, meme classification, hateful meme classification, blind face restoration.

computer world research paper

Cloud Detection

computer world research paper

Dense Captioning

Face reenactment.

computer world research paper

Human action generation

computer world research paper

Action Generation

Image outpainting.

computer world research paper

Person Retrieval

Surgical phase recognition, online surgical phase recognition, offline surgical phase recognition, human dynamics.

computer world research paper

3D Human Dynamics

computer world research paper

Semantic SLAM

computer world research paper

Object SLAM

Action quality assessment, image stitching.

computer world research paper

Text based Person Retrieval

Text-to-image, story visualization, complex scene breaking and synthesis, object segmentation.

computer world research paper

Camouflaged Object Segmentation

Landslide segmentation, text-line extraction, situation recognition, grounded situation recognition, image deconvolution.

computer world research paper

Intrinsic Image Decomposition

Line segment detection, multi-target domain adaptation, image fusion, pansharpening, image to video generation.

computer world research paper

Unconditional Video Generation

Table recognition, weakly-supervised instance segmentation, image smoothing.

computer world research paper

Camouflaged Object Segmentation with a Single Task-generic Prompt

Image morphing, image steganography, point clouds, rotated mnist, diffusion personalization.

computer world research paper

Diffusion Personalization Tuning Free

Efficient diffusion personalization, image shadow removal, layout design, motion detection, sports analytics, viewpoint estimation.

computer world research paper

Fake Image Detection

computer world research paper

GAN image forensics

computer world research paper

Fake Image Attribution

Drone navigation, drone-view target localization, lane detection.

computer world research paper

3D Lane Detection

License plate detection.

computer world research paper

Multi-Object Tracking and Segmentation

computer world research paper

Occlusion Handling

computer world research paper

Video Panoptic Segmentation

Person identification, zero-shot transfer image classification.

computer world research paper

Value prediction

Body mass index (bmi) prediction, contour detection.

computer world research paper

Face Image Quality

Photo retouching.

computer world research paper

Grasp Generation

computer world research paper

3D Canonical Hand Pose Estimation

Shape representation of 3d point clouds, 3d point cloud reconstruction, dense pixel correspondence estimation, human part segmentation.

computer world research paper

Image to 3D

Symmetry detection, video style transfer, motion retargeting, referring image matting.

computer world research paper

Referring Image Matting (Expression-based)

computer world research paper

Referring Image Matting (Keyword-based)

computer world research paper

Referring Image Matting (RefMatte-RW100)

Referring image matting (prompt-based).

computer world research paper

hand-object pose

Robot pose estimation, 3d point cloud linear classification, crop yield prediction, image quality estimation.

computer world research paper

Material Recognition

Road damage detection.

computer world research paper

Document Shadow Removal

Space-time video super-resolution, traffic sign detection, video matting.

computer world research paper

Human Interaction Recognition

One-shot 3d action recognition, mutual gaze, affordance detection.

computer world research paper

Hand Detection

Image similarity search.

computer world research paper

Multiview Learning

Person recognition.

computer world research paper

Precipitation Forecasting

Inverse tone mapping, image/document clustering, self-organized clustering, 3d shape modeling.

computer world research paper

Action Analysis

Facial editing.

computer world research paper

Holdout Set

Image forgery detection, image instance retrieval, amodal instance segmentation, material classification.

computer world research paper

Open Vocabulary Attribute Detection

Referring expression generation, instance search.

computer world research paper

Audio Fingerprint

computer world research paper

Open-World Semi-Supervised Learning

Semi-supervised image classification (cold start).

computer world research paper

3D Object Reconstruction From A Single Image

Art analysis, event segmentation, generic event boundary detection, food recognition.

computer world research paper

Gaze Prediction

Image-variation, point cloud super resolution, semi-supervised instance segmentation, skills assessment.

computer world research paper

Sensor Modeling

Video segmentation, camera shot boundary detection, open-vocabulary video segmentation, open-world video segmentation, lung nodule classification, lung nodule 3d classification, lung nodule detection, lung nodule 3d detection, video prediction, earth surface forecasting, predict future video frames, 3d scene reconstruction, handwriting generation, image retouching, motion magnification, scene change detection.

computer world research paper

Sketch-to-Image Translation

Skills evaluation, highlight removal, 3d shape reconstruction from a single 2d image.

computer world research paper

Shape from Texture

Handwriting verification, bangla spelling error correction, birds eye view object detection.

computer world research paper

Zero-Shot Composed Image Retrieval (ZS-CIR)

computer world research paper

JPEG Artifact Removal

Multispectral object detection, pose retrieval, rgb-d reconstruction, scanpath prediction, seeing beyond the visible, deception detection, deception detection in videos, constrained lip-synchronization, face dubbing.

computer world research paper

Video Visual Relation Detection

Human-object relationship detection, 3d shape reconstruction, 3d shape representation.

computer world research paper

3D Dense Shape Correspondence

Audio-visual synchronization, image manipulation localization, kinship verification, medical image enhancement, multiple people tracking.

computer world research paper

Network Interpretation

Semi-supervised domain generalization, single-object discovery, training-free 3d point cloud classification, unsupervised semantic segmentation.

computer world research paper

Unsupervised Semantic Segmentation with Language-image Pre-training

Binary classification, cancer-no cancer per image classification, llm-generated text detection, cancer-no cancer per breast classification, suspicous (birads 4,5)-no suspicous (birads 1,2,3) per image classification, cancer-no cancer per view classification.

computer world research paper

Sequential Place Recognition

Autonomous flight (dense forest), multimodal machine translation.

computer world research paper

Face to Face Translation

Multimodal lexical translation, multiple object tracking with transformer.

computer world research paper

Multiple Object Track and Segmentation

10-shot image generation, bokeh effect rendering, drivable area detection, face anonymization, font recognition, horizon line estimation, image imputation.

computer world research paper

Instance Shadow Detection

Long video retrieval (background removed), occlusion estimation, open vocabulary panoptic segmentation, physiological computing.

computer world research paper

Lake Ice Monitoring

Short-term object interaction anticipation, spatio-temporal video grounding, unsupervised 3d point cloud linear evaluation, video forensics, wireframe parsing, single-image-generation, unsupervised anomaly detection with specified settings -- 30% anomaly, root cause ranking, anomaly detection at 30% anomaly, anomaly detection at various anomaly percentages.

computer world research paper

Unsupervised Contextual Anomaly Detection

Facial expression recognition, cross-domain facial expression recognition, zero-shot facial expression recognition, landmark tracking, muscle tendon junction identification, 2d semantic segmentation task 3 (25 classes), document enhancement, 3d scene editing, action assessment, ad-hoc video search, defocus blur detection, event data classification, generalized referring expression comprehension, image deblocking, medical image denoising.

computer world research paper

Motion Disentanglement

Personality trait recognition, synthetic image detection, traffic accident detection, accident anticipation, unsupervised landmark detection, visual speech recognition, lip to speech synthesis, gaze redirection, 2d pose estimation, category-agnostic pose estimation, overlapping pose estimation, weakly supervised action segmentation (transcript), weakly supervised action segmentation (action set)), calving front delineation in synthetic aperture radar imagery, calving front delineation in synthetic aperture radar imagery with fixed training amount.

computer world research paper

Handwritten Line Segmentation

Handwritten word segmentation, handwritten text recognition, handwritten document recognition, unsupervised text recognition.

computer world research paper

General Action Video Anomaly Detection

Physical video anomaly detection, monocular cross-view road scene parsing(road), monocular cross-view road scene parsing(vehicle).

computer world research paper

Transparent Object Depth Estimation

3d open-vocabulary instance segmentation.

computer world research paper

4D Panoptic Segmentation

Animated gif generation, historical color image dating, stochastic human motion prediction, image retargeting, image and video forgery detection, infrared image super-resolution, motion captioning, personalized segmentation, persuasion strategies, scene-aware dialogue, spatial relation recognition, spatial token mixer, steganographics, story continuation.

computer world research paper

Unsupervised Anomaly Detection with Specified Settings -- 0.1% anomaly

Unsupervised anomaly detection with specified settings -- 1% anomaly, unsupervised anomaly detection with specified settings -- 10% anomaly, unsupervised anomaly detection with specified settings -- 20% anomaly, vehicle speed estimation, visual social relationship recognition, zero-shot text-to-video generation, continual anomaly detection, continual semantic segmentation, overlapped 5-3, overlapped 25-25, evolving domain generalization, source-free domain generalization, micro-expression generation, micro-expression generation (megc2021), unsupervised panoptic segmentation, unsupervised zero-shot panoptic segmentation, 3d rotation estimation, 3d semantic occupancy prediction, camera auto-calibration, data ablation, defocus estimation, derendering.

computer world research paper

Occluded Face Detection

Fingertip detection, gait identification, human-object interaction concept discovery, image comprehension, speaker-specific lip to speech synthesis, multi-person pose estimation, neural stylization.

computer world research paper

Part-aware Panoptic Segmentation

computer world research paper

Population Mapping

Pornography detection, raw reconstruction, semi-supervised video classification, spectrum cartography, synthetic image attribution, training-free 3d part segmentation, unsupervised image decomposition, video propagation, visual analogies, explanatory visual question answering, weakly supervised 3d point cloud segmentation, weakly-supervised panoptic segmentation, drone-based object tracking, text-guided-generation, brain visual reconstruction, brain visual reconstruction from fmri, fashion understanding, semi-supervised fashion compatibility.

computer world research paper

intensity image denoising

Lifetime image denoising, observation completion, active observation completion, boundary grounding.

computer world research paper

Video Narrative Grounding

3d inpainting, 4d spatio temporal semantic segmentation.

computer world research paper

Age Estimation

computer world research paper

Few-shot Age Estimation

Age and gender estimation, brdf estimation, camouflage segmentation, clothing attribute recognition, depth image estimation, detecting shadows, dynamic texture recognition.

computer world research paper

Disguised Face Verification

Few shot open set object detection, generalized zero-shot learning - unseen, hd semantic map learning, human-object interaction anticipation, image deep networks, keypoint detection and image matching, manufacturing quality control, materials imaging, multi-person pose estimation and tracking.

computer world research paper

Multi-modal image segmentation

Multi-object discovery, neural radiance caching.

computer world research paper

Parking Space Occupancy

computer world research paper

Partial Video Copy Detection

computer world research paper

Multimodal Patch Matching

Perpetual view generation, prediction of occupancy grid maps, procedure learning, prompt-driven zero-shot domain adaptation, repetitive action counting, svbrdf estimation, single-shot hdr reconstruction, on-the-fly sketch based image retrieval, thermal image denoising, trademark retrieval, unsupervised instance segmentation, unsupervised zero-shot instance segmentation, vehicle key-point and orientation estimation.

computer world research paper

Video-Adverb Retrieval (Unseen Compositions)

Video-to-image affordance grounding.

computer world research paper

Visual Sentiment Prediction

Human-scene contact detection, localization in video forgery, 3d canonicalization.

computer world research paper

Cube Engraving Classification

3d scene graph alignment, 3d surface generation.

computer world research paper

Visibility Estimation from Point Cloud

Amodal layout estimation, blink estimation, camera absolute pose regression, change data generation.

computer world research paper

Image-Guided Composition

Constrained diffeomorphic image registration, continuous affect estimation, deep feature inversion, document image skew estimation, earthquake prediction, fashion compatibility learning.

computer world research paper

Displaced People Recognition

Finger vein recognition, flooded building segmentation.

computer world research paper

Future Hand Prediction

Gaze target estimation, house generation, human fmri response prediction, hurricane forecasting, ifc entity classification, image declipping, image similarity detection.

computer world research paper

One-Shot Face Stylization

Image text removal, image-to-gps verification.

computer world research paper

Image-based Automatic Meter Reading

Dial meter reading, indoor scene reconstruction, jpeg decompression.

computer world research paper

Kiss Detection

Laminar-turbulent flow localisation.

computer world research paper

Landmark Recognition

Brain landmark detection, corpus video moment retrieval, mllm evaluation: aesthetics, medical image deblurring, mental workload estimation, meter reading, micro-gesture recognition, mistake detection, motion expressions guided video segmentation, natural image orientation angle detection, multi-object colocalization, multilingual text-to-image generation, video emotion detection, nwp post-processing, occluded 3d object symmetry detection, open set video captioning, open vocabulary semantic segmentation, zero-guidance segmentation, pso-convnets dynamics 1, pso-convnets dynamics 2, partial point cloud matching.

computer world research paper

Partially View-aligned Multi-view Learning

computer world research paper

Pedestrian Detection

computer world research paper

Thermal Infrared Pedestrian Detection

Personality trait recognition by face, physical attribute prediction, point cloud semantic completion, point cloud classification dataset, point- of-no-return (pnr) temporal localization, pose contrastive learning, potrait generation, prostate zones segmentation, pulmorary vessel segmentation, pulmonary artery–vein classification, reference expression generation, safety perception recognition, interspecies facial keypoint transfer, specular reflection mitigation, specular segmentation, state change object detection, surface normals estimation from point clouds, transform a video into a comics, transparency separation, typeface completion.

computer world research paper

Unbalanced Segmentation

computer world research paper

Unsupervised Long Term Person Re-Identification

Video correspondence flow, video frame interpolation.

computer world research paper

eXtreme-Video-Frame-Interpolation

Video individual counting.

computer world research paper

Key-Frame-based Video Super-Resolution (K = 15)

Yield mapping in apple orchards, lidar absolute pose regression, opd: single-view 3d openable part detection, self-supervised scene text recognition, video narration captioning, spectral estimation, spectral estimation from a single rgb image, 3d prostate segmentation, aggregate xview3 metric, atomic action recognition, composite action recognition, calving front delineation from synthetic aperture radar imagery, computer vision transduction, crosslingual text-to-image generation, damaged building detection, zero-shot dense video captioning, document to image conversion, frame duplication detection, geometrical view, hyperview challenge.

computer world research paper

Image Operation Chain Detection

Kinematic based workflow recognition, logo recognition.

computer world research paper

MLLM Aesthetic Evaluation

Motion detection in non-stationary scenes, open-set video tagging, segmentation based workflow recognition, small object detection.

computer world research paper

Rice Grain Disease Detection

Sperm morphology classification, video & kinematic base workflow recognition, video based workflow recognition, video, kinematic & segmentation base workflow recognition, animal pose estimation.

Book cover

World Conference on Information Systems and Technologies

WorldCIST 2021: Trends and Applications in Information Systems and Technologies pp 13–22 Cite as

Five Hundred Most-Cited Papers in the Computer Sciences: Trends, Relationships and Common Factors

  • Phoey Lee Teh   ORCID: orcid.org/0000-0002-7787-1299 19 &
  • Peter Heard   ORCID: orcid.org/0000-0002-5135-7822 20  
  • Conference paper
  • First Online: 29 March 2021

1698 Accesses

1 Citations

Part of the Advances in Intelligent Systems and Computing book series (AISC,volume 1366)

This study reveals common factors among highly cited papers in the computer sciences. The 500 most cited papers in the computer sciences published between January 2013 and December 2017 were downloaded from the Web of Science (WoS). Data on the number of citations, number of authors, article length and subject sub-discipline were extracted and analyzed in order to identify trends, relationships and common features. Correlations between common factors were analyzed. The 500 papers were cited a total of 10,926 times: the average number of citations per paper was 21.82 citations. A correlation was found between author credibility (defined in terms of the QS University Ranking of the first named author’s affiliation) and the number of citations. Authors from universities ranked 350 or higher were more cited than those from lower ranked universities. Relationships were also found between journal ranking and both the number of authors and the article length. Higher ranked journals tend to have a greater number of authors, but were of shorter length. The article length was also found to be correlated with the number of authors and the QS Subject Ranking of the first author’s affiliation. The proportion of articles in higher ranked journals (journal quartile), the length of articles and the number of citations per page were all found to correlate to the sub-discipline area (Information Systems; Software Engineering; Artificial Intelligence; Interdisciplinary Applications; and Theory and Methods).

  • Data search
  • Knowledge discovery
  • Common factors

This is a preview of subscription content, log in via an institution .

Buying options

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Garfield, E.: Citation analysis as a tool in journal evaluation. Serial Librarian 178 (4060), 471–479 (1972)

Google Scholar  

Hirsch, J.E.: hα: an index to quantify an individual’s scientific leadership. Scientometrics 118 , 673–686 (2019)

Article   Google Scholar  

Cormode, G., Ma, Q., Muthukrishnan, S., Thompson, B.: Socializing the h-index. J. Informetrics 7 (3), 718–721 (2013)

Ayaz, S., Masood, N., Islam, M.A.: Predicting scientific impact based on h-index. Scientometrics 114 (3), 993–1010 (2018)

Noruzi, A.: Impact factor, h-index, i10-index and i20-index of webology. Webology 13 (1), 1–4 (2016)

De Visscher, A.: What does the g-index really measure? J. Am. Soc. Inf. Sci. Technol. 62 (11), 2290–2293 (2011)

QS World University Rankings – Methodology|Top Universities. https://www.topuniversities.com/qs-world-university-rankings/methodology . Accessed 19 June 2019

World University Rankings 2019: methodology | Times Higher Education (THE). https://www.timeshighereducation.com/world-university-rankings/methodology-world-university-rankings-2019 . Accessed 19 June 2019

Thelwall, M.: Dimensions: a competitor to Scopus and the web of science? J. Informetrics 12 (2), 430–435 (2018)

Dorta-González, P., Dorta-González, M.I., Suárez-Vega, R.: An approach to the author citation potential: measures of scientific performance which are invariant across scientific fields. Scientometrics 102 (2), 1467–1496 (2014)

Martin-Martin, A., Orduna-Malea, E., Harzing, A., Lopez-Cozar, E.D.: Can we use Google Scholar to identify highly-cited documents? J. Informetrics 11 (1), 152–163 (2017)

Chang, C.L., McAleer, M., Oxley, L.: Coercive journal self citations, impact factor, journal influence and article influence, mathematics and computers in simulation. Int. Assoc. Math. Comput. Simul. (IMACS) 93 , 190–197 (2013)

Plomp, R.: The highly cited papers of professors as and indicator of a research group’s scientific performance. Scientometrics 29 (3), 377–393 (1994)

Rodríguez-Navarro, A.: A simple index for the high-citation tail of citation distribution to quantify research performance in countries and institutions. PLoS ONE 6 (5), e20510 (2011)

Bitetti, M.S.D., Ferreras, J.A.: Publish (in English) or perish: the effect on citation rate of using languages other than English in scientific publications. Ambio 46 (1), 121–127 (2017)

Akre, O., Barone-Adesi, F., Pattersson, A., Pearce, N., Merletti, F., Richiardi, L.: Differences in citation rates by country of origin for papers published in top-ranked medical journals: do they reflect inequalities in access to publication? J. Epidemiol. Community Health 65 (2), 119–123 (2011)

Hamrick, T.A., Fricker, R.D., Brown, G.G.: Assessing what distinguishes highly cited from less-cited papers published in interfaces. Interfaces 40 (6), 454–464 (2010)

Coupé, T.: Peer review versus citations - an analysis of best paper prizes. Res. Policy 42 (1), 295–301 (2013)

Tahamtan, I., Safipour Afshar, A., Ahamdzadeh, K.: Factors affecting number of citations: a comprehensive review of the literature. Scientometrics 107 (3), 1195–1225 (2016). https://doi.org/10.1007/s11192-016-1889-2

Fox, C.W., Paine, C.E.T., Sauterey, B.: Citations increase with manuscript length, author number, and references cited in ecology journals. Ecol. Evol. 6 (21), 7717–7726 (2016). https://doi.org/10.1002/ece3.2505

Aksnes, D.W., Rip, A.: Researchers’ perceptions of citations’. Res. Policy 38 (6), 895–905 (2009)

Gazni, A., Didegah, F.: Investigating different types of research collaboration and citation impact: a case study of Harvard University’s publications. Scientometrics 87 (2), 251–265 (2011)

Oakleaf, M.: Writing information literacy assessment plans: a guide to best practice. Commun. Inf. Literacy 3 (2), 80–90 (2009)

Petersen, C.G., Aase, G.R., Heiser, D.R.: Journal ranking analyses of operations management research. Int. J. Oper. Prod. Manag. 31 (4), 405–422 (2011)

Baker, S.: Authorship: are the days of the lone research ranger limited? Times Higher Education. https://www.timeshighereducation.com/news/authorship-are-days-lone-research-ranger-numbered . Accessed 03 July 2019

Al-Hidabi, M.D., The, P.L.: Multiple publications: the main reason for the retraction of papers in computer science. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) Advances in Information and Communication Networks. FICC 2018. Advances in Intelligent Systems and Computing, vol. 886, pp. 551–526. Springer, Cham (2019)

Download references

Author information

Authors and affiliations.

Department of Computing and Information Systems, School of Science and Technology, Sunway University, 47500, Sunway City, Malaysia

Phoey Lee Teh

Provost Office, Sunway University, 47500, Sunway City, Malaysia

Peter Heard

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Phoey Lee Teh .

Editor information

Editors and affiliations.

ISEG, University of Lisbon, Lisbon, Portugal

Álvaro Rocha

College of Engineering, The Ohio State University, Columbus, OH, USA

Hojjat Adeli

Institute of Data Science and Digital Technologies, Vilnius University, Vilnius, Lithuania

Gintautas Dzemyda

DCT, Universidade Portucalense, Porto, Portugal

Fernando Moreira

Department of Information Sciences, University of Sheffield, Lisbon, Portugal

Ana Maria Ramalho Correia

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Cite this paper.

Teh, P.L., Heard, P. (2021). Five Hundred Most-Cited Papers in the Computer Sciences: Trends, Relationships and Common Factors. In: Rocha, Á., Adeli, H., Dzemyda, G., Moreira, F., Ramalho Correia, A.M. (eds) Trends and Applications in Information Systems and Technologies . WorldCIST 2021. Advances in Intelligent Systems and Computing, vol 1366. Springer, Cham. https://doi.org/10.1007/978-3-030-72651-5_2

Download citation

DOI : https://doi.org/10.1007/978-3-030-72651-5_2

Published : 29 March 2021

Publisher Name : Springer, Cham

Print ISBN : 978-3-030-72650-8

Online ISBN : 978-3-030-72651-5

eBook Packages : Intelligent Technologies and Robotics Intelligent Technologies and Robotics (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research
  • IEEE CS Standards
  • Career Center
  • Subscribe to Newsletter
  • IEEE Standards

computer world research paper

  • For Industry Professionals
  • For Students
  • Launch a New Career
  • Membership FAQ
  • Membership FAQs
  • Membership Grades
  • Special Circumstances
  • Discounts & Payments
  • Distinguished Contributor Recognition
  • Grant Programs
  • Find a Local Chapter
  • Find a Distinguished Visitor
  • Find a Speaker on Early Career Topics
  • Technical Communities
  • Collabratec (Discussion Forum)
  • Start a Chapter
  • My Subscriptions
  • My Referrals
  • Computer Magazine
  • ComputingEdge Magazine
  • Let us help make your event a success. EXPLORE PLANNING SERVICES
  • Events Calendar
  • Calls for Papers
  • Conference Proceedings
  • Conference Highlights
  • Top 2024 Conferences
  • Conference Sponsorship Options
  • Conference Planning Services
  • Conference Organizer Resources
  • Virtual Conference Guide
  • Get a Quote
  • CPS Dashboard
  • CPS Author FAQ
  • CPS Organizer FAQ
  • Find the latest in advanced computing research. VISIT THE DIGITAL LIBRARY
  • Open Access
  • Tech News Blog
  • Author Guidelines
  • Reviewer Information
  • Guest Editor Information
  • Editor Information
  • Editor-in-Chief Information
  • Volunteer Opportunities
  • Video Library
  • Member Benefits
  • Institutional Library Subscriptions
  • Advertising and Sponsorship
  • Code of Ethics
  • Educational Webinars
  • Online Education
  • Certifications
  • Industry Webinars & Whitepapers
  • Research Reports
  • Bodies of Knowledge
  • CS for Industry Professionals
  • Resource Library
  • Newsletters
  • Women in Computing
  • Digital Library Access
  • Organize a Conference
  • Run a Publication
  • Become a Distinguished Speaker
  • Participate in Standards Activities
  • Peer Review Content
  • Author Resources
  • Publish Open Access
  • Society Leadership
  • Boards & Committees
  • Special Technical Communities
  • Local Chapters
  • Governance Resources
  • Conference Publishing Services
  • Chapter Resources
  • About the Board of Governors
  • Board of Governors Members
  • Diversity & Inclusion
  • Open Volunteer Opportunities
  • Award Recipients
  • Student Scholarships & Awards
  • Nominate an Election Candidate
  • Nominate a Colleague
  • Corporate Partnerships
  • Conference Sponsorships & Exhibits
  • Advertising
  • Recruitment
  • Publications
  • Education & Career

IEEE Computer Society 2022 Report

Tech trends: what will be the biggest innovations by 2022.

Predicting the future is hard and risky. Predicting the future in the computing industry is even harder and riskier due to dramatic changes in technology and limitless challenges to innovation. Only a small fraction of innovations truly disrupt the state of the art. Some are not practical or cost-effective, some are ahead of their time, and some simply do not have a market. There are numerous examples of superior technologies that were never adopted because others arrived on time or fared better in the market. Therefore this document is only an attempt to better understand where technologies are going. The book  The Innovator’s Dilemma  and its sequels best describe the process of innovation and disruption.

In 2014, a team of technical leaders from the IEEE Computer Society joined forces to write a technical report, entitled  IEEE CS 2022 , surveying 23 technologies that could potentially change the landscape of computer science and industry by the year 2022. In particular, this report focused on 3D printing, big data and analytics, the open intellectual property movement, massively online open courses, security cross-cutting issues, universal memory, 3D integrated circuits, photonics, cloud computing, computational biology and bioinformatics, device and nanotechnology, sustainability, high-performance computing, the Internet of Things, life sciences, machine learning and intelligent systems, natural user interfaces, networking and inter-connectivity, quantum computing, software-defined networks, multicore, and robotics for medical care.

What Is the Future of Tech: 23 Technologies by 2022

  • Security Cross-Cutting Issues The growth of large data repositories and emergence of data analytics have combined with intrusions by bad actors, governments, and corporations to open a Pandora’s box of issues. How can we balance security and privacy in this environment?
  • Open Intellectual Property Movement From open source software and standards to open-access publishing, the open IP movement is upon us. What are the implications?
  • Sustainability Can electronic cars, LED lighting, new types of batteries and chips, and increasing use of renewables combat rising energy use and an explosion in the uptake of computing?
  • Massively Online Open Courses MOOCs have the potential to transform the higher-education landscape, siphoning students from traditional universities and altering faculty and student roles. How significant will their impact be?
  • Quantum Computing Constrained only by the laws of physics, quantum computing will potential extend Moore’s Law into the next decade. As commercial quantum computing comes within reach, new breakthroughs are occurring at an accelerating pace.
  • Device and Nanotechnology It is clear that MEMS devices, nanoparticles, and their use in applications are here to stay. Nanotechnology has already been useful in manufacturing sunscreen, tires, and medical devices that can be swallowed.
  • 3D Integrated Circuits The transition from printed circuit boards to 3D-ICs is already underway in the mobile arena, and will eventually spread across the entire spectrum of IT products.
  • Universal Memory Universal memory replacements for DRAM will cause a tectonic shift in architectures and software.
  • Multicore By 2022, multicore will be everywhere, from wearable systems and smartphones to cameras, games, automobiles, cloud servers, and exa-scale supercomputers.
  • Photonics Silicon photonics will be a fundamental technology to address the bandwidth, latency, and energy challenges in the fabric of high-end systems.
  • Networking and Interconnectivity Developments at all levels of the network stack will continue to drive research and the Internet economy.
  • Software-Defined Networks OpenFlow and SDN will make networks more secure, transparent, flexible, and functional.
  • High-Performance Computing While some governments are focused on reaching exascale, some researchers are intent on moving HPC to the cloud.
  • Cloud Computing By 2022, cloud will be more entrenched and more computing workloads run on the cloud.
  • The Internet of Things From clothes that monitor our movements to smart homes and cities, the Internet of Things knows no bounds, except for our concerns about ensuring privacy amid such convenience.
  • Natural User Interfaces The long-held dreams of computers that can interface with us through touch, gesture, and speech are finally coming true, with more radical interfaces on the horizon.
  • 3D Printing 3D printing promises a revolution in fabrication, with many opportunities to produce designs that would have been prohibitively expensive.
  • Big Data and Analytics The growing availability of data and demand for its insights holds great potential to improve many data-driven decisions.
  • Machine Learning and Intelligent Systems Machine learning plays an increasingly important role in our lives, whether it’s ranking search results, recommending products, or building better models of the environment.
  • Computer Vision and Pattern Recognition Unlocking information in pictures and videos has had a major impact on consumers and more significant advances are in the pipeline.
  • Life Sciences Technology has been pivotal in improving human and animal health and addressing threats to the environment.
  • Computational Biology and Bioinformatics Vast amounts of data are enabling the improvement of human health and unraveling of the mysteries of life.
  • Medical Robotics From autonomous delivery of hospital supplies to telemedicine and advanced prostheses, medical robotics has led to many life-saving innovations.

What Are the Drivers and Disruptors Behind Tech Innovations

The 2022 Report team surveyed several thousand IEEE members about the forces behind the technology changes. Desire for sustainable energy, the availability of wireless/broadband connectivity, and use of technology for medical procedures ranked highest as drivers, while 3D printing, the use of robots for labor, and cloud computing were ranked most highly as major disruptors.

  • Increases in average life expectancy
  • Increasing ratio if retirees to workers
  • Public concern over control over access/amount of personal information
  • Desire for sustainable energy sources
  • Reduction in availability of grants and philanthropic resources
  • Widening economic inequality worldwide
  • Reduced job security in a global market economy
  • Climate change
  • Global terrorism
  • Use of big data and analytics
  • Reduction in cost of data collection and retention (for use in analytics)
  • Quickening pace of knowledge transfer
  • Long-term availability of certain energy sources
  • Alternative distribution chains (such as manufacturers selling directly to consumers)
  • Use of technology for medical procedures
  • Wireless/broadband connectivity
  • Crowdsourcing/open-sourcing of hardware development
  • Changes in educational structure/design (e.g. MOOCs)
  • Virtual/alternative currencies (such as Bitcoin)
  • Smartphone use as a device for payment
  • Cloud computing
  • Use of robots as a source of labor
  • Nonvolatile memory influencing big data accessibility and portability
  • Quantum/nondeterministic computing
  • Use of 3D printing
  • Green computing
  • New user interfaces (e.g. Siri, Kinect, instead of traditional keyboards)

Want more tech news? Subscribe to ComputingEdge Newsletter Today!

How Seamless Intelligence Will Drive 23 Tech Innovations by 2022

Computing devices — from wearable devices and chips embedded under the skin, to the comput­ers inside our mobile devices, laptops, desktops, home servers, TV sets, and refrigerators, to the computing cloud that we reach via the Internet—will together form an intelligent mesh, a computing and com­munication ecosystem that augments reality with information and intelligence gathered from our fingertips, eyes, ears, and other senses, and even directly interfaced to our brain waves.

At the heart of this revolution is seamless networking, with transparent and uninterrupted transitions between devices made possible by Near-Field Communi­cation, Bluetooth, and Wi-Fi, as well as intelligent coordination software, standardized identity technologies, cloud-based APIs.

The combination of powerful voice and facial rec­ognition, massive identity databases, and powerful tracking will likely result in a new norm that po­tentially translates into a significant loss of privacy compared to today.

Who Are the Authors of IEEE Computer Society’s 2022 Tech Forecast

This document was a team effort, spearheaded by a core team of authors who formulated the overall text and process. This team, organized by Dejan Milojicic, met twice in face-to-face meetings and had a few phone conferences. In addition, other people contributed to various parts of the document; the rest of this section lists all contributors.

The Core Team of Authors

The core team of authors included Hasan Alkhatib, Paolo Faraboschi, Eitan Frachtenburg, Hironori Kasahara, Danny Lange, Phil Laplante, Arif Merchant, Dejan Milojicic, and Karsten Schwan.

Major Contributors of Individual Sections

In addition to the core team, a few individual contributed to substantial parts of the document.

These valuable contributors include Mohammed AlQaraishi, Angela Burgess, Hiroyasu Iwata, Rick McGeer, and John Walz.

Ready for more technology predictions? Read how COVID is reshaping innovation in this special edition of Computer magazine!

Recommended by IEEE Computer Society

computer world research paper

Tools and Techniques to master the management of APIs in Production

computer world research paper

Configuration as Code: A Practical Guide

computer world research paper

Top 10 Microservices Design Patterns and Their Pros and Cons

computer world research paper

Entry Level Accessibility Barriers - A Thought Provoking Challenge For Immersive Technologies

computer world research paper

Sponsoring a Conference in 2024: Is It Worth It?

computer world research paper

Teaching Like an Entrepreneur

computer world research paper

Decoding Emotions: The AI Way – A New Era in Human-Technology Symbiosis

computer world research paper

8 Strategies to Innovate and Achieve Diversity in Tech

computer world research paper

10 Research Papers Accepted to CVPR 2023

Share

Research from the department has been accepted to the 2023 Computer Vision and Pattern Recognition (CVPR) Conference . The annual event explores machine learning, artificial intelligence, and computer vision research and its applications. 

CoWs on Pasture: Baselines and Benchmarks for Language-Driven Zero-Shot Object Navigation Samir Yitzhak Gadre Columbia University , Mitchell Wortsman University of Washington , Gabriel Ilharco University of Washington , Ludwig Schmidt University of Washington , Shuran Song Columbia University

For robots to be generally useful, they must be able to find arbitrary objects described by people (i.e., be language-driven) even without expensive navigation training on in-domain data (i.e., perform zero-shot inference). We explore these capabilities in a unified setting: language-driven zero-shot object navigation (L-ZSON). Inspired by the recent success of open-vocabulary models for image classification, we investigate a straightforward framework, CLIP on Wheels (CoW), to adapt open-vocabulary models to this task without fine-tuning. To better evaluate L-ZSON, we introduce the Pasture benchmark, which considers finding uncommon objects, objects described by spatial and appearance attributes, and hidden objects described relative to visible objects. We conduct an in-depth empirical study by directly deploying 21 CoW baselines across Habitat, RoboTHOR, and Pasture. In total, we evaluate over 90k navigation episodes and find that (1) CoW baselines often struggle to leverage language descriptions, but are proficient at finding uncommon objects. (2) A simple CoW, with CLIP-based object localization and classical exploration — and no additional training — matches the navigation efficiency of a state-of-the-art ZSON method trained for 500M steps on Habitat MP3D data. This same CoW provides a 15.6 percentage point improvement in success over a state-of-the-art RoboTHOR ZSON model.

Towards Fast Adaptation of Pretrained Contrastive Models for Multi-Channel Video-Language Retrieval  Xudong Lin Columbia University , Simran Tiwari Columbia University , Shiyuan Huang Columbia University , Manling Li UIUC , Mike Zheng Shou National University of Singapore , Heng Ji UIUC , Shih-Fu Chang Columbia University

Multi-channel video-language retrieval require models to understand information from different channels (e.g. video+question, video+speech) to correctly link a video with a textual response or query. Fortunately, contrastive multimodal models are shown to be highly effective at aligning entities in images/videos and text, e.g., CLIP; text contrastive models are extensively studied recently for their strong ability of producing discriminative sentence embeddings, e.g., SimCSE. However, there is not a clear way to quickly adapt these two lines to multi-channel video-language retrieval with limited data and resources. In this paper, we identify a principled model design space with two axes: how to represent videos and how to fuse video and text information. Based on categorization of recent methods, we investigate the options of representing videos using continuous feature vectors or discrete text tokens; for the fusion method, we explore the use of a multimodal transformer or a pretrained contrastive text model. We extensively evaluate the four combinations on five video-language datasets. We surprisingly find that discrete text tokens coupled with a pretrained contrastive text model yields the best performance, which can even outperform state-of-the-art on the iVQA and How2QA datasets without additional training on millions of video-text data. Further analysis shows that this is because representing videos as text tokens captures the key visual information and text tokens are naturally aligned with text models that are strong retrievers after the contrastive pretraining process. All the empirical analysis establishes a solid foundation for future research on affordable and upgradable multimodal intelligence.

DiGeo: Discriminative Geometry-Aware Learning for Generalized Few-Shot Object Detection  Jiawei Ma Columbia University , Yulei Niu Columbia University , Jincheng Xu Columbia University , Shiyuan Huang Columbia University , Guangxing Han Columbia University , Shih-Fu Chang Columbia University

Generalized few-shot object detection aims to achieve precise detection on both base classes with abundant annotations and novel classes with limited training data. Existing approaches enhance few-shot generalization with the sacrifice of base-class performance, or maintain high precision in base-class detection with limited improvement in novel-class adaptation. In this paper, we point out the reason is insufficient Discriminative feature learning for all of the classes. As such, we propose a new training framework, DiGeo, to learn Geometry-aware features of inter-class separation and intra-class compactness. To guide the separation of feature clusters, we derive an offline simplex equiangular tight frame (ETF) classifier whose weights serve as class centers and are maximally and equally separated. To tighten the cluster for each class, we include adaptive class-specific margins into the classification loss and encourage the features close to the class centers. Experimental studies on two few-shot benchmark datasets (VOC, COCO) and one long-tail dataset (LVIS) demonstrate that, with a single model, our method can effectively improve generalization on novel classes without hurting the detection of base classes.

Supervised Masked Knowledge Distillation for Few-Shot Transformers Han Lin Columbia University , Guangxing Han Columbia University , Jiawei Ma Columbia University , Shiyuan Huang Columbia University , Xudong Lin Columbia University , Shih-Fu Chang Columbia University

Vision Transformers (ViTs) emerge to achieve impressive performance on many data-abundant computer vision tasks by capturing long-range dependencies among local features. However, under few-shot learning (FSL) settings on small datasets with only a few labeled data, ViT tends to overfit and suffers from severe performance degradation due to its absence of CNN-alike inductive bias. Previous works in FSL avoid such problem either through the help of self-supervised auxiliary losses, or through the dextile uses of label information under supervised settings. But the gap between self-supervised and supervised few-shot Transformers is still unfilled. Inspired by recent advances in self-supervised knowledge distillation and masked image modeling (MIM), we propose a novel Supervised Masked Knowledge Distillation model (SMKD) for few-shot Transformers which incorporates label information into self-distillation frameworks. Compared with previous self-supervised methods, we allow intra-class knowledge distillation on both class and patch tokens, and introduce the challenging task of masked patch tokens reconstruction across intra-class images. Experimental results on four few-shot classification benchmark datasets show that our method with simple design outperforms previous methods by a large margin and achieves a new start-of-the-art. Detailed ablation studies confirm the effectiveness of each component of our model. Code for this paper is available here: this https URL .

FLEX: Full-Body Grasping Without Full-Body Grasps Purva Tendulkar Columbia University , Dídac Surís Columbia University , Carl Vondrick Columbia University

Synthesizing 3D human avatars interacting realistically with a scene is an important problem with applications in AR/VR, video games and robotics. Towards this goal, we address the task of generating a virtual human — hands and full body — grasping everyday objects. Existing methods approach this problem by collecting a 3D dataset of humans interacting with objects and training on this data. However, 1) these methods do not generalize to different object positions and orientations, or to the presence of furniture in the scene, and 2) the diversity of their generated full-body poses is very limited. In this work, we address all the above challenges to generate realistic, diverse full-body grasps in everyday scenes without requiring any 3D full-body grasping data. Our key insight is to leverage the existence of both full-body pose and hand grasping priors, composing them using 3D geometrical constraints to obtain full-body grasps. We empirically validate that these constraints can generate a variety of feasible human grasps that are superior to baselines both quantitatively and qualitatively. See our webpage for more details: this https URL .

Humans As Light Bulbs: 3D Human Reconstruction From Thermal Reflection Ruoshi Liu Columbia University , Carl Vondrick Columbia University

The relatively hot temperature of the human body causes people to turn into long-wave infrared light sources. Since this emitted light has a larger wavelength than visible light, many surfaces in typical scenes act as infrared mirrors with strong specular reflections. We exploit the thermal reflections of a person onto objects in order to locate their position and reconstruct their pose, even if they are not visible to a normal camera. We propose an analysis-by-synthesis framework that jointly models the objects, people, and their thermal reflections, which combines generative models with differentiable rendering of reflections. Quantitative and qualitative experiments show our approach works in highly challenging cases, such as with curved mirrors or when the person is completely unseen by a normal camera.

Tracking Through Containers and Occluders in the Wild Basile Van Hoorick Columbia University , Pavel Tokmakov Toyota Research Institute , Simon Stent Woven Planet , Jie Li Toyota Research Institute , Carl Vondrick Columbia University

Tracking objects with persistence in cluttered and dynamic environments remains a difficult challenge for computer vision systems. In this paper, we introduce TCOW, a new benchmark and model for visual tracking through heavy occlusion and containment. We set up a task where the goal is to, given a video sequence, segment both the projected extent of the target object, as well as the surrounding container or occluder whenever one exists. To study this task, we create a mixture of synthetic and annotated real datasets to support both supervised learning and structured evaluation of model performance under various forms of task variation, such as moving or nested containment. We evaluate two recent transformer-based video models and find that while they can be surprisingly capable of tracking targets under certain settings of task variation, there remains a considerable performance gap before we can claim a tracking model to have acquired a true notion of object permanence.

Doubly Right Object Recognition: A Why Prompt for Visual Rationales Chengzhi Mao Columbia University , Revant Teotia Columbia University , Amrutha Sundar Columbia University , Sachit Menon Columbia University , Junfeng Yang Columbia University , Xin Wang Microsoft Research , Carl Vondrick Columbia University

Many visual recognition models are evaluated only on their classification accuracy, a metric for which they obtain strong performance. In this paper, we investigate whether computer vision models can also provide correct rationales for their predictions. We propose a “doubly right” object recognition benchmark, where the metric requires the model to simultaneously produce both the right labels as well as the right rationales. We find that state-of-the-art visual models, such as CLIP, often provide incorrect rationales for their categorical predictions. However, by transferring the rationales from language models into visual representations through a tailored dataset, we show that we can learn a “why prompt,” which adapts large visual representations to produce correct rationales. Visualizations and empirical experiments show that our prompts significantly improve performance on doubly right object recognition, in addition to zero-shot transfer to unseen tasks and datasets.

What You Can Reconstruct From a Shadow Ruoshi Liu Columbia University , Sachit Menon Columbia University , Chengzhi Mao Columbia University , Dennis Park Toyota Research Institute , Simon Stent Woven Planet , Carl Vondrick Columbia University

3D reconstruction is a fundamental problem in computer vision, and the task is especially challenging when the object to reconstruct is partially or fully occluded. We introduce a method that uses the shadows cast by an unobserved object in order to infer the possible 3D volumes under occlusion. We create a differentiable image formation model that allows us to jointly infer the 3D shape of an object, its pose, and the position of a light source. Since the approach is end-to-end differentiable, we are able to integrate learned priors of object geometry in order to generate realistic 3D shapes of different object categories. Experiments and visualizations show that the method is able to generate multiple possible solutions that are consistent with the observation of the shadow. Our approach works even when the position of the light source and object pose are both unknown. Our approach is also robust to real-world images where ground-truth shadow mask is unknown.

CLIP-Sculptor: Zero-Shot Generation of High-Fidelity and Diverse Shapes From Natural Language Aditya Sanghi Autodesk Research , Rao Fu Brown University , Vivian Liu Columbia University , Karl D.D. Willis Autodesk Research , Hooman Shayani Autodesk Research , Amir H. Khasahmadi Autodesk Research , Srinath Sridhar Brown University , Daniel Ritchie Brown University

Recent works have demonstrated that natural language can be used to generate and edit 3D shapes. However, these methods generate shapes with limited fidelity and diversity. We introduce CLIP-Sculptor, a method to address these constraints by producing high-fidelity and diverse 3D shapes without the need for (text, shape) pairs during training. CLIP-Sculptor achieves this in a multi-resolution approach that first generates in a low-dimensional latent space and then upscales to a higher resolution for improved shape fidelity. For improved shape diversity, we use a discrete latent space which is modeled using a transformer conditioned on CLIP’s image-text embedding space. We also present a novel variant of classifier-free guidance, which improves the accuracy-diversity trade-off. Finally, we perform extensive experiments demonstrating that CLIP-Sculptor outperforms state-of-the-art baselines.

Find open faculty positions here .

Computer Science at Columbia University

Upcoming events, employer informational session: drw holdings featuring columbia quant group.

Friday 2:30 pm

, DRW Holdings

Spring Recess - no classes held

Monday 10:00 am

Coffee and Questions

Wednesday 2:00 pm

Employer Info Session: Boehringer Ingelheim

Monday 3:00 pm

In the News

Press mentions, dean boyce's statement on amicus brief filed by president bollinger.

President Bollinger announced that Columbia University along with many other academic institutions (sixteen, including all Ivy League universities) filed an amicus brief in the U.S. District Court for the Eastern District of New York challenging the Executive Order regarding immigrants from seven designated countries and refugees. Among other things, the brief asserts that “safety and security concerns can be addressed in a manner that is consistent with the values America has always stood for, including the free flow of ideas and people across borders and the welcoming of immigrants to our universities.”

This recent action provides a moment for us to collectively reflect on our community within Columbia Engineering and the importance of our commitment to maintaining an open and welcoming community for all students, faculty, researchers and administrative staff. As a School of Engineering and Applied Science, we are fortunate to attract students and faculty from diverse backgrounds, from across the country, and from around the world. It is a great benefit to be able to gather engineers and scientists of so many different perspectives and talents – all with a commitment to learning, a focus on pushing the frontiers of knowledge and discovery, and with a passion for translating our work to impact humanity.

I am proud of our community, and wish to take this opportunity to reinforce our collective commitment to maintaining an open and collegial environment. We are fortunate to have the privilege to learn from one another, and to study, work, and live together in such a dynamic and vibrant place as Columbia.

Mary C. Boyce Dean of Engineering Morris A. and Alma Schapiro Professor

Add Event to GMail

{{title}} {{fullname}}

computer world research paper

Courses This Semester

  • {{title}} ({{dept}} {{prefix}}{{course_num}}-{{section}})

Issue Cover

  • Previous Article
  • Next Article

Promises and Pitfalls of Technology

Politics and privacy, private-sector influence and big tech, state competition and conflict, author biography, how is technology changing the world, and how should the world change technology.

[email protected]

  • Split-Screen
  • Article contents
  • Figures & tables
  • Supplementary Data
  • Peer Review
  • Open the PDF for in another window
  • Guest Access
  • Get Permissions
  • Cite Icon Cite
  • Search Site

Josephine Wolff; How Is Technology Changing the World, and How Should the World Change Technology?. Global Perspectives 1 February 2021; 2 (1): 27353. doi: https://doi.org/10.1525/gp.2021.27353

Download citation file:

  • Ris (Zotero)
  • Reference Manager

Technologies are becoming increasingly complicated and increasingly interconnected. Cars, airplanes, medical devices, financial transactions, and electricity systems all rely on more computer software than they ever have before, making them seem both harder to understand and, in some cases, harder to control. Government and corporate surveillance of individuals and information processing relies largely on digital technologies and artificial intelligence, and therefore involves less human-to-human contact than ever before and more opportunities for biases to be embedded and codified in our technological systems in ways we may not even be able to identify or recognize. Bioengineering advances are opening up new terrain for challenging philosophical, political, and economic questions regarding human-natural relations. Additionally, the management of these large and small devices and systems is increasingly done through the cloud, so that control over them is both very remote and removed from direct human or social control. The study of how to make technologies like artificial intelligence or the Internet of Things “explainable” has become its own area of research because it is so difficult to understand how they work or what is at fault when something goes wrong (Gunning and Aha 2019) .

This growing complexity makes it more difficult than ever—and more imperative than ever—for scholars to probe how technological advancements are altering life around the world in both positive and negative ways and what social, political, and legal tools are needed to help shape the development and design of technology in beneficial directions. This can seem like an impossible task in light of the rapid pace of technological change and the sense that its continued advancement is inevitable, but many countries around the world are only just beginning to take significant steps toward regulating computer technologies and are still in the process of radically rethinking the rules governing global data flows and exchange of technology across borders.

These are exciting times not just for technological development but also for technology policy—our technologies may be more advanced and complicated than ever but so, too, are our understandings of how they can best be leveraged, protected, and even constrained. The structures of technological systems as determined largely by government and institutional policies and those structures have tremendous implications for social organization and agency, ranging from open source, open systems that are highly distributed and decentralized, to those that are tightly controlled and closed, structured according to stricter and more hierarchical models. And just as our understanding of the governance of technology is developing in new and interesting ways, so, too, is our understanding of the social, cultural, environmental, and political dimensions of emerging technologies. We are realizing both the challenges and the importance of mapping out the full range of ways that technology is changing our society, what we want those changes to look like, and what tools we have to try to influence and guide those shifts.

Technology can be a source of tremendous optimism. It can help overcome some of the greatest challenges our society faces, including climate change, famine, and disease. For those who believe in the power of innovation and the promise of creative destruction to advance economic development and lead to better quality of life, technology is a vital economic driver (Schumpeter 1942) . But it can also be a tool of tremendous fear and oppression, embedding biases in automated decision-making processes and information-processing algorithms, exacerbating economic and social inequalities within and between countries to a staggering degree, or creating new weapons and avenues for attack unlike any we have had to face in the past. Scholars have even contended that the emergence of the term technology in the nineteenth and twentieth centuries marked a shift from viewing individual pieces of machinery as a means to achieving political and social progress to the more dangerous, or hazardous, view that larger-scale, more complex technological systems were a semiautonomous form of progress in and of themselves (Marx 2010) . More recently, technologists have sharply criticized what they view as a wave of new Luddites, people intent on slowing the development of technology and turning back the clock on innovation as a means of mitigating the societal impacts of technological change (Marlowe 1970) .

At the heart of fights over new technologies and their resulting global changes are often two conflicting visions of technology: a fundamentally optimistic one that believes humans use it as a tool to achieve greater goals, and a fundamentally pessimistic one that holds that technological systems have reached a point beyond our control. Technology philosophers have argued that neither of these views is wholly accurate and that a purely optimistic or pessimistic view of technology is insufficient to capture the nuances and complexity of our relationship to technology (Oberdiek and Tiles 1995) . Understanding technology and how we can make better decisions about designing, deploying, and refining it requires capturing that nuance and complexity through in-depth analysis of the impacts of different technological advancements and the ways they have played out in all their complicated and controversial messiness across the world.

These impacts are often unpredictable as technologies are adopted in new contexts and come to be used in ways that sometimes diverge significantly from the use cases envisioned by their designers. The internet, designed to help transmit information between computer networks, became a crucial vehicle for commerce, introducing unexpected avenues for crime and financial fraud. Social media platforms like Facebook and Twitter, designed to connect friends and families through sharing photographs and life updates, became focal points of election controversies and political influence. Cryptocurrencies, originally intended as a means of decentralized digital cash, have become a significant environmental hazard as more and more computing resources are devoted to mining these forms of virtual money. One of the crucial challenges in this area is therefore recognizing, documenting, and even anticipating some of these unexpected consequences and providing mechanisms to technologists for how to think through the impacts of their work, as well as possible other paths to different outcomes (Verbeek 2006) . And just as technological innovations can cause unexpected harm, they can also bring about extraordinary benefits—new vaccines and medicines to address global pandemics and save thousands of lives, new sources of energy that can drastically reduce emissions and help combat climate change, new modes of education that can reach people who would otherwise have no access to schooling. Regulating technology therefore requires a careful balance of mitigating risks without overly restricting potentially beneficial innovations.

Nations around the world have taken very different approaches to governing emerging technologies and have adopted a range of different technologies themselves in pursuit of more modern governance structures and processes (Braman 2009) . In Europe, the precautionary principle has guided much more anticipatory regulation aimed at addressing the risks presented by technologies even before they are fully realized. For instance, the European Union’s General Data Protection Regulation focuses on the responsibilities of data controllers and processors to provide individuals with access to their data and information about how that data is being used not just as a means of addressing existing security and privacy threats, such as data breaches, but also to protect against future developments and uses of that data for artificial intelligence and automated decision-making purposes. In Germany, Technische Überwachungsvereine, or TÜVs, perform regular tests and inspections of technological systems to assess and minimize risks over time, as the tech landscape evolves. In the United States, by contrast, there is much greater reliance on litigation and liability regimes to address safety and security failings after-the-fact. These different approaches reflect not just the different legal and regulatory mechanisms and philosophies of different nations but also the different ways those nations prioritize rapid development of the technology industry versus safety, security, and individual control. Typically, governance innovations move much more slowly than technological innovations, and regulations can lag years, or even decades, behind the technologies they aim to govern.

In addition to this varied set of national regulatory approaches, a variety of international and nongovernmental organizations also contribute to the process of developing standards, rules, and norms for new technologies, including the International Organization for Standardization­ and the International Telecommunication Union. These multilateral and NGO actors play an especially important role in trying to define appropriate boundaries for the use of new technologies by governments as instruments of control for the state.

At the same time that policymakers are under scrutiny both for their decisions about how to regulate technology as well as their decisions about how and when to adopt technologies like facial recognition themselves, technology firms and designers have also come under increasing criticism. Growing recognition that the design of technologies can have far-reaching social and political implications means that there is more pressure on technologists to take into consideration the consequences of their decisions early on in the design process (Vincenti 1993; Winner 1980) . The question of how technologists should incorporate these social dimensions into their design and development processes is an old one, and debate on these issues dates back to the 1970s, but it remains an urgent and often overlooked part of the puzzle because so many of the supposedly systematic mechanisms for assessing the impacts of new technologies in both the private and public sectors are primarily bureaucratic, symbolic processes rather than carrying any real weight or influence.

Technologists are often ill-equipped or unwilling to respond to the sorts of social problems that their creations have—often unwittingly—exacerbated, and instead point to governments and lawmakers to address those problems (Zuckerberg 2019) . But governments often have few incentives to engage in this area. This is because setting clear standards and rules for an ever-evolving technological landscape can be extremely challenging, because enforcement of those rules can be a significant undertaking requiring considerable expertise, and because the tech sector is a major source of jobs and revenue for many countries that may fear losing those benefits if they constrain companies too much. This indicates not just a need for clearer incentives and better policies for both private- and public-sector entities but also a need for new mechanisms whereby the technology development and design process can be influenced and assessed by people with a wider range of experiences and expertise. If we want technologies to be designed with an eye to their impacts, who is responsible for predicting, measuring, and mitigating those impacts throughout the design process? Involving policymakers in that process in a more meaningful way will also require training them to have the analytic and technical capacity to more fully engage with technologists and understand more fully the implications of their decisions.

At the same time that tech companies seem unwilling or unable to rein in their creations, many also fear they wield too much power, in some cases all but replacing governments and international organizations in their ability to make decisions that affect millions of people worldwide and control access to information, platforms, and audiences (Kilovaty 2020) . Regulators around the world have begun considering whether some of these companies have become so powerful that they violate the tenets of antitrust laws, but it can be difficult for governments to identify exactly what those violations are, especially in the context of an industry where the largest players often provide their customers with free services. And the platforms and services developed by tech companies are often wielded most powerfully and dangerously not directly by their private-sector creators and operators but instead by states themselves for widespread misinformation campaigns that serve political purposes (Nye 2018) .

Since the largest private entities in the tech sector operate in many countries, they are often better poised to implement global changes to the technological ecosystem than individual states or regulatory bodies, creating new challenges to existing governance structures and hierarchies. Just as it can be challenging to provide oversight for government use of technologies, so, too, oversight of the biggest tech companies, which have more resources, reach, and power than many nations, can prove to be a daunting task. The rise of network forms of organization and the growing gig economy have added to these challenges, making it even harder for regulators to fully address the breadth of these companies’ operations (Powell 1990) . The private-public partnerships that have emerged around energy, transportation, medical, and cyber technologies further complicate this picture, blurring the line between the public and private sectors and raising critical questions about the role of each in providing critical infrastructure, health care, and security. How can and should private tech companies operating in these different sectors be governed, and what types of influence do they exert over regulators? How feasible are different policy proposals aimed at technological innovation, and what potential unintended consequences might they have?

Conflict between countries has also spilled over significantly into the private sector in recent years, most notably in the case of tensions between the United States and China over which technologies developed in each country will be permitted by the other and which will be purchased by other customers, outside those two countries. Countries competing to develop the best technology is not a new phenomenon, but the current conflicts have major international ramifications and will influence the infrastructure that is installed and used around the world for years to come. Untangling the different factors that feed into these tussles as well as whom they benefit and whom they leave at a disadvantage is crucial for understanding how governments can most effectively foster technological innovation and invention domestically as well as the global consequences of those efforts. As much of the world is forced to choose between buying technology from the United States or from China, how should we understand the long-term impacts of those choices and the options available to people in countries without robust domestic tech industries? Does the global spread of technologies help fuel further innovation in countries with smaller tech markets, or does it reinforce the dominance of the states that are already most prominent in this sector? How can research universities maintain global collaborations and research communities in light of these national competitions, and what role does government research and development spending play in fostering innovation within its own borders and worldwide? How should intellectual property protections evolve to meet the demands of the technology industry, and how can those protections be enforced globally?

These conflicts between countries sometimes appear to challenge the feasibility of truly global technologies and networks that operate across all countries through standardized protocols and design features. Organizations like the International Organization for Standardization, the World Intellectual Property Organization, the United Nations Industrial Development Organization, and many others have tried to harmonize these policies and protocols across different countries for years, but have met with limited success when it comes to resolving the issues of greatest tension and disagreement among nations. For technology to operate in a global environment, there is a need for a much greater degree of coordination among countries and the development of common standards and norms, but governments continue to struggle to agree not just on those norms themselves but even the appropriate venue and processes for developing them. Without greater global cooperation, is it possible to maintain a global network like the internet or to promote the spread of new technologies around the world to address challenges of sustainability? What might help incentivize that cooperation moving forward, and what could new structures and process for governance of global technologies look like? Why has the tech industry’s self-regulation culture persisted? Do the same traditional drivers for public policy, such as politics of harmonization and path dependency in policy-making, still sufficiently explain policy outcomes in this space? As new technologies and their applications spread across the globe in uneven ways, how and when do they create forces of change from unexpected places?

These are some of the questions that we hope to address in the Technology and Global Change section through articles that tackle new dimensions of the global landscape of designing, developing, deploying, and assessing new technologies to address major challenges the world faces. Understanding these processes requires synthesizing knowledge from a range of different fields, including sociology, political science, economics, and history, as well as technical fields such as engineering, climate science, and computer science. A crucial part of understanding how technology has created global change and, in turn, how global changes have influenced the development of new technologies is understanding the technologies themselves in all their richness and complexity—how they work, the limits of what they can do, what they were designed to do, how they are actually used. Just as technologies themselves are becoming more complicated, so are their embeddings and relationships to the larger social, political, and legal contexts in which they exist. Scholars across all disciplines are encouraged to join us in untangling those complexities.

Josephine Wolff is an associate professor of cybersecurity policy at the Fletcher School of Law and Diplomacy at Tufts University. Her book You’ll See This Message When It Is Too Late: The Legal and Economic Aftermath of Cybersecurity Breaches was published by MIT Press in 2018.

Recipient(s) will receive an email with a link to 'How Is Technology Changing the World, and How Should the World Change Technology?' and will not need an account to access the content.

Subject: How Is Technology Changing the World, and How Should the World Change Technology?

(Optional message may have a maximum of 1000 characters.)

Citing articles via

Email alerts, affiliations.

  • Special Collections
  • Review Symposia
  • Info for Authors
  • Info for Librarians
  • Editorial Team
  • Emerging Scholars Forum
  • Open Access
  • Online ISSN 2575-7350
  • Copyright © 2024 The Regents of the University of California. All Rights Reserved.

Stay Informed

Disciplines.

  • Ancient World
  • Anthropology
  • Communication
  • Criminology & Criminal Justice
  • Film & Media Studies
  • Food & Wine
  • Browse All Disciplines
  • Browse All Courses
  • Book Authors
  • Booksellers
  • Instructions
  • Journal Authors
  • Journal Editors
  • Media & Journalists
  • Planned Giving

About UC Press

  • Press Releases
  • Seasonal Catalog
  • Acquisitions Editors
  • Customer Service
  • Exam/Desk Requests
  • Media Inquiries
  • Print-Disability
  • Rights & Permissions
  • UC Press Foundation
  • © Copyright 2023 by the Regents of the University of California. All rights reserved. Privacy policy    Accessibility

This Feature Is Available To Subscribers Only

Sign In or Create an Account

  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer

TOPBOTS Logo

The Best of Applied Artificial Intelligence, Machine Learning, Automation, Bots, Chatbots

10 Cutting Edge Research Papers In Computer Vision & Image Generation

January 24, 2019 by Mariya Yao

Computer Vision Research Papers

UPDATE: We’ve also summarized the top 2019 and top 2020 Computer Vision research papers. 

Ever since convolutional neural networks began outperforming humans in  specific image recognition tasks, research in the field of computer vision has proceeded at breakneck pace.

The basic architecture of CNNs (or ConvNets) was developed in the 1980s . Yann LeCun improved upon the original design in 1989 by using backpropagation to train models to recognize handwritten digits.

We’ve come a long way since then.

In 2018, we saw novel architecture designs that improve upon performance benchmarks and also expand the range of media that machine learning models can analyze.  We also saw a number of breakthroughs with media generation which enable photorealistic style transfer, high-resolution image generation, and video-to-video synthesis.

Due to the importance and prevalence of computer vision and image generation for applied and enterprise AI, we did feature some of the papers below in our previous article summarizing the top overall machine learning papers of 2018 . Since you might not have read that previous piece, we chose to highlight the vision-related research ones again here.

We’ve done our best to summarize these papers correctly, but if we’ve made any mistakes, please contact us to request a fix . Special thanks also goes to computer vision specialist  Rebecca BurWei  for generously offering her expertise in editing and revising drafts of this article.

If these summaries of scientific AI research papers are useful for you, you can subscribe to our AI Research mailing list at the bottom of this article to be alerted when we release new summaries.  We’re planning to release summaries of important papers in computer vision, reinforcement learning, and conversational AI in the next few weeks.

If you’d like to skip around, here are the papers we featured:

  • Spherical CNNs
  • Adversarial Examples that Fool both Computer Vision and Time-Limited Humans
  • A Closed-form Solution to Photorealistic Image Stylization
  • Group Normalization
  • Taskonomy: Disentangling Task Transfer Learning
  • Self-Attention Generative Adversarial Networks
  • GANimation: Anatomically-aware Facial Animation from a Single Image
  • Video-to-Video Synthesis
  • Everybody Dance Now
  • Large Scale GAN Training for High Fidelity Natural Image Synthesis

Important Computer Vision Research Papers of 2018

1. spherical cnns , by taco s. cohen, mario geiger, jonas koehler, and max welling, original abstract.

Convolutional Neural Networks (CNNs) have become the method of choice for learning problems involving 2D planar images. However, a number of problems of recent interest have created a demand for models that can analyze spherical images. Examples include omnidirectional vision for drones, robots, and autonomous cars, molecular regression problems, and global weather and climate modelling. A naive application of convolutional networks to a planar projection of the spherical signal is destined to fail, because the space-varying distortions introduced by such a projection will make translational weight sharing ineffective.

In this paper we introduce the building blocks for constructing spherical CNNs. We propose a definition for the spherical cross-correlation that is both expressive and rotation-equivariant. The spherical correlation satisfies a generalized Fourier theorem, which allows us to compute it efficiently using a generalized (non-commutative) Fast Fourier Transform (FFT) algorithm. We demonstrate the computational efficiency, numerical accuracy, and effectiveness of spherical CNNs applied to 3D model recognition and atomization energy regression.

Our Summary

Omnidirectional cameras that are already used by cars, drones, and other robots capture a spherical image of their entire surroundings. We could analyze such spherical signals by projecting them to the plane and using CNNs. However, any planar projection of a spherical signal results in distortions. To overcome this problem, the group of researchers from the University of Amsterdam introduces the theory of spherical CNNs, the networks that can analyze spherical images without being fooled by distortions.  The approach demonstrates its effectiveness for classifying 3D shapes and Spherical MNIST images as well as for molecular energy regression, an important problem in computational chemistry.

What’s the core idea of this paper?

  • Planar projections of spherical signals result in significant distortions as some areas look larger or smaller than they really are.
  • Traditional CNNs are ineffective for spherical images because as objects move around the sphere, they also appear to shrink and stretch (think maps where Greenland looks much bigger than it actually is).
  • The solution is to use a spherical CNN which is robust to spherical rotations in the input data. By preserving the original shape of the input data, spherical CNNs treat all objects on the sphere equally without distortion.

What’s the key achievement?

  • Introducing a mathematical framework for building spherical CNNs.
  • Providing easy to use, fast and memory efficient PyTorch code for implementation of these CNNs.
  • classification of Spherical MNIST images
  • classification of 3D shapes,
  • molecular energy regression.

What does the AI community think?

  • The paper won the Best Paper Award at ICLR 2018, one of the leading machine learning conferences.

What are future research areas?

  • Development of a Steerable CNN for the sphere to analyze sections of vector bundles over the sphere (e.g., wind directions).
  • Expanding the mathematical theory from 2D spheres to 3D point clouds for classification tasks that are invariant under reflections as well as rotations.

What are possible business applications?

  • the omnidirectional vision for drones, robots, and autonomous cars;
  • molecular regression problems in computational chemistry;
  • global weather and climate modeling.

Where can you get implementation code?

  • The authors provide the original implementation for this research paper on GitHub .

2nd Edition Applied AI book

2. Adversarial Examples that Fool both Computer Vision and Time-Limited Humans , by Gamaleldin F. Elsayed, Shreya Shankar, Brian Cheung, Nicolas Papernot, Alex Kurakin, Ian Goodfellow, Jascha Sohl-Dickstein

Machine learning models are vulnerable to adversarial examples: small changes to images can cause computer vision models to make mistakes such as identifying a school bus as an ostrich. However, it is still an open question whether humans are prone to similar mistakes. Here, we address this question by leveraging recent techniques that transfer adversarial examples from computer vision models with known parameters and architecture to other models with unknown parameters and architecture, and by matching the initial processing of the human visual system. We find that adversarial examples that strongly transfer across computer vision models influence the classifications made by time-limited human observers.

Google Brain researchers seek an answer to the question: do adversarial examples that are not model-specific and can fool different computer vision models without access to their parameters and architectures, can also fool time-limited humans? They leverage key ideas from machine learning, neuroscience, and psychophysics to create adversarial examples that do in fact impact human perception in a time-limited setting. Thus, the paper introduces a new class of illusions that are shared between machines and humans.

TOP Computer Vision Papers

  • As the first step, the researchers use the black box adversarial example construction techniques that create adversarial examples without access to the model’s architecture or parameters.
  • prepending each model with a retinal layer that pre-processes the input to incorporate some of the transformations performed by the human eye;
  • performing an eccentricity-dependent blurring of the image to approximate the input which is received by the visual cortex of human subjects through their retinal lattice.
  • Classification decisions of humans are evaluated in a time-limited setting to detect even subtle effects in human perception.
  • Showing that adversarial examples that transfer across computer vision models do also successfully influence the perception of humans.
  • Demonstrating the similarity between convolutional neural networks and the human visual system.
  • The paper is widely discussed by the AI community. While most of the researchers are stunned by the results , some argue that we need a stricter definition of adversarial image because if humans classify the perturbated picture of a cat as a dog than it’s probably already a dog, not a cat.
  • Researching which techniques are crucial for the transfer of adversarial examples to humans (i.e., retinal preprocessing, model ensembling).
  • Practitioners should consider the risk that imagery could be manipulated to cause human observers to have unusual reactions because adversarial images can affect us below the horizon of awareness .

3. A Closed-form Solution to Photorealistic Image Stylization , by Yijun Li, Ming-Yu Liu, Xueting Li, Ming-Hsuan Yang, Jan Kautz

Photorealistic image stylization concerns transferring style of a reference photo to a content photo with the constraint that the stylized photo should remain photorealistic. While several photorealistic image stylization methods exist, they tend to generate spatially inconsistent stylizations with noticeable artifacts. In this paper, we propose a method to address these issues. The proposed method consists of a stylization step and a smoothing step. While the stylization step transfers the style of the reference photo to the content photo, the smoothing step ensures spatially consistent stylizations. Each of the steps has a closed-form solution and can be computed efficiently. We conduct extensive experimental validations. The results show that the proposed method generates photorealistic stylization outputs that are more preferred by human subjects as compared to those by the competing methods while running much faster. Source code and additional results are available at https://github.com/NVIDIA/FastPhotoStyle .

The team of scientists at NVIDIA and the University of California, Merced propose a new solution to photorealistic image stylization, FastPhotoStyle. The method consists of two steps: stylization and smoothing. Extensive experiments show that the suggested approach generates more realistic and compelling images than previous state-of-the-art. Even more, thanks to the closed-form solution, FastPhotoStyle can produce the stylized image 49 times faster than traditional methods.

Top Computer Vision Research Papers

  • The goal of photorealistic image stylization is to transfer style of a reference photo to a content photo while keeping the stylized image photorealistic.
  • The stylization step is based on the whitening and coloring transform (WCT), which processes images via feature projections. However, WCT was developed for artistic image stylizations, and thus, often generates structural artifacts for photorealistic image stylization. To overcome this problem, the paper introduces PhotoWCT method, which replaces the upsampling layers in the WCT with unpooling layers, and so, preserves more spatial information.
  • The smoothing step is required to solve spatially inconsistent stylizations that could arise after the first step. Smoothing is based on a manifold ranking algorithm.
  • Both steps have a closed-form solution, which means that the solution can be obtained in a fixed number of operations (i.e., convolutions, max-pooling, whitening, etc.). Thus, computations are much more efficient compared to the traditional methods.
  • outperforms artistic stylization algorithms by rendering much fewer structural artifacts and inconsistent stylizations, and
  • outperforms photorealistic stylization algorithms by synthesizing not only colors but also patterns in the style photos.
  • The experiments demonstrate that users prefer FastPhotoStyle results over the previous state-of-the-art in terms of both stylization effects (63.1%) and photorealism (73.5%).
  • FastPhotoSyle can synthesize an image of 1024 x 512 resolution in only 13 seconds, while the previous state-of-the-art method needs 650 seconds for the same task.
  • The paper was presented at ECCV 2018, leading European Conference on Computer Vision.
  • Finding the way to transfer small patterns from the style photo as they are smoothed away by the suggested method.
  • Exploring the possibilities to further reduce the number of structural artifacts in the stylized photos.
  • Content creators in the business settings can largely benefit from photorealistic image stylization as the tool basically allows you to automatically change the style of any photo based on what fits the narrative.
  • The photographers also discuss the tremendous impact that this technology can have in real estate photography.
  • NVIDIA team provides the original implementation for this research paper on GitHub .

4. Group Normalization , by Yuxin Wu and Kaiming He

Batch Normalization (BN) is a milestone technique in the development of deep learning, enabling various networks to train. However, normalizing along the batch dimension introduces problems – BN’s error increases rapidly when the batch size becomes smaller, caused by inaccurate batch statistics estimation. This limits BN’s usage for training larger models and transferring features to computer vision tasks including detection, segmentation, and video, which require small batches constrained by memory consumption. In this paper, we present Group Normalization (GN) as a simple alternative to BN. GN divides the channels into groups and computes within each group the mean and variance for normalization. GN’s computation is independent of batch sizes, and its accuracy is stable in a wide range of batch sizes. On ResNet-50 trained in ImageNet, GN has 10.6% lower error than its BN counterpart when using a batch size of 2; when using typical batch sizes, GN is comparably good with BN and outperforms other normalization variants. Moreover, GN can be naturally transferred from pre-training to fine-tuning. GN can outperform its BN-based counterparts for object detection and segmentation in COCO, and for video classification in Kinetics, showing that GN can effectively replace the powerful BN in a variety of tasks. GN can be easily implemented by a few lines of code in modern libraries.

Facebook AI research team suggest Group Normalization (GN) as an alternative to Batch Normalization (BN). They argue that BN’s error increases dramatically for small batch sizes. This limits the usage of BN when working with large models to solve computer vision tasks that require small batches due to memory constraints. On the contrary, Group Normalization is independent of batch sizes as it divides the channels into groups and computes the mean and variance for normalization within each group. The experiments confirm that GN outperforms BN in a variety of tasks, including object detection, segmentation, and video classification.

TOP Computer Vision Papers

  • Group Normalization is a simple alternative to Batch Normalization, especially in the scenarios where batch size tends to be small, for example, computer vision tasks, requiring high-resolution input.
  • GN explores only the layer dimensions, and thus, its computation is independent of batch size. Specifically, GN divides channels, or feature maps, into groups and normalizes the features within each group.
  • Group Normalization can be easily implemented by a few lines of code in PyTorch and TensorFlow.
  • Introducing Group Normalization, new effective normalization method.
  • GN’s accuracy is stable in a wide range of batch sizes as its computation is independent of batch size. For example, GN demonstrated a 10.6% lower error rate than its BN-based counterpart for ResNet-50 in ImageNet with a batch size of 2.
  • GN can be also transferred to fine-tuning. The experiments show that GN can outperform BN counterparts for object detection and segmentation in COCO dataset and video classification in Kinetics dataset.
  • The paper received an honorable mention at ECCV 2018, leading European Conference on Computer Vision.
  • It is also the second most popular paper in 2018 based on the people’s libraries at Arxiv Sanity Preserver.
  • Applying group normalization to sequential or generative models.
  • Investigating GN’s performance on learning representations for reinforcement learning.
  • Exploring if GN combined with a suitable regularizer will improve results.
  • Business applications that rely on BN-based models for object detection, segmentation, video classification and other computer vision tasks that require high-resolution input may benefit from moving to GN-based models as they are more accurate in these settings.
  • Facebook AI research team provides Mask R-CNN baseline results and models trained with Group Normalization .
  • PyTorch implementation of group normalization is also available on GitHub.

5. Taskonomy: Disentangling Task Transfer Learning , by Amir R. Zamir, Alexander Sax, William Shen, Leonidas J. Guibas, Jitendra Malik, and Silvio Savarese

Do visual tasks have a relationship, or are they unrelated? For instance, could having surface normals simplify estimating the depth of an image? Intuition answers these questions positively, implying existence of a structure among visual tasks. Knowing this structure has notable values; it is the concept underlying transfer learning and provides a principled way for identifying redundancies across tasks, e.g., to seamlessly reuse supervision among related tasks or solve many tasks in one system without piling up the complexity.

We proposes a fully computational approach for modeling the structure of space of visual tasks. This is done via finding (first and higher-order) transfer learning dependencies across a dictionary of twenty six 2D, 2.5D, 3D, and semantic tasks in a latent space. The product is a computational taxonomic map for task transfer learning. We study the consequences of this structure, e.g. nontrivial emerged relationships, and exploit them to reduce the demand for labeled data. For example, we show that the total number of labeled datapoints needed for solving a set of 10 tasks can be reduced by roughly 2/3 (compared to training independently) while keeping the performance nearly the same. We provide a set of tools for computing and probing this taxonomical structure including a solver that users can employ to devise efficient supervision policies for their use cases.

Assertions of the existence of a structure among visual tasks have been made by many researchers since the early years of modern computer science. And now Amir Zamir and his team make an attempt to actually find this structure. They model it using a fully computational approach and discover lots of useful relationships between different visual tasks, including the nontrivial ones. They also show that by taking advantage of these interdependencies, it is possible to achieve the same model performance with the labeled data requirements reduced by roughly ⅔.

TOP Computer Vision Papers

  • A model aware of the relationships among different visual tasks demands less supervision, uses less computation, and behaves in more predictable ways.
  • A fully computational approach to discovering the relationships between visual tasks is preferable because it avoids imposing prior, and possibly incorrect, assumptions: the priors are derived from either human intuition or analytical knowledge, while neural networks might operate on different principles.
  • Identifying relationships between 26 common visual tasks.
  • Showing how this structure helps in discovering types of transfer learning that will be most effective for each visual task.
  • Creating a new dataset of 4 million images of indoor scenes including 600 buildings annotated with 26 tasks.
  • The paper won the Best Paper Award at CVPR 2018, the key conference on computer vision and pattern recognition.
  • The results are very important as for the most real-world tasks large-scale labeled datasets are not available .
  • To move from a model where common visual tasks are entirely defined by humans and try an approach where human-defined visual tasks are viewed as observed samples which are composed of computationally found latent subtasks.
  • Exploring the possibility to transfer the findings to not entirely visual tasks, e.g. robotic manipulation.
  • Relationships discovered in this paper can be used to build more effective visual systems that will require less labeled data and lower computational costs.

6. Self-Attention Generative Adversarial Networks , by Han Zhang, Ian Goodfellow, Dimitris Metaxas, Augustus Odena

In this paper, we propose the Self-Attention Generative Adversarial Network (SAGAN) which allows attention-driven, long-range dependency modeling for image generation tasks. Traditional convolutional GANs generate high-resolution details as a function of only spatially local points in lower-resolution feature maps. In SAGAN, details can be generated using cues from all feature locations. Moreover, the discriminator can check that highly detailed features in distant portions of the image are consistent with each other. Furthermore, recent work has shown that generator conditioning affects GAN performance. Leveraging this insight, we apply spectral normalization to the GAN generator and find that this improves training dynamics. The proposed SAGAN achieves the state-of-the-art results, boosting the best published Inception score from 36.8 to 52.52 and reducing Frechet Inception distance from 27.62 to 18.65 on the challenging ImageNet dataset. Visualization of the attention layers shows that the generator leverages neighborhoods that correspond to object shapes rather than local regions of fixed shape.

Traditional convolutional GANs demonstrated some very promising results with respect to image synthesis. However, they have at least one important weakness – convolutional layers alone fail to capture geometrical and structural patterns in the images. Since convolution is a local operation, it is hardly possible for an output on the top-left position to have any relation to the output at bottom-right . The paper introduces a simple solution to this problem – incorporating the self-attention mechanism into the GAN framework. This solution combined with several stabilization techniques helps the Senf-Attention Generative Adversarial Networks (SAGANs) achieve the state-of-the-art results in image synthesis.

TOP Computer Vision papers

  • Convolutional layers alone are computationally inefficient for modeling long-range dependencies in images. On the contrary, a self-attention mechanism incorporated into the GAN framework will enable both the generator and the discriminator to efficiently model relationships between widely separated spatial regions.
  • The self-attention module calculates response at a position as a weighted sum of the features at all positions.
  • Applying spectral normalization for both generator and discriminator – the researchers argue that not only the discriminator but also the generator can benefit from spectral normalization, as it can prevent the escalation of parameter magnitudes and avoid unusual gradients.
  • Using separate learning rates for the generator and the discriminator to compensate for the problem of slow learning in a regularized discriminator and make it possible to use fewer generator steps per discriminator step.
  • Showing that self-attention module incorporated into the GAN framework is, in fact, effective in modeling long-range dependencies.
  • spectral normalization applied to the generator stabilizes GAN training;
  • utilizing imbalanced learning rates speeds up training of regularized discriminators.
  • Achieving state-of-the-art results in image synthesis by boosting the Inception Score from 36.8 to 52.52 and reducing Fréchet Inception Distance from 27.62 to 18.65.
  • “The idea is simple and intuitive yet very effective, plus easy to implement.” – Sebastian Raschka , assistant professor of Statistics at the University of Wisconsin-Madison.
  • Exploring the possibilities to reduce the number of weird samples generated by GANs.
  • Image synthesis with GANs can replace expensive manual media creation for advertising and e-commerce purposes.
  • PyTorch and TensorFlow implementations of Self-Attention GANs are available on GitHub.

7. GANimation: Anatomically-aware Facial Animation from a Single Image , by Albert Pumarola, Antonio Agudo, Aleix M. Martinez, Alberto Sanfeliu, Francesc Moreno-Noguer

Recent advances in Generative Adversarial Networks (GANs) have shown impressive results for task of facial expression synthesis. The most successful architecture is StarGAN, that conditions GANs generation process with images of a specific domain, namely a set of images of persons sharing the same expression. While effective, this approach can only generate a discrete number of expressions, determined by the content of the dataset. To address this limitation, in this paper, we introduce a novel GAN conditioning scheme based on Action Units (AU) annotations, which describes in a continuous manifold the anatomical facial movements defining a human expression. Our approach allows controlling the magnitude of activation of each AU and combine several of them. Additionally, we propose a fully unsupervised strategy to train the model, that only requires images annotated with their activated AUs, and exploit attention mechanisms that make our network robust to changing backgrounds and lighting conditions. Extensive evaluation show that our approach goes beyond competing conditional generators both in the capability to synthesize a much wider range of expressions ruled by anatomically feasible muscle movements, as in the capacity of dealing with images in the wild.

The paper introduces a novel GAN model that is able to generate anatomically-aware facial animations from a single image under changing backgrounds and illumination conditions. It advances current works, which had only addressed the problem for discrete emotions category editing and portrait images. The approach renders a wide range of emotions by encoding facial deformations as Action Units. The resulting animations demonstrate a remarkably smooth and consistent transformation across frames even with challenging light conditions and backgrounds.

TOP Computer Vision Papers

  • Facial expressions can be described in terms of Action Units (AUs), which anatomically describe the contractions of specific facial muscles. For example, the facial expression for ‘fear’ is generally produced with the following activations: Inner Brow Raiser (AU1), Outer Brow Raiser (AU2), Brow Lowerer (AU4), Upper Lid Raiser (AU5), Lid Tightener (AU7), Lip Stretcher (AU20) and Jaw Drop (AU26). The magnitude of each AU defines the extent of emotion.
  • A model for synthetic facial animation is based on the GAN architecture, which is conditioned on a one-dimensional vector indicating the presence/absence and the magnitude of each Action Unit.
  • To circumvent the need for pairs of training images of the same person under different expressions, a bidirectional generator is used to both transform an image into a desired expression and transform the synthesized image back into the original pose.
  • To handle images under changing backgrounds and illumination conditions, the model includes an attention layer that focuses the action of the network only in those regions of the image that are relevant to convey the novel expression.
  • Introducing a novel GAN model for face animation in the wild that can be trained in a fully unsupervised manner and generate visually compelling images with remarkably smooth and consistent transformation across frames even with challenging light conditions and non-real world data.
  • Demonstrating how a wider range of emotions can be generated by interpolating between emotions the GAN has already seen.
  • Applying the introduced approach to video sequences.
  • The technology that automatically animates the facial expression from a single image can be applied in several areas including the fashion and e-commerce business, the movie industry, photography technologies.
  • The authors provide the original implementation of this research paper on GitHub .

8. Video-to-Video Synthesis , by Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Guilin Liu, Andrew Tao, Jan Kautz, Bryan Catanzaro

We study the problem of video-to-video synthesis, whose goal is to learn a mapping function from an input source video (e.g., a sequence of semantic segmentation masks) to an output photorealistic video that precisely depicts the content of the source video. While its image counterpart, the image-to-image synthesis problem, is a popular topic, the video-to-video synthesis problem is less explored in the literature. Without understanding temporal dynamics, directly applying existing image synthesis approaches to an input video often results in temporally incoherent videos of low visual quality. In this paper, we propose a novel video-to-video synthesis approach under the generative adversarial learning framework. Through carefully-designed generator and discriminator architectures, coupled with a spatio-temporal adversarial objective, we achieve high-resolution, photorealistic, temporally coherent video results on a diverse set of input formats including segmentation masks, sketches, and poses. Experiments on multiple benchmarks show the advantage of our method compared to strong baselines. In particular, our model is capable of synthesizing 2K resolution videos of street scenes up to 30 seconds long, which significantly advances the state-of-the-art of video synthesis. Finally, we apply our approach to future video prediction, outperforming several state-of-the-art competing systems.

Researchers from NVIDIA have introduced a novel video-to-video synthesis approach. The framework is based on conditional GANs. Specifically, the method couples carefully-designed generator and discriminator with a spatio-temporal adversarial objective. The experiments demonstrate that the suggested vid2vid approach can synthesize high-resolution, photorealistic, temporally coherent videos on a diverse set of input formats including segmentation masks, sketches, and poses. It can also predict the next frames with far superior results than the baseline models.

TOP Computer Vision Papers

  • current source frame;
  • past two source frames;
  • past two generated frames.
  • Conditional image discriminator ensures that each output frame resembles a real image given the same source image.
  • Conditional video discriminator ensures that consecutive output frames resemble the temporal dynamics of a real video given the same optical flow.
  • Foreground-background prior in the generator design further improves the synthesis performance of the proposed model.
  • Using a soft occlusion mask instead of binary allows to better handle the “zoom in” scenario: we can add details by gradually blending the warped pixels and the newly synthesized pixels.
  • Generating high-resolution (2048х2048), photorealistic, temporally coherent videos up to 30 seconds long.
  • Outputting several videos with different visual appearances depending on sampling different feature vectors.
  • Outperforming the baseline models in future video prediction.
  • Converting semantic labels into realistic real-world videos.
  • Generating multiple outputs of talking people from edge maps.
  • Generating an entire human body given a pose.
  • “NVIDIA’s new vid2vid is the first open-source code that lets you fake anybody’s face convincingly from one source video. […] interesting times ahead…”, Gene Kogan , an artist and a programmer.
  • The paper has also received some criticism over the concern that it can be used to create deepfakes or tampered videos which can deceive people.
  • Using object tracking information to make sure that each object has a consistent appearance across the whole video.
  • Researching if training the model with coarser semantic labels will help reduce the visible artifacts that appear after semantic manipulations (e.g., turning trees into buildings).
  • Adding additional 3D cues, such as depth maps, to enable synthesis of turning cars.
  • Marketing and advertising can benefit from the opportunities created by the vid2vid method (e.g., replacing the face or even the entire body in the video). However, this should be used with caution, keeping in mind the ethical considerations.
  • NVIDIA team provides the original implementation of this research paper on GitHub .

9. Everybody Dance Now , by Caroline Chan, Shiry Ginosar, Tinghui Zhou, Alexei A. Efros

This paper presents a simple method for “do as I do” motion transfer: given a source video of a person dancing we can transfer that performance to a novel (amateur) target after only a few minutes of the target subject performing standard moves. We pose this problem as a per-frame image-to-image translation with spatio-temporal smoothing. Using pose detections as an intermediate representation between source and target, we learn a mapping from pose images to a target subject’s appearance. We adapt this setup for temporally coherent video generation including realistic face synthesis. Our video demo can be found at https://youtu.be/PCBTZh41Ris .

UC Berkeley researchers present a simple method for generating videos with amateur dancers performing like professional dancers. If you want to take part in the experiment, all you need to do is to record a few minutes of yourself performing some standard moves and then pick up the video with the dance you want to repeat. The neural network will do the main job: it solves the problem as a per-frame image-to-image translation with spatio-temporal smoothing. By conditioning the prediction at each frame on that of the previous time step for temporal smoothness and applying a specialized GAN for realistic face synthesis, the method achieves really amazing results.

TOP Computer Vision Papers

  • A pre-trained state-of-the-art pose detector creates pose stick figures from the source video.
  • Global pose normalization is applied to account for differences between the source and target subjects in body shapes and locations within the frame.
  • Normalized pose stick figures are mapped to the target subject.
  • To make videos smooth, the researchers suggest conditioning the generator on the previously generated frame and then giving both images to the discriminator. Gaussian smoothing on the pose keypoints allows to further reduce jitter.
  • To generate more realistic faces, the method includes an additional face-specific GAN that brushes up the face after the main generation is finished.
  • Suggesting a novel approach to motion transfer that outperforms a strong baseline (pix2pixHD), according to both qualitative and quantitative assessments.
  • Demonstrating that face-specific GAN adds considerable detail to the output video.
  • “Overall I thought this was really fun and well executed. Looking forward to the code release so that I can start training my dance moves.”, Tom Brown , member of technical staff at Google Brain.
  • “’Everybody Dance Now’ from Caroline Chan, Alyosha Efros and team transfers dance moves from one subject to another. The only way I’ll ever dance well. Amazing work!!!”, Soumith Chintala‏, AI Research Engineer at Facebook.
  • Replacing pose stick figures with temporally coherent inputs and representation specifically optimized for motion transfer.
  • “Do as I do” motion transfer might be applied to replace subjects when creating marketing and promotional videos.
  • PyTorch implementation of this research paper is available on GitHub .

10. Large Scale GAN Training for High Fidelity Natural Image Synthesis , by Andrew Brock, Jeff Donahue, and Karen Simonyan

Despite recent progress in generative image modeling, successfully generating high-resolution, diverse samples from complex datasets such as ImageNet remains an elusive goal. To this end, we train Generative Adversarial Networks at the largest scale yet attempted, and study the instabilities specific to such scale. We find that applying orthogonal regularization to the generator renders it amenable to a simple “truncation trick”, allowing fine control over the trade-off between sample fidelity and variety by truncating the latent space. Our modifications lead to models which set the new state of the art in class-conditional image synthesis. When trained on ImageNet at 128×128 resolution, our models (BigGANs) achieve an Inception Score (IS) of 166.3 and Frechet Inception Distance (FID) of 9.6, improving over the previous best IS of 52.52 and FID of 18.65.

DeepMind team finds that current techniques are sufficient for synthesizing high-resolution, diverse images from available datasets such as ImageNet and  JFT-300M. In particular, they show that Generative Adversarial Networks (GANs) can generate images that look very realistic if they are trained at the very large scale, i.e. using two to four times as many parameters and eight times the batch size compared to prior art. These large-scale GANs, or BigGANs, are the new state-of-the-art in class-conditional image synthesis.

TOP Computer Vision Papers

  • GANs perform much better with the increased batch size and number of parameters.
  • Applying orthogonal regularization to the generator makes the model responsive to a specific technique (“truncation trick”), which provides control over the trade-off between sample fidelity and variety.
  • Demonstrating that GANs can benefit significantly from scaling.
  • Building models that allow explicit, fine-grained control of the trade-off between sample variety and fidelity.
  • Discovering instabilities of large-scale GANs and characterizing them empirically.
  • an Inception Score (IS) of 166.3 with the previous best IS of 52.52;
  • Frechet Inception Distance (FID) of 9.6 with the previous best FID of 18.65.
  • The paper is under review for next ICLR 2019.
  • After BigGAN generators become available on TF Hub, AI researchers from all over the world are playing with BigGANs to generate dogs, watches, bikini images, Mona Lisa, seashores and many more.
  • Moving to larger datasets to mitigate GAN stability issues.
  • Replacing expensive manual media creation for advertising and e-commerce purposes.
  • A BigGAN demo implemented in TensorFlow is available to use on Google’s Colab tool.
  • Aaron Leong has a Github repository for BigGAN implemented in PyTorch .

Want Deeper Dives Into Specific AI Research Topics?

Due to popular demand, we’ve released several of these easy-to-read summaries and syntheses of major research papers for different subtopics within AI and machine learning.

  • Top 10 machine learning & AI research papers of 2018
  • Top 10 AI fairness, accountability, transparency, and ethics (FATE) papers of 2018
  • Top 14 natural language processing (NLP) research papers of 2018
  • Top 10 computer vision and image generation research papers of 2018
  • Top 10 conversational AI and dialog systems research papers of 2018
  • Top 10 deep reinforcement learning research papers of 2018

Update: 2019 Research Summaries Are Released

  • Top 10 AI & machine learning research papers from 2019
  • Top 11 NLP achievements & papers from 2019
  • Top 10 research papers in conversational AI from 2019
  • Top 10 computer vision research papers from 2019
  • Top 12 AI ethics research papers introduced in 2019
  • Top 10 reinforcement learning research papers from 2019

Enjoy this article? Sign up for more AI research updates.

We’ll let you know when we release more summary articles like this one.

  • Email Address *
  • Name * First Last
  • Natural Language Processing (NLP)
  • Chatbots & Conversational AI
  • Computer Vision
  • Ethics & Safety
  • Machine Learning
  • Deep Learning
  • Reinforcement Learning
  • Generative Models
  • Other (Please Describe Below)
  • What is your biggest challenge with AI research? *

Reader Interactions

' src=

About Mariya Yao

Mariya is the co-author of Applied AI: A Handbook For Business Leaders and former CTO at Metamaven. She "translates" arcane technical concepts into actionable business advice for executives and designs lovable products people actually want to use. Follow her on Twitter at @thinkmariya to raise your AI IQ.

Leave a Reply

Your email address will not be published. Required fields are marked *

About TOPBOTS

  • Expert Contributors
  • Terms of Service & Privacy Policy
  • Contact TOPBOTS

View the latest institution tables

View the latest country/territory tables

Google Scholar reveals its most influential papers for 2020

Artificial intelligence papers amass citations more than any other research topic.

computer world research paper

Chinese Go player Ke Jie (L) attends a press conference after his second match against Google's artificial intelligence programme AlphaGo on day two of Future of Go Summit in Wuzhen on May 25, 2017 in Jiangxi, Zhejiang Province of China. Credit: VCG / Contributor / Getty

13 July 2020

computer world research paper

VCG / Contributor / Getty

Chinese Go player Ke Jie (L) attends a press conference after his second match against Google's artificial intelligence programme AlphaGo on day two of Future of Go Summit in Wuzhen on May 25, 2017 in Jiangxi, Zhejiang Province of China.

Google Scholar has released its annual ranking of most highly cited publications. Artificial intelligence (AI) research dominates once again , accumulating huge numbers of citations over the past year.

Computer vision research in particular attracts a high number citations over a short period of time. Many of the most highly cited papers in this ranking are centred on object detection and image recognition – research that is crucial for technologies such as self-driving cars and surveillance.

The high citations numbers for AI-related papers mirror the increasing importance governments around the world are placing on the technologies they underpin.

In February , the United States government announced its commitment to double research and development spending in non-defense AI and quantum information science by 2022.

In April, the European Commission announced that it is increasing its annual investments in AI by 70% under the research and innovation programme, Horizon 2020.

Google Scholar is the largest database in the world of its kind, tracking citation information for almost 400 million academic papers and other scholarly literature.

The 2020 Google Scholar Metrics ranking , which is freely accessible online, tracks papers published between 2015 and 2019, and includes citations from all articles that were indexed in Google Scholar as of June 2020.

The most highly-cited paper of all, "Deep Residual Learning for Image Recognition", published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , was written by a team from Microsoft in 2016. It has made a huge leap from 25,256 citations in 2019 to 49,301 citations in 2020.

“Deep learning”, a seminal review of the potential of AI technologies that was published in Nature in 2015, has had an increase in citations from 16,750 in 2019 to 27,375 in 2020.

It is the most highly-cited paper in the listing Nature , which is ranked by Google Scholar as the most influential journal , based on a measure called the h5-index, which is the h-index for articles published in the last five years .

Three of the top five papers listed by Google Scholar for Nature are related to AI. Two are genetics papers. Citations counts for the AI papers are significantly higher.

For example, the AI paper, "Deep learning", with the highest number of citations for Nature , has 27,375. The paper, “Analysis of protein-coding genetic variation in 60,706 humans”, is the highest ranked non-AI-related paper published in Nature , and has 6,387 citations.

Of the 100 top-ranked journals in 2020, six are AI conference publications. Their papers tend to amass citations much faster than papers in influential journals such as The New England Journal of Medicine , Nature , and Science .

Such rapid accumulation of citations may be in part explained by the fact that at these annual conferences that can attract thousands of attendees from around the world, new software, which is often open source, is shared and later built upon by the community.

Below is our 2020 selection of Google Scholar’s most highly-cited articles published by the world's most influential journals.

See our 2019 coverage for a selection that includes the high-performers mentioned above.

1. “ Adam: A Method for Stochastic Optimization ” (2015) International Conference on Learning Representations 47,774 citations

Adam is a popular optimization algorithm for deep learning – a subset of machine learning that uses artificial neural networks inspired by the human brain to imitate how the brain develops certain types of knowledge.

Adam was introduced in this paper at the 2014 International Conference on Learning Representations (ICLR) by Diederik P. Kingma, today a machine learning researcher at Google, and Jimmy Ba from the Machine Learning Group at the University of Toronto, Canada. Adam has since been widely used in deep learning applications in computer vision and natural language processing

The ICLR, one of the most prestigious conferences on machine learning, is an important platform for researchers whose papers are accepted. In May 2020, the conference drew 5,600 participants from nearly 90 countries to its virtual sessions – more than double the turnout in 2019 , at 2,700 physical attendees.

2. “ Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks ” (2015) Neural Information Processing Systems 19,507 citations

Presented at the 2015 Neural Information Processing Systems annual meeting in Canada, this paper describes what has now become the most widely-used version of an object detection algorithm called R-CNN .

Object detection is a major part of computer vision research, used to identify objects such as humans, cars, and buildings in images and videos.

The lead author, Shaoqing Ren, is also a co-author on Google’s most-cited paper for 2020, "Deep Residual Learning for Image Recognition", which has amassed almost 50,000 citations. Read more about it here .

That paper was co-authored by Ross Girshick, one of the invertors of R-CNN and now a research scientist at Facebook AI.

In the same week that “Faster R-CNN” was presented by Ren and his colleagues, Girshick presented a paper on “Fast R-CNN”, another version of R-CNN, at a different conference. That paper, presented at the 2015 IEEE International Conference on Computer Vision in Chile, has amassed more than 10,000 citations .

3. “ Human-level control through deep reinforcement learning ” (2015) Nature 10,394 Citations

After “Deep learning” (mentioned above), which is Nature ’s most highly cited paper in the Google Scholar Metrics ranking, this paper is the journal’s second-most cited paper for 2020.

It centres on reinforcement learning – how machine learning models are trained to make a series of decisions by interacting with their environments.

The paper was authored by a team from Google DeepMind, a London-based organization acquired by Google in 2014 that has developed AI technologies for the diagnosis of eye diseases, energy conservation, and to predict the complex 3D structures of proteins.

4. “ Attention Is All You Need ” (2017) Neural Information Processing Systems 9,885 citations

Authored by researchers at Google Brain and Google Research, this paper proposed a new deep learning model called the Transformer.

Designed to process sequential data such as natural language, Transformer is used by translation, text summarization, and voice recognition technologies, and other applications that use sequence analysis such as DNA, RNA, and peptide sequencing. It’s been used, for example, to generate entire Wikipedia articles .

Earlier this year, researchers at Google predicted that Transformer could be used for applications beyond text, including to generate music and images.

The paper was part of the proceedings from the 2017 Neural Information Processing Systems conference held in Long Beach, California.

5. “ The Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3) ” (2016) JAMA 8,576 citations

The first formal revision of the definitions of sepsis and septic shock in 15 years, this paper describes a condition that’s estimated to affect more than 30 million people worldwide every year.

Led by the European Society of Intensive Care Medicine and the Society of Critical Care Medicine, the study convened a task force of 19 critical care, infectious disease, surgical, and pulmonary specialists in 2014 to provide a more consistent and reproducible picture of sepsis incidence and patient outcomes.

The paper, led by Mervyn Singer, professor of intensive care medicine at University College London, is by far the most highly cited paper in JAMA . The second-most highly cited paper , on opioids, has 3,679 citations, according to Google Scholar.

6. “ limma powers differential expression analyses for RNA-sequencing and microarray studies ” (2015) Nucleic Acids Research 8,328 citations

limma is a widely used, open source analysis tool for gene expression experiments, and has been available for more than a decade. A large part of its appeal is the ease at which new functionality and refinements can be added as new applications arise.

This paper, led by Matthew Ritchie from the Molecular Medicine Division of the Walter and Eliza Hall Institute of Medical Research in Melbourne, Australia, is presented as a review of the “philosophy and design” of the limma package, looking at its recent and historical features and enhancements.

The journal, Nucleic Acids Research , while ranked outside the top 10 of Google Scholar’s most influential journals , has more papers with 3,000+ citations each than The Lancet (ranked 4th).

7. “ Mastering the game of Go with deep neural networks and tree search ” (2016) Nature 8,209 citations

Viewed as one of the most challenging classic games to master, Go is a 2,500-year-old game that will put any player – living or otherwise – through their paces.

In 2016, a computer program called AlphaGo defeated the world Go champion, Lee Sedo , in what would be hailed as a major milestone for AI technology. AlphaGo was the brainchild of computer scientist David Silver when he was a PhD student at the University of Alberta in Canada.

This paper, co-led by David Silver and Aja Huang, today both research scientists at Google DeepMind, describes the technology that underpins AlphaGo. It is the third-most highly cited in Nature , according to Google Scholar.

In 2017, the team introduced AlphaGo Zero , which improves on previous iterations by using a single neural network , rather than two, to evaluate which sequence of moves is the most likely to win. That paper is the eighth-most cited in Nature .

computer graphics Recently Published Documents

Total documents.

  • Latest Documents
  • Most Cited Documents
  • Contributed Authors
  • Related Sources
  • Related Keywords

Beyond categories: dynamic qualitative analysis of visuospatial representation in arithmetic

AbstractVisuospatial representations of numbers and their relationships are widely used in mathematics education. These include drawn images, models constructed with concrete manipulatives, enactive/embodied forms, computer graphics, and more. This paper addresses the analytical limitations and ethical implications of methodologies that use broad categorizations of representations and argues the benefits of dynamic qualitative analysis of arithmetical-representational strategy across multiple semi-independent aspects of display, calculation, and interaction. It proposes an alternative methodological approach combining the structured organization of classification with the detailed nuance of description and describes a systematic but flexible framework for analysing nonstandard visuospatial representations of early arithmetic. This approach is intended for use by researchers or practitioners, for interpretation of multimodal and nonstandard visuospatial representations, and for identification of small differences in learners’ developing arithmetical-representational strategies, including changes over time. Application is illustrated using selected data from a microanalytic study of struggling students’ multiplication and division in scenario tasks.

IEEE Transactions on Visualization and Computer Graphics

Elements of the methodology of teaching vector graphics based on the free graphic editor libreoffice draw at the level of basic general education.

The article presents the methodology for teaching the theme "Creation and editing of vector graphic information" in basic school, which can be implemented both in full-time education and using distance learning technologies. The methodology is based on the use of the free vector graphic editor LibreOffice Draw and has been tested over several years in teaching vector computer graphics in the seventh grade in informatics course in full-time, as well as in a distance learning format in 2020. The authors substantiate the need to develop universal methods of teaching information technologies that are insensitive to the form of education (full-time or using distance educational technologies) based on the use of free software. Some principles of constructing a methodology for teaching vector graphics based on the new Federal State Educational Standard of Basic General Education are formulated. As the basic operating system used by the teacher, the domestic free operating system "Alt Education 9" is proposed. The article substantiates the choice of the graphic editor LibreOffice Draw as the optimal software tool to support teaching vector graphics in elementary school, formulates the criteria for choosing  LibreOffice Draw as a basic tool for studying computer graphics in grades 6–9 for the implementation of distance learning. A universal scheme for the implementation of a distance lesson in teaching information technology based on the use of free cross-platform software, in particular, teaching vector graphics, is proposed. 

The Mathematics of Smoothed Particle Hydrodynamics (SPH) Consistency

Since its inception Smoothed Particle Hydrodynamics (SPH) has been widely employed as a numerical tool in different areas of science, engineering, and more recently in the animation of fluids for computer graphics applications. Although SPH is still in the process of experiencing continual theoretical and technical developments, the method has been improved over the years to overcome some shortcomings and deficiencies. Its widespread success is due to its simplicity, ease of implementation, and robustness in modeling complex systems. However, despite recent progress in consolidating its theoretical foundations, a long-standing key aspect of SPH is related to the loss of particle consistency, which affects its accuracy and convergence properties. In this paper, an overview of the mathematical aspects of the SPH consistency is presented with a focus on the most recent developments.

EVALUATION OF THE RESULTS OF PEDAGOGICAL EXPERIMENTS AND TESTS OF DEVELOPMENT OF DESIGN COMPETENCIES OF FUTURE ENGINEERS WITH COMPUTER GRAPHICS

Graphic design understanding the application of computer graphics and image processing technology in graphic design to improve the employment rate of college graduates, illumination space: a feature space for radiance maps.

<p>From red sunsets to blue skies, the natural world contains breathtaking scenery with complex lighting which many computer graphics applications strive to emulate. Achieving such realism is a computationally challenging task and requires proficiency with rendering software. To aid in this process, radiance maps (RM) are a convenient storage structure for representing the real-world. In this form, it can be used to realistically illuminate synthetic objects or for backdrop replacement in chroma key compositing. An artist can also freely change a RM to another that better matches their desired lighting or background conditions. This motivates the need for a large collection of RMs such that an artist has a range of environmental conditions to choose from. Due to the practicality of RMs, databases of RMs have continually grown since its inception. However, a comprehensive collection of RMs is not useful without a method for searching through the collection.  This thesis defines a semantic feature space that allows an artist to interactively browse through databases of RMs, with applications for both lighting and backdrop replacement in mind. The set of features are automatically extracted from the RMs in an offline pre-processing step, and are queried in real-time for browsing. Illumination features are defined to concisely describe lighting properties of a RM, allowing an artist to find a RM to illuminate their target scene. Texture features are used to describe visual elements of a RM, allowing an artist to search the database for reflective or backdrop properties for their target scene. A combination of the two sets of features allows an artist to search for RMs with desirable illumination effects which match the background environment.</p>

THE DIFFUSENESS OF ILLUMINATION SUITABLE FOR REPRODUCING OBJECT SURFACE APPEARANCE USING COMPUTER GRAPHICS

The appearance of an object depends on its material, shape, and lighting. In particular, the diffuseness of the illumination has a significant effect on the appearance of material and surface texture. We investigated a diffuseness condition suitable for reproducing surface appearance using computer graphics. First, observers memorized the appearance and impression of objects by viewing pre-observation images rendered using various environment maps. Then they evaluated the appearance of the objects in test images rendered under different levels of diffuseness. As a result, moderate diffuseness conditions received a higher evaluation than low diffuseness conditions. This means that low or very high diffuseness unfamiliar in daily life is unsuitable for reproducing a faithful and ideal surface appearance. However, a particular material is difficult to memorize and evaluate its appearance. The results suggest that it is possible to define a diffuseness that adequately reproduces the appearance of an object using computer graphics.

Metode Pose to Pose untuk Membuat Animasi 3 Dimensi Islami "Keutamaan Berbuka Puasa"

Berkembangnya teknologi di bidang computer graphics memberikan kemudahan dalam mengolah suatu karya grafis salah satunya adalah animasi 3D. Dalam pembuatan animasi 3D terdapat permasalahan utama yang biasa menjadi tantangan bagi para animator. Permasalahan utama dalam pembuatan animasi 3D adalah kualitas gerakan yang kasar atau tidak terkesan nyata. Untuk membuat gerakan yang halus dan tampak nyata dapat dilakukan melalui banyak metode salah satunya adalah metode pose to pose. Animasi 3D islami berjudul Keutamaan Berbuka Puasa sebagian besar berisi gerakan dalam memperagakan taat cara berbuka puasa yang baik dan benar untuk mendapatkan keutamaan berbuka. Pembuatan animasi ini dibuat melalui software blender dengan menerapkan metode pose to pose. Sebagai hasil pembuatan paper ini, film animasi 3D berjudul Keutamaan Berbuka Puasa diharapkan dapat dibuat dengan kualitas gerakan yang bagus dengan menggunaan metode pose to pose serta dapat memberikan hiburan dan edukasi yang baik.

Export Citation Format

Share document.

OpenAI’s Sora video-generating model can render video games, too

computer world research paper

OpenAI’s new — and first! — video-generating model, Sora , can pull off some genuinely impressive cinematographic feats. But the model’s even more capable than OpenAI initially made it out to be, at least judging by a technical paper published this evening.

The paper, titled “Video generation models as world simulators,” co-authored by a host of OpenAI researchers, peels back the curtains on key aspects of Sora’s architecture — for instance revealing that Sora can generate videos of an arbitrary resolution and aspect ratio (up to 1080p). Per the paper, Sora’s able to perform a range of image and video editing tasks, from creating looping videos to extending videos forwards or backwards in time to changing the background in an existing video.

But most intriguing to this writer is Sora’s ability to “simulate digital worlds,” as the OpenAI co-authors put it. In an experiment, OpenAI fed Sora prompts containing the word “Minecraft” and had it render a convincingly Minecraft-like HUD and game — and the game’s dynamics, including physics — while simultaneously controlling the player character.

OpenAI Sora can simulate Minecraft I guess. Maybe next generation game console will be "Sora box" and games are distributed as 2-3 paragraphs of text. pic.twitter.com/9BZUIoruOV — Andrew White (@andrewwhite01) February 16, 2024

So how’s Sora able to do this? Well, as observed by senior Nvidia researcher Jim Fan ( via Quartz ), Sora’s more of a “data-driven physics engine” than a creative too. It’s not just generating a single photo or video, but determining the physics of each object in an environment — and rendering a photo or video (or interactive 3D world, as the case may be) based on these calculations.

“These capabilities suggest that continued scaling of video models is a promising path towards the development of highly-capable simulators of the physical and digital world, and the objects, animals and people that live within them,” the OpenAI co-authors write.

Now, Sora’s usual limitations apply in the video game domain. The model can’t accurately approximate the physics of basic interactions like glass shattering. And even with interactions it  can model, Sora’s often inconsistent — for example rendering a person eating a burger but failing to render bite marks.

Still, if I’m reading the paper correctly, it seems Sora could pave the way for more realistic — perhaps even photorealistic — procedurally generated games from text descriptions alone. That’s in equal parts exciting and terrifying (consider the deepfake implications, for one) — which is probably why OpenAI’s choosing to gate Sora behind a very limited access program for now.

Here’s hoping we learn more sooner rather than later.

OpenAI’s newest model Sora can generate videos — and they look decent

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Advanced Search
  • Journal List
  • Sensors (Basel)

Logo of sensors

Brain-Computer Interface: Advancement and Challenges

M. f. mridha.

1 Department of Computer Science and Engineering, Bangladesh University of Business and Technology, Dhaka 1216, Bangladesh; db.ude.tbub@zorif (M.F.M.); [email protected] (S.C.D.); moc.liamg@ibakmdm (M.M.K.); [email protected] (A.A.L.)

Sujoy Chandra Das

Muhammad mohsin kabir, aklima akter lima, md. rashedul islam.

2 Department of Computer Science and Engineering, University of Asia Pacific, Dhaka 1216, Bangladesh

Yutaka Watanobe

3 Department of Computer Science and Engineering, University of Aizu, Aizu-Wakamatsu 965-8580, Japan; pj.ca.uzia-u@akatuy

Associated Data

There is no statement regarding the data.

Brain-Computer Interface (BCI) is an advanced and multidisciplinary active research domain based on neuroscience, signal processing, biomedical sensors, hardware, etc. Since the last decades, several groundbreaking research has been conducted in this domain. Still, no comprehensive review that covers the BCI domain completely has been conducted yet. Hence, a comprehensive overview of the BCI domain is presented in this study. This study covers several applications of BCI and upholds the significance of this domain. Then, each element of BCI systems, including techniques, datasets, feature extraction methods, evaluation measurement matrices, existing BCI algorithms, and classifiers, are explained concisely. In addition, a brief overview of the technologies or hardware, mostly sensors used in BCI, is appended. Finally, the paper investigates several unsolved challenges of the BCI and explains them with possible solutions.

1. Introduction

The quest for direct communication between a person and a computer has always been an attractive topic for scientists and researchers. The Brain-Computer Interface (BCI) system has directly connected the human brain and the outside environment. The BCI is a real-time brain-machine interface that interacts with external parameters. The BCI system employs the user’s brain activity signals as a medium for communication between the person and the computer, translated into the required output. It enables users to operate external devices that are not controlled by peripheral nerves or muscles via brain activity.

BCI has always been a fascinating domain for researchers. Recently, it has become a charming area of scientific inquiry and has become a possible means of proving a direct connection between the brain and technology. Many research and development projects have implemented this concept, and it has also become one of the fastest expanding fields of scientific inquiry. Many scientists tried and applied various communication methods between humans and computers in different BCI forms. However, it has progressed from a simple concept in the early days of digital technology to extremely complex signal recognition, recording, and analysis techniques today. In 1929, Hans Berger [ 1 ] became the first person to record an Electroencephalogram (EEG) [ 2 ], which shows the electrical activity of the brain that is measured through the scalp of a human brain. The author tried it on a boy with a brain tumor; since then, EEG signals have been used clinically to identify brain disorders. Vidal [ 3 ] made the first effort to communicate between a human and a computer using EEG in 1973, coining the phrase “Brain-Computer Interface”. The author listed all of the components required to construct a functional BCI. He made an experiment room that was separated from the control and computer rooms. In the experiment room, three screens were required; the subject’s EEG was to be sent to an amplifier the size of an entire desk in the control area, including two more screens and a printer.

The concept of combining brains and technology has constantly stimulated people’s interest, and it has become a reality because of recent advancements in neurology and engineering, which have opened the pathway to repairing and possibly enhancing human physical and mental capacities. The sector flourishing the most based on BCI is considered the medical application sector. Cochlear implants [ 4 ] for the deaf and deep brain stimulation for Parkinson’s illness are examples of medical uses becoming more prevalent. In addition to these medical applications, security, lie detection, alertness monitoring, telepresence, gaming, education, art, and human enhancement are just a few uses for brain–computer interfaces (BCIs), also known as brain–machine interfaces or BMIs [ 5 ]. Every application based on BCI follows different approaches and methods. Each method has its own set of benefits and drawbacks. The degree to which a performance can be enhanced while minute-to-minute and day-to-day volatility are reduced is crucial for the future of BCI technology. Such advancements rely on the capacity to systematically evaluate and contrast different BCI techniques, allowing for the most promising approaches to be discovered. In addition, this versatility around BCI technologies in different sectors and their applications can seem so complex yet so structured. Most of the BCI applications follow a standard structure and system. This basic structure of BCI consists of signal acquisition, pre-processing, feature extraction, classification, and control of the devices. The signal acquisition paves the way to connecting a brain and a computer and to gathering knowledge from signals. The three parts of pre-processing, feature extraction, and classification are responsible for making the associated signal more usable. Lastly, control of the devices points out the primary motivation: to use the signals in an application, prosthetic, etc.

The outstanding compatibility of various methods and procedures in BCI systems demands extensive research. A few research studies on specific features of BCI have also been conducted. Given all of the excellent BCI research, a comprehensive survey is now necessary. Therefore, an extensive survey analysis was attempted and focused on nine review papers featured in this study. Most surveys, however, do not address contemporary trends and application as well as the purpose and limits of BCI methods. Now, an overview and comparisons of the known reviews of the literature on BCI are shown in Table 1 .

A summary of recent surveys/reviews on various BCI technologies, signals, algorithms, classifiers, etc.

Abiri, R. et al. [ 6 ] evaluated the current review on EEG-based various experimental paradigms used by BCI systems. For each experimental paradigm, the researchers experimented with different EEG decoding algorithms and classification methods. The researchers overviewed the paradigms such as Motor imagery paradigms, Body kinematics, Visual P300, Evoked potential, and Error related potential and the hybrid paradigms analyzed with the classification methods and their applications. Researchers have already faced some severe issues while exploring BCI paradigms, including training time and fatigue, signal processing, and novel decoders; shared control to supervisory control in closed-loop; etc. Tiwari, N. et al. [ 7 ] provided a complete assessment of the evolution of BCI and a fundamental introduction to brain functioning. An extensive comprehensive revision of the anatomy of the human brain, BCI, and its phases; the methods for extracting signals; and the algorithms for putting the extracted information to use was offered. The authors explained the steps of BCI, which consisted of signal acquisition, feature extraction, and signal classification. As the human brain is complex, human-generated thoughts are non-stationary, and generated signals are nonlinear. Thus, the challenging aspect is to develop a system to find deeper insights from the human brain; then, BCI application will perform better with these deeper insights. Vasiljevic, G.A.M. et al. [ 8 ] presented a Systematic Literature Review (SLR) conclusion of BCI games employing consumer-grade gadgets. The authors analyzed the collected data to provide a comprehensive picture of the existing reality and obstacles for HCI of BCI-based games utilizing consumer-grade equipment. According to the observations, numerous games with more straightforward commands were designed for research objectives, and there was a growing amount of more user-friendly BCI games, particularly for recreation. However, this study is limited to the process of search and classification. Martini, M.L. et al. [ 9 ] investigated existing BCI sensory modalities to convey perspectives as technology improves. The sensor element of a BCI circuit determines the quality of brain pattern recognition, and numerous sensor modalities are presently used for system applications, which are generally either electrode-based or functional neuroimaging-based. Sensors differed significantly in their inherent spatial and temporal capabilities along with practical considerations such as invasiveness, mobility, and maintenance. Bablani, A. et al. [ 10 ] examined brain reactions utilizing invasive and noninvasive acquisition techniques, which included electrocorticography (ECoG), electroencephalography (EEG), magnetoencephalography (MEG), and magnetic resonance imaging (MRI). For operating any application, such responses must be interpreted utilizing machine learning and pattern recognition technologies. A short analysis of the existing feature extraction techniques and classification algorithms applicable to brain data has been presented in this study.

Fleury, M. et al. [ 11 ] described various haptic interface paradigms, including SMR, P300, and SSSEP, and approaches for designing relevant haptic systems. The researchers found significant trends in utilizing haptics in BCIs and NF and evaluated various solutions. Haptic interfaces could improve productivity and could improve the relevance of feedback delivered, especially in motor restoration using the SMR paradigm. Torres, E.P. et al. [ 12 ] conducted an overview of relevant research literature from 2015 to 2020. It provides trends and a comparison of methods used in new implementations from a BCI perspective. An explanation of datasets, emotion elicitation methods, feature extraction and selection, classification algorithms, and performance evaluation is presented. Zhang, X. et al. [ 13 ] discussed the classification of noninvasive brain signals and the fundamentals of deep learning algorithms. This study significantly gives an overview of brain signals and deep learning approaches to enable users to understand BCI research. The prominent deep learning techniques and cutting-edge models for brain signals are presented in this paper, together with specific ideas for selecting the best deep learning models. Gu, X. et al. [ 14 ] investigated the most current research on EEG signal detection technologies and computational intelligence methodologies in BCI systems that filled in the loopholes in the five-year systematic review (2015–2019). The authors demonstrated sophisticated signal detecting and augmentation technologies for collecting and cleaning EEG signals. The researchers also exhibited computational intelligence techniques, such as interpretable fuzzy models, transfer learning, deep learning, and combinations for monitoring, maintaining, or tracking human cognitive states and the results of operations in typical applications.

The study necessitated a compendium of scholarly studies covering 1970 to 2021 since we analyze BCI in detail in this literature review. We specialized in the empirical literature on BCI from 2000 to 2021. For historical purposes, such as the invention of BCI systems and their techniques, we selected some publications before 2000. Kitchenham [ 15 , 16 ] established the Systematic Literature Review (SLR) method, which is applied in the research and comprises three phases: organizing, executing, and documenting the review. The SLR methodologies attempted to address all possible questions that could arise as the current research progresses. The recent study’s purpose is to examine the findings of numerous key research areas. The PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines were used to put together the essential materials for this study, which consists of four parts: identification, scanning, eligibility testing, and inclusion. We gathered 577 papers from a variety of sources and weeded out duplicates and similar articles. Finally, we carefully chose 361 articles and sources for monitoring and review. The PRISMA process is presented in Figure 1 .

An external file that holds a picture, illustration, etc.
Object name is sensors-21-05746-g001.jpg

The PRISMA process that is followed in this article.

However, this research looks at the present challenges and difficulties in this BCI field. Furthermore, this study generates ideas and suggestions for future research subjects. The following are the research’s total contributions:

  • The paper explicitly illustrates Brain-Computer Interface’s (BCI) present, past, and future trends and technologies.
  • The paper presents a taxonomy of BCI and elaborates on the few traditional BCI systems with workflow and architectural concepts.
  • The paper investigates some BCI tools and datasets. The datasets are also classified on different BCI research domains.
  • In addition, the paper demonstrates the application of BCI, explores a few unsolved challenges, and analyzes the opportunities.

After reading this section, one should understand BCI and how to get started with it. Our motivation to work with BCI started from a desire to learn more about this domain. Furthermore, the BCI has a bright future ahead of it, as it has a lot to offer in the medical field and in everyday life. BCI can change one’s incapability and can make life and work easy, as detailed in the following section. The applications, problems, future, and social consequences of BCI have also fueled our enthusiasm for this research.

The remainder of the paper is constructed as follows. The motivation of this work and diverse applications of BCI systems are illustrated in Section 2 . Section 3 describes the structure of BCI and briefly reviews the most popular techniques of BCI. In Section 5 , different categories of datasets available publicly are displayed. In Section 7 , the most widely used methods for signal enhancement and feature extraction of BCI are discussed. The most commonly known classifiers are reviewed in Section 8 . A broad discussion on the evaluation metrics for BCI is given in Section 9 . The challenges faced most commonly during the BCI process are reviewed in Section 10 . Lastly, this paper provides a conclusion in Section 11 .

2. Applications of BCI

BCIs may be used for various purposes and the application determines the design of a BCI. According to Nijholt [ 17 ], applications based on BCI have two methods of usability. One can command whether the other one can be observed or monitored. The majority of command applications concentrate on manipulating brain impulses using electrodes to control an external device. On the other hand, applications that involve observation focus on recognizing a subject’s mental and emotional state to behave appropriately depending on their surroundings. Some applications of BCI [ 18 ] based on usability are described below:

2.1. Biomedical Applications

The majority of BCI integrations and research have been focused on medical applications, with many BCIs aiming to replace or restore Central Nervous System (CNS) functioning lost with sickness or by accident. Other BCIs are more narrowly targeted. In diagnostic applications, on treatment and motor rehabilitation following CNS disease or trauma, BCIs for biological purposes are also employed in affective application domains. Biomedical technologies and applications can minimize extended periods of sickness, can provide supervision and protection by empowering persons with mobility difficulties, and can support their rehabilitation. The necessity to build accurate technology that can cope with potentially abnormal brain responses that might occur due to diseases such as brain stroke is a significant challenge in developing such platforms [ 19 ]. The following subsections go through each of these applications in further detail.

2.1.1. Substitute to CNS

These substitution means that it can repair or replace CNS functioning lost due to diseases such as paralysis and spinal cord injury due to stroke or trauma. In addition, due to changed brain functions, individuals with such illnesses might suffer and developing such technology can be difficult. Myoelectrics, known as a motor action potential, which captures electrical impulses in muscles, is now used in several robotic prosthetics. Bousseta, R. et al. [ 20 ] provided an experimental technology for controlling the movement of a robotic prosthetic arm with mental imagery and using cognitive tasks, which can move in four directions like left, right, up, and down.

2.1.2. Assessment and Diagnosis

The usage of BCIs in a clinical context can also help with assessment and diagnosis. Perales [ 21 ] suggested a BCI for assessing the attention of youngsters with cerebral palsy while playing games. Another research [ 22 ] looked into using BCI to capture EEG characteristics as a tool for diagnosing schizophrenia. There are also various diagnostic methods such as the detection of brain tumors [ 23 ], the identification of breast cancer [ 24 ], parkinson’s disease [ 25 ] etc. Diagnoses of several diseases in children including epilepsy, neurodegenerative disorders, motor disabilities, inattentiveness, or different types of ADHD [ 26 ] are possible. Assessment and diagnosis technologies are essential to patient well-being. Their functioning must be fine-tuned to guarantee that they are safe, acceptable, and accurate to industry standards.

2.1.3. Therapy or Rehabilitation

BCI is being used in therapeutic applications besides neurological application and prosthetics nowadays. Among the many applications, post-stroke motor rehabilitation shows promising results using BCI. Stroke is a disease that causes long-term disability to the human body and hampers all kinds of motor or vigorous activity due to an impediment of blood flow. Stroke rehabilitation application has promised to aid these activities or user imaginations through a robot or other types of machinery [ 27 , 28 , 29 ]. Some other applications treat neurological disorders such as Parkinson’s disease (PD), cluster headaches, tinnitus, etc. Deep Brain Stimulation (DBS) is an established treatment for PD as it delivers electrical impulses to a targeted area of the brain responsible for the symptoms [ 30 ]. Some stimulation BCI devices are used to process calmness during migraine attacks and cluster headaches. Lastly, a CNS disorder known as tinnitus is also in development to provide treatment by identifying brain patterns that are changed due to the disease [ 31 ]. Lastly, treatment for auditory verbal hallucinations (AVHs), best known as schizophrenia, is a possibility besides diagnosis [ 32 , 33 ].

2.1.4. Affective Computing

Users’ emotions and state of mind are observed in affective computing BCIs, with the possibility of altering their surrounding environment to improve or change that emotion. Ehrlich, S. et al. [ 34 ] created a closed-loop system in which music is generated and then replayed to listeners based on their emotional state. Human emotional states and sensory connections can be studied with a device that is related to BCI system. Patients suffering neurological diseases also can benefit from affective computing to help them convey their feelings to others [ 35 ].

2.2. Non-Biomedical Applications

BCI technologies have shown economic promise in recent years, notably in the field of non-biomedical applications. Most of these applications consist of entertaining applications, games, and emotional computation. In comparison, researchers focus on robustness and high efficiency in medical and military applications, and innovations targeted at leisure or lifestyle demand a greater emphasis on enjoyment and social elements. The most challenging aspect of this entertainment application is that it must be a user favorite to be commercially successful. As an example, some of the most popular forms of amusement are as follows:

2.2.1. Gaming

BCIs focused mainly on the gaming sector have grown in importance as a research topic. However, gaming BCIs are currently a poor substitute for standard game control methods [ 36 ]. BCI in gaming is an area where further research is needed to make games more user-friendly. In some cases, EEG data make BCI games more utilizable and increase engagement, and the system tracks each player’s enthusiasm level and activates dynamic difficulty adjustment (DDA) when the players’ excitement drops [ 37 ]. When developing such systems, fine-tuning the algorithms that regulate the game’s behavior is a big challenge. Some other games are based on BCI, as it is not visually intense and the graphics are not compatible with the recent generation. With setbacks, there is an engaging future for an Adaptation of P300 based Brain-Computer Interface for Gaming [ 38 ], which is gaining more popularity as these are very flexible to play.

2.2.2. Industry

EEG-based BCIs can also be used in industrial robotics, increasing worker safety by keeping people away from potentially demanding jobs. These technologies could substitute the time-consuming button and joystick systems used to teach robots in industrial applications; can detect when a person is too tired or ill to operate the machinery; and can take the necessary precautions to avoid injury, such as stopping the machinery [ 38 ].

2.2.3. Artistic Application

The four types of artistic applications recognized by BCIs are passive, selective, direct, and collaborative. Passive artistic BCIs need not require active user input to use the user’s brain activity to determine which pre-programmed responses to produce. Every user has had some limited control over the process within selective systems. Still, they will never be in charge of the creative product. Direct artistic BCIs provide users with far more flexibility, generally allowing them to choose items from extensive menus, such as brush type and managing brush stroke movements [ 39 ]. Lastly, the collaborative system is controlled by different users [ 40 ].

2.2.4. Transport

BCI is used in transportation monitoring which tracks awareness to assess driver weariness and to enhance airline pilot performances. In the BCI system, mistakes can be costly regarding lives and monetary obligations on the entities involved when such technologies are utilized in critical applications [ 41 , 42 ].

3. Structure of BCI

The BCI system operates with a closed-loop system. Every action taken by the user is met with some feedback. For example, an imagined hand movement might result in a command that causes a robotic arm to move. This simple movement of this arm needs a lot of processes inside it. It starts from the brain, which is one of our body’s most extensive and most complicated organs. It is made up of billions of nerves that link billions of synapses to communicate. The processes from taking signals from the human brain to transforming into a workable command are shown in Figure 2 and described below:

  • Signal acquisition: In the case of BCI, it is a process of taking samples of signals that measure the brain activity and turning them into commands that can control a virtual or real-world application. The various techniques of BCI for signal acquisition are described later.
  • Pre-processing: After the signal acquisition, the pre-processing of signals is needed. In most cases, the collected signals from the brain are noisy and impaired with artifacts. This step helps to clean this noise and artifacts with different methods and filtering. That is why it is named signal enhancement.
  • Feature extraction: The next stage is feature extraction, which involves analyzing the signal and extracting data. As the brain activity signal is complicated, it is hard to extract useful information just by analyzing it. It is thus necessary to employ processing algorithms that enable the extraction of features of a brain, such as a person’s purpose.
  • Classification: The next step is to apply classification techniques to the signal, free of artifacts. The classification aids in determining the type of mental task the person is performing or the person’s command.
  • Control of devices: The classification step sends a command to the feedback device or application. It may be a computer, for example, where the signal is used to move a cursor, or a robotic arm, where the signal is utilized to move the arm.

An external file that holds a picture, illustration, etc.
Object name is sensors-21-05746-g002.jpg

Basic architecture of a BCI system.

The basic architecture of the BCI system was explained in the preceding section. It prompts us to investigate the classification of BCI system. Based upon various techniques, BCI system is classified. The BCI techniques are discussed in following parts.

From the above Figure 3 , we can classify BCI from different aspects such as dependability, invasiveness, and autonomy.

  • Dependability: BCI can be classified as dependent or independent. Dependent BCIs necessitate certain types of motor control from the operator or healthy subjects, such as gaze control. On the other hand, independent BCIs do not enable the individual to exert any form of motor control; this type of BCI is appropriate for stroke patients or seriously disabled patients.
  • Invasiveness: BCI is also classified into three types according to invasiveness: invasive, partially invasive, and non-invasive. Invasive BCIs are by far the most accurate as they are implanted directly into the cortex, allowing researchers to monitor the activity of every neuron. Invasive varieties of BCI are inserted directly into the brain throughout neurosurgery. There are two types of invasive BCIs: single unit BCIs, which detect signals from a single place of brain cells, and multi-unit BCIs, which detect signals from several areas. Semi-invasive BCIs use Electrocorticography (ECoG), a kind of signal platform that enables electrodes to be placed on the attainable edge of the brain to detect electrical impulses originating from the cerebral cortex. Although this procedure is less intrusive, it still necessitates a surgical opening in the brain. Noninvasive BCIs use external sensing rather than brain implants. Electroencephalography (EEG), Magnetoencephalography (MEG), Positron emission tomography (PET), Functional magnetic resonance imaging (fMRI), and Functional near-infrared spectroscopy (fNIRS) are all noninvasive techniques used it to analyze the brain. However, because of the low cost and portability of the gear, EEG is the most commonly used.
  • Autonomy: BCI can operate either in a synchronous or asynchronous manner. Time-dependent or time-independent interactions between the user and system are possible. The system is known as synchronous BCI if the interaction is carried out within a particular amount of time in response to a cue supplied by the system. In asynchronous BCI, the subject can create a mental task at a certain time to engage with the system. Synchronous BCIs are less user-friendly than asynchronous BCIs; however, designing one is substantially easier than developing an asynchronous BCI.

An external file that holds a picture, illustration, etc.
Object name is sensors-21-05746-g003.jpg

The classification/taxonomy of the BCI system.

As the motive of this research work is to focus on advancements of BCI, the most advanced and mostly used techniques that is based on invasiveness are described in the following part. Based on invasiveness, BCI is classified into three categories that are more familiar. In the consequent section, we address these three categories and describe them elaborately.

3.1. Invasive

The types of BCI that are invasive are inserted directly into the brain with neurosurgery. Invasive BCIs seem to be the most accurate even though they are implanted directly into the cortex as it is allowed to track every neuron’s action. Invasive BCI also has two units rather than parts. The first unit is single-unit BCIs that detect signals from a single location of brain cells, whereas multi-unit BCIs detect numerous areas, the second unit [ 43 ]. However, the neurosurgery treatment has various flaws, such as the possibility of scar tissue formation. The body responds to the foreign object by forming a scar around the electrodes, leading the signal to deteriorate. Since neurosurgery is a dangerous and costly procedure, invasive BCI is mainly used on blind and paralyzed patients.

3.2. Partially Invasive

Although this approach is not as intrusive, it still involves brain surgery. Electrocorticography (ECoG) is a sort of partially invasive BCI monitoring system that places electrodes in the cortex surface of the brain to produce signals with electrical activity. For example, blinking allows your brain to discharge electrical activity. When investigating signals, though, these involuntary actions are generally not of interest since they are in the way of what we search for. It is a form of noise. ECoGs are less considered with noise than non-invasive BCI, making interpretation easier [ 44 ].

Electrocorticography (ECoG)

Electrocorticography (ECoG) [ 45 ] is an partially invasive method that measures the brain’s electrical activity. In another sense, the participant’s skull must be evacuated, and the electrodes must be placed right at the brain’s service. Consequently, this electrode is located on the skull. The particular resolution of the recorded signals is considerably better than EEG. The signal-to-noise ratio is superior compared with the closer proximity to cerebral activity. Furthermore, motion artifacts such as blinks and eye movement have a significantly lower impact on ECoG signals. However, ECoG would only be helpful in the accessible brain area and is close to impossible to utilize outside of a surgical setting [ 46 ].

3.3. Noninvasive

Noninvasive neuroimaging technologies have also been used as interfaces in human research. Noninvasive EEG-based BCIs account for the vast bulk of published BCI research. EEG-based noninvasive technologies and interfaces have been employed in a considerably more comprehensive range of applications. Noninvasive apps and technologies are becoming increasingly popular in recent years since they do not require any brain surgery. In the noninvasive mode, a headpiece or helmet-like electrode is utilized outside the skull to measure the signal by causing electrical activity in the brain. There are some well-known and widely used ways for measuring these electrical activity or potentials, such as Electroencephalography (EEG), Magnetoencephalography (MEG), Functional Magnetic Resonance Imaging (fMRI), Facial Near Infrared Spectroscopy (fNIRS), and Positron Emission Tomography (PET). An elaborate description of BCI techniques is given below:

3.3.1. Electroencephalography (EEG)

EG monitors electrical activity in the scalp generated by activating a few of the brain’s neurons. Several electrodes implanted on the scalp directly, mainly on the cortex, are often used to record these electrical activities quickly. For its excellent temporal resolution, ease of use, safety, and affordability, EEG is the most used technology for capturing brain activity. Active electrodes and passive electrodes are indeed the two types of electrodes that can be utilized. Active electrodes usually feature an integrated amplifier, whereas passive electrodes require an external amplifier to magnify the detected signals. The prime objective of implementing either embedded or external amplifiers is to lessen the impact of background noise and other signal weaknesses caused by cable movement. One of the issues with EEG is that it necessitates the use of gel or saline solutions to lower the resistance of skin-electrode contact. Furthermore, the signal quality is poor, and it is altered by background noise. The International 10–20 system [ 47 ] is often used to implant electrodes over the scalp surface for recording purposes. The electrical activities across various frequency bands are used to describe EEG in general.

3.3.2. Magnetoencephalography (MEG)

The magnetic fields created by current flow in the brain are measured using MEG (Magnetoencephalography). Electric fields have significantly more interrupted travel via the skull than magnetic fields, therefore it has superior spatial resolution than EEG. A functional neuroimaging technique is applied to measure and evaluate the brain’s magnetic field. MEG operates on the outside of the head and is now a part of the clinical treatment regularly. David Choen [ 48 , 49 ] was the first to invent it in 1968 by utilizing a conduction copper detector inside a shielded chamber to reduce background noise. Improved MEG signals have recently been produced using more sensitive sensors such as superconducting quantum interference devices (SQUID) [ 50 ]. MEG has become significant, especially for patients with epilepsy and brain tumors. It may aid in detecting regions of the brain with average function in individuals with epilepsy, tumors, or other mass lesions. MEG operates with magnetic waves rather than electrical waves so that it could contribute additional information to EEG. MEG is also capable of capturing signals with high temporal and spatial resolution. Therefore, to detect cerebral activity that creates tiny magnetic fields the scanners must be closer to the brain’s surface. As a result, specific sensors are required for MEG, such as superconducting quantum interference (SQUID) sensors [ 51 ].

3.3.3. Functional Magnetic Resonance Imaging (fMRI)

Noninvasive functional magnetic resonance imaging (fMRI) is used to evaluate the fluctuation in blood oxygen levels throughout brain activities. fMRI has an excellent spatial resolution, which makes it ideal for identifying active areas of the brain [ 52 ]. The time resolution of fMRI is comparatively low, ranging from 1 to 2 s [ 53 ]. It also has low resolution when it comes to head movements, which could result in artifacts. In the 1990s, functional magnetic resonance imaging (fMRI) was created. It is a noninvasive and safe technology that does not include the use of radiation, is simple to use, and has great spatial and temporal resolution. Hemoglobin in capillary red blood cells in the brain transports oxygen to the neurons. As a result of the increased demand for oxygen, blood flow increases. If haemoglobin is oxygenated, its magnetic properties vary. The MRI equipment, which is a cylindrical tube with a strong electromagnet, can determine which regions of the brain are activated because of this difference. That is how fMRI works. There is also a specific application or software known as diffusion MRI, which generates images from the data or results that use water molecules’ diffusion. Diffusion-weighted and diffusion tensor imaging (DWI/DTI) facilitates this exploration of the microarchitecture of the brain. Diffusion-weighted magnetic resonance imaging (DWI or DW-MRI) imaging renders picture variation depending on variances in the degree of diffusion of water particles inside the brain. Diffusion depicts the stochastic thermic mobility of particles. Diffusion inside the brain is defined by several agents, including representing particles beneath study, the temperature, and the microenvironmental structure in which the diffusion occurs [ 54 ]. Diffusion tensor imaging (DTI) investigates the three-dimensional form of the diffusion, also recognized as diffusion tensor. It is a powerful MRI modality that produces directional knowledge about the water motion in a voxel. It exhibits noninvasively microscopic tissue features that surpass the ability of any other imaging methods [ 55 ].

3.3.4. Functional Near-Infrared Spectroscopy (fNIRS)

The infrared radiation is projected into the brain using fNIRS equipment [ 53 , 56 ] to monitor improvements in specific wavelengths as the light is reflected. fNIRS often detects changes in regional blood volume and oxygenation. When a particular area of the brain works, it requires additional oxygen, which is given to the neurons via capillary red blood cells—the increased blood flow in the brain areas that would be most active at a given time. fMRI is a technique that monitors variations in oxygen levels caused by various activities. As a result, images with a high spatial resolution (1 cm) but lower temporal resolution (>2–5 s) could be obtained, comparable with standard functional magnetic resonance imaging.

3.3.5. Positron Emission Tomography (PET)

PET (positron emission tomography) is a sophisticated imaging tool for examining brain activities in real-time. It enables noninvasive measurement of cerebral blood flow, metabolism, and receptor binding in the brain. Due to the relatively high prices and complexity of the accompanying infrastructure, including cyclotrons, PET scanners, and radio chemistry laboratories, PET was previously only used in research. PET has been widely employed in clinical neurology in recent years due to technological improvements and the proliferation of PET scanners to better our understanding of disease etiology, to help in diagnosis, and to monitor disease progression and response to therapy [ 57 ]. PET medications such as radiolabeled choline, fluciclovine (18F-FACBC), and compounds targeting prostate-specific membrane antigen are now being researched and explored to improve noninvasive prostate cancer localization diagnostic performance [ 58 ].

4. Brain Control Signals

The brain-computer interface (BCI) is based on signal amplification that comes directly from the brain. Several of these signals are simple to extract, while others are more difficult and require additional preprocessing [ 53 ]. These control signals can be classified into one of three groups: (1) evoked signals, (2) spontaneous signals, and (3) hybrid signals. A detailed overview of the three categories is given below. The control signals classification is shown in Figure 4 .

An external file that holds a picture, illustration, etc.
Object name is sensors-21-05746-g004.jpg

The basic architecture of BCI control signals.

4.1. Visual Evoked Potentials

Electrical potentials evoked by short visual stimuli are known as VEPs. The visual cortex’s potentials are monitored, and the waveforms are derived from the EEG. VEPs are generally used to assess the visual pathways from the eye to the brain’s visual cortex. Middendorf et al. published a procedure for measuring the position of the user’s gaze using VEPs in 2000 [ 59 ]. The user is confronted with a screen that displays several virtual buttons that flash at varied rates. The frequency of the photic driving reflex over the user’s visual brain is determined after the user focuses their gaze on a button. Whenever the frequency of a shown button equals the frequency of the user, the system concludes that the user wants to pick it. Steady-State Evoked Potentials (SSEP) and P300 are two of the most well-evoked signals. External stimulation is required for evoked signals that can be unpleasant, awkward, and exhausting for the individual.

4.1.1. Steady-State Evoked Potential (SSEP)

SSEP signals are produced when a patient experiences periodic stimuli such as a flickering picture, modulated sound, or even vibrations [ 60 , 61 ]. The strength of the EEG signal in the brain must grow to meet the stimulus frequency. Signals in many brain locations are observed in terms of the sensory process. SSEP signals of different forms, such as steady-state visual potentials (SSVEPs), somatosensory SSEP, and auditory SSEP, are found. SSVEP is widely used in a variety of applications. These are normal brain reactions to repeating stimuli, which vary depending on the frequency with which they are presented. Although there are instances of BCI paradigms utilizing somatosensory (SSSEP) or auditory (SSAEP) stimuli, they are generally induced using visual stimuli (steady-state visually evoked potentials, SSVEP) [ 62 ].

4.1.2. P300 Evoked Potentials (P300)

The peaks in an EEG generated by infrequent visual, auditory, or somatosensory inputs are known as P300 evoked potentials. Without the need for training to use P300-based BCI systems. A matrix of symbols, in which selection is dependent on the participant’s gaze, is a prominent use of P300-based BCI systems. Such a signal is typically produced using an “odd-ball” paradigm. The user is asked to respond to a random succession of stimuli, which is less frequent than others [ 63 ]. The P300-based EEG waves are triggered when this unusual stimulus is significant to the person. P300 does not reasonably require any subject training, although, it does need repetitive stimulation, which may tire the subject and may cause inconsistencies.

4.2. Spontaneous Signals

With no external cues, the person produces random signals willingly. These signals are produced without any external stimuli (somatosensory, aural, or visual). Motor and sensorimotor rhythms, Slow Cortical Potentials (SCPs), and non-motor cognitive signals are some of the most prominent spontaneous signals [ 53 ].

4.2.1. Motor and Sensorimotor Rhythms

Motor activities are linked to motor and sensorimotor rhythms. Sensorimotor rhythms are rhythmic oscillations in electrophysiological brain activity in the mu (Rolandic band, 7–13 Hz) and beta (13–30 Hz) frequencies. Motor imagery is the process of converting a participant’s motor intentions into control signals employing motor imagery conditions [ 64 ]. The left-hand motion, in an instance, may result in EEG signals in the and rhythms and a decrease in certain motor cortex areas (8–12 Hz) and (18–26 Hz). Depending on the motor imagery rhythms, various applications can be used such as controlling a mouse or playing a game.

4.2.2. Slow Cortical Potentials (SCP)

SCP is indeed an EEG signal with a frequency less than 1 Hz [ 65 ]. It is a low-frequency potential observed in the frontal and central portions of the cortex and depolarization level variations throughout the cortical dendrites. SCP is a highly gradual change in brain activity, either positive or negative, that can only last milliseconds to several seconds. Through operant conditioning, the subject can control the movement of such signals. As a result, extensive training may be required in addition to that needed for motor rhythms. Many studies no longer choose SCP, and motor and sensorimotor rhythms have taken their place.

4.2.3. Non-Motor Cognitive Tasks

Cognitive objectives are utilized to drive the BCI in non-motor cognitive tasks. Several tasks, such as musical imagination, visual counting, mental rotation, and mathematical computation, might be completed [ 66 ]. Penny, W.D. et al. [ 67 ] used a pattern classifier with unclear parameters. The individual performed simple subtraction in one of their non-motor cognitive activities.

4.3. Hybrid Signals

The term “hybrid signals” refers to the utilization of a mixture of brain-generated signals for control. As a result, instead of measuring and using only one signal in the BCI system, a mix of signals is used. The fundamental goal of using two or more types of brain signals as input to a BCI system is to increase dependability while avoiding the drawbacks of each signal type [ 68 ].

Some research is addressed that the types of brain signals are classified into two categories [ 10 ]. These are event-related potentials and evoked brain potential. Three varieties are organized for evoked brain potential: Visual Evoked Potential (VEP), Tactile Evoked Potential (TEP), and Auditory Evoked Potential (AEP) [ 69 ].

While analyzing the literature on BCI systems, we discovered various often used datasets that researchers used while implementing these techniques. In terms of the research, EEG is now the most frequent method for collecting brain data in BCI. As this is a noninvasive method and has convenient handling for most datasets, an EEG signal is used. However, for a variety of reasons, EEG does not provide a comprehensive method of data collection. It needs a variety of fixed things to acquire the data. Firstly, the signal must be acquired and stored by some subject, participants, or patients. It is unsuitable when only one subject requires the same arrangement as multiple subjects to obtain data. After the subjects are prepared, the electrodes (a gear mounted on the scalp) are attached to the individuals to capture and measure data. This data collection method lasted for several sessions, with a particular recording period determined by the work’s purpose. The saved data in these sessions and recordings are primarily brain signals measured by a brain’s action on a sure thing, such as a video or a picture. EEG signals differ from one participant to the next and from one session to the next. In this section, the datasets as well as the subjects and electrodes, channels, and sessions are described. The explanation is tabulated in Table 2 , Table 3 , Table 4 , Table 5 , Table 6 , Table 7 and Table 8 . In Table 2 , some popular motor imagery datasets are illustrated. The most beneficial option for creating BCIs is motor imagery (MI) impulses captured via EEG, which offers a great degree of mobility. It enables people with motor disabilities to communicate with the device by envisioning motor movements without any external stimuli generated from the motor cortex. A few datasets based on error-related potentials (ErrPs) are exhibited in Table 3 . ErrPs is an EEG dataset that utilizes a P300-based BCI speller to boost the performance of BCIs. Detecting and fixing errors of the neuronal signature of a user’s knowledge linked to a brain pattern is known as error-related potentials (ErrPs). Affective computing improves human–machine communication by identifying human emotions. Some mostly used emotion recognition datasets are shown in Table 4 . Various EEG-based BCI devices can detect the user’s emotional states to make contact effortless, more useable, and practical. The emotions extracted in emotion-recognition datasets are valence, arousal, calm, positive, exciting, happy, sad, neutral, and fear. In addition, it is certainly clear by now that brain signals or memory are a mixed emotion. The part where all of these mixed emotions are gathered from different body parts is known as a miscellaneous part of the brain. Therefore, miscellaneous datasets include memory signals, brain images, brain signals, etc. Some miscellaneous datasets are represented in Table 5 . In EEG-based BCI, the signals can detect eye movement such as eye blinks, eye states, etc. The BCI datasets of eye blinks or movements include voluntary and involuntary eye states, blinks, and activities are illustrated in Table 6 . Subsequently, the electrical response in the brain to a specific motor or cognitive event such as a stimulus is known as an event-related potential (ERP). An unwanted sound, a sparking light, or a blinking eye can be an example of a stimulus. BCI utilizing ERPs attempts to track attention, weariness, and the brain’s reaction to this event-related stimulus. Table 7 is encapsulated with popular ERP datasets around. Moreover, the visual information-processing mechanism in the brain is reflected in Visually Evoked Potentials (VEPs). Flashing objects in the form of shifting colors or a reversing grid are frequent visual stimulators. The CRT/LCD monitor or flash tube/infrared diode (LED) is utilized for stimulus display in VEP-based BCIs. Frequently used VEP-based datasets with these utilized objects are represented in Table 8 .

A table of different types of motor imagery datasets of BCI.

A table of different types of Error-Related Potentials (ErrP) dataset of BCI.

A table of different types emotion recognition dataset of BCI.

A table of different types of miscellaneous datasets.

A table of different types of eye-blink or movement datasets in BCI.

A table of different types Event-Related Potential (ERP) datasets in BCI. These datasets are collected from [ 229 ].

A table of different types of Visually Evoked Potential (VEP) datasets in BCI. These datasets are collected from [ 229 ].

However, the dataset covers information recorded from the beginning of BCI. To extract information from datasets, feature extraction methods are necessary, which is reviewed in the following section.

6. Signal Preprocessing and Signal Enhancement

In most situations, the signal or data measured or extracted from datasets are filled with noise. With a natural human activity such as eye blinks and heartbeats, the collected data might become noisy. These noises are eliminated during the pre-processing step to produce clean data that may subsequently process the feature extraction and classification. This pre-processing unit is also known as signal enhancement since it cleans the signal in BCI. Some methods are used for signal enhancement in the BCI system, and these are explained elaborately in the following subsections.

6.1. Independent Component Analysis (ICA)

The noises and EEG signals are isolated in ICA by treating them as distinct entities. Furthermore, the data are retained during the removal of noises. This method divides the EEG data into spatially fixed and temporally independent components. In the case of computing and noise demonstrable, the ICA shows more efficiency [ 256 ].

6.2. Common Average Reference (CAR)

It is most commonly employed as a basic dimensionality reduction technique. This approach decreases noise across all recorded channels, but this does not address channel-specific noise and may inject noise into an otherwise clean channel. It is a spatial filter that can be thought of as the subtraction of shared EEG activity, retaining only the idle action of each EEG particular electrode [ 256 ].

6.3. Adaptive Filters

The adaptive filter is a computational device for mathematical processes. It connects the adaptive filter’s input/output signals iteratively. There are filter coefficients that are self-adjusted utilizing an adaptive algorithm. It works by altering signal properties depending on the characteristics of the signals under investigation [ 257 ].

6.4. Principal Component Analysis (PCA)

PCA is a technique for detecting patterns in data represented by a rotation of the coordinate axes. These axes are not aligned with single time points, but they depict a signal pattern with linear combinations of sets of time points. PCA keeps the axes orthogonal while rotating them to maximize variance along the first axis. It reduces feature dimensions and aids in data classification by completing ranking. In comparison with ICA, PCA compresses separate data better whether noise is eliminated with it or not [ 258 ].

6.5. Surface Laplacian (SL)

SL refers to a method of displaying EEG data with a high spatial resolution. SL can be generated using any EEG recording reference scheme as their estimates are reference-free. Based on the volume conductor’s exterior shape, it is a general estimate of the current density entering or exiting the scalp through the skull, and it does not require volume conduction details. The advantage of SL is that it improves the spatial resolution of the EEG signal. However, SL seems not to demand additional operative neuroanatomy premises as it is sensitive to spline patterns and artifacts [ 259 ].

6.6. Signal De-Noising

Artefacts frequently corrupt EEG signals taken from brain. These artifacts must be removed from EEG data to obtain valuable information from it. The technique of eliminating sounds or artefacts from EEG signals is known as de-noising [ 260 ]. Some de-noising methods are given below:

  • Wavelet de-noising and thresholding: The multi-resolution analysis is used to transfer the EEG signal to the discrete wavelet domain. The contrasting or adaptive threshold level is used to reduce particular coefficients associated with the noise signal [ 261 ]. Shorter coefficients would tend to define noise characteristics throughout time and scale in a well-matched wavelet representation. In contrast, threshold selection is one of the most critical aspects of successful wavelet de-noising. Thresholding can isolate the signal from the noise in this case; hence, thresholding approaches come in several shapes and sizes. All coefficients underneath a predetermined threshold value are set to zero in hard thresholding. Soft thresholding is a method of reducing the value of the remaining coefficients by a factor of two [ 262 ].
  • Empirical mode decomposition (EMD): It is a signal analysis algorithm for multivariate signals. It breaks the signal down into a series of frequency and amplitude-regulated zero-mean signals, widely known as intrinsic mode functions (IMFs). Wavelet decomposition, which decomposes a signal into multiple numbers of Intrinsic Mode Functions (IMFs), is compared by EMD. It decomposes these IMFs using a shifting method. An IMF is a function with a single maximum between zero crossings and a mean value of zero. It produces a residue after degrading IMFs. These IMFs are sufficient to characterize a signal [ 263 ].

Most of our datasets mentioned in the previous section are a part of various BCI paradigms and follow these signal enhancement techniques as well. The motor imagery datasets represent paradigms such as sensorimotor activity or rhythms. In addition, error-related potentials datasets and datasets such as event-related potentials or visually evoke potentials signify their own BCI paradigm. Some other paradigms, such as overt attention, eye movement, miscellaneous, and emotion recognition, identify their datasets. Indeed, these paradigms become bigger in number as the measurement of different brain movements and emotions are attempted. More than 100 BCI designs are required to use signal enhancement techniques to extract features from the signal. In comparison, Reference [ 264 ] shows that 32% of BCI designs use surface Laplacian (SL) to extract features, principal component analysis (PCA) or independent component analysis (ICA) was used in 22%, and common spatial patterns (CSP) and common average referencing (CAR) techniques are used in 14% and 11%, respectively.

7. Feature Extraction

Now, it is necessary to understand what the features represent, their qualities, and how to use them for a BCI system to select the best appropriate classifier. A classification system’s accuracy or efficiency is primarily determined by the feature(s) of the samples to be categorized [ 265 ]; therefore, feature extraction has been crucial stage in BCI. The majority of noninvasive BCI devices use neuroimaging techniques such as MEG and MRI. However, EEG is the most widely utilized method, owing to its high temporal resolution and inexpensive cost [ 266 ]. The EEG signal feature extraction method is one of the essential components of a BCI system because of its involvement in successfully executing the classification stage at discriminating mental states. Nevertheless, the feature extraction methods based on both EEG and ECoG are discussed elaborately in the subsequent section.

7.1. EEG-Based Feature Extraction

Typically, BCI focuses on identifying acquired events using various neuroimage techniques, the most common of which is electroencephalography (EEG). Since its involvement in successfully executing the classification stage at discriminating mental states, the EEG signal feature extraction method is one of the essential components of a BCI system. According to [ 267 ] on EEG, three types of feature extraction are discussed in detail in the following sections. These features are the time domain, the frequency domain, and the time–frequency domain. The following subsection address the feature domains elaborately.

7.1.1. Time Domain

The time–frequency domain integrates analyses in the time and frequency domains. It depicts the signal energy distribution in the Time–Frequency plane (t-f plane) [ 268 ]. When it comes to deciphering rhythmic information in EEG data, a time–frequency analysis comes in handy. EEG’s time-domain properties are straightforward to fix, but they have the disadvantage of containing non-stationary signals that alter over time. Features are usually derived using signal amplitude values in time-domain approaches that can be distorted by interference as noise during EEG recording.

  • Event related potentials: Event-related potentials (ERPs) are very low voltages generated in brain regions in reaction to specific events or stimuli. They are time-locked EEG alterations that provide a safe and noninvasive way to research psychophysiological aspects of mental activities. A wide range of sensory, cognitive, or motor stimuli can trigger event-related potentials [ 269 , 270 ]. ERPs are useful to measure the time to process a stimulus and a response to be produced. The temporal resolution of event-related potentials is remarkable, but it has a low spatial resolution. ERPs were used by Changoluisa, V. et al. [ 271 ] to build an adaptive strategy for identifying and detecting changeable ERPs. Continuous monitoring of the curve in ERP components takes account of their temporal and spatial information. Some limitations of ERPs are that it shows poor spatial resolution, whether it is suitable with temporal resolution [ 272 ]. Furthermore, a significant drawback of ERP is the difficulty in determining where the electrical activity originates in the brain.
  • − Mean absolute value: M A V = 1 N ∑ n = 1 N x n (1)
  • − Power: P = 1 N ∑ n = 1 N x n 2 (2)
  • − Standard deviation: S D = 1 N ∑ n = 1 N x ( n ) − μ n (3)
  • − Root mean square (RMS): RMS = 1 N ∑ i = 1 N x i 2 1 / 2 (4)
  • − Square root of amplitude (SRA): SRA = 1 N ∑ i = 1 N x i 2 (5)
  • − Skewness value (SV): SV = 1 N ∑ i = 1 N x l − x ¯ σ 3 (6)
  • − Kurtosis value (KV): KV = 1 N ∑ i = 1 N x l − x ¯ σ 4 (7)
  • Hjorth features: Bo Hjorth introduced the Hjorth parameters in 1970 [ 276 ]; the three statistical parameters employed in time-domain signal processing are activity, mobility, and complexity. Dagdevir, E. et al. [ 277 ] proposed a motor imagery-based BCI system where the features were extracted from the dataset using the Hjorth algorithm. The Hjorth features have advantages in real-time analyses as it has a low computation cost. However, it has a statistical bias over signal parameter calculation.
  • Phase lag index (PLI): The functional connectivity is determined by calculating the PLI for two pairs of channels. Since it depicts the actual interaction between sources, this index may help estimate phase synchronization in EEG time series. PLI measures the asymmetry of the distribution of phase differences between two signals. The advantage of PLI is that it is less affected by phase delays. It quantifies the nonzero phase lag between the time series of two sources, making it less vulnerable to signals. The effectiveness of functional connectivity features evaluated by phase lag index (PLI), weighted phase lag index (wPLI), and phase-locking value (PLV) on MI classification was studied by Feng, L.Z. et al. [ 278 ].

7.1.2. Frequency Domain

When analyzing any signal in terms of frequency instead of just time, the frequency domain properties are considered. Any signal’s frequency domain representation displays how much of it falls inside a specific frequency range. The frequency domain properties are commonly acquired using power spectral density (PSD). The discussion about these properties is presented below in the following section.

  • Fast fourier transform (FFT): The Fourier transform is a mathematical transformation that converts any time-domain signal into its frequency domain. Discrete Fourier Transform (DFT) [ 279 ], Short Time Fourier Transform (STFT) [ 280 , 281 ], and Fast Fourier Transform are the most common Fourier transform utilized for EEG-based emotion identification (FFT) [ 282 ]. Djamal, E.C. et al. [ 283 ] developed a wireless device that is used to record a player’s brain activity and extracts each action using Fast Fourier Transform. FFT is faster than any other method available, allowing it to be employed in real-time applications. It is a valuable instrument for signal processing at a fixed location. A limitation of FFT is that it can convert the limited range of waveform data and the requirement to add a window weighting function to the waveform to compensate for spectral leakage.

An external file that holds a picture, illustration, etc.
Object name is sensors-21-05746-g005.jpg

The basic structure of CSP [ 286 ].

In Figure 5 , CSP provides spatial filters that minimize the variance of an individual class while concurrently maximizing the variance of other classes. These filters are mainly used to choose the frequency from the multichannel EEG signal. After frequency filtering, spatial filtering is performed using spatial filters that are employed to extract spatial information from the signal. Spatial information is significantly necessary to differentiate intent patterns in multichannel EEG recordings for BCI. The performance of this spatial filtering depends on the operational frequency band of EEG. Therefore, CSP is categorized as a frequency domain feature. However, CSP acts as signal enhancement while it requires no preceding excerpt or information of sub-specific bands.

  • Higher-order Spectral (HOS): Second-order signal measurements include the auto-correlation function and the power spectrum. Second-order measures operate satisfactorily if the signal resembles a Gaussian probability distribution function. However, most of the real-world signals are non-Gaussian. Therefore, Higher-Order Spectral (HOS) [ 285 ] is an extended version of the second-order measure that works well for non-Gaussian signals, when it comes into the equation. In addition, most of the physiological signals are nonlinear and non-stationary. HOS are considered favorable to detect these deviations from the signal’s linearity or stationarity. It is calculated using the Fourier Transform at various frequencies. H O S = X ( K ) X ( l ) X ∗ ( k + l ) (8) where X ( K ) is the Fourier transform of the raw EEG signal x ( n ) and l is a shifting parameter.

7.1.3. Time–Frequency Domain

In the time-frequency domain, the signal is evaluated both in the time and frequency domains simultaneously. The wavelet transform is one of many advanced approaches for analyzing the time-frequency representation. There are some other widely used models for utilizing the time-frequency domain. These models are addressed with a proper explanation in the subsequent section.

The AR parameters are a p ( i ) , the observations are x ( n ) and the excitation white noise is v ( n ) . Lastly, the most challenging part of AR EEG modeling is choosing the correct model to represent and following the changing spectrum correctly.

  • Wavelet Transform (WT): The WT technique encodes the original EEG data using wavelets, which are known as simple building blocks. It looks at unusual data patterns using variable windows with expansive windows for low frequencies and narrow windows for high frequencies. In addition, WT is considered an advanced approach as it offers a simultaneous localization in the time-frequency domain, which is a significant advantage. These wavelets can be discrete or continuous and describe the signal’s characteristics in a time-domain frequency. The Discrete Wavelet Transform (DWT) and the Continuous Wavelet Transform (CWT) are used frequently in EEG analysis [ 289 ]. DWT is now a more widely used signal processing method than CWT as CWT is very redundant. DWT decomposes any signal into approximation and detail coefficients corresponding to distinct frequency ranges maintaining the temporal information in the signal. However, most researchers try all available wavelets before choosing the optimal one that produces the best results, as selecting a mother wavelet is challenging. In wavelet-based feature extraction, the Daubechies wavelet of order 4 (db4) is the most commonly employed [ 290 ].

7.2. ECoG-Based Features

Electrocorticography (ECoG) generates a reliable signal through electrodes placed on the surface of the human brain, which decodes movement, vision, and speech. Decoding ECoG signal processing gives immediate patient feedback and controls a computer cursor or perhaps an exoskeleton. The ECoG signal feature extraction approach is a crucial element of the BCI system since it is involved in accomplishing the classification phase during decoding. Some of the widely used feature extraction methods are discussed below.

7.2.1. Linear Filtering

It is typically employed to filter out noise in the form of signals that are not in the frequency range of the brain’s messages. Low-pass filters and high-pass filters are the two types of linear filters. This typical linear filtering is used to removed ECOG, EOG, and EMG artifacts from EEG signals. Low pass filtering is used to remove EMG artifacts, and high pass filtering is used to remove EOG artifacts [ 291 ]. These artifacts are noises produced by either physiological processes such as muscle, eye, or other biological movement or exogenous (external) sources such as machinery faults. There are three approaches for dealing with artifacts in EEG signal acquisition. Avoiding artifacts by keeping an eye on the subject’s movements and the machine’s operation. Contaminated trials are discarded due to artifact rejection. Pre-processing techniques are used to remove artifacts. The advantage of linear filtering is that signals are considered a controlled scaling of the signal’s frequency domain components. High pass filtering is used to raise the relative importance of the high-frequency components by reducing the features in the frequency domain’s center.

7.2.2. Spatial Filtering

Spatial filtering is a technique for improving decoding by leveraging information about the electrode positions. The spatial filter aims to lessen the influence of spatial distortion in the raw signal; various ECoG channels are treated as coordinates for multivariate data sampling through spatial filters. The filtering transforms that coordinate system to facilitate decoding. Spatial filtering can use to minimize data dimensionality or to increase the dissimilarity of various observations. The referencing systems used during ECoG recordings are frequently utilized for preliminary spatial filtering. Equation ( 10 ) determines the spatial filter [ 292 ].

where x ′ is the spatially filtered signal, x i is the EEG signal from channel i , and w i is the weight of that channel. With the aid of relevant information acquired from multiple EEG channels, spatial filtering contributes to recovering the brain’s original signal. Simultaneously, it reduces dimensionality by lowering EEG channel size to smaller spatially filtered signals.

Thus far, feature extraction involves extracting new features from existing ones to minimize feature measurement costs, to improve classifier efficiency, and to improve classification accuracy. Now in the following section, the extracted feature classifiers are briefly described.

8. BCI Classifiers

BCI always needs a subject to use its device, and similarly, the subject must produce several types of data to use a BCI device. In addition, to use a BCI system, the subject must develop various brain activity patterns that the system can recognize and convert into commands. To achieve this mentioned conversion, some regression or classification algorithms can be used. The classification step’s design comprises selecting one or more classification algorithms from a variety of options. In this section, some commonly known classifiers [ 293 ], which are classified in Figure 6 , as well as some new classifiers [ 294 ] are described below.

An external file that holds a picture, illustration, etc.
Object name is sensors-21-05746-g006.jpg

Classification of commonly used classifiers in BCI.

8.1. Linear Classifiers

Linear classifiers are discriminant algorithms that discriminate classes using linear functions. It is most likely the most widely used algorithm in BCI systems. Two types of linear classifiers are used during BCI design: linear discriminant analysis (LDA) and support vector machine (SVM).

8.1.1. Linear Discriminant Analysis (LDA)

The objective of Linear Discriminant Analysis is to separate data from diverse classes using a hyperplane. The side of hyperplane determinded through the category of a feature vector in a two-class problem. LDA requires that the data has a normal distribution and that both classes have the same covariance matrix. The separation hyper-plane is based on looking for a projection that maximizes the margin between the means of two classes while minimizing intraclass variance [ 295 ]. Furthermore, this classifier is straightforward to apply and generally produces excellent results and soundly implemented in various BCI system, including MI-based BCI, P300 speller, multiclass, and asynchronous BCI. The disadvantage of LDA is its linearity, which might lead to unsatisfactory results when faced with various nonlinear EEG data.

8.1.2. Support Vector Machine (SVM)

A Support Vector Machine (SVM) uses a discriminant hyperplane to identify classes. The determined hyperplane in SVM is the one that maximizes the margins, i.e., the distance between both the nearest training samples. The ability to generalize is believed to improve when margins are maximized [ 296 ]. Linear SVM [ 297 ] is a type of SVM that allows for classification utilizing linear decision bounds. This classifier has been used to solve a substantial number of synchronous BCI tasks with tremendous success. The SVM classifier also works by projecting the input vector X onto a scalar value f(X), as shown in Equation ( 11 ).

Gaussian SVM or RBF SVM is the term applied to the equivalent SVM. RBF and SVM have also produced remarkable outcomes in BCI applications. SVM is used to solve multiclass BCI problems that use the OVR approach, similar to LDA.

8.2. Neural Networks (NN)

Neural networks (NN) and linear classifiers are the two types of classifiers most usually employed in BCI systems, considering that a NN is a collection of artificial neurons that allows us to create nonlinear decision limits [ 298 ]. The multilayer perceptron (MLP) is the most extensively used NN for BCI, as described in this section. Afterward, it briefly discusses other neural network architectures utilized in BCI systems.

8.2.1. Deep Learning (DL) Models

Deep learning has been widely used in BCI applications nowadays compared with machine learning technologies because most BCI applications require a high level of accuracy. Deep learning models perform better in recognizing changing signals from the brain, which changes swiftly. Some popular DL models such as CNN, GNN, RNN, and LSTM are described below:

  • Convolutional Neural Network (CNN): A convolutional neural network (CNN) is an ANN intended primarily to analyze visual input used in image recognition and processing. The convolutional layer, pooling layer, and fully connected layer are the three layers that comprise CNN. Using a CNN, the input data may be reduced to instant response formations with a minimum loss, and the characteristic spatial relationships of EEG patterns can be recorded. Fatigue detection, sleep stage classification, stress detection, motor imagery data processing, and emotion recognition are among the EEG-based BCI applications using CNNs. In BCI, the CNN models are used in the input brain signals to exploit the latent semantic dependencies.
  • Generative Adversarial Network (GAN): Generative adversarial networks are a recent ML technique. The GAN used two ANN models for competing to train each other simultaneously. GANs allow machines to envision and develop new images on their own. EEG-based BCI techniques recorded the signals first and then moved to the GAN techniques to regenerate the images [ 299 ]. The significant application of GAN-based BCI systems is data augmentation. Data augmentation increases the amount of training data available and allows for more complicated DL models. It can also reduce overfitting and can increase classifier accuracy and robustness. In the context of BCI, generative algorithms, including GAN, are frequently used to rebuild or generate a set of brain signal recordings to improve the training set.
  • Recurrent Neural Network (RNN): RNNs’ basic form is a layer with the output linked to the input. Since it has access to the data from past time-stamps, and the architecture of an RNN layer allows for the model to store memory [ 300 , 301 ]. Since RNN and CNN have strong temporal and spatial feature extraction abilities in most DL approaches, it is logical to mix them for temporal and spatial feature learning. RNN can be considered a more powerful version of hidden Markov models (HMM), which classifies EEG correctly [ 302 ]. LSTM is a kind of RNN with a unique architecture that allows it to acquire long-term dependencies despite the difficulties that RNNs confront. It contains a discrete memory cell, a type of node. To manage the flow of data, LSTM employs an architecture with a series of “gates”. When it comes to modeling time series of tasks such as writing and voice recognition, RNN and LSTM have been proven to be effective [ 303 ].

8.2.2. Multilayer Perceptron (MLP)

An Multilayer Perceptron (MLP) [ 304 ] comprises multiple layers of neurons along with an input layer, one or more hidden layers, and an output layer. The input of each neuron is linked to the output of the neurons in the preceding layer. Meanwhile, the output layer neurons evaluate the classification of the input feature vector. MLP and neural networks can approximate, meaning they can compare continuous functions if they have sufficient neurons and layers. The challenging factor behind MLPs is that they are susceptible to over-training, particularly containing noisy and non-stationary data. As a result, significant selection and regularization of the architecture are necessary. Perceptron is a multilayer with no hidden layers comparable with LDA. It has been used in BCI applications on occasion [ 293 ]. Sunny, M.S.H. et al. [ 305 ] used Multilayer Perceptron (MLP) to distinguish distinct frequency bands from EEG signals to extract features more effectively.

8.2.3. Adaptive Classifiers

As new EEG data become accessible, adaptive classifiers’ parameters, such as the weights allocated to each feature in a linear discriminant hyperplane, are gradually re-estimated and updated. Adaptive classifiers can use supervised and unsupervised adaptation, that is, with or without knowledge of the input data’s real class labels. The true class labels of the receiving EEG signals are obtained using supervised adaptation. The classifier is either reassigned on the existing training data, enhanced with these updated, labeled incoming data, or updated solely on this new data. Supervised user testing is essential for supervised BCI adaptation. The label of the receiving EEG data is vague with unsupervised adaptation. As a result, unsupervised adaptation is based on class-unspecific adaptation, such as updating the generalized classes EEG data mean or a co-variance matrix in the classifier model or estimating the data class labels for additional training [ 306 ].

8.3. Nonlinear Bayesian Classifiers

This section discusses the Bayes quadratic and hidden Markov models (HMM), two Bayesian classifiers used in BCI. Although Bayesian graphical networks (BGN) have been used for BCI, they are not covered here since they are not widely used [ 307 ].

8.3.1. Bayes Quadratic

The objective of Bayesian classification is to provide the highest probability class to a feature vector. The Bayes rule is often used to calculate the a posteriori probability of a feature vector assigned to a single class. The class of this feature vector can be calculated by using the MAP (maximum a posteriori) rule with these probabilities. The Bayes quadratic assumption is that the data have a distinct normal distribution. The result is quadratic decision boundaries that justify the classifier’s name [ 308 ]. Although this classifier is not extensively utilized for BCI, it has been successfully used to classify motor imagery and mental tasks.

8.3.2. Hidden Markov Model

A Bayesian classifier that generates a nonlinear cost function is known as a Hidden Markov Model (HMM). An HMM is a statistical algorithm that calculates the chances of seeing a given set of feature variables [ 309 ]. These statistical probabilities from HMM are generally Gaussian Mixture Models (GMM) in case of BCI [ 310 ]. HMM may be used to categorize temporal patterns of BCI characteristics (Obermaier, B. et al. [ 302 ]), even raw EEG data, since the EEG elements required to control BCI have particular time sequences. Although HMM is not widely used in the BCI world, this research demonstrated that they could be helpful to classification on BCI systems such as EEG signals [ 311 ].

8.4. Nearest Neighbor Classifiers

In this section, some classifiers with distance vectors are described. Classifiers such as K nearest neighbors (KNN) and Mahalanobis distance are common among them as they are nonlinear discriminative classifiers [ 312 ].

8.4.1. K Nearest Neighbors

K nearest neighbor method aims to identify the dominant class amongst an unseen point within the dataset habituated for training. Nearest neighbors are typically estimated using a metric that has some intervals during the signal acquisition of BCI. KNN can construct nonlinear decision boundaries by evaluating any function with enough training data with an inflated k value. The usability of KNN algorithms is less in the BCI field as their condescending sensitivity hampers the capacity, which causes them to fail in multiple BCI research. KNN is efficient in BCI systems with some feature vectors, but low power can cause failure in BCI research [ 313 ].

8.4.2. Mahalanobis Distance

For each prototype of class c , Mahalanobis distance-based classifiers [ 314 ] assume a Gaussian distribution N ( c , M c ) . Subsequently, using the Mahalanobis distance d c , a feature vector x is allocated to the class that corresponds to the closest prototype ( x ).

This results in a basic yet reliable classifier; it has been shown to work in multiclass and asynchronous BCI systems. Considering its excellent results, it is still rarely mentioned in BCI literature [ 315 ].

8.5. Hybrid

In several BCI papers, classification is implemented with a single classifier. Furthermore, a current tendency is to combine many classifiers in various ways [ 316 ]. The following are indeed the classifier combination strategies utilized in BCI systems:

8.5.1. Boosting

Boosting is the process of using multiple classifiers in a cascade, and each focused on the errors made by the one before it. It can combine numerous weak classifiers to form a powerful one; thereforem it is unlikely to overtrain. Moreover, it is susceptible to mislabeling, illustrating why it failed in one BCI trial [ 293 ].

8.5.2. Voting

Multiple classifiers are employed for voting, each of which allocates the input feature vector to a class. The majority class becomes the final class. In BCI systems, voting is the most preferred process of combining classifiers due to its simplicity and efficiency [ 293 ].

8.5.3. Stacking

Stacking is the process of utilizing multiple classifiers to categorize the input feature vector. Level-0 classifiers are what it is named. Each one of these classifiers’ output would then feed into a “meta-classifier” (or “level-1 classifier”), which makes a final decision [ 293 ].

Aforementioned in this section, some other classifiers are utilized in the recent BCI research. Since 2016 transfer learning is used for using MI classification tasks [ 317 ]. Some ground-breaking architectures are established in recent years, such as EEG-inception, an end-to-end Neural network [ 318 ], cluster decomposing, and multi-object optimization-based-ensemble learning framework [ 319 ]; RFNet is a fusion network that learns from attention weights and used in embedding-specific features for decision making [ 179 ].

Now, a better understanding of the performance of commonly known classifiers with some popular datasets are given in Table 9 .

Comparison of classifiers based on popular datasets and features.

9. Evaluation Measurement

To evaluate the performance of BCI systems, researchers employed several evaluation metrics. The most common is accuracy, commonly known as error rate. Although accuracy is not always an acceptable criterion due to specific rigorous requirements, various evaluation criteria have been offered. An overview of BCI research evaluation criteria is provided below.

9.1. Generally Used Evaluation Metrics

In this section, we sorted the most commonly used evaluation metrics for measuring the BCI system performances. The evaluation measures are explained carefully in the following subsections.

9.1.1. The Confusion Matrix

The confusion matrix represents the relationship between the actual class’s user-intentioned output classes and the actual predicted class. True positives rate (TPR), False negative rate (FNR), False positives rate (FPR), Positive predictive value (PPV), and negative predictive value (PPV) are used to describe sensitivity or recall, specificity, (1-specificity), precision, etc. [ 325 ].

9.1.2. Classification Accuracy and Error Rate

Classification accuracy is one of the important metrics in BCI systems; this study evaluates performance using classification accuracy as well as sensitivity and specificity. This measure determines how frequently the BCI makes a right pick or what proportion of all selections are accurate. It is the most obvious indicator of BCI accomplishment, implying that it increase in a linear fashion with decision time, so it takes a long time. The following is the mathematical formula for calculating accuracy:

9.1.3. Information Transfer Rate

Shannon [ 326 ] proposed the Information Transfer Rate (ITR) as the rate that makes up both of these metrics. This rate represents the quantity of data that may pass through the system in one unit of time. In [ 327 ], the information transmission rate in bits per minute ( b i t s / m i n ) and accuracy (ACC) in percentage (%) were used to evaluate performance. They made demographic data (age and gender) as well as the performance outcomes of 10 participants, and the ITR was computed using the Formula ( 14 ), which is as follows:

where N is the number of targets and p is the classification accuracy (ACC). Based on four cursor movements and the choose command, this resulted in a N of 5. Bits per trial were used to compute B t .

According to ITR [ 328 ] also has some important parameters that are used to evaluate BCI. A description of them is given below:

  • Target detection accuracy: The accuracy of target identification may be enhanced by increasing the Signal-to-Noise Ratio (SNR) and the separability of several classes. Several techniques, such as trial averaging, spatial filtering, and eliciting increased task-related EEG signals, are employed in the preprocessing step to reduce the SNR. Many applications utilize trail averaging across topics to improve the performance of a single BCI. These mental states may be used to lower the SNR [ 53 ].
  • Number of classes: The number of classes is raised and more sophisticated applications are built with a high ITR. TDMA, FDMA, and CDMA are among the stimulus coding techniques that have been adopted for BCI systems [ 243 , 329 ]. P300, for example, uses TDMA to code the target stimulus. In VEP-based BCI systems, FDMA and CDMA have been used.
  • Target detection time: The detection time is when a user first expresses their purpose and when the system makes a judgment. One of the goals of BCI systems is to improve the ITR by reducing target detection time. Adaptive techniques, such as the “dynamic halting” method, might be used to minimize the target detection time [ 330 ].

9.1.4. Cohen’s Kappa Coefficient

Cohen’s Kappa measures the agreement between two observers; it measures the contract between the proper output and the command of BCI domain in a BCI-based AAC system. Cohen’s kappa coefficient resolves many of the accuracy measure’s objections [ 331 ]. The general agreement p 0 = A C C , which is equivalent to the classification accuracy and the chance agreement p e , with n i and n i i being the column i t h and row i t h , correspondingly, are used to calculate K .

where posteriori and priori probability are n : i , n i : respectively. The estimated kappa Coefficient K and standard error e ( K ) are acquired by

When there is no correlation between the expected and actual classes, the kappa coefficient becomes zero. A perfect categorization is indicated by a kappa coefficient of 1. If the Kappa value is less than zero, the classifier offers an alternative assignment for the output and actual classes [ 332 ].

9.2. Continuous BCI System Evaluation

Continuous BCI performance was measured using a variety of parameters. Different measures may be even more appropriate depending on whether the study is conducted online or offline. The section goes through some of the most commonly used metrics in this field, including the correlation coefficient, accuracy, and Fitts’s Law [ 333 ].

9.2.1. Correlation Coefficient

The correlation coefficient could be a useful statistic for determining whether an intracortical implant receives task-relevant neurons. There are two essential stipulations: one is scale-invariant, which implies that the cursor might miss the mark substantially while still generating high values if the sign of the actual and anticipated movements coincide [ 334 ]; the other is that a decoder can yield a high value if it simply generates a signal that fluctuates with the repetitions [ 333 ].

9.2.2. Accuracy

Task characteristics such as target size and dwell time have a significant impact on accuracy. As a result, it is more of a sign that the task was is good enough for the subject and modality than a performance measure [ 333 ].

9.2.3. Fitts’s Law

Fitts’s law asserts that the time taken for a person to move a mouse cursor to a targeted object of the target’s distance is divided by its size. The longer it takes, the greater the distance and the narrower the target [ 335 , 336 ]. Fitts’s law requires using a method to calculate the “index of difficulty” of a particular change.

9.3. User-Centric BCI System Evaluation

Users are an essential element of the BCI product life cycle. Their interactions and experiences influence whether BCI systems are acceptable and viable. The four criteria or User Experience (UX) factors are used to evaluate user-centric BCI systems. These are usability, affects, ergonomics, and quality of life, shown below in the following subsection.

9.3.1. Usability

The amount that can be utilized to fulfill specific objectives with effectiveness, efficiency, learnability, and satisfaction in a given context is referred to as usability [ 337 ]. In usability measure, we can include four metrics, such as,

  • Effectiveness or accuracy: It depicts the overall accuracy of the BCI system as experienced from the end user’s perspective [ 333 ].
  • Efficiency or information transfer rate: It refers to the speed and timing at which a task is accomplished. Therefore, it depicts the overall BCI system’s speed, throughput, and latency seen through the eyes of the end user’s perspective [ 333 ].
  • Learnability: The BCI system can make users feel as if they can use the product effectively and quickly learn additional features. Both the end-user and the provider are affected by learnability [ 338 ].
  • Satisfaction: It is based on participants’ reactions to actual feelings while using BCI systems, showing the user’s favorable attitude regarding utilizing the system. To measure satisfaction, we can use rating scales or qualitative methods [ 333 ].

9.3.2. Affect

Regarding BCIs, it might refer to how comfortable the system is, particularly for long periods, and how pleasant or uncomfortable the stimuli are to them. EEG event-related possibilities, spectral characteristics, galvanic skin responses, or heart rates could be used to quantitatively monitor user’s exhaustion, valence, and arousal levels [ 339 ].

9.3.3. Ergonomics

Ergonomics studies are the study of how people interact with their environments. The load on the user’s memory is represented by the cognitive task load, a multidimensional entity. In addition, physiological markers including eye movement, EEG, ERP, and spectral characteristics could also be employed to evaluate cognitive stress objectively [ 340 ].

9.3.4. Quality of Life

It expresses the user’s overall perception of the system’s utility and acceptance and its influence on their well-being. The Return on Investment (ROI) is an economic measure of the perceived benefit derived from it. The overall quality of experience is a measure of how satisfied a user is with their expertise [ 333 ].

Other assessment methods, such as Mutual Information, Written symbol rate (WSR), and Practical bit rate (PBR), are utilized to a lesser extent.

10. Limitations and Challenges

The brain-computer interface is advancing towards a more dynamic and accurate solution of the connection between brain and machine. Still, few factors are resisting achieving the ultimate goal. Therefore, we analyzed a few core research on BCI in this section and found the limitations exhibited in Table 10 . Then, we demonstrated the significant challenges of the BCI domain.

A summary of some research papers proposing new methods of BCI.

The challenges and difficulties of the BCI domain are divided into three categories: challenges based on usability, technical challenges, and ethical challenges. The rest of the section briefly explains these challenges.

10.1. Based on Usability

This section describes the challenges that users have in accepting BCI technology [ 350 ]. They include concerns relating to the requisite training for class discrimination.

10.1.1. Training Time

Usually, training a user, either leading the user through the procedure or the total quantity of the documented manual, takes time. The majority of the time, the user also requests the system to be simpler to use. The users often despise a complicated system that is difficult to manage. It is a challenging effort to create such a sophisticated, user-friendly system [ 351 ].

10.1.2. Fatigue

The majority of present BCIs generate a lot of fatigue since they need a lot of concentration, focus, and awareness to a rapid and intermittent input. In addition to the annoyance of weariness of electrodes, BCI may fail to operate because the user cannot maintain a sufficient degree of focus. As in BCI, mental activity is continually monitored and the user’s attention point alters the input. The concentration necessary for stimuli results in a combination of input and output [ 352 , 353 ]. Rather than relaxing, the user must concentrate on a single point as an input and then look at the outcome. At some point, the interaction has a forced quality to it, rather than the natural quality that would be there if the user could choose whatever part of the visual output to focus on [ 6 ].

10.1.3. Mobility to Users

Across most situations, users are not allowed to move around or to have mobility in BCIs. During the test application, users must stay motionless and quiet, ideally sitting down. However, in a real-world setting, a user may need to utilize BCI while walking down the street, for example, to manage a smartphone. Additionally, BCIs cannot ensure user comfort. Usually, the EEG headset is not lightweight and easy to carry, which hampers the user experience.

10.1.4. Psychophysiological and Neurological Challenges

Emotional and mental mechanisms, cognition-related neurophysiology, and neurological variables, such as functionality and architecture, play vital roles in BCI performance, resulting in significant intra- and inter-individual heterogeneity. Immediate brain dynamics are influenced by psychological elements such as attention; memory load; weariness; conflicting cognitive functions; and users’ specific characteristics such as lifestyle, gender, and age. Participants with weaker empathy engage less emotionally in a P300-BCI paradigm and generate larger P300 wave amplitudes than someone with greater empathy involvement [ 354 ].

10.2. Technical Challenges

Non-linearity, non-stationarity, and noise as well as limited training sets and the accompanying dimensionality curse are difficulties relating to the recorded electrophysiological characteristics of brain impulses.

10.2.1. Non-Linearity

The brain is a very complex nonlinear system in which chaotic neuronal ensemble activity may be seen. Nonlinear dynamic techniques can thus better describe EEG data than linear ones.

10.2.2. Non-Stationarity

The non-stationarity of electrophysiological brain signals to recognize human recognition is a significant challenge in developing a BCI system. It results in a constant shift in the signals utilized with time, either between or within transition time. EEG signal variability can be influenced by the mental and emotional state backdrop across sessions. In addition, various emotional states such as sadness, happiness, anxiety, and fear can vary on daily basis that reflects non-stationarity [ 355 ]. Noise is also a significant contribution to the non-stationarity problems that BCI technology faces. Noises and other external interferences are always present in raw EEG data of emotion recognition that is most robust [ 356 ]. It comprises undesired signals generated by changes in electrode location as well as noise from the surroundings [ 357 ].

10.2.3. Transfer Rate of Signals

In BCIs, the system must continuously adjust to the signals of the user. This modification must be made quickly and precisely. Current BCIs have an extremely slow information transfer rate, taking almost two minutes to “digitalize” a single phrase, for example. Furthermore, BCI accuracy does not always reach a desirable level, particularly in visual stimulus-based BCI. Actions must sometimes be repeated or undone, producing pain or even dissatisfaction in using interactive systems using this type of interface [ 358 ].

10.2.4. Signal Processing

Recently, a variety of decoding techniques, signal processing algorithms, and classification algorithms have been studied. Despite this, the information retrieved from EEG waves does not have a high enough signal-to-noise ratio to operate a device with some extent of liberty, such as a prosthetic limb. Algorithms that are more resilient, accurate, and quick are required to control BCI.

10.2.5. Training Sets

In BCI, the training process is mainly impacted by usability concerns, but training sets are tiny in most cases. Although the subjects find the training sessions time-consuming and challenging, they give the user the required expertise to interact with the system and to learn to manage their neurophysiological signals. As a result, balancing the technological complexity of decoding the user’s brain activity with the level of training required for the proper functioning of the interfaces is a crucial issue in building a BCI [ 359 ].

10.2.6. Lack of Data Analysis Method

The classifiers should be evaluated online since every BCI implementation is in an online situation. Additionally, it should be validated to ensure that it has low complexity and can be calibrated rapidly in real-time. Domain adaptation and transfer learning could be an acceptable solution for developing calibration-free BCIs, where even the integration of unique feature sets, such as covariance matrices with domain adaptation algorithms, can strengthen the invariance performance of BCIs.

10.2.7. Performance Evaluation Metrics

A variety of performance evaluation measures are used to evaluate BCI systems. However, when different evaluation metrics are used to assess BCI systems, it is nearly impossible to compare systems. As a result, the BCI research community should establish a uniform and systematic approach to quantify a particular BCI application or a particular metric. For example, to test the efficiency of a BCI wheelchair control, the number of control commands, categories of control commands, total distance, time consumed, the number of collisions, classification accuracy, and the average success rate need to be evaluated, among other factors [ 360 ].

10.2.8. Low ITR of BCI Systems

The information transfer rate is one of the extensively used processes for the performance evaluation metrics of BCI systems. The number of classes, target detection accuracy, and target detection time are all factors of this rate. By increasing the Signal-to-Noise Ratio (SNR), it can improve the target detection accuracy [ 53 , 328 ]. Several techniques are typically used for the preprocessing phase to optimize the SNR. When a high ITR has been attained, more complicated applications can be created by expanding the number of classes available. CDMA, TDMA, and FDMA [ 243 , 361 ] are only a few of the stimulus coding schemes that have already been developed for BCI systems. TDMA was used with P300 to code the required stimuli, while CDMA and FDMA have been used with BCIs that interact with VEP. Furthermore, the essential aspect of BCIs is reducing the target recognition period, which helps to increase the ITR. Adaptive techniques, such as “dynamic stopping”, could be an effective option for accomplishing this.

10.2.9. Specifically Allocated Lab for BCI Technology

Most of the BCI systems are trialed in a supervised lab rather than in the actual surroundings of the users. When designing a BCI system, it is essential to think about the environment in which the technology may be used. It is critical to thoroughly investigate the system’s requirements, environmental factors, circumstances, and target users mostly during the system design phase.

10.3. Ethical Challenges

There are many thoughts surrounding the ethical issues behind BCI as it considers physical, psychological, and social factors. In biological factors, BCI always finds a human body to identify signals that must be acquainted with electrodes. As humans need to wear these electrodes, it is always risky for them and can harm the human body to some worse extent. BCI also requires strict maintenance of the human body during signal acquisition, so the subject must sit for a long time in his place. Adding to that, a user or participant must act what the electrodes need, so they cannot do anything willingly. This fact can have a substantial impact on the human body.

11. Conclusions

The brain-computer interface is a communication method that joins the wired brain and external applications and devices directly. The BCI domain includes investigating, assisting, augmenting, and experimenting with brain signal activities. Due to transatlantic documentation, low-cost amplifiers, greater temporal resolution, and superior signal analysis methods, BCI technologies are available to researchers in diverse domains. Moreover, It is an interdisciplinary area that allows for biology, engineering, computer science, and applied mathematics research. However, an architectural and constructive investigation of the brain–computer interface is exhibited in this article. It is aimed at novices who would like to learn about the current state of BCI systems and methodologies. The fundamental principles of BCI techniques are discussed elaborately. It describes the architectural perspectives of certain unique taxons and gives a taxonomy of BCI systems. The paper also covered feature extraction, classification, evaluation procedures, and techniques as the research continues. It presents a summary of the present methods for creating various types of BCI systems. The study looks into the different types of datasets that are available for BCI systems as well. The article also explains the challenges and limitations of the described BCI systems, along with possible solutions. Lastly, BCI technology advancement is accomplished in four stages: primary scientific development, preclinical experimentation, clinical investigation, and commercialization. At present, most of the BCI techniques are in the preclinical and clinical phases. The combined efforts of scientific researchers and the tech industries are needed to avail the benefit of this great domain to ordinary people through commercialization.

Acknowledgments

We would like to thank Bangladesh University of Business & Technology (BUBT), University of Asia Pacific (UAP), and University of Aizu (UoA) for supporting this research. Also, special thanks to the Advanced Machine Learning lab, BUBT; Computer Vision & Pattern Recognition Lab, UAP; Database System Lab, UoA; for giving facilities to research and publish.

Author Contributions

Conceptualization, M.F.M.; Data curation, M.F.M., S.C.D., M.M.K. and A.A.L.; Formal analysis, M.F.M.; Investigation, M.R.I. and Y.W.; Methodology, M.F.M., S.C.D., M.M.K., A.A.L., M.R.I. and Y.W.; Software, S.C.D., M.M.K. and A.A.L.; Supervision, M.R.I.; Validation, M.F.M., M.R.I. and Y.W.; Visualization, M.F.M., S.C.D., M.M.K. and A.A.L.; Writing—original draft, M.F.M., S.C.D., M.M.K., A.A.L., M.R.I. and Y.W.; Writing—review & editing, M.F.M., M.R.I. and Y.W. All authors have read and agreed to the published version of the manuscript.

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Data availability statement, conflicts of interest.

The authors declare no conflict of interest.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Help | Advanced Search

Computer Science > Computation and Language

Title: large language models: a survey.

Abstract: Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks, since the release of ChatGPT in November 2022. LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data, as predicted by scaling laws \cite{kaplan2020scaling,hoffmann2022training}. The research area of LLMs, while very recent, is evolving rapidly in many different ways. In this paper, we review some of the most prominent LLMs, including three popular LLM families (GPT, LLaMA, PaLM), and discuss their characteristics, contributions and limitations. We also give an overview of techniques developed to build, and augment LLMs. We then survey popular datasets prepared for LLM training, fine-tuning, and evaluation, review widely used LLM evaluation metrics, and compare the performance of several popular LLMs on a set of representative benchmarks. Finally, we conclude the paper by discussing open challenges and future research directions.

Submission history

Access paper:.

  • Download PDF
  • HTML (experimental)
  • Other Formats

license icon

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

ScienceDaily

A new design for quantum computers

Creating a quantum computer powerful enough to tackle problems we cannot solve with current computers remains a big challenge for quantum physicists. A well-functioning quantum simulator -- a specific type of quantum computer -- could lead to new discoveries about how the world works at the smallest scales. Quantum scientist Natalia Chepiga from Delft University of Technology has developed a guide on how to upgrade these machines so that they can simulate even more complex quantum systems. The study is now published in Physical Review Letters .

"Creating useful quantum computers and quantum simulators is one of the most important and debated topics in quantum science today, with the potential to revolutionise society," says researcher Natalia Chepiga. Quantum simulators are a type of quantum computer, Chepiga explains: "Quantum simulators are meant to address open problems of quantum physics to further push our understanding of nature. Quantum computers will have wide applications in various areas of social life, for example in finances, encryption and data storage."

Steering wheel

"A key ingredient of a useful quantum simulator is a possibility to control or manipulate it," says Chepiga. "Imagine a car without a steering wheel. It can only go forward but cannot turn. Is it useful? Only if you need to go in one particular direction, otherwise the answer will be 'no!'. If we want to create a quantum computer that will be able to discover new physics phenomena in the near-future, we need to build a 'steering wheel' to tune into what seems interesting. In my paper I propose a protocol that creates a fully controllable quantum simulator."

The protocol is a recipe -- a set of ingredients that a quantum simulator should have to be tunable. In the conventional setup of a quantum simulator, rubidium (Rb) or cesium (Cs) atoms are targeted by a single laser. As a result, these particles will take up electrons, and thereby become more energetic; they become excited. "I show that if we were to use two lasers with different frequencies or colours, thereby exciting these atoms to different states, we could tune the quantum simulators to many different settings," Chepiga explains.

The protocol offers an additional dimension of what can be simulated. "Imagine that you have only seen a cube as a sketch on a flat piece of paper, but now you get a real 3D cube that you can touch, rotate and explore in different ways," Chepiga continues. "Theoretically we can add even more dimensions by bringing in more lasers."

Simulating many particles

"The collective behaviour of a quantum system with many particles is extremely challenging to simulate," Chepiga explains. "Beyond a few dozens of particles, modelling with our usual computer or a supercomputer has to rely on approximations." When taking the interaction of more particles, temperature and motion into account, there are simply too many calculations to perform for the computer.

Quantum simulators are composed of quantum particles, which means that the components are entangled. "Entanglement is some sort of mutual information that quantum particles share between themselves. It is an intrinsic property of the simulator and therefore allows to overcome this computational bottleneck."

  • Quantum Computers
  • Computers and Internet
  • Spintronics Research
  • Computer Science
  • Communications
  • Distributed Computing
  • Quantum computer
  • Quantum entanglement
  • Quantum tunnelling
  • John von Neumann
  • Quantum dot
  • Quantum mechanics
  • Introduction to quantum mechanics
  • Supercomputer

Story Source:

Materials provided by Delft University of Technology . Note: Content may be edited for style and length.

Journal Reference :

  • Natalia Chepiga. Tunable Quantum Criticality in Multicomponent Rydberg Arrays . Physical Review Letters , 2024; 132 (7) DOI: 10.1103/PhysRevLett.132.076505

Cite This Page :

  • Anchors Holding Antarctic Land-Ice Shrinking
  • Compound Vital for All Life and Life's Origin
  • How Competition Can Help Promote Cooperation
  • Giant New Snake Species Identified in the Amazon
  • Chemists Synthesize Unique Anticancer Molecules
  • Neutron Star at Heart of Supernova Remnant
  • The Search for More Temperate Tatooines
  • Steering Light With Supercritical Coupling
  • Dramatic Improvements in Crohn's Patients
  • Record-Breaking Quasar Discovered

IMAGES

  1. ️ Computer related research paper topics. Research Paper Topics. 2019-02-22

    computer world research paper

  2. Computer Science Research Paper Publishing Journals : Pdf

    computer world research paper

  3. 😂 Computer generated paper. Essay on Politics. Research Paper on

    computer world research paper

  4. The Digital World

    computer world research paper

  5. Journal of Computer Science Template

    computer world research paper

  6. Essay Computers For And Against

    computer world research paper

VIDEO

  1. Fundamentals of information technology 1 semester model paper

  2. The Latest Internet Computer News & Charts You Need to See!

  3. World Computer System থেকে পিসি বিল্ড করলেই থাকছে নিশ্চিত উপহার 💯 #pcbuildbd #gamingpc #budgetpc

  4. Numbers / ComputerWorld

  5. The World’s Smallest Computer PSU? Using GaN! #pc #pcbuild

  6. World Computer Championship (1974). Master (Computer) vs Tell (Computer)

COMMENTS

  1. Computer Science

    Covers all theoretical and applied aspects at the intersection of computer science and game theory, including work in mechanism design, learning in games (which may overlap with Learning), foundations of agent modeling in games (which may overlap with Multiagent systems), coordination, specification and formal methods for non-cooperative computa...

  2. Computer science

    Computer science - Latest research and news | Nature nature subjects Computer science articles from across Nature Portfolio Atom RSS Feed Computer science is the study and development of...

  3. Deep learning in computer vision: A critical review of emerging

    Latest Stage (2019-now) and Research Trends. On the basis of the analysis of literature trends in the past 2 years, we summarized four research trends for future works. (1) Exploration of network types and architecture: The types of networks tend to be enriched. More types, such as Siamese neural network (SNN), Recurrent neural network (RNN ...

  4. Machine Learning: Algorithms, Real-World Applications and Research

    1 Mention Explore all metrics Abstract In the current age of the Fourth Industrial Revolution (4 IR or Industry 4.0), the digital world has a wealth of data, such as Internet of Things (IoT) data, cybersecurity data, mobile data, business data, social media data, health data, etc.

  5. 533984 PDFs

    Jan 2024 Dina Hussein Dina M. Ibrahim Norah Alajlan div> Since Tiny machine learning (TinyML) is a quickly evolving subject, it is crucial that internet of things (IoT) devices be able to...

  6. [2402.05929] An Interactive Agent Foundation Model

    The development of artificial intelligence systems is transitioning from creating static, task-specific models to dynamic, agent-based systems capable of performing well in a wide range of applications. We propose an Interactive Agent Foundation Model that uses a novel multi-task agent training paradigm for training AI agents across a wide range of domains, datasets, and tasks. Our training ...

  7. 40 years of quantum computing

    40 years of quantum computing Nature Reviews Physics 4 , 1 ( 2022) Cite this article 21k Accesses 8 Citations 52 Altmetric Metrics This year we celebrate four decades of quantum computing by...

  8. PDF MIT Open Access Articles Quantum computing

    To inform information systems (IS) research, this Fun‑ damental provides the fundamental concepts of quantum computing and depicts research opportunities. Therefore, we provide in our second section a brief overview of a quantum computer system and its three layers of a quantum com‑ puter: hardware, system software, and application layer. The

  9. Computer

    Computer. null | IEEE Xplore. Need Help? US & Canada: +1 800 678 4333 Worldwide: +1 732 981 0060 Contact & Support

  10. AI and science: what 1,600 researchers think

    The share of research papers that mention AI terms has risen in ... a computer scientist at Kansas State University in Manhattan. ... (CAS), is seeking global talents around the world. Beijing ...

  11. Computer Vision

    Stay informed on the latest trending ML papers with code, research developments, libraries, methods, and datasets. ... Or, discuss a change on Slack. Browse SoTA > Computer Vision Computer Vision. 4444 benchmarks • 1364 tasks • 2843 datasets • 42303 papers with code Semantic Segmentation ... Real-World Adversarial Attack. 11 papers with ...

  12. Five Hundred Most-Cited Papers in the Computer Sciences ...

    1 Citations Part of the Advances in Intelligent Systems and Computing book series (AISC,volume 1366) Abstract This study reveals common factors among highly cited papers in the computer sciences. The 500 most cited papers in the computer sciences published between January 2013 and December 2017 were downloaded from the Web of Science (WoS).

  13. computer science Latest Research Papers

    computer science Latest Research Papers | ScienceGate computer science Recently Published Documents TOTAL DOCUMENTS 14255 (FIVE YEARS 3547) H-INDEX 73 (FIVE YEARS 9) Latest Documents Most Cited Documents Contributed Authors Related Sources Related Keywords Hiring CS Graduates: What We Learned from Employers ACM Transactions on Computing Education

  14. Exploring Computer Science Around the World: Education, Research

    Computer science is a global phenomenon, with education, research, and applications taking place all aro und the world. As technology continues to advance, it will be essential for individuals ...

  15. Artificial Intelligence in Business: From Research and Innovation to

    Neha Soni et al. / Procedia Computer Science 167 (2020) 2200â€"2210 2203 Neha Soni et al./ Procedia Computer Science 00 (2019) 000â€"000 3 monitoring patient’s health, and AI playing games (e.g. Chess and Go) better than world champions are some of the technological innovations under AI. 2016 has been an amazing year for ...

  16. Tech Trends: 2022 Report

    IEEE Computer Society 2022 Report. In 2014, IEEE Computer Society's then-President Dejan Milojicic and a team of nine technologists surveyed the landscape and identified 23 game-changing technologies they felt would have the biggest impact on our world by 2022. Read the report to see how accurate their predictions have proven to be.

  17. Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis

    Contemporary models for generating images show remarkable quality and versatility. Swayed by these advantages, the research community repurposes them to generate videos. Since video content is highly redundant, we argue that naively bringing advances of image models to the video generation domain reduces motion fidelity, visual quality and impairs scalability. In this work, we build Snap Video ...

  18. Computer Science and Engineering

    This conceptual research paper is written to discuss the implementation of the A.D.A.B model in technology -based and technical subjects such as Computer Science, Engineering, Technical and so on ...

  19. [2402.13846] Large Language Models are Advanced Anonymizers

    Recent work in privacy research on large language models has shown that they achieve near human-level performance at inferring personal data from real-world online texts. With consistently increasing model capabilities, existing text anonymization methods are currently lacking behind regulatory requirements and adversarial threats. This raises the question of how individuals can effectively ...

  20. 10 Research Papers Accepted to CVPR 2023

    Research from the department has been accepted to the 2023 Computer Vision and Pattern Recognition (CVPR) Conference. The annual event explores machine learning, artificial intelligence, and computer vision research and its applications. CoWs on Pasture: Baselines and Benchmarks for Language-Driven Zero-Shot Object Navigation

  21. How Is Technology Changing the World, and How Should the World Change

    Technologies are becoming increasingly complicated and increasingly interconnected. Cars, airplanes, medical devices, financial transactions, and electricity systems all rely on more computer software than they ever have before, making them seem both harder to understand and, in some cases, harder to control. Government and corporate surveillance of individuals and information processing ...

  22. 10 Cutting Edge Research Papers In Computer Vision & Image ...

    January 24, 2019 by Mariya Yao UPDATE: We've also summarized the top 2019 and top 2020 Computer Vision research papers. Ever since convolutional neural networks began outperforming humans in specific image recognition tasks, research in the field of computer vision has proceeded at breakneck pace.

  23. Google Scholar reveals its most influential papers for 2020

    The journal, Nucleic Acids Research, while ranked outside the top 10 of Google Scholar's most influential journals, has more papers with 3,000+ citations each than The Lancet (ranked 4th). 7.

  24. computer graphics Latest Research Papers

    Over Time. AbstractVisuospatial representations of numbers and their relationships are widely used in mathematics education. These include drawn images, models constructed with concrete manipulatives, enactive/embodied forms, computer graphics, and more. This paper addresses the analytical limitations and ethical implications of methodologies ...

  25. OpenAI's Sora video-generating model can render video games, too

    The paper, titled "Video generation models as world simulators," co-authored by a host of OpenAI researchers, peels back the curtains on key aspects of Sora's architecture — for instance ...

  26. PDF Technology and Education: Computers, Software, and the Internet

    Technology and Education: Computers, Software, and the Internet George Bulman and Robert W. Fairlie NBER Working Paper No. 22237 May 2016 JEL No. I20,I24 ABSTRACT A substantial amount of money is spent on technology by schools, families and policymakers with the hope of improving educational outcomes.

  27. Brain-Computer Interface: Advancement and Challenges

    Abstract. Brain-Computer Interface (BCI) is an advanced and multidisciplinary active research domain based on neuroscience, signal processing, biomedical sensors, hardware, etc. Since the last decades, several groundbreaking research has been conducted in this domain. Still, no comprehensive review that covers the BCI domain completely has been ...

  28. [2402.06196] Large Language Models: A Survey

    The research area of LLMs, while very recent, is evolving rapidly in many different ways. In this paper, we review some of the most prominent LLMs, including three popular LLM families (GPT, LLaMA, PaLM), and discuss their characteristics, contributions and limitations. We also give an overview of techniques developed to build, and augment LLMs.

  29. A new design for quantum computers

    Creating a quantum computer powerful enough to tackle problems we cannot solve with current computers remains a big challenge for quantum physicists. A well-functioning quantum simulator -- a ...