diff --git a/mkdocs/docs/Electroencephalography/Electroencephalography.md b/mkdocs/docs/Electroencephalography/Electroencephalography.md
index fab8cfee..6ea5c59b 100644
--- a/mkdocs/docs/Electroencephalography/Electroencephalography.md
+++ b/mkdocs/docs/Electroencephalography/Electroencephalography.md
@@ -2,33 +2,3 @@
 ### Electroencephalography
 |Publish Date|Title|Authors|PDF|Code|Abstract|
 | :---: | :---: | :---: | :---: | :---: | :---: |
-|**2024-07-04**|**Geodesic Optimization for Predictive Shift Adaptation on EEG data**|Apolline Mellot et.al.|[2407.03878v1](http://arxiv.org/abs/2407.03878v1)|null|Electroencephalography (EEG) data is often collected from diverse contexts involving different populations and EEG devices. This variability can induce distribution shifts in the data $X$ and in the biomedical variables of interest $y$, thus limiting the application of supervised machine learning (ML) algorithms. While domain adaptation (DA) methods have been developed to mitigate the impact of these shifts, such methods struggle when distribution shifts occur simultaneously in $X$ and $y$. As state-of-the-art ML models for EEG represent the data by spatial covariance matrices, which lie on the Riemannian manifold of Symmetric Positive Definite (SPD) matrices, it is appealing to study DA techniques operating on the SPD manifold. This paper proposes a novel method termed Geodesic Optimization for Predictive Shift Adaptation (GOPSA) to address test-time multi-source DA for situations in which source domains have distinct $y$ distributions. GOPSA exploits the geodesic structure of the Riemannian manifold to jointly learn a domain-specific re-centering operator representing site-specific intercepts and the regression model. We performed empirical benchmarks on the cross-site generalization of age-prediction models with resting-state EEG data from a large multi-national dataset (HarMNqEEG), which included $14$ recording sites and more than $1500$ human participants. Compared to state-of-the-art methods, our results showed that GOPSA achieved significantly higher performance on three regression metrics ($R^2$, MAE, and Spearman's $\rho$) for several source-target site combinations, highlighting its effectiveness in tackling multi-source DA with predictive shifts in EEG data analysis. Our method has the potential to combine the advantages of mixed-effects modeling with machine learning for biomedical applications of EEG, such as multicenter clinical trials.|
-|**2024-07-03**|**MVGT: A Multi-view Graph Transformer Based on Spatial Relations for EEG Emotion Recognition**|Yanjie Cui et.al.|[2407.03131v1](http://arxiv.org/abs/2407.03131v1)|null|Electroencephalography (EEG), a medical imaging technique that captures scalp electrical activity of brain structures via electrodes, has been widely used in affective computing. The spatial domain of EEG is rich in affective information.However, few of the existing studies have simultaneously analyzed EEG signals from multiple perspectives of geometric and anatomical structures in spatial domain. In this paper, we propose a multi-view Graph Transformer (MVGT) based on spatial relations, which integrates information from the temporal, frequency and spatial domains, including geometric and anatomical structures, so as to enhance the expressive power of the model comprehensively.We incorporate the spatial information of EEG channels into the model as encoding, thereby improving its ability to perceive the spatial structure of the channels. Meanwhile, experimental results based on publicly available datasets demonstrate that our proposed model outperforms state-of-the-art methods in recent years. In addition, the results also show that the MVGT could extract information from multiple domains and capture inter-channel relationships in EEG emotion recognition tasks effectively.|
-|**2024-07-02**|**EIT-1M: One Million EEG-Image-Text Pairs for Human Visual-textual Recognition and More**|Xu Zheng et.al.|[2407.01884v1](http://arxiv.org/abs/2407.01884v1)|null|Recently, electroencephalography (EEG) signals have been actively incorporated to decode brain activity to visual or textual stimuli and achieve object recognition in multi-modal AI. Accordingly, endeavors have been focused on building EEG-based datasets from visual or textual single-modal stimuli. However, these datasets offer limited EEG epochs per category, and the complex semantics of stimuli presented to participants compromise their quality and fidelity in capturing precise brain activity. The study in neuroscience unveils that the relationship between visual and textual stimulus in EEG recordings provides valuable insights into the brain's ability to process and integrate multi-modal information simultaneously. Inspired by this, we propose a novel large-scale multi-modal dataset, named EIT-1M, with over 1 million EEG-image-text pairs. Our dataset is superior in its capacity of reflecting brain activities in simultaneously processing multi-modal information. To achieve this, we collected data pairs while participants viewed alternating sequences of visual-textual stimuli from 60K natural images and category-specific texts. Common semantic categories are also included to elicit better reactions from participants' brains. Meanwhile, response-based stimulus timing and repetition across blocks and sessions are included to ensure data diversity. To verify the effectiveness of EIT-1M, we provide an in-depth analysis of EEG data captured from multi-modal stimuli across different categories and participants, along with data quality scores for transparency. We demonstrate its validity on two tasks: 1) EEG recognition from visual or textual stimuli or both and 2) EEG-to-visual generation.|
-|**2024-06-25**|**SincVAE: a New Approach to Improve Anomaly Detection on EEG Data Using SincNet and Variational Autoencoder**|Andrea Pollastro et.al.|[2406.17537v1](http://arxiv.org/abs/2406.17537v1)|null|Over the past few decades, electroencephalography (EEG) monitoring has become a pivotal tool for diagnosing neurological disorders, particularly for detecting seizures. Epilepsy, one of the most prevalent neurological diseases worldwide, affects approximately the 1 \% of the population. These patients face significant risks, underscoring the need for reliable, continuous seizure monitoring in daily life. Most of the techniques discussed in the literature rely on supervised Machine Learning (ML) methods. However, the challenge of accurately labeling variations in epileptic EEG waveforms complicates the use of these approaches. Additionally, the rarity of ictal events introduces an high imbalancing within the data, which could lead to poor prediction performance in supervised learning approaches. Instead, a semi-supervised approach allows to train the model only on data not containing seizures, thus avoiding the issues related to the data imbalancing. This work proposes a semi-supervised approach for detecting epileptic seizures from EEG data, utilizing a novel Deep Learning-based method called SincVAE. This proposal incorporates the learning of an ad-hoc array of bandpass filter as a first layer of a Variational Autoencoder (VAE), potentially eliminating the preprocessing stage where informative band frequencies are identified and isolated. Results indicate that SincVAE improves seizure detection in EEG data and is capable of identifying early seizures during the preictal stage as well as monitoring patients throughout the postictal stage.|
-|**2024-06-20**|**Feature Fusion Based on Mutual-Cross-Attention Mechanism for EEG Emotion Recognition**|Yimin Zhao et.al.|[2406.14014v1](http://arxiv.org/abs/2406.14014v1)|[link](https://github.com/ztony0712/MCA)|An objective and accurate emotion diagnostic reference is vital to psychologists, especially when dealing with patients who are difficult to communicate with for pathological reasons. Nevertheless, current systems based on Electroencephalography (EEG) data utilized for sentiment discrimination have some problems, including excessive model complexity, mediocre accuracy, and limited interpretability. Consequently, we propose a novel and effective feature fusion mechanism named Mutual-Cross-Attention (MCA). Combining with a specially customized 3D Convolutional Neural Network (3D-CNN), this purely mathematical mechanism adeptly discovers the complementary relationship between time-domain and frequency-domain features in EEG data. Furthermore, the new designed Channel-PSD-DE 3D feature also contributes to the high performance. The proposed method eventually achieves 99.49% (valence) and 99.30% (arousal) accuracy on DEAP dataset.|
-|**2024-06-19**|**System Immersion of a Driving Simulator Affects the Oscillatory Brain Activity**|Nikol Figalová et.al.|[2406.13570v1](http://arxiv.org/abs/2406.13570v1)|null|The technological properties of a system delivering simulation experience are a crucial dimension of immersion. To create a sense of presence and reproduce drivers behaviour as realistically as possible, we need reliable driving simulators that allow drivers to become highly immersed. This study investigates the impact of a system immersion of a driving simulator on the drivers' brain activity while operating a conditionally automated vehicle. Nineteen participants drove approximately 40 minutes while their brain activity was recorded using electroencephalography (EEG). We found a significant effect of the system immersion in the occipital and parietal areas, primarily in the high-Beta bandwidth. No effect was found in the Theta, Alpha, and low-Beta bandwidths. These findings suggest that the system immersion might influence the drivers' physiological arousal, consequently influencing their cognitive and emotional processes.|
-|**2024-06-12**|**Prediction of the Realisation of an Information Need: An EEG Study**|Niall McGuire et.al.|[2406.08105v3](http://arxiv.org/abs/2406.08105v3)|null|One of the foundational goals of Information Retrieval (IR) is to satisfy searchers' Information Needs (IN). Understanding how INs physically manifest has long been a complex and elusive process. However, recent studies utilising Electroencephalography (EEG) data have provided real-time insights into the neural processes associated with INs. Unfortunately, they have yet to demonstrate how this insight can practically benefit the search experience. As such, within this study, we explore the ability to predict the realisation of IN within EEG data across 14 subjects whilst partaking in a Question-Answering (Q/A) task. Furthermore, we investigate the combinations of EEG features that yield optimal predictive performance, as well as identify regions within the Q/A queries where a subject's realisation of IN is more pronounced. The findings from this work demonstrate that EEG data is sufficient for the real-time prediction of the realisation of an IN across all subjects with an accuracy of 73.5% (SD 2.6%) and on a per-subject basis with an accuracy of 90.1% (SD 22.1%). This work helps to close the gap by bridging theoretical neuroscientific advancements with tangible improvements in information retrieval practices, paving the way for real-time prediction of the realisation of IN.|
-|**2024-06-12**|**GAPses: Versatile smart glasses for comfortable and fully-dry acquisition and parallel ultra-low-power processing of EEG and EOG**|Sebastian Frey et.al.|[2406.07903v2](http://arxiv.org/abs/2406.07903v2)|null|Recent advancements in head-mounted wearable technology are revolutionizing the field of biopotential measurement, but the integration of these technologies into practical, user-friendly devices remains challenging due to issues with design intrusiveness, comfort, and data privacy. To address these challenges, this paper presents GAPSES, a novel smart glasses platform designed for unobtrusive, comfortable, and secure acquisition and processing of electroencephalography (EEG) and electrooculography (EOG) signals. We introduce a direct electrode-electronics interface with custom fully dry soft electrodes to enhance comfort for long wear. An integrated parallel ultra-low-power RISC-V processor (GAP9, Greenwaves Technologies) processes data at the edge, thereby eliminating the need for continuous data streaming through a wireless link, enhancing privacy, and increasing system reliability in adverse channel conditions. We demonstrate the broad applicability of the designed prototype through validation in a number of EEG-based interaction tasks, including alpha waves, steady-state visual evoked potential analysis, and motor movement classification. Furthermore, we demonstrate an EEG-based biometric subject recognition task, where we reach a sensitivity and specificity of 98.87% and 99.86% respectively, with only 8 EEG channels and an energy consumption per inference on the edge as low as 121 uJ. Moreover, in an EOG-based eye movement classification task, we reach an accuracy of 96.68% on 11 classes, resulting in an information transfer rate of 94.78 bit/min, which can be further increased to 161.43 bit/min by reducing the accuracy to 81.43%. The deployed implementation has an energy consumption of 24 uJ per inference and a total system power of only 16.28 mW, allowing for continuous operation of more than 12 h with a small 75 mAh battery.|
-|**2024-06-11**|**EEG-ImageNet: An Electroencephalogram Dataset and Benchmarks with Image Visual Stimuli of Multi-Granularity Labels**|Shuqi Zhu et.al.|[2406.07151v1](http://arxiv.org/abs/2406.07151v1)|[link](https://github.com/promise-z5q2sq/eeg-imagenet-dataset)|Identifying and reconstructing what we see from brain activity gives us a special insight into investigating how the biological visual system represents the world. While recent efforts have achieved high-performance image classification and high-quality image reconstruction from brain signals collected by Functional Magnetic Resonance Imaging (fMRI) or magnetoencephalogram (MEG), the expensiveness and bulkiness of these devices make relevant applications difficult to generalize to practical applications. On the other hand, Electroencephalography (EEG), despite its advantages of ease of use, cost-efficiency, high temporal resolution, and non-invasive nature, has not been fully explored in relevant studies due to the lack of comprehensive datasets. To address this gap, we introduce EEG-ImageNet, a novel EEG dataset comprising recordings from 16 subjects exposed to 4000 images selected from the ImageNet dataset. EEG-ImageNet consists of 5 times EEG-image pairs larger than existing similar EEG benchmarks. EEG-ImageNet is collected with image stimuli of multi-granularity labels, i.e., 40 images with coarse-grained labels and 40 with fine-grained labels. Based on it, we establish benchmarks for object classification and image reconstruction. Experiments with several commonly used models show that the best models can achieve object classification with accuracy around 60% and image reconstruction with two-way identification around 64%. These results demonstrate the dataset's potential to advance EEG-based visual brain-computer interfaces, understand the visual perception of biological systems, and provide potential applications in improving machine visual models.|
-|**2024-06-10**|**Leveraging Hyperscanning EEG and VR Omnidirectional Treadmill to Explore Inter-Brain Synchrony in Collaborative Spatial Navigation**|Chun-Hsiang Chuang et.al.|[2406.06327v1](http://arxiv.org/abs/2406.06327v1)|null|Navigating through a physical environment to reach a desired location involves a complex interplay of cognitive, sensory, and motor functions. When navigating with others, experiencing a degree of behavioral and cognitive synchronization is both natural and ubiquitous. This synchronization facilitates a harmonious effort toward achieving a common goal, reflecting how individuals instinctively align their actions and thoughts in collaborative settings. Collaborative spatial tasks, which are crucial in daily and professional settings, require coordinated navigation and problem-solving skills. This study explores the neural mechanisms underlying such tasks by using hyperscanning electroencephalography (EEG) technology to examine brain dynamics in dyadic route planning within a virtual reality setting. By analyzing intra- and inter-brain couplings across delta, theta, alpha, beta, and gamma EEG bands using both functional and effective connectivity measures, we identified significant neural synchronization patterns associated with collaborative task performance in both leaders and followers. Functional intra-brain connectivity analyses revealed distinct neural engagement across EEG frequency bands, with increased delta couplings observed in both leaders and followers. Theta connectivity was particularly enhanced in followers, whereas the alpha band exhibited divergent patterns that indicate role-specific neural strategies. Inter-brain analysis revealed increased delta causality between interacting members but decreased theta and gamma couplings from followers to leaders. Additionally, inter-brain analysis indicated decreased couplings in faster-performing dyads, especially in theta bands. These insights enhance our understanding of the neural mechanisms driving collaborative spatial navigation and demonstrate the effectiveness of hyperscanning in studying complex brain-to-brain interactions.|
-|**2024-06-07**|**Benchmarking Deep Jansen-Rit Parameter Inference: An in Silico Study**|Deepa Tilwani et.al.|[2406.05002v1](http://arxiv.org/abs/2406.05002v1)|[link](https://github.com/lina-usc/jansen-rit-model-benchmarking-deep-learning)|The study of effective connectivity (EC) is essential in understanding how the brain integrates and responds to various sensory inputs. Model-driven estimation of EC is a powerful approach that requires estimating global and local parameters of a generative model of neural activity. Insights gathered through this process can be used in various applications, such as studying neurodevelopmental disorders. However, accurately determining EC through generative models remains a significant challenge due to the complexity of brain dynamics and the inherent noise in neural recordings, e.g., in electroencephalography (EEG). Current model-driven methods to study EC are computationally complex and cannot scale to all brain regions as required by whole-brain analyses. To facilitate EC assessment, an inference algorithm must exhibit reliable prediction of parameters in the presence of noise. Further, the relationship between the model parameters and the neural recordings must be learnable. To progress toward these objectives, we benchmarked the performance of a Bi-LSTM model for parameter inference from the Jansen-Rit neural mass model (JR-NMM) simulated EEG under various noise conditions. Additionally, our study explores how the JR-NMM reacts to changes in key biological parameters (i.e., sensitivity analysis) like synaptic gains and time constants, a crucial step in understanding the connection between neural mechanisms and observed brain activity. Our results indicate that we can predict the local JR-NMM parameters from EEG, supporting the feasibility of our deep-learning-based inference approach. In future work, we plan to extend this framework to estimate local and global parameters from real EEG in clinically relevant applications.|
-|**2024-06-05**|**GET: A Generative EEG Transformer for Continuous Context-Based Neural Signals**|Omair Ali et.al.|[2406.03115v3](http://arxiv.org/abs/2406.03115v3)|null|Generating continuous electroencephalography (EEG) signals through advanced artificial neural networks presents a novel opportunity to enhance brain-computer interface (BCI) technology. This capability has the potential to significantly enhance applications ranging from simulating dynamic brain activity and data augmentation to improving real-time epilepsy detection and BCI inference. By harnessing generative transformer neural networks, specifically designed for EEG signal generation, we can revolutionize the interpretation and interaction with neural data. Generative AI has demonstrated significant success across various domains, from natural language processing (NLP) and computer vision to content creation in visual arts and music. It distinguishes itself by using large-scale datasets to construct context windows during pre-training, a technique that has proven particularly effective in NLP, where models are fine-tuned for specific downstream tasks after extensive foundational training. However, the application of generative AI in the field of BCIs, particularly through the development of continuous, context-rich neural signal generators, has been limited. To address this, we introduce the Generative EEG Transformer (GET), a model leveraging transformer architecture tailored for EEG data. The GET model is pre-trained on diverse EEG datasets, including motor imagery and alpha wave datasets, enabling it to produce high-fidelity neural signals that maintain contextual integrity. Our empirical findings indicate that GET not only faithfully reproduces the frequency spectrum of the training data and input prompts but also robustly generates continuous neural signals. By adopting the successful training strategies of the NLP domain for BCIs, the GET sets a new standard for the development and application of neural signal generation technologies.|
-|**2024-06-03**|**MAD: Multi-Alignment MEG-to-Text Decoding**|Yiqian Yang et.al.|[2406.01512v1](http://arxiv.org/abs/2406.01512v1)|[link](https://github.com/neuspeech/mad-meg2text)|Deciphering language from brain activity is a crucial task in brain-computer interface (BCI) research. Non-invasive cerebral signaling techniques including electroencephalography (EEG) and magnetoencephalography (MEG) are becoming increasingly popular due to their safety and practicality, avoiding invasive electrode implantation. However, current works under-investigated three points: 1) a predominant focus on EEG with limited exploration of MEG, which provides superior signal quality; 2) poor performance on unseen text, indicating the need for models that can better generalize to diverse linguistic contexts; 3) insufficient integration of information from other modalities, which could potentially constrain our capacity to comprehensively understand the intricate dynamics of brain activity.   This study presents a novel approach for translating MEG signals into text using a speech-decoding framework with multiple alignments. Our method is the first to introduce an end-to-end multi-alignment framework for totally unseen text generation directly from MEG signals. We achieve an impressive BLEU-1 score on the $\textit{GWilliams}$ dataset, significantly outperforming the baseline from 5.49 to 10.44 on the BLEU-1 metric. This improvement demonstrates the advancement of our model towards real-world applications and underscores its potential in advancing BCI research. Code is available at $\href{https://github.com/NeuSpeech/MAD-MEG2text}{https://github.com/NeuSpeech/MAD-MEG2text}$.|
-|**2024-06-03**|**Conditional Gumbel-Softmax for constrained feature selection with application to node selection in wireless sensor networks**|Thomas Strypsteen et.al.|[2406.01162v1](http://arxiv.org/abs/2406.01162v1)|null|In this paper, we introduce Conditional Gumbel-Softmax as a method to perform end-to-end learning of the optimal feature subset for a given task and deep neural network (DNN) model, while adhering to certain pairwise constraints between the features. We do this by conditioning the selection of each feature in the subset on another feature. We demonstrate how this approach can be used to select the task-optimal nodes composing a wireless sensor network (WSN) while ensuring that none of the nodes that require communication between one another have too large of a distance between them, limiting the required power spent on this communication. We validate this approach on an emulated Wireless Electroencephalography (EEG) Sensor Network (WESN) solving a motor execution task. We analyze how the performance of the WESN varies as the constraints are made more stringent and how well the Conditional Gumbel-Softmax performs in comparison with a heuristic, greedy selection method. While the application focus of this paper is on wearable brain-computer interfaces, the proposed methodology is generic and can readily be applied to node deployment in wireless sensor networks and constrained feature selection in other applications as well.|
-|**2024-05-31**|**Learning Exemplar Representations in Single-Trial EEG Category Decoding**|Jack Kilgallen et.al.|[2406.16902v1](http://arxiv.org/abs/2406.16902v1)|null|Within neuroimgaing studies it is a common practice to perform repetitions of trials in an experiment when working with a noisy class of data acquisition system, such as electroencephalography (EEG) or magnetoencephalography (MEG). While this approach can be useful in some experimental designs, it presents significant limitations for certain types of analyses, such as identifying the category of an object observed by a subject. In this study we demonstrate that when trials relating to a single object are allowed to appear in both the training and testing sets, almost any classification algorithm is capable of learning the representation of an object given only category labels. This ability to learn object representations is of particular significance as it suggests that the results of several published studies which predict the category of observed objects from EEG signals may be affected by a subtle form of leakage which has inflated their reported accuracies. We demonstrate the ability of both simple classification algorithms, and sophisticated deep learning models, to learn object representations given only category labels. We do this using two datasets; the Kaneshiro et al. (2015) dataset and the Gifford et al. (2022) dataset. Our results raise doubts about the true generalizability of several published models and suggests that the reported performance of these models may be significantly inflated.|
-|**2024-05-28**|**Multi-modal Mood Reader: Pre-trained Model Empowers Cross-Subject Emotion Recognition**|Yihang Dong et.al.|[2405.19373v1](http://arxiv.org/abs/2405.19373v1)|null|Emotion recognition based on Electroencephalography (EEG) has gained significant attention and diversified development in fields such as neural signal processing and affective computing. However, the unique brain anatomy of individuals leads to non-negligible natural differences in EEG signals across subjects, posing challenges for cross-subject emotion recognition. While recent studies have attempted to address these issues, they still face limitations in practical effectiveness and model framework unity. Current methods often struggle to capture the complex spatial-temporal dynamics of EEG signals and fail to effectively integrate multimodal information, resulting in suboptimal performance and limited generalizability across subjects. To overcome these limitations, we develop a Pre-trained model based Multimodal Mood Reader for cross-subject emotion recognition that utilizes masked brain signal modeling and interlinked spatial-temporal attention mechanism. The model learns universal latent representations of EEG signals through pre-training on large scale dataset, and employs Interlinked spatial-temporal attention mechanism to process Differential Entropy(DE) features extracted from EEG data. Subsequently, a multi-level fusion layer is proposed to integrate the discriminative features, maximizing the advantages of features across different dimensions and modalities. Extensive experiments on public datasets demonstrate Mood Reader's superior performance in cross-subject emotion recognition tasks, outperforming state-of-the-art methods. Additionally, the model is dissected from attention perspective, providing qualitative analysis of emotion-related brain areas, offering valuable insights for affective research in neural signal processing.|
-|**2024-05-24**|**Medformer: A Multi-Granularity Patching Transformer for Medical Time-Series Classification**|Yihe Wang et.al.|[2405.19363v1](http://arxiv.org/abs/2405.19363v1)|[link](https://github.com/dl4mhealth/medformer)|Medical time series data, such as Electroencephalography (EEG) and Electrocardiography (ECG), play a crucial role in healthcare, such as diagnosing brain and heart diseases. Existing methods for medical time series classification primarily rely on handcrafted biomarkers extraction and CNN-based models, with limited exploration of transformers tailored for medical time series. In this paper, we introduce Medformer, a multi-granularity patching transformer tailored specifically for medical time series classification. Our method incorporates three novel mechanisms to leverage the unique characteristics of medical time series: cross-channel patching to leverage inter-channel correlations, multi-granularity embedding for capturing features at different scales, and two-stage (intra- and inter-granularity) multi-granularity self-attention for learning features and correlations within and among granularities. We conduct extensive experiments on five public datasets under both subject-dependent and challenging subject-independent setups. Results demonstrate Medformer's superiority over 10 baselines, achieving top averaged ranking across five datasets on all six evaluation metrics. These findings underscore the significant impact of our method on healthcare applications, such as diagnosing Myocardial Infarction, Alzheimer's, and Parkinson's disease. We release the source code at \url{https://github.com/DL4mHealth/Medformer}.|
-|**2024-05-21**|**Study on spike-and-wave detection in epileptic signals using t-location-scale distribution and the K-nearest neighbors classifier**|Antonio Quintero-Rincón et.al.|[2405.14896v1](http://arxiv.org/abs/2405.14896v1)|null|Pattern classification in electroencephalography (EEG) signals is an important problem in biomedical engineering since it enables the detection of brain activity, particularly the early detection of epileptic seizures. In this paper, we propose a k-nearest neighbors classification for epileptic EEG signals based on a t-location-scale statistical representation to detect spike-and-waves. The proposed approach is demonstrated on a real dataset containing both spike-and-wave events and normal brain function signals, where our performance is evaluated in terms of classification accuracy, sensitivity, and specificity.|
-|**2024-05-18**|**Domain Generalization for Zero-calibration BCIs with Knowledge Distillation-based Phase Invariant Feature Extraction**|Zilin Liang et.al.|[2405.11163v1](http://arxiv.org/abs/2405.11163v1)|[link](https://github.com/ZilinL/KnIFE)|The distribution shift of electroencephalography (EEG) data causes poor generalization of braincomputer interfaces (BCIs) in unseen domains. Some methods try to tackle this challenge by collecting a portion of user data for calibration. However, it is time-consuming, mentally fatiguing, and user-unfriendly. To achieve zerocalibration BCIs, most studies employ domain generalization (DG) techniques to learn invariant features across different domains in the training set. However, they fail to fully explore invariant features within the same domain, leading to limited performance. In this paper, we present an novel method to learn domain-invariant features from both interdomain and intra-domain perspectives. For intra-domain invariant features, we propose a knowledge distillation framework to extract EEG phase-invariant features within one domain. As for inter-domain invariant features, correlation alignment is used to bridge distribution gaps across multiple domains. Experimental results on three public datasets validate the effectiveness of our method, showcasing stateof-the-art performance. To the best of our knowledge, this is the first domain generalization study that exploit Fourier phase information as an intra-domain invariant feature to facilitate EEG generalization. More importantly, the zerocalibration BCI based on inter- and intra-domain invariant features has significant potential to advance the practical applications of BCIs in real world.|
-|**2024-05-17**|**Subject-Adaptive Transfer Learning Using Resting State EEG Signals for Cross-Subject EEG Motor Imagery Classification**|Sion An et.al.|[2405.19346v1](http://arxiv.org/abs/2405.19346v1)|null|Electroencephalography (EEG) motor imagery (MI) classification is a fundamental, yet challenging task due to the variation of signals between individuals i.e., inter-subject variability. Previous approaches try to mitigate this using task-specific (TS) EEG signals from the target subject in training. However, recording TS EEG signals requires time and limits its applicability in various fields. In contrast, resting state (RS) EEG signals are a viable alternative due to ease of acquisition with rich subject information. In this paper, we propose a novel subject-adaptive transfer learning strategy that utilizes RS EEG signals to adapt models on unseen subject data. Specifically, we disentangle extracted features into task- and subject-dependent features and use them to calibrate RS EEG signals for obtaining task information while preserving subject characteristics. The calibrated signals are then used to adapt the model to the target subject, enabling the model to simulate processing TS EEG signals of the target subject. The proposed method achieves state-of-the-art accuracy on three public benchmarks, demonstrating the effectiveness of our method in cross-subject EEG MI classification. Our findings highlight the potential of leveraging RS EEG signals to advance practical brain-computer interface systems.|
-|**2024-05-14**|**EEG-Features for Generalized Deepfake Detection**|Arian Beckmann et.al.|[2405.08527v1](http://arxiv.org/abs/2405.08527v1)|null|Since the advent of Deepfakes in digital media, the development of robust and reliable detection mechanism is urgently called for. In this study, we explore a novel approach to Deepfake detection by utilizing electroencephalography (EEG) measured from the neural processing of a human participant who viewed and categorized Deepfake stimuli from the FaceForensics++ datset. These measurements serve as input features to a binary support vector classifier, trained to discriminate between real and manipulated facial images. We examine whether EEG data can inform Deepfake detection and also if it can provide a generalized representation capable of identifying Deepfakes beyond the training domain. Our preliminary results indicate that human neural processing signals can be successfully integrated into Deepfake detection frameworks and hint at the potential for a generalized neural representation of artifacts in computer generated faces. Moreover, our study provides next steps towards the understanding of how digital realism is embedded in the human cognitive system, possibly enabling the development of more realistic digital avatars in the future.|
-|**2024-05-03**|**Precision Enhancement in Sustained Visual Attention Training Platforms: Offline EEG Signal Analysis for Classifier Fine-Tuning**|Maryam Norouzi et.al.|[2405.02422v2](http://arxiv.org/abs/2405.02422v2)|null|In this study, a novel open-source brain-computer interface (BCI) platform was developed to decode scalp electroencephalography (EEG) signals associated with sustained attention. The EEG signal collection was conducted using a wireless headset during a sustained visual attention task, where participants were instructed to discriminate between composite images superimposed with scenes and faces, responding only to the relevant subcategory while ignoring the irrelevant ones. Seven volunteers participated in this experiment. The data collected were subjected to analyses through event-related potential (ERP), Hilbert Transform, and Wavelet Transform to extract temporal and spectral features. For each participant, utilizing its extracted features, personalized Support Vector Machine (SVM) and Random Forest (RF) models with tuned hyperparameters were developed. The models aimed to decode the participant's attentional state towards the face and scene stimuli. The SVM models achieved a higher average accuracy of 80\% and an Area Under the Curve (AUC) of 0.86, while the RF models showed an average accuracy of 78\% and AUC of 0.8. This work suggests potential applications for the evaluation of visual attention and the development of closed-loop brainwave regulation systems in the future.|
-|**2024-05-03**|**EEG2TEXT: Open Vocabulary EEG-to-Text Decoding with EEG Pre-Training and Multi-View Transformer**|Hanwen Liu et.al.|[2405.02165v1](http://arxiv.org/abs/2405.02165v1)|null|Deciphering the intricacies of the human brain has captivated curiosity for centuries. Recent strides in Brain-Computer Interface (BCI) technology, particularly using motor imagery, have restored motor functions such as reaching, grasping, and walking in paralyzed individuals. However, unraveling natural language from brain signals remains a formidable challenge. Electroencephalography (EEG) is a non-invasive technique used to record electrical activity in the brain by placing electrodes on the scalp. Previous studies of EEG-to-text decoding have achieved high accuracy on small closed vocabularies, but still fall short of high accuracy when dealing with large open vocabularies. We propose a novel method, EEG2TEXT, to improve the accuracy of open vocabulary EEG-to-text decoding. Specifically, EEG2TEXT leverages EEG pre-training to enhance the learning of semantics from EEG signals and proposes a multi-view transformer to model the EEG signal processing by different spatial regions of the brain. Experiments show that EEG2TEXT has superior performance, outperforming the state-of-the-art baseline methods by a large margin of up to 5% in absolute BLEU and ROUGE scores. EEG2TEXT shows great potential for a high-performance open-vocabulary brain-to-text system to facilitate communication.|
-|**2024-05-02**|**Quantifying Spatial Domain Explanations in BCI using Earth Mover's Distance**|Param Rajpura et.al.|[2405.01277v1](http://arxiv.org/abs/2405.01277v1)|[link](https://github.com/HAIx-Lab/SpatialExplanations4BCI)|Brain-computer interface (BCI) systems facilitate unique communication between humans and computers, benefiting severely disabled individuals. Despite decades of research, BCIs are not fully integrated into clinical and commercial settings. It's crucial to assess and explain BCI performance, offering clear explanations for potential users to avoid frustration when it doesn't work as expected. This work investigates the efficacy of different deep learning and Riemannian geometry-based classification models in the context of motor imagery (MI) based BCI using electroencephalography (EEG). We then propose an optimal transport theory-based approach using earth mover's distance (EMD) to quantify the comparison of the feature relevance map with the domain knowledge of neuroscience. For this, we utilized explainable AI (XAI) techniques for generating feature relevance in the spatial domain to identify important channels for model outcomes. Three state-of-the-art models are implemented - 1) Riemannian geometry-based classifier, 2) EEGNet, and 3) EEG Conformer, and the observed trend in the model's accuracy across different architectures on the dataset correlates with the proposed feature relevance metrics. The models with diverse architectures perform significantly better when trained on channels relevant to motor imagery than data-driven channel selection. This work focuses attention on the necessity for interpretability and incorporating metrics beyond accuracy, underscores the value of combining domain knowledge and quantifying model interpretations with data-driven approaches in creating reliable and robust Brain-Computer Interfaces (BCIs).|
-|**2024-04-27**|**Empowering Mobility: Brain-Computer Interface for Enhancing Wheelchair Control for Individuals with Physical Disabilities**|Shiva Ghasemi et.al.|[2404.17895v1](http://arxiv.org/abs/2404.17895v1)|null|The integration of brain-computer interfaces (BCIs) into the realm of smart wheelchair (SW) technology signifies a notable leap forward in enhancing the mobility and autonomy of individuals with physical disabilities. BCIs are a technology that enables direct communication between the brain and external devices. While BCIs systems offer remarkable opportunities for enhancing human-computer interaction and providing mobility solutions for individuals with disabilities, they also raise significant concerns regarding security, safety, and privacy that have not been thoroughly addressed by researchers on a large scale. Our research aims to enhance wheelchair control for individuals with physical disabilities by leveraging electroencephalography (EEG) signals for BCIs. We introduce a non-invasive BCI system that utilizes a neuro-signal acquisition headset to capture EEG signals. These signals are obtained from specific brain activities that individuals have been trained to produce, allowing for precise control of the wheelchair. EEG-based BCIs are instrumental in capturing the brain's electrical activity and translating these signals into actionable commands. The primary objective of our study is to demonstrate the system's capability to interpret EEG signals and decode specific thought patterns or mental commands issued by the user. By doing so, it aims to convert these into accurate control commands for the wheelchair. This process includes the recognition of navigational intentions, such as moving forward, backward, or executing turns, specifically tailored for wheelchair operation. Through this innovative approach, we aim to create a seamless interface between the user's cognitive intentions and the wheelchair's movements, enhancing autonomy and mobility for individuals with physical disabilities.|
-|**2024-04-26**|**Unveiling Thoughts: A Review of Advancements in EEG Brain Signal Decoding into Text**|Saydul Akbar Murad et.al.|[2405.00726v1](http://arxiv.org/abs/2405.00726v1)|null|The conversion of brain activity into text using electroencephalography (EEG) has gained significant traction in recent years. Many researchers are working to develop new models to decode EEG signals into text form. Although this area has shown promising developments, it still faces numerous challenges that necessitate further improvement. It's important to outline this area's recent developments and future research directions. In this review article, we thoroughly summarize the progress in EEG-to-text conversion. Firstly, we talk about how EEG-to-text technology has grown and what problems we still face. Secondly, we discuss existing techniques used in this field. This includes methods for collecting EEG data, the steps to process these signals, and the development of systems capable of translating these signals into coherent text. We conclude with potential future research directions, emphasizing the need for enhanced accuracy, reduced system constraints, and the exploration of novel applications across varied sectors. By addressing these aspects, this review aims to contribute to developing more accessible and effective Brain-Computer Interface (BCI) technology for a broader user base.|
-|**2024-04-26**|**EEG_RL-Net: Enhancing EEG MI Classification through Reinforcement Learning-Optimised Graph Neural Networks**|Htoo Wai Aung et.al.|[2405.00723v1](http://arxiv.org/abs/2405.00723v1)|null|Brain-Computer Interfaces (BCIs) rely on accurately decoding electroencephalography (EEG) motor imagery (MI) signals for effective device control. Graph Neural Networks (GNNs) outperform Convolutional Neural Networks (CNNs) in this regard, by leveraging the spatial relationships between EEG electrodes through adjacency matrices. The EEG_GLT-Net framework, featuring the state-of-the-art EEG_GLT adjacency matrix method, has notably enhanced EEG MI signal classification, evidenced by an average accuracy of 83.95% across 20 subjects on the PhysioNet dataset. This significantly exceeds the 76.10% accuracy rate achieved using the Pearson Correlation Coefficient (PCC) method within the same framework.   In this research, we advance the field by applying a Reinforcement Learning (RL) approach to the classification of EEG MI signals. Our innovative method empowers the RL agent, enabling not only the classification of EEG MI data points with higher accuracy, but effective identification of EEG MI data points that are less distinct. We present the EEG_RL-Net, an enhancement of the EEG_GLT-Net framework, which incorporates the trained EEG GCN Block from EEG_GLT-Net at an adjacency matrix density of 13.39% alongside the RL-centric Dueling Deep Q Network (Dueling DQN) block. The EEG_RL-Net model showcases exceptional classification performance, achieving an unprecedented average accuracy of 96.40% across 20 subjects within 25 milliseconds. This model illustrates the transformative effect of the RL in EEG MI time point classification.|
-|**2024-04-24**|**Introducing EEG Analyses to Help Personal Music Preference Prediction**|Zhiyu He et.al.|[2404.15753v1](http://arxiv.org/abs/2404.15753v1)|[link](https://github.com/hezy18/eeg_music)|Nowadays, personalized recommender systems play an increasingly important role in music scenarios in our daily life with the preference prediction ability. However, existing methods mainly rely on users' implicit feedback (e.g., click, dwell time) which ignores the detailed user experience. This paper introduces Electroencephalography (EEG) signals to personal music preferences as a basis for the personalized recommender system. To realize collection in daily life, we use a dry-electrodes portable device to collect data. We perform a user study where participants listen to music and record preferences and moods. Meanwhile, EEG signals are collected with a portable device. Analysis of the collected data indicates a significant relationship between music preference, mood, and EEG signals. Furthermore, we conduct experiments to predict personalized music preference with the features of EEG signals. Experiments show significant improvement in rating prediction and preference classification with the help of EEG. Our work demonstrates the possibility of introducing EEG signals in personal music preference with portable devices. Moreover, our approach is not restricted to the music scenario, and the EEG signals as explicit feedback can be used in personalized recommendation tasks.|
-|**2024-04-24**|**MDDD: Manifold-based Domain Adaptation with Dynamic Distribution for Non-Deep Transfer Learning in Cross-subject and Cross-session EEG-based Emotion Recognition**|Ting Luo et.al.|[2404.15615v1](http://arxiv.org/abs/2404.15615v1)|null|Emotion decoding using Electroencephalography (EEG)-based affective brain-computer interfaces represents a significant area within the field of affective computing. In the present study, we propose a novel non-deep transfer learning method, termed as Manifold-based Domain adaptation with Dynamic Distribution (MDDD). The proposed MDDD includes four main modules: manifold feature transformation, dynamic distribution alignment, classifier learning, and ensemble learning. The data undergoes a transformation onto an optimal Grassmann manifold space, enabling dynamic alignment of the source and target domains. This process prioritizes both marginal and conditional distributions according to their significance, ensuring enhanced adaptation efficiency across various types of data. In the classifier learning, the principle of structural risk minimization is integrated to develop robust classification models. This is complemented by dynamic distribution alignment, which refines the classifier iteratively. Additionally, the ensemble learning module aggregates the classifiers obtained at different stages of the optimization process, which leverages the diversity of the classifiers to enhance the overall prediction accuracy. The experimental results indicate that MDDD outperforms traditional non-deep learning methods, achieving an average improvement of 3.54%, and is comparable to deep learning methods. This suggests that MDDD could be a promising method for enhancing the utility and applicability of aBCIs in real-world scenarios.|
-|**2024-04-17**|**EEG_GLT-Net: Optimising EEG Graphs for Real-time Motor Imagery Signals Classification**|Htoo Wai Aung et.al.|[2404.11075v1](http://arxiv.org/abs/2404.11075v1)|null|Brain-Computer Interfaces connect the brain to external control devices, necessitating the accurate translation of brain signals such as from electroencephalography (EEG) into executable commands. Graph Neural Networks (GCN) have been increasingly applied for classifying EEG Motor Imagery signals, primarily because they incorporates the spatial relationships among EEG channels, resulting in improved accuracy over traditional convolutional methods. Recent advances by GCNs-Net in real-time EEG MI signal classification utilised Pearson Coefficient Correlation (PCC) for constructing adjacency matrices, yielding significant results on the PhysioNet dataset. Our paper introduces the EEG Graph Lottery Ticket (EEG_GLT) algorithm, an innovative technique for constructing adjacency matrices for EEG channels. It does not require pre-existing knowledge of inter-channel relationships, and it can be tailored to suit both individual subjects and GCN model architectures. Our findings demonstrated that the PCC method outperformed the Geodesic approach by 9.65% in mean accuracy, while our EEG_GLT matrix consistently exceeded the performance of the PCC method by a mean accuracy of 13.39%. Also, we found that the construction of the adjacency matrix significantly influenced accuracy, to a greater extent than GCN model configurations. A basic GCN configuration utilising our EEG_GLT matrix exceeded the performance of even the most complex GCN setup with a PCC matrix in average accuracy. Our EEG_GLT method also reduced MACs by up to 97% compared to the PCC method, while maintaining or enhancing accuracy. In conclusion, the EEG_GLT algorithm marks a breakthrough in the development of optimal adjacency matrices, effectively boosting both computational accuracy and efficiency, making it well-suited for real-time classification of EEG MI signals that demand intensive computational resources.|
diff --git a/mkdocs/docs/Electromyography/Electromyography.md b/mkdocs/docs/Electromyography/Electromyography.md
index c72083e5..f6c62089 100644
--- a/mkdocs/docs/Electromyography/Electromyography.md
+++ b/mkdocs/docs/Electromyography/Electromyography.md
@@ -2,33 +2,3 @@
 ### Electromyography
 |Publish Date|Title|Authors|PDF|Code|Abstract|
 | :---: | :---: | :---: | :---: | :---: | :---: |
-|**2024-06-07**|**L-SFAN: Lightweight Spatially-focused Attention Network for Pain Behavior Detection**|Jorge Ortigoso-Narro et.al.|[2406.16913v1](http://arxiv.org/abs/2406.16913v1)|null|Chronic Low Back Pain (CLBP) afflicts millions globally, significantly impacting individuals' well-being and imposing economic burdens on healthcare systems. While artificial intelligence (AI) and deep learning offer promising avenues for analyzing pain-related behaviors to improve rehabilitation strategies, current models, including convolutional neural networks (CNNs), recurrent neural networks, and graph-based neural networks, have limitations. These approaches often focus singularly on the temporal dimension or require complex architectures to exploit spatial interrelationships within multivariate time series data. To address these limitations, we introduce \hbox{L-SFAN}, a lightweight CNN architecture incorporating 2D filters designed to meticulously capture the spatial-temporal interplay of data from motion capture and surface electromyography sensors. Our proposed model, enhanced with an oriented global pooling layer and multi-head self-attention mechanism, prioritizes critical features to better understand CLBP and achieves competitive classification accuracy. Experimental results on the EmoPain database demonstrate that our approach not only enhances performance metrics with significantly fewer parameters but also promotes model interpretability, offering valuable insights for clinicians in managing CLBP. This advancement underscores the potential of AI in transforming healthcare practices for chronic conditions like CLBP, providing a sophisticated framework for the nuanced analysis of complex biomedical data.|
-|**2024-05-29**|**Anatomical Region Recognition and Real-time Bone Tracking Methods by Dynamically Decoding A-Mode Ultrasound Signals**|Bangyu Lan et.al.|[2405.19542v2](http://arxiv.org/abs/2405.19542v2)|null|Accurate bone tracking is crucial for kinematic analysis in orthopedic surgery and prosthetic robotics. Traditional methods (e.g., skin markers) are subject to soft tissue artifacts, and the bone pins used in surgery introduce the risk of additional trauma and infection. For electromyography (EMG), its inability to directly measure joint angles requires complex algorithms for kinematic estimation. To address these issues, A-mode ultrasound-based tracking has been proposed as a non-invasive and safe alternative. However, this approach suffers from limited accuracy in peak detection when processing received ultrasound signals. To build a precise and real-time bone tracking approach, this paper introduces a deep learning-based method for anatomical region recognition and bone tracking using A-mode ultrasound signals, specifically focused on the knee joint. The algorithm is capable of simultaneously performing bone tracking and identifying the anatomical region where the A-mode ultrasound transducer is placed. It contains the fully connection between all encoding and decoding layers of the cascaded U-Nets to focus only on the signal region that is most likely to have the bone peak, thus pinpointing the exact location of the peak and classifying the anatomical region of the signal. The experiment showed a 97% accuracy in the classification of the anatomical regions and a precision of around 0.5$\pm$1mm under dynamic tracking conditions for various anatomical areas surrounding the knee joint. In general, this approach shows great potential beyond the traditional method, in terms of the accuracy achieved and the recognition of the anatomical region where the ultrasound has been attached as an additional functionality.|
-|**2024-05-23**|**An LSTM Feature Imitation Network for Hand Movement Recognition from sEMG Signals**|Chuheng Wu et.al.|[2405.19356v1](http://arxiv.org/abs/2405.19356v1)|null|Surface Electromyography (sEMG) is a non-invasive signal that is used in the recognition of hand movement patterns, the diagnosis of diseases, and the robust control of prostheses. Despite the remarkable success of recent end-to-end Deep Learning approaches, they are still limited by the need for large amounts of labeled data. To alleviate the requirement for big data, researchers utilize Feature Engineering, which involves decomposing the sEMG signal into several spatial, temporal, and frequency features. In this paper, we propose utilizing a feature-imitating network (FIN) for closed-form temporal feature learning over a 300ms signal window on Ninapro DB2, and applying it to the task of 17 hand movement recognition. We implement a lightweight LSTM-FIN network to imitate four standard temporal features (entropy, root mean square, variance, simple square integral). We then explore transfer learning capabilities by applying the pre-trained LSTM-FIN for tuning to a downstream hand movement recognition task. We observed that the LSTM network can achieve up to 99\% R2 accuracy in feature reconstruction and 80\% accuracy in hand movement recognition. Our results also showed that the model can be robustly applied for both within- and cross-subject movement recognition, as well as simulated low-latency environments. Overall, our work demonstrates the potential of the FIN modeling paradigm in data-scarce scenarios for sEMG signal processing.|
-|**2024-05-23**|**SpGesture: Source-Free Domain-adaptive sEMG-based Gesture Recognition with Jaccard Attentive Spiking Neural Network**|Weiyu Guo et.al.|[2405.14398v1](http://arxiv.org/abs/2405.14398v1)|null|Surface electromyography (sEMG) based gesture recognition offers a natural and intuitive interaction modality for wearable devices. Despite significant advancements in sEMG-based gesture-recognition models, existing methods often suffer from high computational latency and increased energy consumption. Additionally, the inherent instability of sEMG signals, combined with their sensitivity to distribution shifts in real-world settings, compromises model robustness.   To tackle these challenges, we propose a novel SpGesture framework based on Spiking Neural Networks, which possesses several unique merits compared with existing methods: (1) Robustness: By utilizing membrane potential as a memory list, we pioneer the introduction of Source-Free Domain Adaptation into SNN for the first time. This enables SpGesture to mitigate the accuracy degradation caused by distribution shifts. (2) High Accuracy: With a novel Spiking Jaccard Attention, SpGesture enhances the SNNs' ability to represent sEMG features, leading to a notable rise in system accuracy. To validate SpGesture's performance, we collected a new sEMG gesture dataset which has different forearm postures, where SpGesture achieved the highest accuracy among the baselines ($89.26\%$). Moreover, the actual deployment on the CPU demonstrated a system latency below 100ms, well within real-time requirements. This impressive performance showcases SpGesture's potential to enhance the applicability of sEMG in real-world scenarios. The code is available at https://anonymous.4open.science/r/SpGesture.|
-|**2024-05-11**|**Diff-ETS: Learning a Diffusion Probabilistic Model for Electromyography-to-Speech Conversion**|Zhao Ren et.al.|[2405.08021v1](http://arxiv.org/abs/2405.08021v1)|null|Electromyography-to-Speech (ETS) conversion has demonstrated its potential for silent speech interfaces by generating audible speech from Electromyography (EMG) signals during silent articulations. ETS models usually consist of an EMG encoder which converts EMG signals to acoustic speech features, and a vocoder which then synthesises the speech signals. Due to an inadequate amount of available data and noisy signals, the synthesised speech often exhibits a low level of naturalness. In this work, we propose Diff-ETS, an ETS model which uses a score-based diffusion probabilistic model to enhance the naturalness of synthesised speech. The diffusion model is applied to improve the quality of the acoustic features predicted by an EMG encoder. In our experiments, we evaluated fine-tuning the diffusion model on predictions of a pre-trained EMG encoder, and training both models in an end-to-end fashion. We compared Diff-ETS with a baseline ETS model without diffusion using objective metrics and a listening test. The results indicated the proposed Diff-ETS significantly improved speech naturalness over the baseline.|
-|**2024-05-09**|**supDQN: Supervised Rewarding Strategy Driven Deep Q-Network for sEMG Signal Decontamination**|Ashutosh Jena et.al.|[2405.05883v1](http://arxiv.org/abs/2405.05883v1)|null|The presence of muscles throughout the active parts of the body such as the upper and lower limbs, makes electromyography-based human-machine interaction prevalent. However, muscle signals are stochastic and noisy. These noises can be regular and irregular. Irregular noises due to movements or electrical switching require dynamic filtering. Conventionally, filters are stacked, which trims and delays the signal unnecessarily. This study introduces a decontamination technique involving a supervised rewarding strategy to drive a deep Q-network-based agent (supDQN). It applies one of three filters to decontaminate a 1sec long surface electromyography signal, which is dynamically contaminated. A machine learning agent identifies whether the signal after filtering is clean or noisy. Accordingly, a reward is generated. The identification accuracy is enhanced by using a local interpretable model-agnostic explanation. The deep Q-network is guided by this reward to select filter optimally while decontaminating a signal. The proposed filtering strategy is tested on four noise levels (-5 dB, -1 dB, +1 dB, +5 dB). supDQN filters the signal desirably when the signal-to-noise ratio (SNR) is between -5 dB to +1 dB. It filters less desirably at high SNR (+5 dB). A normalized root mean square (nRMSE) is formulated to depict the difference of filtered signal from ground truth. This is used to compare supDQN and conventional methods including wavelet denoising with debauchies and symlet wavelet, high order low pass filter, notch filter, and high pass filter. The proposed filtering strategy gives an average value nRMSE of 1.1974, which is lower than the conventional filters.|
-|**2024-05-06**|**MEET: Mixture of Experts Extra Tree-Based sEMG Hand Gesture Identification**|Naveen Gehlot et.al.|[2405.09562v1](http://arxiv.org/abs/2405.09562v1)|null|Artificial intelligence (AI) has made significant advances in recent years and opened up new possibilities in exploring applications in various fields such as biomedical, robotics, education, industry, etc. Among these fields, human hand gesture recognition is a subject of study that has recently emerged as a research interest in robotic hand control using electromyography (EMG). Surface electromyography (sEMG) is a primary technique used in EMG, which is popular due to its non-invasive nature and is used to capture gesture movements using signal acquisition devices placed on the surface of the forearm. Moreover, these signals are pre-processed to extract significant handcrafted features through time and frequency domain analysis. These are helpful and act as input to machine learning (ML) models to identify hand gestures. However, handling multiple classes and biases are major limitations that can affect the performance of an ML model. Therefore, to address this issue, a new mixture of experts extra tree (MEET) model is proposed to identify more accurate and effective hand gesture movements. This model combines individual ML models referred to as experts, each focusing on a minimal class of two. Moreover, a fully trained model known as the gate is employed to weigh the output of individual expert models. This amalgamation of the expert models with the gate model is known as a mixture of experts extra tree (MEET) model. In this study, four subjects with six hand gesture movements have been considered and their identification is evaluated among eleven models, including the MEET classifier. Results elucidate that the MEET classifier performed best among other algorithms and identified hand gesture movement accurately.|
-|**2024-04-17**|**Towards Robust and Interpretable EMG-based Hand Gesture Recognition using Deep Metric Meta Learning**|Simon Tam et.al.|[2404.15360v1](http://arxiv.org/abs/2404.15360v1)|null|Current electromyography (EMG) pattern recognition (PR) models have been shown to generalize poorly in unconstrained environments, setting back their adoption in applications such as hand gesture control. This problem is often due to limited training data, exacerbated by the use of supervised classification frameworks that are known to be suboptimal in such settings. In this work, we propose a shift to deep metric-based meta-learning in EMG PR to supervise the creation of meaningful and interpretable representations. We use a Siamese Deep Convolutional Neural Network (SDCNN) and contrastive triplet loss to learn an EMG feature embedding space that captures the distribution of the different classes. A nearest-centroid approach is subsequently employed for inference, relying on how closely a test sample aligns with the established data distributions. We derive a robust class proximity-based confidence estimator that leads to a better rejection of incorrect decisions, i.e. false positives, especially when operating beyond the training data domain. We show our approach's efficacy by testing the trained SDCNN's predictions and confidence estimations on unseen data, both in and out of the training domain. The evaluation metrics include the accuracy-rejection curve and the Kullback-Leibler divergence between the confidence distributions of accurate and inaccurate predictions. Outperforming comparable models on both metrics, our results demonstrate that the proposed meta-learning approach improves the classifier's precision in active decisions (after rejection), thus leading to better generalization and applicability.|
-|**2024-04-17**|**Revisiting Noise Resilience Strategies in Gesture Recognition: Short-Term Enhancement in Surface Electromyographic Signal Analysis**|Weiyu Guo et.al.|[2404.11213v1](http://arxiv.org/abs/2404.11213v1)|null|Gesture recognition based on surface electromyography (sEMG) has been gaining importance in many 3D Interactive Scenes. However, sEMG is easily influenced by various forms of noise in real-world environments, leading to challenges in providing long-term stable interactions through sEMG. Existing methods often struggle to enhance model noise resilience through various predefined data augmentation techniques. In this work, we revisit the problem from a short term enhancement perspective to improve precision and robustness against various common noisy scenarios with learnable denoise using sEMG intrinsic pattern information and sliding-window attention. We propose a Short Term Enhancement Module(STEM) which can be easily integrated with various models. STEM offers several benefits: 1) Learnable denoise, enabling noise reduction without manual data augmentation; 2) Scalability, adaptable to various models; and 3) Cost-effectiveness, achieving short-term enhancement through minimal weight-sharing in an efficient attention mechanism. In particular, we incorporate STEM into a transformer, creating the Short Term Enhanced Transformer (STET). Compared with best-competing approaches, the impact of noise on STET is reduced by more than 20%. We also report promising results on both classification and regression datasets and demonstrate that STEM generalizes across different gesture recognition tasks.|
-|**2024-04-11**|**The Power of Properties: Uncovering the Influential Factors in Emotion Classification**|Tim Büchner et.al.|[2404.07867v1](http://arxiv.org/abs/2404.07867v1)|null|Facial expression-based human emotion recognition is a critical research area in psychology and medicine. State-of-the-art classification performance is only reached by end-to-end trained neural networks. Nevertheless, such black-box models lack transparency in their decision-making processes, prompting efforts to ascertain the rules that underlie classifiers' decisions. Analyzing single inputs alone fails to expose systematic learned biases. These biases can be characterized as facial properties summarizing abstract information like age or medical conditions. Therefore, understanding a model's prediction behavior requires an analysis rooted in causality along such selected properties. We demonstrate that up to 91.25% of classifier output behavior changes are statistically significant concerning basic properties. Among those are age, gender, and facial symmetry. Furthermore, the medical usage of surface electromyography significantly influences emotion prediction. We introduce a workflow to evaluate explicit properties and their impact. These insights might help medical professionals select and apply classifiers regarding their specialized data and properties.|
-|**2024-04-11**|**Efficient sEMG-based Cross-Subject Joint Angle Estimation via Hierarchical Spiking Attentional Feature Decomposition Network**|Xin Zhou et.al.|[2404.07517v1](http://arxiv.org/abs/2404.07517v1)|null|Surface electromyography (sEMG) has demonstrated significant potential in simultaneous and proportional control (SPC). However, existing algorithms for predicting joint angles based on sEMG often suffer from high inference costs or are limited to specific subjects rather than cross-subject scenarios. To address these challenges, we introduced a hierarchical Spiking Attentional Feature Decomposition Network (SAFE-Net). This network initially compresses sEMG signals into neural spiking forms using a Spiking Sparse Attention Encoder (SSAE). Subsequently, the compressed features are decomposed into kinematic and biological features through a Spiking Attentional Feature Decomposition (SAFD) module. Finally, the kinematic and biological features are used to predict joint angles and identify subject identities, respectively. Our validation on two datasets (SIAT-DB1 and SIAT-DB2) and comparison with two existing methods, Informer and Spikformer, demonstrate that SSAE achieves significant power consumption savings of 39.1% and 37.5% respectively over them in terms of inference costs. Furthermore, SAFE-Net surpasses Informer and Spikformer in recognition accuracy on both datasets. This study underscores the potential of SAFE-Net to advance the field of SPC in lower limb rehabilitation exoskeleton robots.|
-|**2024-03-29**|**Design, Fabrication and Evaluation of a Stretchable High-Density Electromyography Array**|Rejin John Varghese et.al.|[2403.20117v1](http://arxiv.org/abs/2403.20117v1)|[link](https://github.com/rejinjohnvarghese/stretchable-hmi-array)|The adoption of high-density electrode systems for human-machine interfaces in real-life applications has been impeded by practical and technical challenges, including noise interference, motion artifacts and the lack of compact electrode interfaces. To overcome some of these challenges, we introduce a wearable and stretchable electromyography (EMG) array, and present its design, fabrication methodology, characterisation, and comprehensive evaluation. Our proposed solution comprises dry-electrodes on flexible printed circuit board (PCB) substrates, eliminating the need for time-consuming skin preparation. The proposed fabrication method allows the manufacturing of stretchable sleeves, with consistent and standardised coverage across subjects. We thoroughly tested our developed prototype, evaluating its potential for application in both research and real-world environments. The results of our study showed that the developed stretchable array matches or outperforms traditional EMG grids and holds promise in furthering the real-world translation of high-density EMG for human-machine interfaces.|
-|**2024-03-27**|**An Evolutionary Network Architecture Search Framework with Adaptive Multimodal Fusion for Hand Gesture Recognition**|Yizhang Xia et.al.|[2403.18208v1](http://arxiv.org/abs/2403.18208v1)|null|Hand gesture recognition (HGR) based on multimodal data has attracted considerable attention owing to its great potential in applications. Various manually designed multimodal deep networks have performed well in multimodal HGR (MHGR), but most of existing algorithms require a lot of expert experience and time-consuming manual trials. To address these issues, we propose an evolutionary network architecture search framework with the adaptive multimodel fusion (AMF-ENAS). Specifically, we design an encoding space that simultaneously considers fusion positions and ratios of the multimodal data, allowing for the automatic construction of multimodal networks with different architectures through decoding. Additionally, we consider three input streams corresponding to intra-modal surface electromyography (sEMG), intra-modal accelerometer (ACC), and inter-modal sEMG-ACC. To automatically adapt to various datasets, the ENAS framework is designed to automatically search a MHGR network with appropriate fusion positions and ratios. To the best of our knowledge, this is the first time that ENAS has been utilized in MHGR to tackle issues related to the fusion position and ratio of multimodal data. Experimental results demonstrate that AMF-ENAS achieves state-of-the-art performance on the Ninapro DB2, DB3, and DB7 datasets.|
-|**2024-03-19**|**Multimodal wearable EEG, EMG and accelerometry measurements improve the accuracy of tonic-clonic seizure detection in-hospital**|Jingwei Zhang et.al.|[2403.13066v1](http://arxiv.org/abs/2403.13066v1)|null|Objective: Most current wearable tonic-clonic seizure (TCS) detection systems are based on extra-cerebral signals, such as electromyography (EMG) or accelerometry (ACC). Although many of these devices show good sensitivity in seizure detection, their false positive rates (FPR) are still relatively high. Wearable EEG may improve performance; however, studies investigating this remain scarce. This paper aims 1) to investigate the possibility of detecting TCSs with a behind-the-ear, two-channel wearable EEG, and 2) to evaluate the added value of wearable EEG to other non-EEG modalities in multimodal TCS detection. Method: We included 27 participants with a total of 44 TCSs from the European multicenter study SeizeIT2. The multimodal wearable detection system Sensor Dot (Byteflies) was used to measure two-channel, behind-the-ear EEG, EMG, electrocardiography (ECG), ACC and gyroscope (GYR). First, we evaluated automatic unimodal detection of TCSs, using performance metrics such as sensitivity, precision, FPR and F1-score. Secondly, we fused the different modalities and again assessed performance. Algorithm-labeled segments were then provided to a neurologist and a wearable data expert, who reviewed and annotated the true positive TCSs, and discarded false positives (FPs). Results: Wearable EEG outperformed the other modalities in unimodal TCS detection by achieving a sensitivity of 100.0% and a FPR of 10.3/24h (compared to 97.7% sensitivity and 30.9/24h FPR for EMG; 95.5% sensitivity and 13.9 FPR for ACC). The combination of wearable EEG and EMG achieved overall the most clinically useful performance in offline TCS detection with a sensitivity of 97.7%, a FPR of 0.4/24 h, a precision of 43.0%, and a F1-score of 59.7%. Subsequent visual review of the automated detections resulted in maximal sensitivity and zero FPs.|
-|**2024-03-12**|**Neural, Muscular, and Perceptual responses with shoulder exoskeleton use over Days**|Tiash Rana Mukherjee et.al.|[2403.08044v1](http://arxiv.org/abs/2403.08044v1)|null|Passive shoulder exoskeletons have been widely introduced in the industry to aid upper extremity movements during repetitive overhead work. As an ergonomic intervention, it is important to understand how users adapt to these devices over time and if these induce external stress while working. The study evaluated the use of an exoskeleton over a period of 3 days by assessing the neural, physiological, and perceptual responses of twenty-four participants by comparing a physical task against the same task with an additional cognitive workload. Over days adaptation to task irrespective of task and group were identified. Electromyography (EMG) analysis of shoulder and back muscles reveals lower muscle activity in the exoskeleton group irrespective of task. Functional connectivity analysis using functional near infrared spectroscopy (fNIRS) reveals that exoskeletons benefit users by reducing task demands in the motor planning and execution regions. Sex-based differences were also identified in these neuromuscular assessments.|
-|**2024-03-07**|**Comparison of gait phase detection using traditional machine learning and deep learning techniques**|Farhad Nazari et.al.|[2403.05595v1](http://arxiv.org/abs/2403.05595v1)|null|Human walking is a complex activity with a high level of cooperation and interaction between different systems in the body. Accurate detection of the phases of the gait in real-time is crucial to control lower-limb assistive devices like exoskeletons and prostheses. There are several ways to detect the walking gait phase, ranging from cameras and depth sensors to the sensors attached to the device itself or the human body. Electromyography (EMG) is one of the input methods that has captured lots of attention due to its precision and time delay between neuromuscular activity and muscle movement. This study proposes a few Machine Learning (ML) based models on lower-limb EMG data for human walking. The proposed models are based on Gaussian Naive Bayes (NB), Decision Tree (DT), Random Forest (RF), Linear Discriminant Analysis (LDA) and Deep Convolutional Neural Networks (DCNN). The traditional ML models are trained on hand-crafted features or their reduced components using Principal Component Analysis (PCA). On the contrary, the DCNN model utilises convolutional layers to extract features from raw data. The results show up to 75% average accuracy for traditional ML models and 79% for Deep Learning (DL) model. The highest achieved accuracy in 50 trials of the training DL model is 89.5%.|
-|**2024-03-04**|**High-speed Low-consumption sEMG-based Transient-state micro-Gesture Recognition**|Youfang Han et.al.|[2403.06998v2](http://arxiv.org/abs/2403.06998v2)|null|Gesture recognition on wearable devices is extensively applied in human-computer interaction. Electromyography (EMG) has been used in many gesture recognition systems for its rapid perception of muscle signals. However, analyzing EMG signals on devices, like smart wristbands, usually needs inference models to have high performances, such as low inference latency, low power consumption, and low memory occupation. Therefore, this paper proposes an improved spiking neural network (SNN) to achieve these goals. We propose an adaptive multi-delta coding as a spiking coding method to improve recognition accuracy. We propose two additive solvers for SNN, which can reduce inference energy consumption and amount of parameters significantly, and improve the robustness of temporal differences. In addition, we propose a linear action detection method TAD-LIF, which is suitable for SNNs. TAD-LIF is an improved LIF neuron that can detect transient-state gestures quickly and accurately. We collected two datasets from 20 subjects including 6 micro gestures. The collection devices are two designed lightweight consumer-level sEMG wristbands (3 and 8 electrode channels respectively). Compared to CNN, FCN, and normal SNN-based methods, the proposed SNN has higher recognition accuracy. The accuracy of the proposed SNN is 83.85% and 93.52% on the two datasets respectively. In addition, the inference latency of the proposed SNN is about 1% of CNN, the power consumption is about 0.1% of CNN, and the memory occupation is about 20% of CNN. The proposed methods can be used for precise, high-speed, and low-power micro-gesture recognition tasks, and are suitable for consumer-level intelligent wearable devices, which is a general way to achieve ubiquitous computing.|
-|**2024-02-27**|**Complexity Assessment of Analog Security Primitives Using the Disentropy of Autocorrelation**|Paul Jimenez et.al.|[2402.17488v1](http://arxiv.org/abs/2402.17488v1)|null|The study of regularity in signals can be of great importance, typically in medicine to analyse electrocardiogram (ECG) or electromyography (EMG) signals, but also in climate studies, finance or security. In this work we focus on security primitives such as Physical Unclonable Functions (PUFs) or Pseudo-Random Number Generators (PRNGs). Such primitives must have a high level of complexity or entropy in their responses to guarantee enough security for their applications. There are several ways of assessing the complexity of their responses, especially in the binary domain. With the development of analog PUFs such as optical (photonic) PUFs, it would be useful to be able to assess their complexity in the analog domain when designing them, for example, before converting analog signals into binary. In this numerical study, we decided to explore the potential of the disentropy of autocorrelation as a measure of complexity for security primitives as PUFs or PRNGs with analog output or responses. We compare this metric to others used to assess regularities in analog signals such as Approximate Entropy (ApEn) and Fuzzy Entropy (FuzEn). We show that the disentropy of autocorrelation is able to differentiate between well-known PRNGs and non-optimised or bad PRNGs in the analog and binary domain with a better contrast than ApEn and FuzEn. Next, we show that the disentropy of autocorrelation is able to detect small patterns injected in PUFs responses and then we applied it to photonic PUFs simulations.|
-|**2024-02-18**|**Analysis of Fatigue-Induced Compensatory Movements in Bicep Curls: Gaining Insights for the Deployment of Wearable Sensors**|Ming Xuan Chua et.al.|[2402.11421v2](http://arxiv.org/abs/2402.11421v2)|null|A common challenge in Bicep Curls rehabilitation is muscle compensation, where patients adopt alternative movement patterns when the primary muscle group cannot act due to injury or fatigue, significantly decreasing the effectiveness of rehabilitation efforts. The problem is exacerbated by the growing trend toward transitioning from in-clinic to home-based rehabilitation, where constant monitoring and correction by physiotherapists are limited. Developing wearable sensors capable of detecting muscle compensation becomes crucial to address this challenge. This study aims to gain insights into the optimal deployment of wearable sensors through a comprehensive study of muscle compensation in Bicep Curls. We collect upper limb joint kinematics and surface electromyography signals (sEMG) from eight muscles in 12 healthy subjects during standard and fatigue stages. Two muscle synergies are derived from sEMG signals and are analyzed comprehensively along with joint kinematics. Our findings reveal a shift in the relative contribution of forearm muscles to shoulder muscles, accompanied by a significant increase in activation amplitude for both synergies. Additionally, more pronounced movement was observed at the shoulder joint during fatigue. These results suggest focusing on the shoulder muscle activities and joint motions when deploying wearable sensors to effectively detect compensatory movements.|
-|**2024-02-08**|**A Non-Intrusive Neural Quality Assessment Model for Surface Electromyography Signals**|Cho-Yuan Lee et.al.|[2402.05482v3](http://arxiv.org/abs/2402.05482v3)|null|In practical scenarios involving the measurement of surface electromyography (sEMG) in muscles, particularly those areas near the heart, one of the primary sources of contamination is the presence of electrocardiogram (ECG) signals. To assess the quality of real-world sEMG data more effectively, this study proposes QASE-net, a new non-intrusive model that predicts the SNR of sEMG signals. QASE-net combines CNN-BLSTM with attention mechanisms and follows an end-to-end training strategy. Our experimental framework utilizes real-world sEMG and ECG data from two open-access databases, the Non-Invasive Adaptive Prosthetics Database and the MIT-BIH Normal Sinus Rhythm Database, respectively. The experimental results demonstrate the superiority of QASE-net over the previous assessment model, exhibiting significantly reduced prediction errors and notably higher linear correlations with the ground truth. These findings show the potential of QASE-net to substantially enhance the reliability and precision of sEMG quality assessment in practical applications.|
-|**2024-02-06**|**SDEMG: Score-based Diffusion Model for Surface Electromyographic Signal Denoising**|Yu-Tung Liu et.al.|[2402.03808v2](http://arxiv.org/abs/2402.03808v2)|[link](https://github.com/tonyliu0910/sdemg)|Surface electromyography (sEMG) recordings can be influenced by electrocardiogram (ECG) signals when the muscle being monitored is close to the heart. Several existing methods use signal-processing-based approaches, such as high-pass filter and template subtraction, while some derive mapping functions to restore clean sEMG signals from noisy sEMG (sEMG with ECG interference). Recently, the score-based diffusion model, a renowned generative model, has been introduced to generate high-quality and accurate samples with noisy input data. In this study, we proposed a novel approach, termed SDEMG, as a score-based diffusion model for sEMG signal denoising. To evaluate the proposed SDEMG approach, we conduct experiments to reduce noise in sEMG signals, employing data from an openly accessible source, the Non-Invasive Adaptive Prosthetics database, along with ECG signals from the MIT-BIH Normal Sinus Rhythm Database. The experiment result indicates that SDEMG outperformed comparative methods and produced high-quality sEMG samples. The source code of SDEMG the framework is available at: https://github.com/tonyliu0910/SDEMG|
-|**2024-02-04**|**Smart Textile-Driven Soft Spine Exosuit for Lifting Tasks in Industrial Applications**|Kefan Zhu et.al.|[2402.02319v1](http://arxiv.org/abs/2402.02319v1)|null|Work related musculoskeletal disorders (WMSDs) are often caused by repetitive lifting, making them a significant concern in occupational health. Although wearable assist devices have become the norm for mitigating the risk of back pain, most spinal assist devices still possess a partially rigid structure that impacts the user comfort and flexibility. This paper addresses this issue by presenting a smart textile actuated spine assistance robotic exosuit (SARE), which can conform to the back seamlessly without impeding the user movement and is incredibly lightweight. The SARE can assist the human erector spinae to complete any action with virtually infinite degrees of freedom. To detect the strain on the spine and to control the smart textile automatically, a soft knitting sensor which utilizes fluid pressure as sensing element is used. The new device is validated experimentally with human subjects where it reduces peak electromyography (EMG) signals of lumbar erector spinae by around 32 percent in loaded and around 22 percent in unloaded conditions. Moreover, the integrated EMG decreased by around 24.2 percent under loaded condition and around 23.6 percent under unloaded condition. In summary, the artificial muscle wearable device represents an anatomical solution to reduce the risk of muscle strain, metabolic energy cost and back pain associated with repetitive lifting tasks.|
-|**2024-01-16**|**Effects of Virtual Hand Representation on the Typing Performance, Upper Extremity Angle, and Neck Muscle Activity during Virtual Reality Typing**|Mobasshira Zaman et.al.|[2401.08018v1](http://arxiv.org/abs/2401.08018v1)|null|This study evaluated the effect of virtual hand representation on the typing performance, upper extremity angle, neck muscle activity, and usability during virtual reality (VR) typing. A total of 15 participants (7 females and 8 males) performed a typing task with and without virtual hand representations. The optical motion capture data was utilized to capture the upper extremity angles, and electromyography device was used to collect the neck muscle activities (left and right splenius capitis). The results showed that the typing performance, upper extremity angle, neck muscle activity, and usability were significantly different with and without virtual hand representations. With the virtual hand representation, net typing speed (WPM) and usability increased significantly by 171.4% and 25% compared to the without hand representation. Without the virtual hand representation, participants showed increased wrist extension, and decreased right shoulder abduction angles. The variability of the neck rotation was increased while typing without the virtual hand representation. The neck muscle activities were increased with the virtual hand representation. The results suggest that typing with the virtual hand representation could positively affect the typing performance and usability and reduce the risk of the musculoskeletal disorders of the upper extremity. Future study could explore the effect of the virtual hand representation for users with varying levels of typing skills.|
-|**2024-01-11**|**Volume Transfer: A New Design Concept for Fabric-Based Pneumatic Exosuits**|Chendong Liu et.al.|[2401.05881v1](http://arxiv.org/abs/2401.05881v1)|null|The fabric-based pneumatic exosuit is now a hot research topic because it is lighter and softer than traditional exoskeletons. Existing research focused more on the mechanical properties of the exosuit (e.g., torque and speed), but less on its wearability (e.g., appearance and comfort). This work presents a new design concept for fabric-based pneumatic exosuits Volume Transfer, which means transferring the volume of pneumatic actuators beyond the garments profile to the inside. This allows for a concealed appearance and a larger stress area while maintaining adequate torques. In order to verify this concept, we develop a fabric-based pneumatic exosuit for knee extension assistance. Its profile is only 26mm and its stress area wraps around almost half of the leg. We use a mathematical model and simulation to determine the parameters of the exosuit, avoiding multiple iterations of the prototype. Experiment results show that the exosuit can generate a torque of 7.6Nm at a pressure of 90kPa and produce a significant reduction in the electromyography activity of the knee extensor muscles. We believe that Volume Transfer could be utilized prevalently in future fabric-based pneumatic exosuit designs to achieve a significant improvement in wearability.|
-|**2024-01-11**|**Face-GPS: A Comprehensive Technique for Quantifying Facial Muscle Dynamics in Videos**|Juni Kim et.al.|[2401.05625v1](http://arxiv.org/abs/2401.05625v1)|null|We introduce a novel method that combines differential geometry, kernels smoothing, and spectral analysis to quantify facial muscle activity from widely accessible video recordings, such as those captured on personal smartphones. Our approach emphasizes practicality and accessibility. It has significant potential for applications in national security and plastic surgery. Additionally, it offers remote diagnosis and monitoring for medical conditions such as stroke, Bell's palsy, and acoustic neuroma. Moreover, it is adept at detecting and classifying emotions, from the overt to the subtle. The proposed face muscle analysis technique is an explainable alternative to deep learning methods and a non-invasive substitute to facial electromyography (fEMG).|
-|**2024-01-06**|**Convergence Rate Maximization for Split Learning-based Control of EMG Prosthetic Devices**|Matea Marinova et.al.|[2401.03233v3](http://arxiv.org/abs/2401.03233v3)|null|Split Learning (SL) is a promising Distributed Learning approach in electromyography (EMG) based prosthetic control, due to its applicability within resource-constrained environments. Other learning approaches, such as Deep Learning and Federated Learning (FL), provide suboptimal solutions, since prosthetic devices are extremely limited in terms of processing power and battery life. The viability of implementing SL in such scenarios is caused by its inherent model partitioning, with clients executing the smaller model segment. However, selecting an inadequate cut layer hinders the training process in SL systems. This paper presents an algorithm for optimal cut layer selection in terms of maximizing the convergence rate of the model. The performance evaluation demonstrates that the proposed algorithm substantially accelerates the convergence in an EMG pattern recognition task for improving prosthetic device control.|
-|**2024-01-04**|**Estimating continuous data of wrist joint angles using ultrasound images**|Yo Kobayashi et.al.|[2401.02152v1](http://arxiv.org/abs/2401.02152v1)|null|Ultrasound imaging has recently been introduced as a sensing interface for joint motion estimation. The use of ultrasound images as an estimation method is expected to improve the control performance of assistive devices and human--machine interfaces. This study aimed to estimate continuous wrist joint angles using ultrasound images. Specifically, in an experiment, joint angle information was obtained during extension--flexion movements, and ultrasound images of the associated muscles were acquired. Using the features obtained from ultrasound images, a multivariate linear regression model was used to estimate the joint angles. The coordinates of the feature points obtained using optical flow from the ultrasound images were used as explanatory variables of the multivariate linear regression model. The model was trained and tested for each trial by each participant to verify the estimation accuracy. The results show that the mean and standard deviation of the estimation accuracy for all trials were root mean square error (RMSE)=1.82 $\pm$ 0.54 deg and coefficient of determination (R2)=0.985 $\pm$ 0.009. Our method achieves a highly accurate estimation of joint angles compared with previous studies using other signals, such as surface electromyography, while the multivariate linear regression model is simple and both computational and model training costs are low.|
-|**2023-12-21**|**Towards Non-contact Muscle Activity Estimation using FMCW Radar**|Kukhokuhle Tsengwa et.al.|[2312.14273v1](http://arxiv.org/abs/2312.14273v1)|null|Surface electromyography (sEMG) is a widely used muscle activity monitoring technique. sEMG measures muscle activity through monopolar and bipolar, multi-electrode electrodes. The surface electrodes are placed on the surface of the skin above the target muscle and the received signal can be used to infer the state of the muscle - active, inactive or fatigued - which serves as vital information during neurological and orthopaedic rehabilitation. Additionally, the sEMG signal can also be used for the control of prostheses. sEMG requires contact with the participant's skin and is thus a potentially uncomfortable method for the measurement of muscle activity. Moreover, the setup procedure has been termed time-consuming by sEMG experts and is listed as one of the main barriers to the clinical employment of the technique. Previous studies have shown that architectural changes, particularly muscle deformation, can provide information about the activity of the muscle, providing an alternative to sEMG. In all these studies, the muscle deformation signal is acquired using ultrasound imaging, an approach known as sonomyography (SMG). Despite its advantages, such as improved spatial resolution, SMG is still a contact based approach. In this paper, we propose a non-contact muscle activity monitoring approach that measures the muscle deformation signal using a Frequency Modulated Continuous Wave (FMCW) mmWave radar which we call radiomyography (RMG). In future, this system will enable muscle activation to be measured in an unconstrained and less cumbersome manner for both the person conducting the test and the individual being tested.|
-|**2023-12-20**|**SelfEEG: A Python library for Self-Supervised Learning in Electroencephalography**|Federico Del Pup et.al.|[2401.05405v1](http://arxiv.org/abs/2401.05405v1)|[link](https://github.com/medmaxlab/selfeeg)|SelfEEG is an open-source Python library developed to assist researchers in conducting Self-Supervised Learning (SSL) experiments on electroencephalography (EEG) data. Its primary objective is to offer a user-friendly but highly customizable environment, enabling users to efficiently design and execute self-supervised learning tasks on EEG data.   SelfEEG covers all the stages of a typical SSL pipeline, ranging from data import to model design and training. It includes modules specifically designed to: split data at various granularity levels (e.g., session-, subject-, or dataset-based splits); effectively manage data stored with different configurations (e.g., file extensions, data types) during mini-batch construction; provide a wide range of standard deep learning models, data augmentations and SSL baseline methods applied to EEG data.   Most of the functionalities offered by selfEEG can be executed both on GPUs and CPUs, expanding its usability beyond the self-supervised learning area. Additionally, these functionalities can be employed for the analysis of other biomedical signals often coupled with EEGs, such as electromyography or electrocardiography data.   These features make selfEEG a versatile deep learning tool for biomedical applications and a useful resource in SSL, one of the currently most active fields of Artificial Intelligence.|
-|**2023-12-20**|**EMG-based Control Strategies of a Supernumerary Robotic Hand for the Rehabilitation of Sub-Acute Stroke Patients: Proof of Concept**|Marina Gnocco et.al.|[2312.13009v1](http://arxiv.org/abs/2312.13009v1)|null|One of the most frequent and severe aftermaths of a stroke is the loss of upper limb functionality. Therapy started in the sub-acute phase proved more effective, mainly when the patient participates actively. Recently, a novel set of rehabilitation and support robotic devices, known as supernumerary robotic limbs, have been introduced. This work investigates how a surface electromyography (sEMG) based control strategy would improve their usability in rehabilitation, limited so far by input interfaces requiring to subjects some level of residual mobility. After briefly introducing the phenomena hindering post-stroke sEMG and its use to control robotic hands, we describe a framework to acquire and interpret muscle signals of the forearm extensors. We applied it to drive a supernumerary robotic limb, the SoftHand-X, to provide Task-Specific Training (TST) in patients with sub-acute stroke. We propose and describe two algorithms to control the opening and closing of the robotic hand, with different levels of user agency and therapist control. We experimentally tested the feasibility of the proposed approach on four patients, followed by a therapist, to check their ability to operate the hand. The promising preliminary results indicate sEMG-based control as a viable solution to extend TST to sub-acute post-stroke patients.|
diff --git a/mkdocs/docs/PPG/Photoplethysmography.md b/mkdocs/docs/PPG/Photoplethysmography.md
index d07a7af5..a78a8514 100644
--- a/mkdocs/docs/PPG/Photoplethysmography.md
+++ b/mkdocs/docs/PPG/Photoplethysmography.md
@@ -2,33 +2,3 @@
 ### Photoplethysmography
 |Publish Date|Title|Authors|PDF|Code|Abstract|
 | :---: | :---: | :---: | :---: | :---: | :---: |
-|**2024-07-04**|**Biometric Authentication Based on Enhanced Remote Photoplethysmography Signal Morphology**|Zhaodong Sun et.al.|[2407.04127v1](http://arxiv.org/abs/2407.04127v1)|[link](https://github.com/zhaodongsun/rppg_biometrics)|Remote photoplethysmography (rPPG) is a non-contact method for measuring cardiac signals from facial videos, offering a convenient alternative to contact photoplethysmography (cPPG) obtained from contact sensors. Recent studies have shown that each individual possesses a unique cPPG signal morphology that can be utilized as a biometric identifier, which has inspired us to utilize the morphology of rPPG signals extracted from facial videos for person authentication. Since the facial appearance and rPPG are mixed in the facial videos, we first de-identify facial videos to remove facial appearance while preserving the rPPG information, which protects facial privacy and guarantees that only rPPG is used for authentication. The de-identified videos are fed into an rPPG model to get the rPPG signal morphology for authentication. In the first training stage, unsupervised rPPG training is performed to get coarse rPPG signals. In the second training stage, an rPPG-cPPG hybrid training is performed by incorporating external cPPG datasets to achieve rPPG biometric authentication and enhance rPPG signal morphology. Our approach needs only de-identified facial videos with subject IDs to train rPPG authentication models. The experimental results demonstrate that rPPG signal morphology hidden in facial videos can be used for biometric authentication. The code is available at https://github.com/zhaodongsun/rppg_biometrics.|
-|**2024-07-03**|**Using Photoplethysmography to Detect Real-time Blood Pressure Changes with a Calibration-free Deep Learning Model**|Jingyuan Hong et.al.|[2407.03274v1](http://arxiv.org/abs/2407.03274v1)|null|Blood pressure (BP) changes are linked to individual health status in both clinical and non-clinical settings. This study developed a deep learning model to classify systolic (SBP), diastolic (DBP), and mean (MBP) BP changes using photoplethysmography (PPG) waveforms. Data from the Vital Signs Database (VitalDB) comprising 1,005 ICU patients with synchronized PPG and BP recordings was used. BP changes were categorized into three labels: Spike (increase above a threshold), Stable (change within a plus or minus threshold), and Dip (decrease below a threshold). Four time-series classification models were studied: multi-layer perceptron, convolutional neural network, residual network, and Encoder. A subset of 500 patients was randomly selected for training and validation, ensuring a uniform distribution across BP change labels. Two test datasets were compiled: Test-I (n=500) with a uniform distribution selection process, and Test-II (n=5) without. The study also explored the impact of including second-deviation PPG (sdPPG) waveforms as additional input information. The Encoder model with a Softmax weighting process using both PPG and sdPPG waveforms achieved the highest detection accuracy--exceeding 71.3% and 85.4% in Test-I and Test-II, respectively, with thresholds of 30 mmHg for SBP, 15 mmHg for DBP, and 20 mmHg for MBP. Corresponding F1-scores were over 71.8% and 88.5%. These findings confirm that PPG waveforms are effective for real-time monitoring of BP changes in ICU settings and suggest potential for broader applications.|
-|**2024-06-21**|**Deep Imbalanced Regression to Estimate Vascular Age from PPG Data: a Novel Digital Biomarker for Cardiovascular Health**|Guangkun Nie et.al.|[2406.14953v2](http://arxiv.org/abs/2406.14953v2)|[link](https://github.com/Ngk03/Dist-Loss)|Photoplethysmography (PPG) is emerging as a crucial tool for monitoring human hemodynamics, with recent studies highlighting its potential in assessing vascular aging through deep learning. However, real-world age distributions are often imbalanced, posing significant challenges for deep learning models. In this paper, we introduce a novel, simple, and effective loss function named the Dist Loss to address deep imbalanced regression tasks. We trained a one-dimensional convolutional neural network (Net1D) incorporating the Dist Loss on the extensive UK Biobank dataset (n=502,389) to estimate vascular age from PPG signals and validate its efficacy in characterizing cardiovascular health. The model's performance was validated on a 40% held-out test set, achieving state-of-the-art results, especially in regions with small sample sizes. Furthermore, we divided the population into three subgroups based on the difference between predicted vascular age and chronological age: less than -10 years, between -10 and 10 years, and greater than 10 years. We analyzed the relationship between predicted vascular age and several cardiovascular events over a follow-up period of up to 10 years, including death, coronary heart disease, and heart failure. Our results indicate that the predicted vascular age has significant potential to reflect an individual's cardiovascular health status. Our code will be available at https://github.com/Ngk03/AI-vascular-age.|
-|**2024-06-04**|**Ghost imaging-based Non-contact Heart Rate Detection**|Jianming Yu et.al.|[2406.02640v1](http://arxiv.org/abs/2406.02640v1)|null|Remote heart rate measurement is an increasingly concerned research field, usually using remote photoplethysmography (rPPG) to collect heart rate information through video data collection. However, in certain specific scenarios (such as low light conditions, intense lighting, and non-line-of-sight situations), traditional imaging methods fail to capture image information effectively, that may lead to difficulty or inability in measuring heart rate. To address these limitations, this study proposes using ghost imaging as a substitute for traditional imaging in the aforementioned scenarios. The mean absolute error between experimental measurements and reference true values is 4.24 bpm.Additionally, the bucket signals obtained by the ghost imaging system can be directly processed using digital signal processing techniques, thereby enhancing personal privacy protection.|
-|**2024-05-29**|**Local nature of 0.1 Hz oscillations in microcirculation is confirmed by imaging photoplethysmography**|Irina A. Mizeva et.al.|[2405.18760v1](http://arxiv.org/abs/2405.18760v1)|null|Low-frequency oscillations in the human circulatory system is important for basic physiology and practical applications in clinical medicine. Our objective was to study which mechanism (central or local) is responsible for changes in blood flow fluctuations at around 0.1 Hz. We used the method of imaging photoplethysmography synchronized with electrocardiography to measure blood-flow response to local forearm heating of 18 healthy male volunteers. The dynamics of peripheral perfusion was revealed by a correlation processing of photoplethysmography data, and the central hemodynamics was assessed from the electrocardiogram. Wavelet analysis was used to estimate the dynamics of spectral components. Our results show that skin heating leads to multiple increase in local perfusion accompanied by drop in blood flow oscillations at 0.1 Hz, whereas no changes in heart rate variability was observed. After switching off the heating, perfusion remains at the high level, regardless decrease in skin temperature. The 0.1 Hz oscillations are smoothly recovered to the base level. In conclusion, we confirm the local nature of fluctuations in peripheral blood flow in the frequency band of about 0.1 Hz. A significant, but time-delayed, recovery of fluctuation energy in this frequency range after cessation of the skin warming was discovered. This study reveals a novel factor involved in the regulation microcirculatory vascular tone. A comprehensive study of hemodynamics using the new technique of imaging photoplethysmography synchronized with electrocardiography is a prerequisite for development of a valuable diagnostic tool.|
-|**2024-05-23**|**Deep Learning Classification of Photoplethysmogram Signal for Hypertension Levels**|Nida Nasir et.al.|[2405.14556v1](http://arxiv.org/abs/2405.14556v1)|null|Continuous photoplethysmography (PPG)-based blood pressure monitoring is necessary for healthcare and fitness applications. In Artificial Intelligence (AI), signal classification levels with the machine and deep learning arrangements need to be explored further. Techniques based on time-frequency spectra, such as Short-time Fourier Transform (STFT), have been used to address the challenges of motion artifact correction. Therefore, the proposed study works with PPG signals of more than 200 patients (650+ signal samples) with hypertension, using STFT with various Neural Networks (Convolution Neural Network (CNN), Long Short-Term Memory (LSTM), Bidirectional Long Short-Term Memory (Bi-LSTM), followed by machine learning classifiers, such as, Support Vector Machine (SVM) and Random Forest (RF). The classification has been done for two categories: Prehypertension (normal levels) and Hypertension (includes Stage I and Stage II). Various performance metrics have been obtained with two batch sizes of 3 and 16 for the fusion of the neural networks. With precision and specificity of 100% and recall of 82.1%, the LSTM model provides the best results among all combinations of Neural Networks. However, the maximum accuracy of 71.9% is achieved by the LSTM-CNN model. Further stacked Ensemble method has been used to achieve 100% accuracy for Meta-LSTM-RF, Meta- LSTM-CNN-RF and Meta- STFT-CNN-SVM.|
-|**2024-05-19**|**Uncertainty-Aware PPG-2-ECG for Enhanced Cardiovascular Diagnosis using Diffusion Models**|Omer Belhasin et.al.|[2405.11566v1](http://arxiv.org/abs/2405.11566v1)|null|Analyzing the cardiovascular system condition via Electrocardiography (ECG) is a common and highly effective approach, and it has been practiced and perfected over many decades. ECG sensing is non-invasive and relatively easy to acquire, and yet it is still cumbersome for holter monitoring tests that may span over hours and even days. A possible alternative in this context is Photoplethysmography (PPG): An optically-based signal that measures blood volume fluctuations, as typically sensed by conventional ``wearable devices''. While PPG presents clear advantages in acquisition, convenience, and cost-effectiveness, ECG provides more comprehensive information, allowing for a more precise detection of heart conditions. This implies that a conversion from PPG to ECG, as recently discussed in the literature, inherently involves an unavoidable level of uncertainty. In this paper we introduce a novel methodology for addressing the PPG-2-ECG conversion, and offer an enhanced classification of cardiovascular conditions using the given PPG, all while taking into account the uncertainties arising from the conversion process. We provide a mathematical justification for our proposed computational approach, and present empirical studies demonstrating its superior performance compared to state-of-the-art baseline methods.|
-|**2024-05-10**|**PhysMLE: Generalizable and Priors-Inclusive Multi-task Remote Physiological Measurement**|Jiyao Wang et.al.|[2405.06201v1](http://arxiv.org/abs/2405.06201v1)|null|Remote photoplethysmography (rPPG) has been widely applied to measure heart rate from face videos. To increase the generalizability of the algorithms, domain generalization (DG) attracted increasing attention in rPPG. However, when rPPG is extended to simultaneously measure more vital signs (e.g., respiration and blood oxygen saturation), achieving generalizability brings new challenges. Although partial features shared among different physiological signals can benefit multi-task learning, the sparse and imbalanced target label space brings the seesaw effect over task-specific feature learning. To resolve this problem, we designed an end-to-end Mixture of Low-rank Experts for multi-task remote Physiological measurement (PhysMLE), which is based on multiple low-rank experts with a novel router mechanism, thereby enabling the model to adeptly handle both specifications and correlations within tasks. Additionally, we introduced prior knowledge from physiology among tasks to overcome the imbalance of label space under real-world multi-task physiological measurement. For fair and comprehensive evaluations, this paper proposed a large-scale multi-task generalization benchmark, named Multi-Source Synsemantic Domain Generalization (MSSDG) protocol. Extensive experiments with MSSDG and intra-dataset have shown the effectiveness and efficiency of PhysMLE. In addition, a new dataset was collected and made publicly available to meet the needs of the MSSDG.|
-|**2024-05-04**|**Deep Pulse-Signal Magnification for remote Heart Rate Estimation in Compressed Videos**|Joaquim Comas et.al.|[2405.02652v2](http://arxiv.org/abs/2405.02652v2)|null|Recent advancements in data-driven approaches for remote photoplethysmography (rPPG) have significantly improved the accuracy of remote heart rate estimation. However, the performance of such approaches worsens considerably under video compression, which is nevertheless necessary to store and transmit video data efficiently. In this paper, we present a novel approach to address the impact of video compression on rPPG estimation, which leverages a pulse-signal magnification transformation to adapt compressed videos to an uncompressed data domain in which the rPPG signal is magnified. We validate the effectiveness of our model by exhaustive evaluations on two publicly available datasets, UCLA-rPPG and UBFC-rPPG, employing both intra- and cross-database performance at several compression rates. Additionally, we assess the robustness of our approach on two additional highly compressed and widely-used datasets, MAHNOB-HCI and COHFACE, which reveal outstanding heart rate estimation results.|
-|**2024-05-02**|**KID-PPG: Knowledge Informed Deep Learning for Extracting Heart Rate from a Smartwatch**|Christodoulos Kechris et.al.|[2405.09559v1](http://arxiv.org/abs/2405.09559v1)|[link](https://github.com/esl-epfl/KID-PPG)|Accurate extraction of heart rate from photoplethysmography (PPG) signals remains challenging due to motion artifacts and signal degradation. Although deep learning methods trained as a data-driven inference problem offer promising solutions, they often underutilize existing knowledge from the medical and signal processing community. In this paper, we address three shortcomings of deep learning models: motion artifact removal, degradation assessment, and physiologically plausible analysis of the PPG signal. We propose KID-PPG, a knowledge-informed deep learning model that integrates expert knowledge through adaptive linear filtering, deep probabilistic inference, and data augmentation. We evaluate KID-PPG on the PPGDalia dataset, achieving an average mean absolute error of 2.85 beats per minute, surpassing existing reproducible methods. Our results demonstrate a significant performance improvement in heart rate tracking through the incorporation of prior knowledge into deep learning models. This approach shows promise in enhancing various biomedical applications by incorporating existing expert knowledge in deep learning models.|
-|**2024-05-02**|**Evaluation of Video-Based rPPG in Challenging Environments: Artifact Mitigation and Network Resilience**|Nhi Nguyen et.al.|[2405.01230v1](http://arxiv.org/abs/2405.01230v1)|null|Video-based remote photoplethysmography (rPPG) has emerged as a promising technology for non-contact vital sign monitoring, especially under controlled conditions. However, the accurate measurement of vital signs in real-world scenarios faces several challenges, including artifacts induced by videocodecs, low-light noise, degradation, low dynamic range, occlusions, and hardware and network constraints. In this article, we systematically investigate comprehensive investigate these issues, measuring their detrimental effects on the quality of rPPG measurements. Additionally, we propose practical strategies for mitigating these challenges to improve the dependability and resilience of video-based rPPG systems. We detail methods for effective biosignal recovery in the presence of network limitations and present denoising and inpainting techniques aimed at preserving video frame integrity. Through extensive evaluations and direct comparisons, we demonstrate the effectiveness of the approaches in enhancing rPPG measurements under challenging environments, contributing to the development of more reliable and effective remote vital sign monitoring technologies.|
-|**2024-04-26**|**SiamQuality: A ConvNet-Based Foundation Model for Imperfect Physiological Signals**|Cheng Ding et.al.|[2404.17667v1](http://arxiv.org/abs/2404.17667v1)|[link](https://github.com/chengding0713/siamquality)|Foundation models, especially those using transformers as backbones, have gained significant popularity, particularly in language and language-vision tasks. However, large foundation models are typically trained on high-quality data, which poses a significant challenge, given the prevalence of poor-quality real-world data. This challenge is more pronounced for developing foundation models for physiological data; such data are often noisy, incomplete, or inconsistent. The present work aims to provide a toolset for developing foundation models on physiological data. We leverage a large dataset of photoplethysmography (PPG) signals from hospitalized intensive care patients. For this data, we propose SimQuality, a novel self-supervised learning task based on convolutional neural networks (CNNs) as the backbone to enforce representations to be similar for good and poor quality signals that are from similar physiological states. We pre-trained the SimQuality on over 36 million 30-second PPG pairs and then fine-tuned and tested on six downstream tasks using external datasets. The results demonstrate the superiority of the proposed approach on all the downstream tasks, which are extremely important for heart monitoring on wearable devices. Our method indicates that CNNs can be an effective backbone for foundation models that are robust to training data quality.|
-|**2024-04-20**|**SiNC+: Adaptive Camera-Based Vitals with Unsupervised Learning of Periodic Signals**|Jeremy Speth et.al.|[2404.13449v1](http://arxiv.org/abs/2404.13449v1)|null|Subtle periodic signals, such as blood volume pulse and respiration, can be extracted from RGB video, enabling noncontact health monitoring at low cost. Advancements in remote pulse estimation -- or remote photoplethysmography (rPPG) -- are currently driven by deep learning solutions. However, modern approaches are trained and evaluated on benchmark datasets with ground truth from contact-PPG sensors. We present the first non-contrastive unsupervised learning framework for signal regression to mitigate the need for labelled video data. With minimal assumptions of periodicity and finite bandwidth, our approach discovers the blood volume pulse directly from unlabelled videos. We find that encouraging sparse power spectra within normal physiological bandlimits and variance over batches of power spectra is sufficient for learning visual features of periodic signals. We perform the first experiments utilizing unlabelled video data not specifically created for rPPG to train robust pulse rate estimators. Given the limited inductive biases, we successfully applied the same approach to camera-based respiration by changing the bandlimits of the target signal. This shows that the approach is general enough for unsupervised learning of bandlimited quasi-periodic signals from different domains. Furthermore, we show that the framework is effective for finetuning models on unlabelled video from a single subject, allowing for personalized and adaptive signal regressors.|
-|**2024-04-15**|**SQUWA: Signal Quality Aware DNN Architecture for Enhanced Accuracy in Atrial Fibrillation Detection from Noisy PPG Signals**|Runze Yan et.al.|[2404.15353v1](http://arxiv.org/abs/2404.15353v1)|[link](https://github.com/runz96/squwa)|Atrial fibrillation (AF), a common cardiac arrhythmia, significantly increases the risk of stroke, heart disease, and mortality. Photoplethysmography (PPG) offers a promising solution for continuous AF monitoring, due to its cost efficiency and integration into wearable devices. Nonetheless, PPG signals are susceptible to corruption from motion artifacts and other factors often encountered in ambulatory settings. Conventional approaches typically discard corrupted segments or attempt to reconstruct original signals, allowing for the use of standard machine learning techniques. However, this reduces dataset size and introduces biases, compromising prediction accuracy and the effectiveness of continuous monitoring. We propose a novel deep learning model, Signal Quality Weighted Fusion of Attentional Convolution and Recurrent Neural Network (SQUWA), designed to learn how to retain accurate predictions from partially corrupted PPG. Specifically, SQUWA innovatively integrates an attention mechanism that directly considers signal quality during the learning process, dynamically adjusting the weights of time series segments based on their quality. This approach enhances the influence of higher-quality segments while reducing that of lower-quality ones, effectively utilizing partially corrupted segments. This approach represents a departure from the conventional methods that exclude such segments, enabling the utilization of a broader range of data, which has great implications for less disruption when monitoring of AF risks and more accurate estimation of AF burdens. Our extensive experiments show that SQUWA outperform existing PPG-based models, achieving the highest AUCPR of 0.89 with label noise mitigation. This also exceeds the 0.86 AUCPR of models trained with using both electrocardiogram (ECG) and PPG data.|
-|**2024-04-15**|**TransfoRhythm: A Transformer Architecture Conductive to Blood Pressure Estimation via Solo PPG Signal Capturing**|Amir Arjomand et.al.|[2404.15352v1](http://arxiv.org/abs/2404.15352v1)|null|Recent statistics indicate that approximately 1.3 billion individuals worldwide suffer from hypertension, a leading cause of premature death globally. Blood pressure (BP) serves as a critical health indicator for accurate and timely diagnosis and/or treatment of hypertension. Driven by recent advancements in Artificial Intelligence (AI) and Deep Neural Networks (DNNs), there has been a surge of interest in developing data-driven and cuff-less BP estimation solutions. In this context, current literature predominantly focuses on coupling Electrocardiography (ECG) and Photoplethysmography (PPG) sensors, though this approach is constrained by reliance on multiple sensor types. An alternative, utilizing standalone PPG signals, presents challenges due to the absence of auxiliary sensors (ECG), requiring the use of morphological features while addressing motion artifacts and high-frequency noise. To address these issues, the paper introduces the TransfoRhythm framework, a Transformer-based DNN architecture built upon the recently released physiological database, MIMIC-IV. Leveraging Multi-Head Attention (MHA) mechanism, TransfoRhythm identifies dependencies and similarities across data segments, forming a robust framework for cuff-less BP estimation solely using PPG signals. To our knowledge, this paper represents the first study to apply the MIMIC IV dataset for cuff-less BP estimation, and TransfoRhythm is the first MHA-based model trained via MIMIC IV for BP prediction. Performance evaluation through comprehensive experiments demonstrates TransfoRhythm's superiority over its state-of-the-art counterparts. Specifically, TransfoRhythm achieves highly accurate results with Root Mean Square Error (RMSE) of [1.84, 1.42] and Mean Absolute Error (MAE) of [1.50, 1.17] for systolic and diastolic blood pressures, respectively.|
-|**2024-04-14**|**Orientation-conditioned Facial Texture Mapping for Video-based Facial Remote Photoplethysmography Estimation**|Sam Cantrill et.al.|[2404.09378v3](http://arxiv.org/abs/2404.09378v3)|null|Camera-based remote photoplethysmography (rPPG) enables contactless measurement of important physiological signals such as pulse rate (PR). However, dynamic and unconstrained subject motion introduces significant variability into the facial appearance in video, confounding the ability of video-based methods to accurately extract the rPPG signal. In this study, we leverage the 3D facial surface to construct a novel orientation-conditioned facial texture video representation which improves the motion robustness of existing video-based facial rPPG estimation methods. Our proposed method achieves a significant 18.2% performance improvement in cross-dataset testing on MMPD over our baseline using the PhysNet model trained on PURE, highlighting the efficacy and generalization benefits of our designed video representation. We demonstrate significant performance improvements of up to 29.6% in all tested motion scenarios in cross-dataset testing on MMPD, even in the presence of dynamic and unconstrained subject motion, emphasizing the benefits of disentangling motion through modeling the 3D facial surface for motion robust facial rPPG estimation. We validate the efficacy of our design decisions and the impact of different video processing steps through an ablation study. Our findings illustrate the potential strengths of exploiting the 3D facial surface as a general strategy for addressing dynamic and unconstrained subject motion in videos. The code is available at https://samcantrill.github.io/orientation-uv-rppg/.|
-|**2024-04-12**|**Mental Stress Detection: Development and Evaluation of a Wearable In-Ear Plethysmography**|Hika Barki et.al.|[2404.08212v2](http://arxiv.org/abs/2404.08212v2)|null|Mental stress is a prevalent condition that can have negative impacts on one's health. Early detection and treatment are crucial for preventing related illnesses and maintaining overall wellness. This study presents a new method for identifying mental stress using a wearable biosensor worn in the ear. Data was gathered from 14 participants in a controlled environment using stress-inducing tasks such as memory and math tests. The raw photoplethysmography data was then processed by filtering, segmenting, and transforming it into scalograms using a continuous wavelet transform (CWT) which are based on two different mother wavelets, namely, a generalized Morse wavelet and the analytic Morlet (Gabor) wavelet. The scalograms were then passed through a convolutional neural network classifier, GoogLeNet, to classify the signals as stressed or non-stressed. The method achieved an outstanding result using the generalized Morse wavelet with an accuracy of 91.02% and an F1-score of 90.95%. This method demonstrates promise as a reliable tool for early detection and treatment of mental stress by providing real-time monitoring and allowing for preventive measures to be taken before it becomes a serious issue.|
-|**2024-04-12**|**Measuring Domain Shifts using Deep Learning Remote Photoplethysmography Model Similarity**|Nathan Vance et.al.|[2404.08184v1](http://arxiv.org/abs/2404.08184v1)|null|Domain shift differences between training data for deep learning models and the deployment context can result in severe performance issues for models which fail to generalize. We study the domain shift problem under the context of remote photoplethysmography (rPPG), a technique for video-based heart rate inference. We propose metrics based on model similarity which may be used as a measure of domain shift, and we demonstrate high correlation between these metrics and empirical performance. One of the proposed metrics with viable correlations, DS-diff, does not assume access to the ground truth of the target domain, i.e. it may be applied to in-the-wild data. To that end, we investigate a model selection problem in which ground truth results for the evaluation domain is not known, demonstrating a 13.9% performance improvement over the average case baseline.|
-|**2024-04-11**|**Resolve Domain Conflicts for Generalizable Remote Physiological Measurement**|Weiyu Sun et.al.|[2404.07855v1](http://arxiv.org/abs/2404.07855v1)|[link](https://github.com/swy666/rppg-doha)|Remote photoplethysmography (rPPG) technology has become increasingly popular due to its non-invasive monitoring of various physiological indicators, making it widely applicable in multimedia interaction, healthcare, and emotion analysis. Existing rPPG methods utilize multiple datasets for training to enhance the generalizability of models. However, they often overlook the underlying conflict issues across different datasets, such as (1) label conflict resulting from different phase delays between physiological signal labels and face videos at the instance level, and (2) attribute conflict stemming from distribution shifts caused by head movements, illumination changes, skin types, etc. To address this, we introduce the DOmain-HArmonious framework (DOHA). Specifically, we first propose a harmonious phase strategy to eliminate uncertain phase delays and preserve the temporal variation of physiological signals. Next, we design a harmonious hyperplane optimization that reduces irrelevant attribute shifts and encourages the model's optimization towards a global solution that fits more valid scenarios. Our experiments demonstrate that DOHA significantly improves the performance of existing methods under multiple protocols. Our code is available at https://github.com/SWY666/rPPG-DOHA.|
-|**2024-04-11**|**Multimodal Emotion Recognition by Fusing Video Semantic in MOOC Learning Scenarios**|Yuan Zhang et.al.|[2404.07484v1](http://arxiv.org/abs/2404.07484v1)|null|In the Massive Open Online Courses (MOOC) learning scenario, the semantic information of instructional videos has a crucial impact on learners' emotional state. Learners mainly acquire knowledge by watching instructional videos, and the semantic information in the videos directly affects learners' emotional states. However, few studies have paid attention to the potential influence of the semantic information of instructional videos on learners' emotional states. To deeply explore the impact of video semantic information on learners' emotions, this paper innovatively proposes a multimodal emotion recognition method by fusing video semantic information and physiological signals. We generate video descriptions through a pre-trained large language model (LLM) to obtain high-level semantic information about instructional videos. Using the cross-attention mechanism for modal interaction, the semantic information is fused with the eye movement and PhotoPlethysmoGraphy (PPG) signals to obtain the features containing the critical information of the three modes. The accurate recognition of learners' emotional states is realized through the emotion classifier. The experimental results show that our method has significantly improved emotion recognition performance, providing a new perspective and efficient method for emotion recognition research in MOOC learning scenarios. The method proposed in this paper not only contributes to a deeper understanding of the impact of instructional videos on learners' emotional states but also provides a beneficial reference for future research on emotion recognition in MOOC learning scenarios.|
-|**2024-04-10**|**SleepPPG-Net2: Deep learning generalization for sleep staging from photoplethysmography**|Shirel Attia et.al.|[2404.06869v1](http://arxiv.org/abs/2404.06869v1)|null|Background: Sleep staging is a fundamental component in the diagnosis of sleep disorders and the management of sleep health. Traditionally, this analysis is conducted in clinical settings and involves a time-consuming scoring procedure. Recent data-driven algorithms for sleep staging, using the photoplethysmogram (PPG) time series, have shown high performance on local test sets but lower performance on external datasets due to data drift. Methods: This study aimed to develop a generalizable deep learning model for the task of four class (wake, light, deep, and rapid eye movement (REM)) sleep staging from raw PPG physiological time-series. Six sleep datasets, totaling 2,574 patients recordings, were used. In order to create a more generalizable representation, we developed and evaluated a deep learning model called SleepPPG-Net2, which employs a multi-source domain training approach.SleepPPG-Net2 was benchmarked against two state-of-the-art models. Results: SleepPPG-Net2 showed consistently higher performance over benchmark approaches, with generalization performance (Cohen's kappa) improving by up to 19%. Performance disparities were observed in relation to age, sex, and sleep apnea severity. Conclusion: SleepPPG-Net2 sets a new standard for staging sleep from raw PPG time-series.|
-|**2024-04-09**|**RhythmMamba: Fast Remote Physiological Measurement with Arbitrary Length Videos**|Bochao Zou et.al.|[2404.06483v1](http://arxiv.org/abs/2404.06483v1)|[link](https://github.com/zizheng-guo/rhythmmamba)|Remote photoplethysmography (rPPG) is a non-contact method for detecting physiological signals from facial videos, holding great potential in various applications such as healthcare, affective computing, and anti-spoofing. Existing deep learning methods struggle to address two core issues of rPPG simultaneously: extracting weak rPPG signals from video segments with large spatiotemporal redundancy and understanding the periodic patterns of rPPG among long contexts. This represents a trade-off between computational complexity and the ability to capture long-range dependencies, posing a challenge for rPPG that is suitable for deployment on mobile devices. Based on the in-depth exploration of Mamba's comprehension of spatial and temporal information, this paper introduces RhythmMamba, an end-to-end Mamba-based method that employs multi-temporal Mamba to constrain both periodic patterns and short-term trends, coupled with frequency domain feed-forward to enable Mamba to robustly understand the quasi-periodic patterns of rPPG. Extensive experiments show that RhythmMamba achieves state-of-the-art performance with reduced parameters and lower computational complexity. The proposed RhythmMamba can be applied to video segments of any length without performance degradation. The codes are available at https://github.com/zizheng-guo/RhythmMamba.|
-|**2024-04-09**|**Exploring the limitations of blood pressure estimation using the photoplethysmography signal**|Felipe M. Dias et.al.|[2404.16049v1](http://arxiv.org/abs/2404.16049v1)|null|Hypertension, a leading contributor to cardiovascular morbidity, underscores the need for accurate and continuous blood pressure (BP) monitoring. Photoplethysmography (PPG) presents a promising approach to this end. However, the precision of BP estimates derived from PPG signals has been the subject of ongoing debate, necessitating a comprehensive evaluation of their effectiveness and constraints. We developed a calibration-based Siamese ResNet model for BP estimation, using a signal input paired with a reference BP reading. We compared the use of normalized PPG (N-PPG) against the normalized Invasive Arterial Blood Pressure (N-IABP) signals as input. The N-IABP signals do not directly present systolic and diastolic values but theoretically provide a more accurate BP measure than PPG signals since it is a direct pressure sensor inside the body. Our strategy establishes a critical benchmark for PPG performance, realistically calibrating expectations for PPG's BP estimation capabilities. Nonetheless, we compared the performance of our models using different signal-filtering conditions to evaluate the impact of filtering on the results. We evaluated our method using the AAMI and the BHS standards employing the VitalDB dataset. The N-IABP signals meet with AAMI standards for both Systolic Blood Pressure (SBP) and Diastolic Blood Pressure (DBP), with errors of 1.29+-6.33mmHg for systolic pressure and 1.17+-5.78mmHg for systolic and diastolic pressure respectively for the raw N-IABP signal. In contrast, N-PPG signals, in their best setup, exhibited inferior performance than N-IABP, presenting 1.49+-11.82mmHg and 0.89+-7.27mmHg for systolic and diastolic pressure respectively. Our findings highlight the potential and limitations of employing PPG for BP estimation, showing that these signals contain information correlated to BP but may not be sufficient for predicting it accurately.|
-|**2024-04-07**|**Camera-Based Remote Physiology Sensing for Hundreds of Subjects Across Skin Tones**|Jiankai Tang et.al.|[2404.05003v1](http://arxiv.org/abs/2404.05003v1)|[link](https://github.com/health-hci-group/largest_rppg_dataset_evaluation)|Remote photoplethysmography (rPPG) emerges as a promising method for non-invasive, convenient measurement of vital signs, utilizing the widespread presence of cameras. Despite advancements, existing datasets fall short in terms of size and diversity, limiting comprehensive evaluation under diverse conditions. This paper presents an in-depth analysis of the VitalVideo dataset, the largest real-world rPPG dataset to date, encompassing 893 subjects and 6 Fitzpatrick skin tones. Our experimentation with six unsupervised methods and three supervised models demonstrates that datasets comprising a few hundred subjects(i.e., 300 for UBFC-rPPG, 500 for PURE, and 700 for MMPD-Simple) are sufficient for effective rPPG model training. Our findings highlight the importance of diversity and consistency in skin tones for precise performance evaluation across different datasets.|
-|**2024-04-05**|**Analyzing Participants' Engagement during Online Meetings Using Unsupervised Remote Photoplethysmography with Behavioral Features**|Alexander Vedernikov et.al.|[2404.04394v2](http://arxiv.org/abs/2404.04394v2)|null|Engagement measurement finds application in healthcare, education, services. The use of physiological and behavioral features is viable, but the impracticality of traditional physiological measurement arises due to the need for contact sensors. We demonstrate the feasibility of unsupervised remote photoplethysmography (rPPG) as an alternative for contact sensors in deriving heart rate variability (HRV) features, then fusing these with behavioral features to measure engagement in online group meetings. Firstly, a unique Engagement Dataset of online interactions among social workers is collected with granular engagement labels, offering insight into virtual meeting dynamics. Secondly, a pre-trained rPPG model is customized to reconstruct rPPG signals from video meetings in an unsupervised manner, enabling the calculation of HRV features. Thirdly, the feasibility of estimating engagement from HRV features using short observation windows, with a notable enhancement when using longer observation windows of two to four minutes, is demonstrated. Fourthly, the effectiveness of behavioral cues is evaluated when fused with physiological data, which further enhances engagement estimation performance. An accuracy of 94% is achieved when only HRV features are used, eliminating the need for contact sensors or ground truth signals; use of behavioral cues raises the accuracy to 96%. Facial analysis offers precise engagement measurement, beneficial for future applications.|
-|**2024-03-15**|**How Suboptimal is Training rPPG Models with Videos and Targets from Different Body Sites?**|Björn Braun et.al.|[2403.10582v1](http://arxiv.org/abs/2403.10582v1)|null|Remote camera measurement of the blood volume pulse via photoplethysmography (rPPG) is a compelling technology for scalable, low-cost, and accessible assessment of cardiovascular information. Neural networks currently provide the state-of-the-art for this task and supervised training or fine-tuning is an important step in creating these models. However, most current models are trained on facial videos using contact PPG measurements from the fingertip as targets/ labels. One of the reasons for this is that few public datasets to date have incorporated contact PPG measurements from the face. Yet there is copious evidence that the PPG signals at different sites on the body have very different morphological features. Is training a facial video rPPG model using contact measurements from another site on the body suboptimal? Using a recently released unique dataset with synchronized contact PPG and video measurements from both the hand and face, we can provide precise and quantitative answers to this question. We obtain up to 40 % lower mean squared errors between the waveforms of the predicted and the ground truth PPG signals using state-of-the-art neural models when using PPG signals from the forehead compared to using PPG signals from the fingertip. We also show qualitatively that the neural models learn to predict the morphology of the ground truth PPG signal better when trained on the forehead PPG signals. However, while models trained from the forehead PPG produce a more faithful waveform, models trained from a finger PPG do still learn the dominant frequency (i.e., the heart rate) well.|
-|**2024-03-14**|**rFaceNet: An End-to-End Network for Enhanced Physiological Signal Extraction through Identity-Specific Facial Contours**|Dali Zhu et.al.|[2403.09034v2](http://arxiv.org/abs/2403.09034v2)|null|Remote photoplethysmography (rPPG) technique extracts blood volume pulse (BVP) signals from subtle pixel changes in video frames. This study introduces rFaceNet, an advanced rPPG method that enhances the extraction of facial BVP signals with a focus on facial contours. rFaceNet integrates identity-specific facial contour information and eliminates redundant data. It efficiently extracts facial contours from temporally normalized frame inputs through a Temporal Compressor Unit (TCU) and steers the model focus to relevant facial regions by using the Cross-Task Feature Combiner (CTFC). Through elaborate training, the quality and interpretability of facial physiological signals extracted by rFaceNet are greatly improved compared to previous methods. Moreover, our novel approach demonstrates superior performance than SOTA methods in various heart rate estimation benchmarks.|
-|**2024-03-11**|**Advancing Generalizable Remote Physiological Measurement through the Integration of Explicit and Implicit Prior Knowledge**|Yuting Zhang et.al.|[2403.06947v1](http://arxiv.org/abs/2403.06947v1)|[link](https://github.com/keke-nice/greip)|Remote photoplethysmography (rPPG) is a promising technology that captures physiological signals from face videos, with potential applications in medical health, emotional computing, and biosecurity recognition. The demand for rPPG tasks has expanded from demonstrating good performance on intra-dataset testing to cross-dataset testing (i.e., domain generalization). However, most existing methods have overlooked the prior knowledge of rPPG, resulting in poor generalization ability. In this paper, we propose a novel framework that simultaneously utilizes explicit and implicit prior knowledge in the rPPG task. Specifically, we systematically analyze the causes of noise sources (e.g., different camera, lighting, skin types, and movement) across different domains and incorporate these prior knowledge into the network. Additionally, we leverage a two-branch network to disentangle the physiological feature distribution from noises through implicit label correlation. Our extensive experiments demonstrate that the proposed method not only outperforms state-of-the-art methods on RGB cross-dataset evaluation but also generalizes well from RGB datasets to NIR datasets. The code is available at https://github.com/keke-nice/Greip.|
-|**2024-02-23**|**Constraint Latent Space Matters: An Anti-anomalous Waveform Transformation Solution from Photoplethysmography to Arterial Blood Pressure**|Cheng Bian et.al.|[2402.17780v1](http://arxiv.org/abs/2402.17780v1)|null|Arterial blood pressure (ABP) holds substantial promise for proactive cardiovascular health management. Notwithstanding its potential, the invasive nature of ABP measurements confines their utility primarily to clinical environments, limiting their applicability for continuous monitoring beyond medical facilities. The conversion of photoplethysmography (PPG) signals into ABP equivalents has garnered significant attention due to its potential in revolutionizing cardiovascular disease management. Recent strides in PPG-to-ABP prediction encompass the integration of generative and discriminative models. Despite these advances, the efficacy of these models is curtailed by the latent space shift predicament, stemming from alterations in PPG data distribution across disparate hardware and individuals, potentially leading to distorted ABP waveforms. To tackle this problem, we present an innovative solution named the Latent Space Constraint Transformer (LSCT), leveraging a quantized codebook to yield robust latent spaces by employing multiple discretizing bases. To facilitate improved reconstruction, the Correlation-boosted Attention Module (CAM) is introduced to systematically query pertinent bases on a global scale. Furthermore, to enhance expressive capacity, we propose the Multi-Spectrum Enhancement Knowledge (MSEK), which fosters local information flow within the channels of latent code and provides additional embedding for reconstruction. Through comprehensive experimentation on both publicly available datasets and a private downstream task dataset, the proposed approach demonstrates noteworthy performance enhancements compared to existing methods. Extensive ablation studies further substantiate the effectiveness of each introduced module.|
-|**2024-02-22**|**Non-Contact Acquisition of PPG Signal using Chest Movement-Modulated Radio Signals**|Israel Jesus Santos Filho et.al.|[2402.14565v1](http://arxiv.org/abs/2402.14565v1)|null|We present for the first time a novel method that utilizes the chest movement-modulated radio signals for non-contact acquisition of the photoplethysmography (PPG) signal. Under the proposed method, a software-defined radio (SDR) exposes the chest of a subject sitting nearby to an orthogonal frequency division multiplexing signal with 64 sub-carriers at a center frequency 5.24 GHz, while another SDR in the close vicinity collects the modulated radio signal reflected off the chest. This way, we construct a custom dataset by collecting 160 minutes of labeled data (both raw radio data as well as the reference PPG signal) from 16 healthy young subjects. With this, we first utilize principal component analysis for dimensionality reduction of the radio data. Next, we denoise the radio signal and reference PPG signal using wavelet technique, followed by segmentation and Z-score normalization. We then synchronize the radio and PPG segments using cross-correlation method. Finally, we proceed to the waveform translation (regression) task, whereby we first convert the radio and PPG segments into frequency domain using discrete cosine transform (DCT), and then learn the non-linear regression between them. Eventually, we reconstruct the synthetic PPG signal by taking inverse DCT of the output of regression block, with a mean absolute error of 8.1294. The synthetic PPG waveform has a great clinical significance as it could be used for non-contact performance assessment of cardiovascular and respiratory systems of patients suffering from infectious diseases, e.g., covid19.|
diff --git a/mkdocs/docs/actigraphy/actigraphy.md b/mkdocs/docs/actigraphy/actigraphy.md
index 55dec3b1..047f0b46 100644
--- a/mkdocs/docs/actigraphy/actigraphy.md
+++ b/mkdocs/docs/actigraphy/actigraphy.md
@@ -2,33 +2,3 @@
 ### actigraphy
 |Publish Date|Title|Authors|PDF|Code|Abstract|
 | :---: | :---: | :---: | :---: | :---: | :---: |
-|**2024-07-04**|**Seamless Monitoring of Stress Levels Leveraging a Universal Model for Time Sequences**|Davide Gabrielli et.al.|[2407.03821v1](http://arxiv.org/abs/2407.03821v1)|null|Monitoring the stress level in patients with neurodegenerative diseases can help manage symptoms, improve patient's quality of life, and provide insight into disease progression. In the literature, ECG, actigraphy, speech, voice, and facial analysis have proven effective at detecting patients' emotions. On the other hand, these tools are invasive and do not integrate smoothly into the patient's daily life. HRV has also been proven to effectively indicate stress conditions, especially in combination with other signals. However, when HRV is derived from less invasive devices than the ECG, like smartwatches and bracelets, the quality of measurements significantly degrades. This paper presents a methodology for stress detection from a smartwatch based on a universal model for time series, UniTS, which we fine-tuned for the task. We cast the problem as anomaly detection rather than classification to favor model adaptation to individual patients and allow the clinician to maintain greater control over the system's predictions. We demonstrate that our proposed model considerably surpasses 12 top-performing methods on 3 benchmark datasets. Furthermore, unlike other state-of-the-art systems, UniTS enables seamless monitoring, as it shows comparable performance when using signals from invasive or lightweight devices.|
-|**2024-02-27**|**Advancing sleep detection by modelling weak label sets: A novel weakly supervised learning approach**|Matthias Boeker et.al.|[2402.17601v1](http://arxiv.org/abs/2402.17601v1)|null|Understanding sleep and activity patterns plays a crucial role in physical and mental health. This study introduces a novel approach for sleep detection using weakly supervised learning for scenarios where reliable ground truth labels are unavailable. The proposed method relies on a set of weak labels, derived from the predictions generated by conventional sleep detection algorithms. Introducing a novel approach, we suggest a novel generalised non-linear statistical model in which the number of weak sleep labels is modelled as outcome of a binomial distribution. The probability of sleep in the binomial distribution is linked to the outcomes of neural networks trained to detect sleep based on actigraphy. We show that maximizing the likelihood function of the model, is equivalent to minimizing the soft cross-entropy loss. Additionally, we explored the use of the Brier score as a loss function for weak labels. The efficacy of the suggested modelling framework was demonstrated using the Multi-Ethnic Study of Atherosclerosis dataset. A \gls{lstm} trained on the soft cross-entropy outperformed conventional sleep detection algorithms, other neural network architectures and loss functions in accuracy and model calibration. This research not only advances sleep detection techniques in scenarios where ground truth data is scarce but also contributes to the broader field of weakly supervised learning by introducing innovative approach in modelling sets of weak labels.|
-|**2023-07-07**|**A Bayesian Circadian Hidden Markov Model to Infer Rest-Activity Rhythms Using 24-hour Actigraphy Data**|Jiachen Lu et.al.|[2307.03832v1](http://arxiv.org/abs/2307.03832v1)|null|24-hour actigraphy data collected by wearable devices offer valuable insights into physical activity types, intensity levels, and rest-activity rhythms (RAR). RARs, or patterns of rest and activity exhibited over a 24-hour period, are regulated by the body's circadian system, synchronizing physiological processes with external cues like the light-dark cycle. Disruptions to these rhythms, such as irregular sleep patterns, daytime drowsiness or shift work, have been linked to adverse health outcomes including metabolic disorders, cardiovascular disease, depression, and even cancer, making RARs a critical area of health research.   In this study, we propose a Bayesian Circadian Hidden Markov Model (BCHMM) that explicitly incorporates 24-hour circadian oscillators mirroring human biological rhythms. The model assumes that observed activity counts are conditional on hidden activity states through Gaussian emission densities, with transition probabilities modeled by state-specific sinusoidal functions. Our comprehensive simulation study reveals that BCHMM outperforms frequentist approaches in identifying the underlying hidden states, particularly when the activity states are difficult to separate. BCHMM also excels with smaller Kullback-Leibler divergence on estimated densities. With the Bayesian framework, we address the label-switching problem inherent to hidden Markov models via a positive constraint on mean parameters. From the proposed BCHMM, we can infer the 24-hour rest-activity profile via time-varying state probabilities, to characterize the person-level RAR. We demonstrate the utility of the proposed BCHMM using 2011-2014 National Health and Nutrition Examination Survey (NHANES) data, where worsened RAR, indicated by lower probabilities in low-activity state during the day and higher probabilities in high-activity state at night, is associated with an increased risk of diabetes.|
-|**2023-03-14**|**Transfer Learning for Real-time Deployment of a Screening Tool for Depression Detection Using Actigraphy**|Rajanikant Ghate et.al.|[2303.07847v1](http://arxiv.org/abs/2303.07847v1)|null|Automated depression screening and diagnosis is a highly relevant problem today. There are a number of limitations of the traditional depression detection methods, namely, high dependence on clinicians and biased self-reporting. In recent years, research has suggested strong potential in machine learning (ML) based methods that make use of the user's passive data collected via wearable devices. However, ML is data hungry. Especially in the healthcare domain primary data collection is challenging. In this work, we present an approach based on transfer learning, from a model trained on a secondary dataset, for the real time deployment of the depression screening tool based on the actigraphy data of users. This approach enables machine learning modelling even with limited primary data samples. A modified version of leave one out cross validation approach performed on the primary set resulted in mean accuracy of 0.96, where in each iteration one subject's data from the primary set was set aside for testing.|
-|**2023-01-04**|**KIDS: kinematics-based (in)activity detection and segmentation in a sleep case study**|Omar Elnaggar et.al.|[2301.03469v1](http://arxiv.org/abs/2301.03469v1)|null|Sleep behaviour and in-bed movements contain rich information on the neurophysiological health of people, and have a direct link to the general well-being and quality of life. Standard clinical practices rely on polysomnography for sleep assessment; however, it is intrusive, performed in unfamiliar environments and requires trained personnel. Progress has been made on less invasive sensor technologies, such as actigraphy, but clinical validation raises concerns over their reliability and precision. Additionally, the field lacks a widely acceptable algorithm, with proposed approaches ranging from raw signal or feature thresholding to data-hungry classification models, many of which are unfamiliar to medical staff. This paper proposes an online Bayesian probabilistic framework for objective (in)activity detection and segmentation based on clinically meaningful joint kinematics, measured by a custom-made wearable sensor. Intuitive three-dimensional visualisations of kinematic timeseries were accomplished through dimension reduction based preprocessing, offering out-of-the-box framework explainability potentially useful for clinical monitoring and diagnosis. The proposed framework attained up to 99.2\% $F_1$-score and 0.96 Pearson's correlation coefficient in, respectively, the posture change detection and inactivity segmentation tasks. The work paves the way for a reliable home-based analysis of movements during sleep which would serve patient-centred longitudinal care plans.|
-|**2022-12-31**|**Definition and clinical validation of Pain Patient States from high-dimensional mobile data: application to a chronic pain cohort**|Jenna M. Reinen et.al.|[2301.00299v1](http://arxiv.org/abs/2301.00299v1)|null|The technical capacity to monitor patients with a mobile device has drastically expanded, but data produced from this approach are often difficult to interpret. We present a solution to produce a meaningful representation of patient status from large, complex data streams, leveraging both a data-driven approach, and use clinical knowledge to validate results. Data were collected from a clinical trial enrolling chronic pain patients, and included questionnaires, voice recordings, actigraphy, and standard health assessments. The data were reduced using a clustering analysis. In an initial exploratory analysis with only questionnaire data, we found up to 3 stable cluster solutions that grouped symptoms on a positive to negative spectrum. Objective features (actigraphy, speech) expanded the cluster solution granularity. Using a 5 state solution with questionnaire and actigraphy data, we found significant correlations between cluster properties and assessments of disability and quality-of-life. The correlation coefficient values showed an ordinal distinction, confirming the cluster ranking on a negative to positive spectrum. This suggests we captured novel, distinct Pain Patient States with this approach, even when multiple clusters were equated on pain magnitude. Relative to using complex time courses of many variables, Pain Patient States holds promise as an interpretable, useful, and actionable metric for a clinician or caregiver to simplify and provide timely delivery of care.|
-|**2022-12-21**|**A hidden Markov modeling approach combining objective measure of activity and subjective measure of self-reported sleep to estimate the sleep-wake cycle**|Semhar B. Ogbagaber et.al.|[2212.11224v1](http://arxiv.org/abs/2212.11224v1)|null|Characterizing the sleep-wake cycle in adolescents is an important prerequisite to better understand the association of abnormal sleep patterns with subsequent clinical and behavioral outcomes. The aim of this research was to develop hidden Markov models (HMM) that incorporate both objective (actigraphy) and subjective (sleep log) measures to estimate the sleep-wake cycle using data from the NEXT longitudinal study, a large population-based cohort study. The model was estimated with a negative binomial distribution for the activity counts (1-minute epochs) to account for overdispersion relative to a Poisson process. Furthermore, self-reported measures were dichotomized (for each one-minute interval) and subject to misclassification. We assumed that the unobserved sleep-wake cycle follows a two-state Markov chain with transitional probabilities varying according to a circadian rhythm. Maximum-likelihood estimation using a backward-forward algorithm was applied to fit the longitudinal data on a subject by subject basis. The algorithm was used to reconstruct the sleep-wake cycle from sequences of self-reported sleep and activity data. Furthermore, we conduct simulations to examine the properties of this approach under different observational patterns including both complete and partially observed measurements on each individual.|
-|**2022-08-30**|**Mediation analysis with densities as mediators with an application to iCOMPARE trial**|Jingru Zhang et.al.|[2208.13939v1](http://arxiv.org/abs/2208.13939v1)|null|Physical activity has long been shown to be associated with biological and physiological performance and risk of diseases. It is of great interest to assess whether the effect of an exposure or intervention on an outcome is mediated through physical activity measured by modern wearable devices such as actigraphy. However, existing methods for mediation analysis focus almost exclusively on mediation variable that is in the Euclidean space, which cannot be applied directly to the actigraphy data of physical activity. Such data is best summarized in the form of an histogram or density. In this paper, we extend the structural equation models (SEMs) to the settings where a density is treated as the mediator to study the indirect mediation effect of physical activity on an outcome. We provide sufficient conditions for identifying the average causal effects of density mediator and present methods for estimating the direct and mediating effects of density on an outcome. We apply our method to the data set from the iCOMPARE trial that compares flexible duty-hour policies and standard duty-hour policies on interns' sleep related outcomes to explore the mediation effect of physical activity on the causal path between flexible duty-hour policies and sleep related outcomes.|
-|**2021-11-29**|**Validating CircaCP: a Generic Sleep-Wake Cycle Detection Algorithm**|Shanshan Chen et.al.|[2111.14960v1](http://arxiv.org/abs/2111.14960v1)|null|Sleep-wake cycle detection is a key step when extrapolating sleep patterns from actigraphy data. Numerous supervised detection algorithms have been developed with parameters estimated from and optimized for a particular dataset, yet their generalizability from sensor to sensor or study to study is unknown. In this paper, we propose and validate an unsupervised algorithm -- CircaCP -- to detect sleep-wake cycles from minute-by-minute actigraphy data. It first uses a robust cosinor model to estimate circadian rhythm, then searches for a single change point (CP) within each cycle. We used CircaCP to estimate sleep/wake onset times (S/WOTs) from 2125 indviduals' data in the MESA Sleep study and compared the estimated S/WOTs against self-reported S/WOT event markers. Lastly, we quantified the biases between estimated and self-reported S/WOTs, as well as variation in S/WOTs contributed by the two methods, using linear mixed-effects models and variance component analysis.   On average, SOTs estimated by CircaCP were five minutes behind those reported by event markers, and WOTs estimated by CircaCP were less than one minute behind those reported by markers. These differences accounted for less than 0.2% variability in SOTs and in WOTs, taking into account other sources of between-subject variations. By focusing on the commonality in human circadian rhythms captured by actigraphy, our algorithm transferred seamlessly from hip-worn ActiGraph data collected from children in our previous study to wrist-worn Actiwatch data collected from adults. The large between- and within-subject variability highlights the need for estimating individual-level S/WOTs when conducting actigraphy research. The generalizability of our algorithm also suggests that it could be widely applied to actigraphy data collected by other wearable sensors.|
-|**2021-07-08**|**Circadian Rhythms are Not Captured Equal: Exploring Circadian Metrics Extracted by Different Computational Methods from Smartphone Accelerometer and GPS Sensors in Daily Life Tracking**|Congyu Wu et.al.|[2107.04135v1](http://arxiv.org/abs/2107.04135v1)|null|Circadian rhythm is the natural biological cycle manifested in human daily routines. A regular and stable rhythm is found to be correlated with good physical and mental health. With the wide adoption of mobile and wearable technology, many types of sensor data, such as GPS and actigraphy, provide evidence for researchers to objectively quantify the circadian rhythm of a user and further use these quantified metrics of circadian rhythm to infer the user's health status. Researchers in computer science and psychology have investigated circadian rhythm using various mobile and wearable sensors in ecologically valid human sensing studies, but questions remain whether and how different data types produce different circadian rhythm results when simultaneously used to monitor a user. We hypothesize that different sensor data reveal different aspects of the user's daily behavior, thus producing different circadian rhythm patterns. In this paper we focus on two data types: GPS and accelerometer data from smartphones. We used smartphone data from 225 college student participants and applied four circadian rhythm characterization methods. We found significant and interesting discrepancies in the rhythmic patterns discovered among sensors, which suggests circadian rhythms discovered from different personal tracking sensors have different levels of sensitivity to device usage and aspects of daily behavior.|
-|**2021-07-01**|**Long-Short Ensemble Network for Bipolar Manic-Euthymic State Recognition Based on Wrist-worn Sensors**|Ulysse Côté-Allard et.al.|[2107.00710v3](http://arxiv.org/abs/2107.00710v3)|[link](https://github.com/UlysseCoteAllard/LongShortNetworkBipolar)|Manic episodes of bipolar disorder can lead to uncritical behaviour and delusional psychosis, often with destructive consequences for those affected and their surroundings. Early detection and intervention of a manic episode are crucial to prevent escalation, hospital admission and premature death. However, people with bipolar disorder may not recognize that they are experiencing a manic episode and symptoms such as euphoria and increased productivity can also deter affected individuals from seeking help. This work proposes to perform user-independent, automatic mood-state detection based on actigraphy and electrodermal activity acquired from a wrist-worn device during mania and after recovery (euthymia). This paper proposes a new deep learning-based ensemble method leveraging long (20h) and short (5 minutes) time-intervals to discriminate between the mood-states. When tested on 47 bipolar patients, the proposed classification scheme achieves an average accuracy of 91.59% in euthymic/manic mood-state recognition.|
-|**2021-05-05**|**Activity-Aware Deep Cognitive Fatigue Assessment using Wearables**|Mohammad Arif Ul Alam et.al.|[2105.02824v1](http://arxiv.org/abs/2105.02824v1)|null|Cognitive fatigue has been a common problem among workers which has become an increasing global problem since the emergence of COVID-19 as a global pandemic. While existing multi-modal wearable sensors-aided automatic cognitive fatigue monitoring tools have focused on physical and physiological sensors (ECG, PPG, Actigraphy) analytic on specific group of people (say gamers, athletes, construction workers), activity-awareness is utmost importance due to its different responses on physiology in different person. In this paper, we propose a novel framework, Activity-Aware Recurrent Neural Network (\emph{AcRoNN}), that can generalize individual activity recognition and improve cognitive fatigue estimation significantly. We evaluate and compare our proposed method with state-of-art methods using one real-time collected dataset from 5 individuals and another publicly available dataset from 27 individuals achieving max. 19% improvement.|
-|**2021-04-28**|**Optimizing Rescoring Rules with Interpretable Representations of Long-Term Information**|Aaron Fisher et.al.|[2104.14291v1](http://arxiv.org/abs/2104.14291v1)|null|Analyzing temporal data (e.g., wearable device data) requires a decision about how to combine information from the recent and distant past. In the context of classifying sleep status from actigraphy, Webster's rescoring rules offer one popular solution based on the long-term patterns in the output of a moving-window model. Unfortunately, the question of how to optimize rescoring rules for any given setting has remained unsolved. To address this problem and expand the possible use cases of rescoring rules, we propose rephrasing these rules in terms of epoch-specific features. Our features take two general forms: (1) the time lag between now and the most recent [or closest upcoming] bout of time spent in a given state, and (2) the length of the most recent [or closest upcoming] bout of time spent in a given state. Given any initial moving window model, these features can be defined recursively, allowing for straightforward optimization of rescoring rules. Joint optimization of the moving window model and the subsequent rescoring rules can also be implemented using gradient-based optimization software, such as Tensorflow. Beyond binary classification problems (e.g., sleep-wake), the same approach can be applied to summarize long-term patterns for multi-state classification problems (e.g., sitting, walking, or stair climbing). We find that optimized rescoring rules improve the performance of sleep-wake classifiers, achieving accuracy comparable to that of certain neural network architectures.|
-|**2021-01-05**|**Bayesian Hierarchical Modeling and Analysis for Actigraph Data from Wearable Devices**|Pierfrancesco Alaimo Di Loro et.al.|[2101.01624v4](http://arxiv.org/abs/2101.01624v4)|[link](https://github.com/minmar94/EfficientTNNGPforActigraph)|The majority of Americans fail to achieve recommended levels of physical activity, which leads to numerous preventable health problems such as diabetes, hypertension, and heart diseases. This has generated substantial interest in monitoring human activity to gear interventions toward environmental features that may relate to higher physical activity. Wearable devices, such as wrist-worn sensors that monitor gross motor activity (actigraph units) continuously record the activity levels of a subject, producing massive amounts of high-resolution measurements. Analyzing actigraph data needs to account for spatial and temporal information on trajectories or paths traversed by subjects wearing such devices. Inferential objectives include estimating a subject's physical activity levels along a given trajectory; identifying trajectories that are more likely to produce higher levels of physical activity for a given subject; and predicting expected levels of physical activity in any proposed new trajectory for a given set of health attributes. Here, we devise a Bayesian hierarchical modeling framework for spatial-temporal actigraphy data to deliver fully model-based inference on trajectories while accounting for subject-level health attributes and spatial-temporal dependencies. We undertake a comprehensive analysis of an original dataset from the Physical Activity through Sustainable Transport Approaches in Los Angeles (PASTA-LA) study to ascertain spatial zones and trajectories exhibiting significantly higher levels of physical activity while accounting for various sources of heterogeneity.|
-|**2020-11-14**|**Using Convolutional Variational Autoencoders to Predict Post-Trauma Health Outcomes from Actigraphy Data**|Ayse S. Cakmak et.al.|[2011.07406v2](http://arxiv.org/abs/2011.07406v2)|null|Depression and post-traumatic stress disorder (PTSD) are psychiatric conditions commonly associated with experiencing a traumatic event. Estimating mental health status through non-invasive techniques such as activity-based algorithms can help to identify successful early interventions. In this work, we used locomotor activity captured from 1113 individuals who wore a research grade smartwatch post-trauma. A convolutional variational autoencoder (VAE) architecture was used for unsupervised feature extraction from four weeks of actigraphy data. By using VAE latent variables and the participant's pre-trauma physical health status as features, a logistic regression classifier achieved an area under the receiver operating characteristic curve (AUC) of 0.64 to estimate mental health outcomes. The results indicate that the VAE model is a promising approach for actigraphy data analysis for mental health outcomes in long-term studies.|
-|**2020-08-06**|**Fatigue Assessment using ECG and Actigraphy Sensors**|Yang Bai et.al.|[2008.02871v2](http://arxiv.org/abs/2008.02871v2)|[link](https://github.com/baiyang4/Sjogrens_questionnaire)|Fatigue is one of the key factors in the loss of work efficiency and health-related quality of life, and most fatigue assessment methods were based on self-reporting, which may suffer from many factors such as recall bias. To address this issue, we developed an automated system using wearable sensing and machine learning techniques for objective fatigue assessment. ECG/Actigraphy data were collected from subjects in free-living environments. Preprocessing and feature engineering methods were applied, before interpretable solution and deep learning solution were introduced. Specifically, for interpretable solution, we proposed a feature selection approach which can select less correlated and high informative features for better understanding system's decision-making process. For deep learning solution, we used state-of-the-art self-attention model, based on which we further proposed a consistency self-attention (CSA) mechanism for fatigue assessment. Extensive experiments were conducted, and very promising results were achieved.|
-|**2019-06-03**|**Deep learning from wristband sensor data: towards wearable, non-invasive seizure forecasting**|Christian Meisel et.al.|[1906.00511v2](http://arxiv.org/abs/1906.00511v2)|null|Seizure forecasting may provide patients with timely warnings to adapt their daily activities and help clinicians deliver more objective, personalized treatments. While recent work has convincingly demonstrated that seizure risk assessment is possible, these early approaches relied largely on complex, often invasive setups including intracranial electrocorticography, implanted devices and multi-channel EEG, which limits translation of these methods to broad clinical application. To facilitate broader adaptation of seizure forecasting in clinical practice, non-invasive, easily applicable techniques that reliably assess seizure risk, in combination with clinical information, are crucial. Wristbands that continuously record physiological parameters, including electrodermal activity, body temperature, blood volume pressure and actigraphy, may afford monitoring of autonomous nervous system function and movement relevant for such a task, hence minimizing potential complications associated with invasive monitoring, and avoiding stigma associated with bulky external monitoring devices on the head. Here, we use deep learning to analyze long-term, multi-modal wristband sensor data from 50 patients with epilepsy (total duration $>$1400 hours) to assess its capability to distinguish preictal from interictal states. Prediction performance is assessed using area under the receiver operating charateristic (AUC) and improvement over chance (IoC) based on F1 scores. Using one- and two-dimensional convolutional neural networks, we identified better-than-chance predictability in out-of-sample test data in 60\% of the patients in leave-one-out and 43\% of patients in pseudo-prospective approaches. These results provide a step towards developing easier to apply, non-invasive methods for seizure risk assessments in patients with epilepsy.|
-|**2019-03-28**|**A Generic Algorithm for Sleep-Wake Cycle Detection using Unlabeled Actigraphy Data**|Shanshan Chen et.al.|[1904.05313v1](http://arxiv.org/abs/1904.05313v1)|null|One key component when analyzing actigraphy data for sleep studies is sleep-wake cycle detection. Most detection algorithms rely on accurate sleep diary labels to generate supervised classifiers, with parameters optimized for a particular dataset. However, once the actigraphy trackers are deployed in the field, labels for training models and validating detection accuracy are often not available.   In this paper, we propose a generic, training-free algorithm to detect sleep-wake cycles from minute-by-minute actigraphy. Leveraging a robust nonlinear parametric model, our proposed method refines the detection region by searching for a single change point within bounded regions defined by the parametric model. Challenged by the absence of ground truth labels, we also propose an evaluation metric dedicated to this problem. Tested on week-long actigraphy from 112 children, the results show that the proposed algorithm improves on the baseline model consistently and significantly (p<3e-15). Moreover, focusing on the commonality in human circadian rhythm captured by actigraphy, the proposed method is generic to data collected by various actigraphy trackers, circumventing the laborious label collection step in developing customized classifiers for sleep detection.|
-|**2019-02-10**|**Classifying attention deficit hyperactivity disorder in children with non-linearities in actigraphy**|Jeremi K. Ochab et.al.|[1902.03530v1](http://arxiv.org/abs/1902.03530v1)|null|Objective This study provides an objective measure based on actigraphy for Attention Deficit Hyperactivity Disorder (ADHD) diagnosis in children. We search for motor activity features that could allow further investigation into their association with other neurophysiological disordered traits.   Method The study involved $n=29$ (48 eligible) male participants aged $9.89\pm0.92$ years (8 controls, and 7 in each group: ADHD combined subtype, ADHD hyperactive-impulsive subtype, and autism spectrum disorder, ASD) wearing a wristwatch actigraph continuously for a week ($9\%$ losses in daily records) in two acquisition modes. We analyzed 47 quantities: from sleep duration or movement intensity to theory-driven scaling exponents or non-linear prediction errors of both diurnal and nocturnal activity. We used them in supervised classification to obtain cross-validated diagnostic performance.   Results We report the best performing measures, including a nearest neighbors 4-feature classifier providing $69.4\pm1.6\%$ accuracy, $78.0\pm2.2\%$ sensitivity and $60.8\pm2.6\%$ specificity in a binary ADHD vs control classification and $46.5\pm1.1\%$ accuracy (against $25\%$ baseline), $61.8\pm1.4\%$ sensitivity and $79.30 \pm0.43\%$ specificity in 4-class task (two ADHD subtypes, ASD, and control). The most informative feature is skewness of the shape of Zero Crossing Mode (ZCM) activity. Mean and standard deviation of nocturnal activity are among the least informative.   Conclusion Actigraphy causes only minor discomfort to the subjects and is inexpensive. The range of existing mathematical and machine learning tools also allow it to be a useful add-on test for ADHD or differential diagnosis between ADHD subtypes. The study was limited to a small, male sample without the inattentive ADHD subtype.|
-|**2018-12-03**|**A Hidden Markov Model Based Unsupervised Algorithm for Sleep/Wake Identification Using Actigraphy**|Xinyue Li et.al.|[1812.00553v2](http://arxiv.org/abs/1812.00553v2)|null|Actigraphy is widely used in sleep studies but lacks a universal unsupervised algorithm for sleep/wake identification. In this study, we proposed a Hidden Markov Model (HMM) based unsupervised algorithm that can automatically and effectively infer sleep/wake states. It is an individualized data-driven approach that analyzes actigraphy from each individual respectively to learn activity characteristics and further separate sleep and wake states. We used Actiwatch and polysomnography (PSG) data from 43 individuals in the Multi-Ethnic Study of Atherosclerosis to evaluate the performance of our method. Epoch-by-epoch comparisons were made between our HMM algorithm and that embedded in the Actiwatch software (AS). The percent agreement between HMM and PSG was 85.7%, and that between AS and PSG was 84.7%. Positive predictive values for sleep epochs were 85.6% and 84.6% for HMM and AS, respectively, and 95.5% and 85.6% for wake epochs. Both methods have similar performance and tend to overestimate sleep and underestimate wake compared to PSG. Our HMM approach is able to quantify the variability in activity counts that allow us to differentiate relatively active and sedentary individuals: individuals with higher estimated variabilities tend to show more frequent sedentary behaviors. In conclusion, our unsupervised data-driven HMM algorithm achieves slightly better performance compared to the commonly used algorithm in the Actiwatch software. HMM can help expand the application of actigraphy in large-scale studies and in cases where intrusive PSG is hard to acquire or unavailable. In addition, the estimated HMM parameters can characterize individual activity patterns that can be utilized for further analysis.|
-|**2018-08-20**|**Bayesian Function-on-Scalars Regression for High Dimensional Data**|Daniel R. Kowal et.al.|[1808.06689v2](http://arxiv.org/abs/1808.06689v2)|null|We develop a fully Bayesian framework for function-on-scalars regression with many predictors. The functional data response is modeled nonparametrically using unknown basis functions, which produces a flexible and data-adaptive functional basis. We incorporate shrinkage priors that effectively remove unimportant scalar covariates from the model and reduce sensitivity to the number of (unknown) basis functions. For variable selection in functional regression, we propose a decision theoretic posterior summarization technique, which identifies a subset of covariates that retains nearly the predictive accuracy of the full model. Our approach is broadly applicable for Bayesian functional regression models, and unlike existing methods provides joint rather than marginal selection of important predictor variables. Computationally scalable posterior inference is achieved using a Gibbs sampler with linear time complexity in the number of predictors. The resulting algorithm is empirically faster than existing frequentist and Bayesian techniques, and provides joint estimation of model parameters, prediction and imputation of functional trajectories, and uncertainty quantification via the posterior distribution. A simulation study demonstrates improvements in estimation accuracy, uncertainty quantification, and variable selection relative to existing alternatives. The methodology is applied to actigraphy data to investigate the association between intraday physical activity and responses to a sleep questionnaire.|
-|**2018-04-25**|**The Intelligent ICU Pilot Study: Using Artificial Intelligence Technology for Autonomous Patient Monitoring**|Anis Davoudi et.al.|[1804.10201v2](http://arxiv.org/abs/1804.10201v2)|null|Currently, many critical care indices are repetitively assessed and recorded by overburdened nurses, e.g. physical function or facial pain expressions of nonverbal patients. In addition, many essential information on patients and their environment are not captured at all, or are captured in a non-granular manner, e.g. sleep disturbance factors such as bright light, loud background noise, or excessive visitations. In this pilot study, we examined the feasibility of using pervasive sensing technology and artificial intelligence for autonomous and granular monitoring of critically ill patients and their environment in the Intensive Care Unit (ICU). As an exemplar prevalent condition, we also characterized delirious and non-delirious patients and their environment. We used wearable sensors, light and sound sensors, and a high-resolution camera to collected data on patients and their environment. We analyzed collected data using deep learning and statistical analysis. Our system performed face detection, face recognition, facial action unit detection, head pose detection, facial expression recognition, posture recognition, actigraphy analysis, sound pressure and light level detection, and visitation frequency detection. We were able to detect patient's face (Mean average precision (mAP)=0.94), recognize patient's face (mAP=0.80), and their postures (F1=0.94). We also found that all facial expressions, 11 activity features, visitation frequency during the day, visitation frequency during the night, light levels, and sound pressure levels during the night were significantly different between delirious and non-delirious patients (p-value<0.05). In summary, we showed that granular and autonomous monitoring of critically ill patients and their environment is feasible and can be used for characterizing critical care conditions and related environment factors.|
-|**2018-03-31**|**Continuous Circadian Phase Estimation Using Adaptive Notch Filter**|Wei Qiao et.al.|[1804.00115v1](http://arxiv.org/abs/1804.00115v1)|null|Actigraphy has been widely used for the analysis of circadian rhythm. Current practice applies regression analysis to data from multiple days to estimate the circadian phase. This paper presents a filtering method for online processing of biometric data to estimate the circadian phase. We apply the proposed method on actigraphy data of fruit flies (Drosophila melanogaster).|
-|**2018-02-22**|**Actigraphy-based Sleep/Wake Pattern Detection using Convolutional Neural Networks**|Lena Granovsky et.al.|[1802.07945v1](http://arxiv.org/abs/1802.07945v1)|null|Common medical conditions are often associated with sleep abnormalities. Patients with medical disorders often suffer from poor sleep quality compared to healthy individuals, which in turn may worsen the symptoms of the disorder. Accurate detection of sleep/wake patterns is important in developing personalized digital markers, which can be used for objective measurements and efficient disease management. Big Data technologies and advanced analytics methods hold the promise to revolutionize clinical research processes, enabling the effective blending of digital data into clinical trials. Actigraphy, a non-invasive activity monitoring method is heavily used to detect and evaluate activities and movement disorders, and assess sleep/wake behavior. In order to study the connection between sleep/wake patterns and a cluster headache disorder, activity data was collected using a wearable device in the course of a clinical trial. This study presents two novel modeling schemes that utilize Deep Convolutional Neural Networks (CNN) to identify sleep/wake states. The proposed methods are a sequential CNN, reminiscent of the bi-directional CNN for slot filling, and a Multi-Task Learning (MTL) based model. Furthermore, we expand standard "Sleep" and "Wake" activity states space by adding the "Falling asleep" and "Siesta" states. We show that the proposed methods provide promising results in accurate detection of the expanded sleep/wake states. Finally, we explore the relations between the detected sleep/wake patterns and onset of cluster headache attacks, and present preliminary observations.|
-|**2017-12-27**|**Co-Morbidity Exploration on Wearables Activity Data Using Unsupervised Pre-training and Multi-Task Learning**|Karan Aggarwal et.al.|[1712.09527v1](http://arxiv.org/abs/1712.09527v1)|null|Physical activity and sleep play a major role in the prevention and management of many chronic conditions. It is not a trivial task to understand their impact on chronic conditions. Currently, data from electronic health records (EHRs), sleep lab studies, and activity/sleep logs are used. The rapid increase in the popularity of wearable health devices provides a significant new data source, making it possible to track the user's lifestyle real-time through web interfaces, both to consumer as well as their healthcare provider, potentially. However, at present there is a gap between lifestyle data (e.g., sleep, physical activity) and clinical outcomes normally captured in EHRs. This is a critical barrier for the use of this new source of signal for healthcare decision making. Applying deep learning to wearables data provides a new opportunity to overcome this barrier.   To address the problem of the unavailability of clinical data from a major fraction of subjects and unrepresentative subject populations, we propose a novel unsupervised (task-agnostic) time-series representation learning technique called act2vec. act2vec learns useful features by taking into account the co-occurrence of activity levels along with periodicity of human activity patterns. The learned representations are then exploited to boost the performance of disorder-specific supervised learning models. Furthermore, since many disorders are often related to each other, a phenomenon referred to as co-morbidity, we use a multi-task learning framework for exploiting the shared structure of disorder inducing life-style choices partially captured in the wearables data. Empirical evaluation using actigraphy data from 4,124 subjects shows that our proposed method performs and generalizes substantially better than the conventional time-series symbolic representational methods and task-specific deep learning models.|
-|**2017-12-18**|**Activity and Circadian Rhythm of Sepsis Patients in the Intensive Care Unit**|Anis Davoudi et.al.|[1712.06631v1](http://arxiv.org/abs/1712.06631v1)|null|Early mobilization of critically ill patients in the Intensive Care Unit (ICU) can prevent adverse outcomes such as delirium and post-discharge physical impairment. To date, no studies have characterized activity of sepsis patients in the ICU using granular actigraphy data. This study characterizes the activity of sepsis patients in the ICU to aid in future mobility interventions. We have compared the actigraphy features of 24 patients in four groups: Chronic Critical Illness (CCI) sepsis patients in the ICU, Rapid Recovery (RR) sepsis patients in the ICU, non-sepsis ICU patients (control-ICU), and healthy subjects. We used several statistical and circadian rhythm features extracted from the patients' actigraphy data collected over a five-day period. Our results show that the four groups are significantly different in terms of activity features. In addition, we observed that the CCI and control-ICU patients show less regularity in their circadian rhythm compared to the RR patients. These results show the potential of using actigraphy data for guiding mobilization practices, classifying sepsis recovery subtype, as well as for tracking patients' recovery.|
-|**2017-11-02**|**Sleep Stage Classification Based on Multi-level Feature Learning and Recurrent Neural Networks via Wearable Device**|Xin Zhang et.al.|[1711.00629v1](http://arxiv.org/abs/1711.00629v1)|null|This paper proposes a practical approach for automatic sleep stage classification based on a multi-level feature learning framework and Recurrent Neural Network (RNN) classifier using heart rate and wrist actigraphy derived from a wearable device. The feature learning framework is designed to extract low- and mid-level features. Low-level features capture temporal and frequency domain properties and mid-level features learn compositions and structural information of signals. Since sleep staging is a sequential problem with long-term dependencies, we take advantage of RNNs with Bidirectional Long Short-Term Memory (BLSTM) architectures for sequence data learning. To simulate the actual situation of daily sleep, experiments are conducted with a resting group in which sleep is recorded in resting state, and a comprehensive group in which both resting sleep and non-resting sleep are included.We evaluate the algorithm based on an eight-fold cross validation to classify five sleep stages (W, N1, N2, N3, and REM). The proposed algorithm achieves weighted precision, recall and F1 score of 58.0%, 60.3%, and 58.2% in the resting group and 58.5%, 61.1%, and 58.5% in the comprehensive group, respectively. Various comparison experiments demonstrate the effectiveness of feature learning and BLSTM. We further explore the influence of depth and width of RNNs on performance. Our method is specially proposed for wearable devices and is expected to be applicable for long-term sleep monitoring at home. Without using too much prior domain knowledge, our method has the potential to generalize sleep disorder detection.|
-|**2017-05-10**|**Visualization of Wearable Data and Biometrics for Analysis and Recommendations in Childhood Obesity**|Michael Aupetit et.al.|[1705.03691v1](http://arxiv.org/abs/1705.03691v1)|null|Obesity is one of the major health risk factors be- hind the rise of non-communicable conditions. Understanding the factors influencing obesity is very complex since there are many variables that can affect the health behaviors leading to it. Nowadays, multiple data sources can be used to study health behaviors, such as wearable sensors for physical activity and sleep, social media, mobile and health data. In this paper we describe the design of a dashboard for the visualization of actigraphy and biometric data from a childhood obesity camp in Qatar. This dashboard allows quantitative discoveries that can be used to guide patient behavior and orient qualitative research.|
-|**2017-02-13**|**On multifractals: a non-linear study of actigraphy data**|Lucas Gabriel Souza França et.al.|[1702.03912v2](http://arxiv.org/abs/1702.03912v2)|[link](https://github.com/lucasfr/actiMF)|This work aimed, to determine the characteristics of activity series from fractal geometry concepts application, in addition to evaluate the possibility of identifying individuals with fibromyalgia. Activity level data were collected from 27 healthy subjects and 27 fibromyalgia patients, with the use of clock-like devices equipped with accelerometers, for about four weeks, all day long. The activity series were evaluated through fractal and multifractal methods. Hurst exponent analysis exhibited values according to other studies ($H>0.5$) for both groups ($H=0.98\pm0.04$ for healthy subjects and $H=0.97\pm0.03$ for fibromyalgia patients), however, it is not possible to distinguish between the two groups by such analysis. Activity time series also exhibited a multifractal pattern. A paired analysis of the spectra indices for the sleep and awake states revealed differences between healthy subjects and fibromyalgia patients. The individuals feature differences between awake and sleep states, having statistically significant differences for $\alpha_{q-} - \alpha_{0}$ in healthy subjects ($p = 0.014$) and $D_{0}$ for patients with fibromyalgia ($p = 0.013$). The approach has proven to be an option on the characterisation of such kind of signals and was able to differ between both healthy and fibromyalgia groups. This outcome suggests changes in the physiologic mechanisms of movement control.|
-|**2016-09-12**|**Hearables: Multimodal physiological in-ear sensing**|Valentin Goverdovsky et.al.|[1609.03330v2](http://arxiv.org/abs/1609.03330v2)|null|Future health systems require the means to assess and track the neural and physiological function of a user over long periods of time and in the community. Human body responses are manifested through multiple modalities, such as the mechanical, electrical and chemical; yet current physiological monitors (actigraphy, heart rate) largely lack in both the desired cross-modal and non-stigmatizing aspects. We address these challenges through an inconspicuous and comfortable earpiece, equipped with miniature multimodal sensors, which benefits from the relatively stable position of the ear canal with respect to vital organs to robustly measure the brain, cardiac and respiratory functions. Comprehensive experiments validate each modality within the proposed earpiece, while its potential in health monitoring is illustrated through case studies. We further demonstrate how combining data from multiple sensors within such an integrated wearable device improves both the accuracy of measurements and the ability to deal with artifacts in real-life scenarios.|