University of Illinois Urbana-Champaign
Abstract
Foundation models are reshaping EEG analysis, yet an important problem of EEG tokenization remains a challenge. This paper presents TFM-Tokenizer, a novel tokenization framework that learns a vocabulary of time-frequency motifs from single-channel EEG signals and encodes them into discrete tokens. We propose a dual-path architecture with time–frequency masking to capture robust motif representations, and it is model-agnostic, supporting both lightweight transformers and existing foundation models for downstream tasks.
Our study demonstrates three key benefits: Accuracy—experiments on four diverse EEG benchmarks demonstrate consistent performance gains, achieving up to 11% improvement in Cohen’s Kappa over strong baselines. Generalization—as a plug-and-play component, it consistently boosts the performance of diverse foundation models, including BIOT and LaBraM. Scalability—by operating at the single-channel level, our method is device-agnostic, and experiments on ear-EEG sleep staging show our tokenizer outperforms baselines by 14%.
Key Results
Consistent performance gains across four EEG benchmarks with up to +11% Cohen’s Kappa improvement over strong baselines in both single- and multi-dataset pretraining settings.
A plug-and-play component that boosts existing foundation models like BIOT and LaBraM—no architectural changes needed.
Device-agnostic design at the single-channel level. Outperforms baselines by +14% on ear-EEG sleep staging—a completely different signal format and device.
Method
TFM-Tokenizer uses a dual-path architecture that captures both temporal and frequency-domain features through localized spectral window encoding and temporal patching. Time–frequency masking during pretraining encourages robust, diverse motif representations.
static/images/framework.png
Figure 1. Overview of the TFM-Tokenizer framework. (a) TFM-Tokenizer pretraining with dual-path encoding and masked prediction. (b) Frequency band and temporal masking strategy. (c) Localized spectral window encoder. (d) Downstream transformer encoder pretraining with masked token prediction.
Results
Learned tokens exhibit strong class-discriminative, frequency-aware, and consistent structure, capturing physiologically meaningful EEG motifs as discrete tokens.
static/images/token_analysis.png
Figure 2. Token analysis showing class-discriminative and frequency-aware structure of the learned discrete tokens.
Citation
If you find our work useful, please consider citing: