The MIDS class is sharing with the research community a first-of-its-kind dataset collected using consumer-grade brainwave-sensing headsets during an in-class group exercise, along with the software code and visual stimlus used to collect the data.
The audio-visual stimulus presented was 5 minutes and 20 sceonds long. The dataset includes all subjects' readings during the stimulus presentation, as well as readings from before the start and after the end of the stimulus.
To download the dataset, along with all metadata and stimulus videos, please enter your name, institutional affiliation (optional), and email address:
We take four steps to protect the confidentiality of the participants. First, the data is collected anonymously. Each of the participants randomly selects a paper slip with a number which they enter into the client software as their ID. No person has any knowledge of which ID was chosen by which student. Second, the ID is re-mapped so that the students do not know which id in the final dataset corresponds to their own data. Third, the dataset contains data from 30 participants, which is 50% of the entire population of 60 students. Therefore, for any given student, there is a 50% probability that their data is part of the dataset. Finally, we do not hold data from the remainder of the population, and there is no way for us to reconstruct a dataset for all 60 participants.
The server receives one data packet every second from each Mindwave Mobile device, and stores the data in one row entry with the following 8 data fields:
id, indra_time, browser_latency, reading_time, attention_esense, meditation_esense, eeg_power, raw_values, signal_quality, createdAt, updatedAt
id: Integer value in the range of 1 to 30.
indra_time: The synchronized timing of the reading. See browser_latency below. Use this for most analyses. NOTE: The included CSV files are not necessarily sorted by time.
browser_latency: The difference between the time on the subject's computer, and the time of our server. This value is used to calculate the synchronized indra_time, above. So, the time that a row was sent to the server from the browser-based client software is
indra_time - browser_latency.
reading_time: The time at which the Neurosky data passed through the bluetooth connection onto the subject's computer. In ideal conditions, where there is 0 latency between receiving this packet and sending the data to the server,
reading_time = indra_time - browser_latency. So, you can use reading_time to estimate the delay between the actual reading and the time at which the reading was sent to the server.
The remaining five fields are defined by the Neurosky SDK:
raw_values: Tuple containing raw sample values acquired by the sensor, at a sampling rate of 512Hz.
attention_esense and meditation_esense: Neurosky's eSense meters for Attention and Meditation levels, in integer values in the range of 0 to 100. The values represent the last values computed in the current period.
eeg_power: Tuple represents the magnitude of 8 commonly-recognized types of EEG frequency bands -- delta (0.5 - 2.75Hz), theta (3.5 - 6.75Hz), low-alpha (7.5 - 9.25Hz), high-alpha (10 - 11.75Hz), low-beta (13 - 16.75Hz), high-beta (18 - 29.75Hz), low-gamma (31 - 39.75Hz), and mid-gamma (41 - 49.75Hz). These values have no units and are only meaningful for comparison to the values for the other frequency bands within a sample.
signal_quality: A zero value indicates good signal quality. A value of 128 greater corresponds to a situation where the headset is not being worn properly.
In total, the dataset consists of 29,480 rows, an average of 982 per participant.
For each participant, we also anonymously collected some metadata: (1) which session the participants were in, (2) whether or not they had previously seen the video displayed during the stimulus (a superbowl ad), (3) gender, (4)whether or not they saw hidden icons displayed during the color counting exercise, and (5) their chosen color during the color counting exercise.
Crucially, we also collected the timing (in indra_time) of all stimulus events for both session 1 and session 2. These times are included in the main dataset, below.
Please use the following citation if you publish your research results using this dataset or software code or stimulus file:
John Chuang, Nicholas Merrill, Thomas Maillart, and Students of the UC Berkeley Spring 2015 MIDS Immersion Class. "Synchronized Brainwave Recordings from a Group Presented with a Common Audio-Visual Stimulus (May 9, 2015)." May 2015.