Dataset

The dataset consists of keystroke samples from 64 students answering questions on 3 online exams over a semester. The first exam required student to type normally with both hands. For the second and third exam, students were required to type with their left hand and right hand only. This was done to simulate a serious handicap in which a user is only able to type with one hand. Each student provided at least 500 keystrokes for each sample.

The dataset is split into labeled and unlabeled samples. The goal is to identify the user of the unlabeled samples.

Important: not all of the users in the training dataset appear in the testing dataset.

Dataset repository. To request access, email contact@vmonaco.com

The training dataset contains 500-keystroke samples from 64 users under normal typing conditions. Columns in the training dataset are:

Training dataset columns
Column Description
user Unique label for each user
condition Typing condition (both hands, left hand, right hand)
handedness User handedness (left, right, ambidextrous)
typingstyle User typing style (touch typist, hunt-and-peck, or hybrid)
timepress Press timestamp in milliseconds
timerelease Release timestamp in milliseconds
keyname Name of the key


The testing dataset contains 471 500-keystroke samples from the same population under three different typing conditions: normal typing with both hands, typing with just the left hand, and typing with just the right hand. All samples from within the same user are at least 50 keystrokes apart to avoid classification by grammatical structures in the student's response. Timestamps are also normalized by subtracting the first keypress timestamp, to remove any correlation between the time of the attempt in the training and testing datasets. The columns in the testing dataset are:

Testing dataset columns
Column Description
sample Globally unique label for each sample
condition Typing condition (both hands, left hand, right hand)
timepress Press timestamp in milliseconds
timerelease Release timestamp in milliseconds
keyname  Name of the key
Submission format

The goal is to identify the user of each sample in the testing dataset. Submissions should contain a classification for each sample in the testing dataset. Submissions should be a csv file with 2 columns and 472 lines (header + 471 sample classifications). The first column is the sample and the second is the classification label.

Ground truth

The competition has ended and the correct labels for the test set are available in the dataset repository.