Datasets | Control Learning and Systems Optimization Group

PURACE VOLCANO DATASET

These data are extracted from the Colombian Geological Survey (SGC) and we have processed all of them in order to create the proposed dataset. This dataset will help to test if a learning algorithm can learn quickly and if it can extract knowledge from data streams in real-time.

Datasets: Purace5Days. Purace1Year

IMBALANCED SEMEION

This data set, which has been created from the SEMION data set, has 1236 instances, in which the number of examples in classes C0, C7 and C9 are a quarter of the number of examples in the other classes (40 compared with 160).

Data set: semeionImbalanced.arff (weka)

Citation Request:
If you publish results based on this data set, please acknowledge its use, refer to the data set by the name “Imbalanced Semeion”, and inform your readers of the current location of the data set and point that this dataset has been created from the SEMION data set data set.

ASISTENTUR

This data collection contains 1006 road traffic signs, from Spain, distributed into 9 classes: no pedestrians, no turn, no waiting or stopping, no overtaking, and the road signs that reset the maximum speed limit to 20, 40, 50, 60, and 100 kilometers per hour. The following figure presents the different classes and the number of instances per class.

Data set: asistentur.arff (weka)

Relevant Papers:

M. Paz Sesmero, Juan M. Alonso-Weber, German Gutierrez, Agapito Ledezma, Araceli Sanchis. An ensemble approach of dual base learners for multi-class classification problems, in Information Fusion 24 (2015) pp 122–136.

Citation Request:
If you publish results based on this data set, please acknowledge its use, refer to the data set by the name “ASISTENTUR”, and inform your readers of the current location of the data set.

2012 ICTSF

2012 International Competition Time Series Forecasting dataset, at IEEE EAIS12.

Dataset: xls