Title: Collection of Machine Learning Data Sets for 'mlr3'
Version: 0.9.0
Description: A small collection of interesting and educational machine learning data sets which are used as examples in the 'mlr3' book (https://mlr3book.mlr-org.com), the use case gallery (https://mlr3gallery.mlr-org.com), or in other examples. All data sets are properly preprocessed and ready to be analyzed by most machine learning algorithms. Data sets are automatically added to the dictionary of tasks if 'mlr3' is loaded.
License: LGPL-3
URL: https://github.com/mlr-org/mlr3data
BugReports: https://github.com/mlr-org/mlr3data/issues
Depends: R (≥ 3.1.0)
Suggests: mlr3 (≥ 0.13.3)
Encoding: UTF-8
LazyData: true
LazyDataCompression: xz
NeedsCompilation: no
RoxygenNote: 7.3.2
Packaged: 2024-11-07 21:29:34 UTC; marc
Author: Michel Lang ORCID iD [ctb], Marc Becker ORCID iD [cre, aut]
Maintainer: Marc Becker <marcbecker@posteo.de>
Repository: CRAN
Date/Publication: 2024-11-08 00:30:02 UTC

mlr3data: Collection of Machine Learning Data Sets for 'mlr3'

Description

A small collection of interesting and educational machine learning data sets which are used as examples in the 'mlr3' book (https://mlr3book.mlr-org.com), the use case gallery (https://mlr3gallery.mlr-org.com), or in other examples. All data sets are properly preprocessed and ready to be analyzed by most machine learning algorithms. Data sets are automatically added to the dictionary of tasks if 'mlr3' is loaded.

Author(s)

Maintainer: Marc Becker marcbecker@posteo.de (ORCID)

Other contributors:

See Also

Useful links:


House Sales in Ames, Iowa

Description

Regression task to predict house sale prices for Ames, Iowa.

Contains 80 features and 2930 observations. Target column is "Sale_Price".

Examples

data("ames_housing", package = "mlr3data")
str(ames_housing)

Bike Sharing Demand

Description

Regression data to predict the total count of bikes rented. Contains 13 features and 17379 observations. Target column is "count".

Pre-processing

Source

https://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset

Examples

data("bike_sharing", package = "mlr3data")
str(bike_sharing)

Power Consumption of Kitchen Appliances in Ames, Iowa

Description

Data for power consumption of kitchen appliances in Ames, Iowa. Extends the ames_housing data set.

Contains 720 features and 2930 observations.

Examples

data("energy_usage", package = "mlr3data")
str(energy_usage)

Indian Liver Patient Dataset

Description

Classification data to predict whether or not a person is a liver patient. Obtained using the mlr3oml package. Contains 538 observations and 10 features. Target column is "diseased".

Pre-processing

Source

https://www.openml.org/d/1480

Examples

data("ilpd", package = "mlr3data")
str(ilpd)

House Sales in King County

Description

Regression task to predict house sale prices for King County, including Seattle, between May 2014 and May 2015.

Contains 19 features and 21613 observations. Target column is "price".

Pre-processing

Source

https://www.kaggle.com/datasets/harlfoxem/housesalesprediction

Examples

data("kc_housing", package = "mlr3data")
str(kc_housing)

Major League Baseball Statistics 1962-2012

Description

Regression data to predict the number of runs scored. Obtained using the mlr3oml package.

Contains 14 features and 1232 observations. Target column is "rs".

Pre-processing

Source

https://www.openml.org/d/41021

Examples

data("moneyball", package = "mlr3data")
str(moneyball)

Optical Recognition of Handwritten Digits

Description

Classification data to predict handwritten digits. Obtained using the mlr3oml package.

Binarized version of the original data set. The multi-class target column has been converted to a two-class nominal target column by re-labeling the majority class as positive ("P") and all others as negative ("N"). Originally converted by Quan Sun.

Contains 64 features and 5620 observations. Target column is "binaryclass".

Pre-processing

Source

https://www.openml.org/d/980

Examples

data("optdigits", package = "mlr3data")
str(optdigits)

Simplified Palmer Penguins Data Set

Description

Classification data to predict the species of penguins from the palmerpenguins package. A better alternative to the iris data set.

Pre-processing

Source

palmerpenguins

References

Gorman KB, Williams TD, Fraser WR (2014). “Ecological Sexual Dimorphism and Environmental Variability within a Community of Antarctic Penguins (Genus Pygoscelis).” PLoS ONE, 9(3), e90081. doi:10.1371/journal.pone.0090081.

https://github.com/allisonhorst/palmerpenguins

Examples

data("penguins_simple", package = "mlr3data")
str(penguins_simple)

Titanic

Description

Classification data to predict the fate of passengers on the ocean liner "Titanic". Contains 10 features and 1309 observations. Target column is "Survived".

Pre-processing

Source

titanic and https://www.kaggle.com/c/titanic/data

Examples

data("titanic", package = "mlr3data")
str(titanic)