R: FP-growth

spark.fpGrowth {SparkR}

R Documentation

FP-growth

Description

A parallel FP-growth algorithm to mine frequent itemsets. spark.fpGrowth fits a FP-growth model on a SparkDataFrame. Users can spark.freqItemsets to get frequent itemsets, spark.associationRules to get association rules, predict to make predictions on new data based on generated association rules, and write.ml/read.ml to save/load fitted models. For more details, see FP-growth.

Usage

spark.fpGrowth(data, ...)

spark.freqItemsets(object)

spark.associationRules(object)

## S4 method for signature 'SparkDataFrame'
spark.fpGrowth(data, minSupport = 0.3,
  minConfidence = 0.8, itemsCol = "items", numPartitions = NULL)

## S4 method for signature 'FPGrowthModel'
spark.freqItemsets(object)

## S4 method for signature 'FPGrowthModel'
spark.associationRules(object)

## S4 method for signature 'FPGrowthModel'
predict(object, newData)

## S4 method for signature 'FPGrowthModel,character'
write.ml(object, path,
  overwrite = FALSE)

Arguments

`data`	A SparkDataFrame for training.
`...`	additional argument(s) passed to the method.
`object`	a fitted FPGrowth model.
`minSupport`	Minimal support level.
`minConfidence`	Minimal confidence level.
`itemsCol`	Features column name.
`numPartitions`	Number of partitions used for fitting.
`newData`	a SparkDataFrame for testing.
`path`	the directory where the model is saved.
`overwrite`	logical value indicating whether to overwrite if the output path already exists. Default is FALSE which means throw exception if the output path exists.

Value

spark.fpGrowth returns a fitted FPGrowth model.

A SparkDataFrame with frequent itemsets. The SparkDataFrame contains two columns: items (an array of the same type as the input column) and freq (frequency of the itemset).

A SparkDataFrame with association rules. The SparkDataFrame contains three columns: antecedent (an array of the same type as the input column), consequent (an array of the same type as the input column), and condfidence (confidence).

predict returns a SparkDataFrame containing predicted values.

Note

spark.fpGrowth since 2.2.0

spark.freqItemsets(FPGrowthModel) since 2.2.0

spark.associationRules(FPGrowthModel) since 2.2.0

predict(FPGrowthModel) since 2.2.0

write.ml(FPGrowthModel, character) since 2.2.0

Examples

## Not run: 
##D raw_data <- read.df(
##D   "data/mllib/sample_fpgrowth.txt",
##D   source = "csv",
##D   schema = structType(structField("raw_items", "string")))
##D 
##D data <- selectExpr(raw_data, "split(raw_items, ' ') as items")
##D model <- spark.fpGrowth(data)
##D 
##D # Show frequent itemsets
##D frequent_itemsets <- spark.freqItemsets(model)
##D showDF(frequent_itemsets)
##D 
##D # Show association rules
##D association_rules <- spark.associationRules(model)
##D showDF(association_rules)
##D 
##D # Predict on new data
##D new_itemsets <- data.frame(items = c("t", "t,s"))
##D new_data <- selectExpr(createDataFrame(new_itemsets), "split(items, ',') as items")
##D predict(model, new_data)
##D 
##D # Save and load model
##D path <- "/path/to/model"
##D write.ml(model, path)
##D read.ml(path)
##D 
##D # Optional arguments
##D baskets_data <- selectExpr(createDataFrame(itemsets), "split(items, ',') as baskets")
##D another_model <- spark.fpGrowth(data, minSupport = 0.1, minConfidence = 0.5,
##D                                 itemsCol = "baskets", numPartitions = 10)
## End(Not run)

[Package SparkR version 2.3.3 Index]