| spark.fpGrowth {SparkR} | R Documentation | 
A parallel FP-growth algorithm to mine frequent itemsets.
spark.fpGrowth fits a FP-growth model on a SparkDataFrame. Users can
spark.freqItemsets to get frequent itemsets, spark.associationRules to get
association rules, predict to make predictions on new data based on generated association
rules, and write.ml/read.ml to save/load fitted models.
For more details, see
FP-growth.
spark.fpGrowth(data, ...) spark.freqItemsets(object) spark.associationRules(object) ## S4 method for signature 'SparkDataFrame' spark.fpGrowth(data, minSupport = 0.3, minConfidence = 0.8, itemsCol = "items", numPartitions = NULL) ## S4 method for signature 'FPGrowthModel' spark.freqItemsets(object) ## S4 method for signature 'FPGrowthModel' spark.associationRules(object) ## S4 method for signature 'FPGrowthModel' predict(object, newData) ## S4 method for signature 'FPGrowthModel,character' write.ml(object, path, overwrite = FALSE)
| data | A SparkDataFrame for training. | 
| ... | additional argument(s) passed to the method. | 
| object | a fitted FPGrowth model. | 
| minSupport | Minimal support level. | 
| minConfidence | Minimal confidence level. | 
| itemsCol | Features column name. | 
| numPartitions | Number of partitions used for fitting. | 
| newData | a SparkDataFrame for testing. | 
| path | the directory where the model is saved. | 
| overwrite | logical value indicating whether to overwrite if the output path already exists. Default is FALSE which means throw exception if the output path exists. | 
spark.fpGrowth returns a fitted FPGrowth model.
A SparkDataFrame with frequent itemsets.
The SparkDataFrame contains two columns:
items (an array of the same type as the input column)
and freq (frequency of the itemset).
A SparkDataFrame with association rules.
The SparkDataFrame contains three columns:
antecedent (an array of the same type as the input column),
consequent (an array of the same type as the input column),
and condfidence (confidence).
predict returns a SparkDataFrame containing predicted values.
spark.fpGrowth since 2.2.0
spark.freqItemsets(FPGrowthModel) since 2.2.0
spark.associationRules(FPGrowthModel) since 2.2.0
predict(FPGrowthModel) since 2.2.0
write.ml(FPGrowthModel, character) since 2.2.0
## Not run: 
##D raw_data <- read.df(
##D   "data/mllib/sample_fpgrowth.txt",
##D   source = "csv",
##D   schema = structType(structField("raw_items", "string")))
##D 
##D data <- selectExpr(raw_data, "split(raw_items, ' ') as items")
##D model <- spark.fpGrowth(data)
##D 
##D # Show frequent itemsets
##D frequent_itemsets <- spark.freqItemsets(model)
##D showDF(frequent_itemsets)
##D 
##D # Show association rules
##D association_rules <- spark.associationRules(model)
##D showDF(association_rules)
##D 
##D # Predict on new data
##D new_itemsets <- data.frame(items = c("t", "t,s"))
##D new_data <- selectExpr(createDataFrame(new_itemsets), "split(items, ',') as items")
##D predict(model, new_data)
##D 
##D # Save and load model
##D path <- "/path/to/model"
##D write.ml(model, path)
##D read.ml(path)
##D 
##D # Optional arguments
##D baskets_data <- selectExpr(createDataFrame(itemsets), "split(items, ',') as baskets")
##D another_model <- spark.fpGrowth(data, minSupport = 0.1, minConfidence = 0.5,
##D                                 itemsCol = "baskets", numPartitions = 10)
## End(Not run)