| subset {SparkR} | R Documentation | 
Return subsets of SparkDataFrame according to given conditions
subset(x, ...) ## S4 method for signature 'SparkDataFrame,numericOrcharacter' x[[i]] ## S4 replacement method for signature 'SparkDataFrame,numericOrcharacter' x[[i]] <- value ## S4 method for signature 'SparkDataFrame' x[i, j, ..., drop = F] ## S4 method for signature 'SparkDataFrame' subset(x, subset, select, drop = F, ...)
| x | a SparkDataFrame. | 
| ... | currently not used. | 
| i, subset | (Optional) a logical expression to filter on rows. For extract operator [[ and replacement operator [[<-, the indexing parameter for a single Column. | 
| value | a Column or an atomic vector in the length of 1 as literal value, or  | 
| j, select | expression for the single Column or a list of columns to select from the SparkDataFrame. | 
| drop | if TRUE, a Column will be returned if the resulting dataset has only one column. Otherwise, a SparkDataFrame will always be returned. | 
A new SparkDataFrame containing only the rows that meet the condition with selected columns.
[[ since 1.4.0
[[<- since 2.1.1
[ since 1.4.0
subset since 1.5.0
Other SparkDataFrame functions: SparkDataFrame-class,
agg, alias,
arrange, as.data.frame,
attach,SparkDataFrame-method,
broadcast, cache,
checkpoint, coalesce,
collect, colnames,
coltypes,
createOrReplaceTempView,
crossJoin, cube,
dapplyCollect, dapply,
describe, dim,
distinct, dropDuplicates,
dropna, drop,
dtypes, exceptAll,
except, explain,
filter, first,
gapplyCollect, gapply,
getNumPartitions, group_by,
head, hint,
histogram, insertInto,
intersectAll, intersect,
isLocal, isStreaming,
join, limit,
localCheckpoint, merge,
mutate, ncol,
nrow, persist,
printSchema, randomSplit,
rbind, rename,
repartitionByRange,
repartition, rollup,
sample, saveAsTable,
schema, selectExpr,
select, showDF,
show, storageLevel,
str, summary,
take, toJSON,
unionByName, union,
unpersist, withColumn,
withWatermark, with,
write.df, write.jdbc,
write.json, write.orc,
write.parquet, write.stream,
write.text
Other subsetting functions: filter,
select
## Not run: 
##D   # Columns can be selected using [[ and [
##D   df[[2]] == df[["age"]]
##D   df[,2] == df[,"age"]
##D   df[,c("name", "age")]
##D   # Or to filter rows
##D   df[df$age > 20,]
##D   # SparkDataFrame can be subset on both rows and Columns
##D   df[df$name == "Smith", c(1,2)]
##D   df[df$age %in% c(19, 30), 1:2]
##D   subset(df, df$age %in% c(19, 30), 1:2)
##D   subset(df, df$age %in% c(19), select = c(1,2))
##D   subset(df, select = c(1,2))
##D   # Columns can be selected and set
##D   df[["age"]] <- 23
##D   df[[1]] <- df$age
##D   df[[2]] <- NULL # drop column
## End(Not run)