Type: | Package |
Version: | 0.0.3 |
Title: | Package About Data Manipulation in Pure Base R |
Description: | Data manipulation in one package and in base R. Minimal. No dependencies. 'dplyr' and 'tidyr'-like in one place. Nothing else than base R to build the package. |
Depends: | R (≥ 3.4.4) |
License: | MIT + file LICENSE |
URL: | https://github.com/pv71u98h1/m61r/ |
BugReports: | https://github.com/pv71u98h1/m61r/issues/ |
Encoding: | UTF-8 |
NeedsCompilation: | no |
Packaged: | 2022-05-06 14:33:12 UTC; jean-marie |
Author: | Jean-Marie Lepioufle [aut, cre] |
Maintainer: | Jean-Marie Lepioufle <pv71u98h1@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2022-05-06 15:50:02 UTC |
Arrange your data.frames
Description
Re-arrange your data.frame in ascending or descending order and given one or several columns.
Usage
arrange_(df, ...)
desange_(df, ...)
Arguments
df |
data.frame |
... |
formula used for arranging the data.frame |
Value
The functions return an object of the same type as df
.
The output has the following properties:
Properties:
Columns are not modified.
Output get rows in the order specified by
....
Data frame attributes are preserved.
Examples
tmp <- arrange_(CO2,~c(conc))
head(tmp)
tmp <- arrange_(CO2,~c(Treatment,conc,uptake))
head(tmp)
tmp <- desange_(CO2,~c(Treatment,conc,uptake))
head(tmp)
Formula to be run on a data.frame given a group
Description
Evaluate a formula on the data.frame.
Usage
expression_(df, group=NULL, fun_expr)
Arguments
df |
data.frame |
group |
formula that describes the group |
fun_expr |
formula that describes the expression to be run on the data.frame |
Value
The function returns a list.
Each element of the list get the result of processed expressions determined in ...
on the whole data frame df
if group
is kept NULL
, or for each group determined in group
otherwise.
The class of each element is intrinsic to the output of the expression determined in argument ...
.
Examples
expression_(CO2,fun_expr=~mean(conc))
expression_(CO2,fun_expr=~conc/uptake)
# with group
expression_(CO2,group=~Type,fun_expr=~mean(uptake))
expression_(CO2,group=~Type,fun_expr=~lm(uptake~conc))
filter a data.frame
Description
Filter rows of a data.frame with conditions.
Usage
filter_(df, subset = NULL)
Arguments
df |
data.frame |
subset |
formula that describes the conditions |
Value
The function returns an object of the same type as df
.
Properties:
Columns are not modified.
Only rows following the condtion determined by
subset appear.
Data frame attributes are preserved.
Examples
tmp <- filter_(CO2,~Plant=="Qn1")
head(tmp)
tmp <- filter_(CO2,~Type=="Quebec")
head(tmp)
group_by a data.frame by chosen columns
Description
Group a data.frame by chosen columns
Usage
group_by_(df, group = NULL)
Arguments
df |
data.frame |
group |
formula that describes the group |
Value
The function returns a list.
Each element of the list is a subset of data frame df
. Subset is determined by variables given in group
.
Each data frame get the following properties:
Columns are not modified.
Only rows corresponding to the subset.
Data frame attributes are preserved.
Examples
tmp <- group_by_(CO2,~c(Type,Treatment))
tmp[[1]]
Join two data.frames
Description
Join two data.frames.
Usage
left_join_(df, df2, by = NULL, by.x = NULL, by.y = NULL)
anti_join_(df, df2, by = NULL, by.x = NULL, by.y = NULL)
full_join_(df, df2, by = NULL, by.x = NULL, by.y = NULL)
inner_join_(df, df2, by = NULL, by.x = NULL, by.y = NULL)
right_join_(df, df2, by = NULL, by.x = NULL, by.y = NULL)
semi_join_(df, df2, by = NULL, by.x = NULL, by.y = NULL)
Arguments
df |
data.frame |
df2 |
data.frame |
by |
column names of the pivot of both data.frame 1 and data.frame 2 if they are identical. Otherwise, better to use by.x and by.y |
by.x |
column names of the pivot of data.frame 1 |
by.y |
column names of the pivot of data.frame 2 |
Value
The functions return a data frame. The output has the following properties:
-
For functions
left_join()
,inner_join()
,full_join()
, andright_join()
, output includes alldf1
columns and alldf2
columns. For columns with identical names indf1
anddf2
, a suffix '.x' and '.y' is added. Forleft_join()
, alldf1
rows with matching rows ofdf2
Forinner_join()
, a subset ofdf1
rows matching rows ofdf2
. Forfull_join()
, alldf1
rows, with alldf2
rows. Forright_join()
, alldf2
rows with matching rows ofdf1
. -
For functions
semi_join()
andanti_join()
, output include columns ofdf1
only. Forsemi_join()
, alldf1
rows with a match indf2
. Foranti_join()
, a subset ofdf1
rows not matching rows ofdf2
.
Examples
books <- data.frame(
name = I(c("Tukey", "Venables", "Tierney","Ripley",
"Ripley", "McNeil", "R Core")),
title = c("Exploratory Data Analysis",
"Modern Applied Statistics ...",
"LISP-STAT",
"Spatial Statistics", "Stochastic Simulation",
"Interactive Data Analysis",
"An Introduction to R"),
other.author = c(NA, "Ripley", NA, NA, NA, NA,"Venables & Smith"))
authors <- data.frame(
surname = I(c("Tukey", "Venables", "Tierney", "Ripley", "McNeil","Asimov")),
nationality = c("US", "Australia", "US", "UK", "Australia","US"),
deceased = c("yes", rep("no", 4),"yes"))
tmp <- left_join_(books,authors, by.x = "name", by.y = "surname")
head(tmp)
tmp <- inner_join_(books,authors, by.x = "name", by.y = "surname")
head(tmp)
tmp <- full_join_(books,authors, by.x = "name", by.y = "surname")
head(tmp)
tmp <- right_join_(books,authors, by.x = "name", by.y = "surname")
head(tmp)
tmp <- semi_join_(books,authors, by.x = "name", by.y = "surname")
head(tmp)
tmp <- anti_join_(books,authors, by.x = "name", by.y = "surname")
head(tmp)
Create m61r object
Description
Create a m61r object that enables to run a sequence of operations on a data.frame.
Usage
m61r(df = NULL)
## S3 method for class 'm61r'
x[i, j, ...]
## S3 replacement method for class 'm61r'
x[i, j] <- value
## S3 method for class 'm61r'
print(x, ...)
## S3 method for class 'm61r'
names(x, ...)
## S3 method for class 'm61r'
dim(x, ...)
## S3 method for class 'm61r'
as.data.frame(x, ...)
## S3 method for class 'm61r'
rbind(x, ...)
## S3 method for class 'm61r'
cbind(x, ...)
Arguments
df |
data.frame |
x |
object of class |
i |
row |
j |
column |
... |
further arguments passed to or from other methods |
value |
value to be assigned |
Value
The function m61r
returns an object of type m61r
.
Argument df
get stored internally to the object m61r
.
One manipulates the internal data.frame
by using internal functions similar to the ones implemented in package m61r
for data.frames
as arrange
, desange
, filter
, join
and its relatives, mutate
and transmutate
, gather
and spread
, select
, groupe_by
, summarise
, values
and modify
.
The result of the last action is stored internally to the object m61r
until the internal function values
get called.
It is thus possible to create a readable sequence of actions on a data.frame
.
In addition,
-
[.m61r
returns a subset of the internaldata.frame
embedded to the objectm61r
. -
[<-.m61r
assignsvalue
to the internaldata.frame
embedded to the objectm61r
. -
print.m61r
prints the internaldata.frame
embedded to the objectm61r
. -
names.m61r
provides the names of the column of the internaldata.frame
embedded to the objectm61r
. -
dim.m61r
provides the dimensions of the internaldata.frame
embedded to the objectm61r
. -
as.data.frame.m61r
extracts the internaldata.frame
embedded to the objectm61r
. -
cbind.m61r
combines by _c_olumns two objectsm61r
. -
rbind.m61r
combines by _r_ows two objectsm61r
. -
left_join
,anti_join
,full_join
,inner_join
,right_join
,semi_join
join two objectsm61r
.
Finally, it is possible to clone a m61r
object into a new one by using the internal function clone
.
Examples
# init
co2 <- m61r(df=CO2)
# filter
co2$filter(~Plant=="Qn1")
co2
co2$filter(~Type=="Quebec")
co2
# select
co2$select(~Type)
co2
co2$select(~c(Plant,Type))
co2
co2$select(~-Type)
co2
co2$select(variable=~-(Plant:Treatment))
co2
# mutate/transmutate
co2$mutate(z=~conc/uptake)
co2
co2$mutate(mean=~mean(uptake))
co2
co2$mutate(z1=~uptake/conc,y=~conc/100)
co2
co2$transmutate(z2=~uptake/conc,y2=~conc/100)
co2
# summarise
co2$summarise(mean=~mean(uptake),sd=~sd(uptake))
co2
co2$group_by(~c(Type,Treatment))
co2$summarise(mean=~mean(uptake),sd=~sd(uptake))
co2
# arrange/dessange
co2$arrange(~c(conc))
co2
co2$arrange(~c(Treatment,conc,uptake))
co2
co2$desange(~c(Treatment,conc,uptake))
co2
# join
authors <- data.frame(
surname = I(c("Tukey", "Venables", "Tierney", "Ripley", "McNeil")),
nationality = c("US", "Australia", "US", "UK", "Australia"),
deceased = c("yes", rep("no", 4)))
books <- data.frame(
name = I(c("Tukey", "Venables", "Tierney","Ripley",
"Ripley", "McNeil", "R Core")),
title = c("Exploratory Data Analysis",
"Modern Applied Statistics ...",
"LISP-STAT",
"Spatial Statistics", "Stochastic Simulation",
"Interactive Data Analysis",
"An Introduction to R"),
other.author = c(NA, "Ripley", NA, NA, NA, NA,"Venables & Smith"))
## inner join
tmp <- m61r(df=authors)
tmp$inner_join(books, by.x = "surname", by.y = "name")
tmp
## left join
tmp$left_join(books, by.x = "surname", by.y = "name")
tmp
## right join
tmp$right_join(books, by.x = "surname", by.y = "name")
tmp
## full join
tmp$full_join(books, by.x = "surname", by.y = "name")
tmp
## semi join
tmp$semi_join(books, by.x = "surname", by.y = "name")
tmp
## anti join #1
tmp$anti_join(books, by.x = "surname", by.y = "name")
tmp
## anti join #2
tmp2 <- m61r(df=books)
tmp2$anti_join(authors, by.x = "name", by.y = "surname")
tmp2
## with two m61r objects
tmp1 <- m61r(books)
tmp2 <- m61r(authors)
tmp3 <- anti_join(tmp1,tmp2, by.x = "name", by.y = "surname")
tmp3
# Reshape
## gather
df3 <- data.frame(id = 1:4,
age = c(40,50,60,50),
dose.a1 = c(1,2,1,2),
dose.a2 = c(2,1,2,1),
dose.a14 = c(3,3,3,3))
df4 <- m61r::m61r(df3)
df4$gather(pivot = c("id","age"))
df4
## spread
df3 <- data.frame(id = 1:4,
age = c(40,50,60,50),
dose.a1 = c(1,2,1,2),
dose.a2 = c(2,1,2,1),
dose.a14 = c(3,3,3,3))
df4 <- m61r::gather_(df3,pivot = c("id","age"))
df4 <- rbind(df4,
data.frame(id=5, age=20,parameters="dose.a14",values=8),
data.frame(id=6, age=10,parameters="dose.a1",values=5))
tmp <- m61r::m61r(df4)
tmp$spread(col_name="parameters",col_values="values",pivot=c("id","age"))
tmp
# equivalence
co2 # is not equivalent to co2[]
co2[] # is equivalent to co2$values()
co2[1,] # is equivalent to co2$values(1,)
co2[,2:3] # is equivalent to co2$values(,2:3)
co2[1:10,1:3] # is equivalent to co2$values(1:10,2:3)
co2[1,"Plant"]# is equivalent to co2$values(1,"Plant")
# modification on m61r object only stay for one step
co2[1,"conc"] <- 100
co2[1,] # temporary result
co2[1,] # back to normal
# WARNING:
# Keep the brackets to manipulate the intern data.frame
co2[] <- co2[-1,]
co2[1:3,] # temporary result
co2[1:3,] # back to normal
# ... OR you will destroy co2, and only keep the data.frame
# co2 <- co2[-1,]
# class(co2) # data.frame
# descriptive manipulation
names(co2)
dim(co2)
str(co2)
## cloning
# The following will only create a second variable that point on
# the same object (!= cloning)
foo <- co2
str(co2)
str(foo)
# Instead, cloning into a new environemnt
foo <- co2$clone()
str(co2)
str(foo)
Mutate and transmutate a data.frame
Description
Mutate and transmutate a data.frame.
Usage
mutate_(df, ...)
transmutate_(df, ...)
Arguments
df |
data.frame |
... |
formula used for mutating/transmutating the data.frame |
Value
The functions return a data frame. The output has the following properties:
-
For function
mutate_()
, output includes alldf
columns. In addition, new columns are created according to argument...
and placed after the others. -
For function
transmutate_()
, output includes only columns created according to argument...
and placed after the others.
Examples
tmp <- mutate_(CO2,z=~conc/uptake)
head(tmp)
# Return an warning: expression mean(uptake) get a result with 'nrow' different from 'df'
# tmp <- mutate_(CO2,mean=~mean(uptake))
tmp <- mutate_(CO2,z1=~uptake/conc,y=~conc/100)
head(tmp)
tmp <- transmutate_(CO2,z2=~uptake/conc,y2=~conc/100)
head(tmp)
Reshape a data.frame
Description
Reshape a data.frame.
Usage
gather_(df, new_col_name = "parameters", new_col_values = "values",
pivot)
spread_(df, col_name, col_values, pivot)
Arguments
df |
data.frame |
new_col_name |
name of the new column 'parameters' |
new_col_values |
name of the new columns 'values' |
col_name |
name of the column 'parameters' |
col_values |
name of the new columns 'values' |
pivot |
name of the columns used as pivot |
Details
A data frame is said 'wide' if several of its columns describe connected information of the same record.
A data frame is said ‘long’ if two of its columns provide information about records, with one describing their name and the second their value.
Functions gather_()
and spread_()
enable to reshape a data frames from a ‘wide’ format to a 'long' format, and vice-versa.
Value
The functions return a data frame.
Output from function
gather_()
get 'pivot' columns determined by argumentpivot
, and 'long' columns named according to argumentsnew_col_name
andnew_col_values
.Output from function
spread_()
get 'pivot' columns determined by argumentpivot
, and 'wide' columns named according to values in column determined by argumentcol_name
. For 'wide' columns, each row corresponds to values present in column determined by argumentcol_values
.
Examples
df3 <- data.frame(id = 1:4,
age = c(40,50,60,50),
dose.a1 = c(1,2,1,2),
dose.a2 = c(2,1,2,1),
dose.a14 = c(3,3,3,3))
gather_(df3,pivot = c("id","age"))
df4 <- gather_(df3,pivot = c("id","age"))
df5 <- rbind(df4,
data.frame(id=5, age=20,parameters="dose.a14",values=8),
data.frame(id=6, age=10,parameters="dose.a1",values=5))
spread_(df5,col_name="parameters",col_values="values",pivot=c("id","age"))
select columns of a data.frame
Description
Select columns of a data.frame.
Usage
select_(df, variable = NULL)
Arguments
df |
data.frame |
variable |
formula that describes the selection |
Value
select_()
returns a data frame.
Properties:
Only columns following the condtion determined by
variable appear.
Rows are not modified.
Examples
tmp <- select_(CO2,~Type)
head(tmp)
tmp <- select_(CO2,~c(Plant,Type))
head(tmp)
tmp <- select_(CO2,~-Type)
head(tmp)
tmp <- select_(CO2,variable=~-(Plant:Treatment))
head(tmp)
Summarise formula on groups
Description
Summarise of formulas on a data.frame.
Usage
summarise_(df, group = NULL, ...)
Arguments
df |
data.frame |
group |
formula that describes the group |
... |
formulas to be generated |
Value
summarise_()
returns a data frame.
If argument group
is not NULL
, output get its first columns called according to the names present in argument group
.
The following columns are called according to the name of each argument present in ...
.
Each row corresponds to processed expressions determined in ...
for each group determined in group
, or over the whole data frame if group
is NULL
.
Examples
summarise_(CO2,a=~mean(uptake),b=~sd(uptake))
summarise_(CO2, group=~c(Type,Treatment),a=~mean(uptake),b=~sd(uptake))
get or assign a value to a data.frame
Description
Get or assign a value to a data.frame
Usage
value_(df, i, j)
'modify_<-'(df,i,j,value)
Arguments
df |
data.frame |
i |
row |
j |
column |
value |
value to be assigned |
Value
The functions value_
and 'modify_<-'
return a data frame.
Properties:
Only rows determined by
i appear. If
i is missing, no row is filtered.
Only columns determined by
j appear. If
j is missing, no column is filtered.
Besides,
For function
value_
: If argumenti
is non-missing and argumentj
is missing, the function returns an object of the same type asdf
. If both argumentsi
andj
are missing, the function returns an object of the same type asdf
.For function
'modify_<-'
: The function returns an object of the same type asdf
.
Examples
tmp <- value_(CO2,1,2)
attributes(tmp) # data frame
tmp <- value_(CO2,1:2,2)
attributes(tmp) # data frame
tmp <- value_(CO2,1:2,2:4)
attributes(tmp) # data frame
tmp <- value_(CO2,,2)
attributes(tmp) # data frame
tmp <- value_(CO2,2)
attributes(tmp) # same as CO2
tmp <- value_(CO2)
attributes(tmp) # same as CO2
df3 <- data.frame(id = 1:4,
age = c(40,50,60,50),
dose.a1 = c(1,2,1,2),
dose.a2 = c(2,1,2,1),
dose.a14 = c(3,3,3,3))
'modify_<-'(df3,1,2,6)
'modify_<-'(df3,1:3,2:4,data.frame(c(20,10,90),c(9,3,4),c(0,0,0)))