| Title: | Use Raw Vectors to Minimize Memory Consumption of Factors |
| Version: | 0.1.0 |
| Description: | Uses raw vectors to minimize memory consumption of categorical variables with fewer than 256 unique values. Useful for analysis of large datasets involving variables such as age, years, states, countries, or education levels. |
| License: | GPL-2 |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.2.0 |
| Imports: | utils |
| Suggests: | data.table, tinytest |
| NeedsCompilation: | yes |
| Packaged: | 2023-11-17 05:59:45 UTC; hughp |
| Author: | Hugh Parsonage [aut, cre] |
| Maintainer: | Hugh Parsonage <hugh.parsonage@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2023-11-17 08:50:06 UTC |
Aggregating helpers
Description
Aggregating helpers
Usage
count_by256(DT, by = NULL, count_col = "N")
Arguments
DT |
A |
by |
(string) A column of |
count_col |
(string) The name of the column in the result containing the counts. |
Value
For:
count_by256A tally of
by.
Factors of fewer than 256 elements
Description
Whereas base R's factors are based on 32-bit integer vectors,
factor256 uses 8-bit raw vectors to minimize its memory footprint.
Usage
factor256(x, levels = NULL)
recompose256(f)
relevel256(x, levels)
## S3 method for class 'factor256'
levels(x)
is.factor256(x)
isntSorted256(x, strictly = FALSE)
as_factor(x)
factor256_in(x, tbl)
factor256_notin(x, tbl)
factor256_ein(x, tbl)
factor256_enotin(x, tbl)
tabulate256(f)
rank256(x)
order256(x)
unique256(x)
tabulate256_levels(x, nmax = NULL, dotInterval = 65535L)
Arguments
x |
An atomic vector with fewer than 256 unique elements. |
levels |
An optional character vector of or representing the unique values of |
f |
A raw vector of class |
strictly |
If |
tbl |
The table of values to lookup in |
nmax, dotInterval |
( |
Value
factor256 is a class based on raw vectors.
Values in x absent from levels are mapped to 00.
In the following list, o is the result.
factor256A raw vector of class
factor256.recompose256is the inverse operation.
factor256_e?(not)?inA logical vector the same length of
f,o[i] = TRUEiff[i]is among the values oftblwhen converted tofactor256._notinis the negation. Thefactor256_evariants will error if none of the values oftblare present inf.tabulate256Takes a raw vector and counts the number of times each element occurs within it. It is always length-256; if an element is absent it will have value zero in the output.
tabulate256_levelsSimilar to
tabulate256but with optional argumentsnmax,dotInterval.as_factorConverts from
factor256tofactor.order256Same as
orderbut supports raw vectors.order256(x)rank256Same as
rankwithties.method = "first"but supports raw vectors.unique256Unique elements of.
Examples
f10 <- factor256(1:10)
fletters <- factor256(rep(letters, 1:26))
head(factor256_in(fletters, "g"))
head(tabulate256(fletters))
head(recompose256(fletters))
gletters <- factor256(rep(letters, 1:26), levels = letters[1:25])
tail(tabulate256(gletters))
tabulate256_levels(gletters, nmax = 5L, dotInterval = 1L)
Interlace raw vectors
Description
Some processes do not accept raw vectors so it can be necessary to convert our vectors to integers.
Usage
interlace256(w, x, y = NULL, z = NULL)
deinterlace256(u)
interlace256_columns(DT, new_colnames = 1L)
deinterlace256_columns(DT, new_colnames = 1L)
Arguments
w, x, y, z |
Raw vectors. A vector may be |
u |
An integer vector. |
DT |
A |
new_colnames |
A mechanism for producing the new columns. Currently only
|
Value
interlace256 Return an integer vector, compressing raw vectors.
deinterlace256 is the inverse operation, returning a list of four raw vectors.
setkey for raw columns
Description
setkey for raw columns
Usage
setkeyv256(DT, cols)
Arguments
DT |
A |
cols |
Column names as in |
Value
Same as data.table::setkeyv except that raw cols will be
converted to factors (as data.table does not allow raw keys).