Package nuggets searches for patterns that can be
expressed as formulae in the form of elementary conjunctions, referred
to in this text as conditions. Conditions are constructed from
predicates, which correspond to data columns. The
interpretation of conditions depends on the choice of underlying
logic:
Crisp (Boolean) logic: each predicate takes values
TRUE (1) or FALSE (0). The truth value of a
condition is computed according to the rules of classical Boolean
algebra.
Fuzzy logic: each predicate is assigned a truth degree from the interval \([0, 1]\). The truth degree of a conjunction is then computed using a chosen triangular norm (t-norm). The package supports three common t-norms, which are defined for predicates’ truth degrees \(a, b \in [0, 1]\) as follows:
Before applying nuggets, data columns intended as
predicates must be prepared either by dichotomization
(conversion into dummy variables) or by transformation into
fuzzy sets. The package provides functions for both
transformations. See the Data
Preparation vignette for a comprehensive guide, or the section Data Preparation below for a quick
overview.
nuggets implements functions to search for pre-defined
types of patterns, for example:
dig_associations() for association rules,dig_baseline_contrasts(),
dig_complement_contrasts(), and
dig_paired_baseline_contrasts() for various contrast
patterns on numeric variables,dig_correlations() for conditional correlations.See Pre-defined Patterns below for further details.
Discovered rules and patterns can be post-processed, visualized, and explored interactively. Section Post-processing and Visualization describes these features.
Finally, the package allows users to provide custom evaluation functions for conditions and to search for user-defined types of patterns:
dig() is a general function for searching arbitrary
pattern types.dig_grid() is a wrapper around dig() for
patterns defined by conditions and a pair of columns evaluated by a
user-defined function.See Custom Patterns for more information.
Before applying nuggets, data columns intended as
predicates must be prepared either by dichotomization
(conversion into dummy variables) or by transformation into
fuzzy sets. The package provides the partition()
function for both transformations.
For a detailed guide to data preparation, including information about all available functions and advanced techniques, please see the Data Preparation vignette.
For crisp patterns, numeric columns are transformed to logical
(TRUE/FALSE) columns. Here’s a quick example
using the built-in mtcars dataset:
# Transform the whole dataset to crisp predicates
# First, convert cyl to a factor for illustration
crisp_mtcars <- mtcars |>
    mutate(cyl = factor(cyl, levels = c(4, 6, 8), labels = c("four", "six", "eight"))) |>
    partition(cyl, vs:gear, .method = "dummy") |>
    partition(mpg, .method = "crisp", .breaks = c(-Inf, 15, 20, 30, Inf)) |>
    partition(disp:carb, .method = "crisp", .breaks = 3) 
head(crisp_mtcars, n = 3)
#> # A tibble: 3 × 32
#>   `cyl=four` `cyl=six` `cyl=eight` `vs=0` `vs=1` `am=0` `am=1` `gear=3` `gear=4`
#>   <lgl>      <lgl>     <lgl>       <lgl>  <lgl>  <lgl>  <lgl>  <lgl>    <lgl>   
#> 1 FALSE      TRUE      FALSE       TRUE   FALSE  FALSE  TRUE   FALSE    TRUE    
#> 2 FALSE      TRUE      FALSE       TRUE   FALSE  FALSE  TRUE   FALSE    TRUE    
#> 3 TRUE       FALSE     FALSE       FALSE  TRUE   FALSE  TRUE   FALSE    TRUE    
#>   `gear=5` `mpg=(-Inf;15]` `mpg=(15;20]` `mpg=(20;30]` `mpg=(30;Inf]`
#>   <lgl>    <lgl>           <lgl>         <lgl>         <lgl>         
#> 1 FALSE    FALSE           FALSE         TRUE          FALSE         
#> 2 FALSE    FALSE           FALSE         TRUE          FALSE         
#> 3 FALSE    FALSE           FALSE         TRUE          FALSE         
#>   `disp=(-Inf;205]` `disp=(205;338]` `disp=(338;Inf]` `hp=(-Inf;146]`
#>   <lgl>             <lgl>            <lgl>            <lgl>          
#> 1 TRUE              FALSE            FALSE            TRUE           
#> 2 TRUE              FALSE            FALSE            TRUE           
#> 3 TRUE              FALSE            FALSE            TRUE           
#>   `hp=(146;241]` `hp=(241;Inf]` `drat=(-Inf;3.48]` `drat=(3.48;4.21]`
#>   <lgl>          <lgl>          <lgl>              <lgl>             
#> 1 FALSE          FALSE          FALSE              TRUE              
#> 2 FALSE          FALSE          FALSE              TRUE              
#> 3 FALSE          FALSE          FALSE              TRUE              
#>   `drat=(4.21;Inf]` `wt=(-Inf;2.82]` `wt=(2.82;4.12]` `wt=(4.12;Inf]`
#>   <lgl>             <lgl>            <lgl>            <lgl>          
#> 1 FALSE             TRUE             FALSE            FALSE          
#> 2 FALSE             FALSE            TRUE             FALSE          
#> 3 FALSE             TRUE             FALSE            FALSE          
#>   `qsec=(-Inf;17.3]` `qsec=(17.3;20.1]` `qsec=(20.1;Inf]` `carb=(-Inf;3.33]`
#>   <lgl>              <lgl>              <lgl>             <lgl>             
#> 1 TRUE               FALSE              FALSE             FALSE             
#> 2 TRUE               FALSE              FALSE             FALSE             
#> 3 FALSE              TRUE               FALSE             TRUE              
#>   `carb=(3.33;5.67]` `carb=(5.67;Inf]`
#>   <lgl>              <lgl>            
#> 1 TRUE               FALSE            
#> 2 TRUE               FALSE            
#> 3 FALSE              FALSENow all columns are logical and can be used as predicates in crisp conditions.
Fuzzy predicates express the degree to which a condition is satisfied, with values in the interval \([0,1]\). This allows modeling of smooth transitions between categories:
# Start with fresh mtcars and transform to fuzzy predicates
fuzzy_mtcars <- mtcars |>
    mutate(cyl = factor(cyl, levels = c(4, 6, 8), labels = c("four", "six", "eight"))) |>
    partition(cyl, vs:gear, .method = "dummy") |>
    partition(mpg, .method = "triangle", .breaks = c(-Inf, 15, 20, 30, Inf)) |>
    partition(disp:carb, .method = "triangle", .breaks = 3) 
head(fuzzy_mtcars, n = 3)
#> # A tibble: 3 × 31
#>   `cyl=four` `cyl=six` `cyl=eight` `vs=0` `vs=1` `am=0` `am=1` `gear=3` `gear=4`
#>   <lgl>      <lgl>     <lgl>       <lgl>  <lgl>  <lgl>  <lgl>  <lgl>    <lgl>   
#> 1 FALSE      TRUE      FALSE       TRUE   FALSE  FALSE  TRUE   FALSE    TRUE    
#> 2 FALSE      TRUE      FALSE       TRUE   FALSE  FALSE  TRUE   FALSE    TRUE    
#> 3 TRUE       FALSE     FALSE       FALSE  TRUE   FALSE  TRUE   FALSE    TRUE    
#>   `gear=5` `mpg=(-Inf;15;20)` `mpg=(15;20;30)` `mpg=(20;30;Inf)`
#>   <lgl>                 <dbl>            <dbl>             <dbl>
#> 1 FALSE                     0             0.9               0.1 
#> 2 FALSE                     0             0.9               0.1 
#> 3 FALSE                     0             0.72              0.28
#>   `disp=(-Inf;71.1;272)` `disp=(71.1;272;472)` `disp=(272;472;Inf)`
#>                    <dbl>                 <dbl>                <dbl>
#> 1                  0.557                 0.443                    0
#> 2                  0.557                 0.443                    0
#> 3                  0.816                 0.184                    0
#>   `hp=(-Inf;52;194)` `hp=(52;194;335)` `hp=(194;335;Inf)`
#>                <dbl>             <dbl>              <dbl>
#> 1              0.592             0.408                  0
#> 2              0.592             0.408                  0
#> 3              0.711             0.289                  0
#>   `drat=(-Inf;2.76;3.84)` `drat=(2.76;3.84;4.93)` `drat=(3.84;4.93;Inf)`
#>                     <dbl>                   <dbl>                  <dbl>
#> 1                       0                   0.945                0.0550 
#> 2                       0                   0.945                0.0550 
#> 3                       0                   0.991                0.00917
#>   `wt=(-Inf;1.51;3.47)` `wt=(1.51;3.47;5.42)` `wt=(3.47;5.42;Inf)`
#>                   <dbl>                 <dbl>                <dbl>
#> 1                 0.434                 0.566                    0
#> 2                 0.304                 0.696                    0
#> 3                 0.587                 0.413                    0
#>   `qsec=(-Inf;14.5;18.7)` `qsec=(14.5;18.7;22.9)` `qsec=(18.7;22.9;Inf)`
#>                     <dbl>                   <dbl>                  <dbl>
#> 1                  0.533                    0.467                      0
#> 2                  0.4                      0.6                        0
#> 3                  0.0214                   0.979                      0
#>   `carb=(-Inf;1;4.5)` `carb=(1;4.5;8)` `carb=(4.5;8;Inf)`
#>                 <dbl>            <dbl>              <dbl>
#> 1               0.143            0.857                  0
#> 2               0.143            0.857                  0
#> 3               1                0                      0Note that the cyl, vs, am, and
gear columns are still represented by dummy logical
columns, while the numeric columns are now represented by fuzzy sets.
This combination allows both crisp and fuzzy predicates to be used
together in pattern discovery.
The nuggets package provides powerful and flexible data
preparation tools. The Data
Preparation vignette covers these capabilities in depth,
including:
.span and .inc
parameters for overlapping fuzzy setsis_almost_constant() and
remove_almost_constant() to identify and filter
uninformative columnsdig_tautologies() to find always-true rules that can be
used to prune search spacesFor example, you can use quantile-based partitioning to ensure balanced predicates, or use raised-cosine fuzzy sets with custom labels to create meaningful linguistic terms like “very_low”, “low”, “medium”, “high”, and “very_high”. These preparation choices significantly impact the interpretability and usefulness of patterns discovered in subsequent analyses.
The package nuggets provides a set of functions for
discovering some of the best-known pattern types. These functions can
process Boolean data, fuzzy data, or both. Each function returns a
tibble, where every row represents one detected pattern.
Note: This section assumes that the data have already been preprocessed — i.e., transformed into a binarized or fuzzified form. See the previous section Data Preparation for details on how to prepare your dataset (for example,
crisp_mtcarsandfuzzy_mtcars).
For more advanced workflows — such as defining custom pattern types or computing user-defined measures — see the section Custom Patterns.
Association rules identify conditions (antecedents) under which a specific feature (consequent) is present very often.
\[ A \Rightarrow C \]
If condition A is satisfied, then the feature
C tends to be present.
For example,
university_edu & middle_age & IT_industry => high_income
can be read as:
People in middle age with university education working in IT
industry are very likely to have a high income.
In practice, the antecedent A is a set of predicates,
and the consequent C is usually a single predicate.
For a set of predicates \(I\), let \(\text{supp}(I)\) denote the support — the relative frequency (for logical data) or the mean truth degree (for fuzzy data) of rows satisfying all predicates in \(I\). Using this notation:
Optional additional measures ("lift",
"conviction", "added_value") can be computed
using the measures argument.
Before searching for rules, it is recommended to create a vector of disjoints, which specifies predicates that must not appear together in the same condition. This vector should have the same length as the number of dataset columns.
For example, columns representing gear=3 and
gear=4 are mutually exclusive, so their shared group label
in disj prevents meaningless conditions like
gear=3 & gear=4. You can conveniently generate this
vector with var_names():
disj <- var_names(colnames(fuzzy_mtcars))
print(disj)
#>  [1] "cyl"  "cyl"  "cyl"  "vs"   "vs"   "am"   "am"   "gear" "gear" "gear"
#> [11] "mpg"  "mpg"  "mpg"  "disp" "disp" "disp" "hp"   "hp"   "hp"   "drat"
#> [21] "drat" "drat" "wt"   "wt"   "wt"   "qsec" "qsec" "qsec" "carb" "carb"
#> [31] "carb"The dig_associations() function searches for association
rules. Its main arguments are:
x: the data matrix or data frame (logical or
numeric);antecedent, consequent: tidyselect
expressions selecting columns for each side of the rule;disjoint: a vector defining mutually exclusive
predicates;min_support,
min_confidence, min_coverage, and limits like
min_length, max_length;measures,
t_norm, and contingency_table.In the following example, we search for fuzzy association rules in
the dataset fuzzy_mtcars, such that: - any column except
those starting with "am" may appear in the antecedent; -
columns starting with "am" may appear in the consequent; -
minimum support is 0.02; - minimum confidence is
0.8; - additional quality measures "lift" and
"conviction" are computed.
result <- dig_associations(fuzzy_mtcars,
                           antecedent = !starts_with("am"),
                           consequent = starts_with("am"),
                           disjoint = disj,
                           min_support = 0.02,
                           min_confidence = 0.8,
                           measures = c("lift", "conviction"),
                           contingency_table = TRUE)The result is a tibble containing the discovered rules and their quality metrics. You can arrange them, for example, by decreasing support:
result <- arrange(result, desc(support))
print(result)
#> # A tibble: 526 × 14
#>    antecedent                     consequent support confidence coverage
#>    <chr>                          <chr>        <dbl>      <dbl>    <dbl>
#>  1 {gear=3}                       {am=0}       0.469      1        0.469
#>  2 {gear=3,vs=0}                  {am=0}       0.375      1        0.375
#>  3 {cyl=eight,gear=3,vs=0}        {am=0}       0.375      1        0.375
#>  4 {cyl=eight,vs=0}               {am=0}       0.375      0.857    0.438
#>  5 {cyl=eight,gear=3}             {am=0}       0.375      1        0.375
#>  6 {cyl=eight}                    {am=0}       0.375      0.857    0.438
#>  7 {mpg=(-Inf;15;20)}             {am=0}       0.327      0.847    0.387
#>  8 {drat=(-Inf;2.76;3.84)}        {am=0}       0.311      0.948    0.328
#>  9 {gear=3,mpg=(-Inf;15;20)}      {am=0}       0.309      1        0.309
#> 10 {drat=(-Inf;2.76;3.84),gear=3} {am=0}       0.307      1        0.307
#>    conseq_support count antecedent_length    pp    pn    np    nn  lift
#>             <dbl> <dbl>             <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1          0.594 15                    1 15    0      4     13    1.68
#>  2          0.594 12                    2 12    0      7     13    1.68
#>  3          0.594 12                    3 12    0      7     13    1.68
#>  4          0.594 12                    2 12    2      7     11    1.44
#>  5          0.594 12                    2 12    0      7     13    1.68
#>  6          0.594 12                    1 12    2      7     11    1.44
#>  7          0.594 10.5                  1 10.5  1.90   8.52  11.1  1.43
#>  8          0.594  9.96                 1  9.96 0.546  9.04  12.5  1.60
#>  9          0.594  9.88                 2  9.88 0      9.12  13.0  1.68
#> 10          0.594  9.82                 2  9.82 0      9.18  13    1.68
#>    conviction
#>         <dbl>
#>  1     Inf   
#>  2     Inf   
#>  3     Inf   
#>  4       2.84
#>  5     Inf   
#>  6       2.84
#>  7       2.65
#>  8       7.82
#>  9     Inf   
#> 10     Inf   
#> # ℹ 516 more rowsThis example illustrates the typical workflow for mining association
rules with nuggets. The same structure and arguments apply
when analyzing either fuzzy or Boolean datasets.
TBD (dig_correlations)
TBD (dig_contrasts)
TBD
The nuggets package allows to execute a user-defined
callback function on each generated frequent condition. That way a
custom type of patterns may be searched. The following example
replicates the search for associations rules with the custom callback
function. For that, a dataset has to be dichotomized and the disjoint
vector created as in the Data Preparation section
above:
As we want to search for associations rules with some minimum support and confidence, we define the variables to hold that thresholds. We also need to define a callback function that will be called for each found frequent condition. Its purpose is to generate the rules with the obtained condition as an antecedent:
min_support <- 0.02
min_confidence <- 0.8
f <- function(condition, support, foci_supports) {
    conf <- foci_supports / support
    sel <- !is.na(conf) & conf >= min_confidence & !is.na(foci_supports) & foci_supports >= min_support
    conf <- conf[sel]
    supp <- foci_supports[sel]
    
    lapply(seq_along(conf), function(i) { 
      list(antecedent = format_condition(names(condition)),
           consequent = format_condition(names(conf)[[i]]),
           support = supp[[i]],
           confidence = conf[[i]])
    })
}The callback function f() defines three arguments:
condition, support and
foci_supports. The names of the arguments are not random.
Based on the argument names of the callback function, the searching
algorithm provides information to the function. Here
condition is a vector of indices representing the
conjunction of predicates in a condition. By the predicate we mean the
column in the source dataset. The support argument gets the
relative frequency of the condition in the dataset.
foci_supports is a vector of supports of special
predicates, which we call “foci” (plural of “focus”), within the rows
satisfying the condition. For associations rules, foci are potential
rule consequents.
Now we can run the digging for rules:
#result <- dig(fuzzyCO2,
              #f = f,
              #condition = !starts_with("Treatment"),
              #focus = starts_with("Treatment"),
              #disjoint = disj,
              #min_length = 1,
              #min_support = min_support)As we return a list of lists in the callback function, we have to flatten the first level of lists in the result and binding it into a data frame: