Type: | Package |
Title: | Identify Duplicated R Code in a Project |
Version: | 0.3.0 |
Author: | Russ Hyde, University of Glasgow |
Maintainer: | Russ Hyde <russ.hyde.data@gmail.com> |
Description: | Identifies code blocks that have a high level of similarity within a set of R files. |
URL: | https://github.com/russHyde/dupree |
BugReports: | https://github.com/russHyde/dupree/issues |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
Language: | en-GB |
LazyData: | true |
Suggests: | testthat (≥ 2.1.0), knitr, rmarkdown |
Imports: | dplyr, purrr, tibble, magrittr, methods, stringdist (≥ 0.9.5.5), lintr (≥ 2.0.0), rlang |
RoxygenNote: | 7.1.0 |
Collate: | 'utils.R' 'dupree.R' 'dupree_classes.R' 'dupree_data_validity.R' 'dupree_code_enumeration.R' 'dups-class.R' |
NeedsCompilation: | no |
Packaged: | 2020-03-30 17:27:58 UTC; russ |
Repository: | CRAN |
Date/Publication: | 2020-04-21 10:20:02 UTC |
An S4 class to represent the code blocks as strings of integers
Description
An S4 class to represent the code blocks as strings of integers
Slots
blocks
A tbl_df with columns 'file', 'block', 'start_line' and 'enumerated_code'
as.data.frame method for 'dups' class
Description
as.data.frame method for 'dups' class
Usage
## S3 method for class 'dups'
as.data.frame(x, ...)
Arguments
x |
any R object. |
... |
additional arguments to be passed to or from methods. |
convert a 'dups' object to a 'tibble'
Description
convert a 'dups' object to a 'tibble'
Usage
## S3 method for class 'dups'
as_tibble(x, ...)
Arguments
x |
A data frame, list, matrix, or other object that could reasonably be coerced to a tibble. |
... |
Other arguments passed on to individual methods. |
Detect code duplication between the code-blocks in a set of files
Description
This function identifies all code-blocks in a set of files and then computes a similarity score between those code-blocks to help identify functions / classes that have a high level of duplication, and could possibly be refactored.
Usage
dupree(files, min_block_size = 40, ...)
Arguments
files |
A set of files over which code-duplication should be measured. |
min_block_size |
|
... |
Unused at present. |
Details
Code-blocks under a size threshold are disregarded before analysis (the size
threshold is controlled by min_block_size
); and only top-level code
blocks are considered.
Every sufficiently large code-block in the input files will be present in the results at least once. If code-block X and code-block Y are present in a row of the resulting data-frame, then either X is the closest match to Y, or Y is the closest match to X (or possibly both) according to the similarity score; as such, some code-blocks may be present multiple times in the results.
Similarity between code-blocks is calculated using the
longest-common-subsequence (lcs
) measure from the package
stringdist
. This measure is applied to a tokenised version of the
code-blocks. That is, each function name / operator / variable in the code
blocks is converted to a unique integer so that a code-block can be
represented as a vector of integers and the lcs
measure is applied to
each pair of these vectors.
Value
A tibble
. Each row in the table summarises the
comparison between two code-blocks (block 'a' and block 'b') in the input
files. Each code-block in the pair is indicated by: i) the file
(file_a
/ file_b
) that contains it; ii) its position within
that file (block_a
/ block_b
; 1 being the first code-block in
a given file); and iii) the line where that code-block starts in that file
(line_a
/ line_b
). The pairs of code-blocks are ordered by
decreasing similarity. Any match that is returned is either the top hit for
block 'a' or for block 'b' (or both).
Examples
# To quantify duplication between the top-level code-blocks in a file
example_file <- system.file("extdata", "duplicated.R", package = "dupree")
dup <- dupree(example_file, min_block_size = 10)
dup
# For the block-pair with the highest duplication, we print the first four
# lines:
readLines(example_file)[dup$line_a[1] + c(0:3)]
readLines(example_file)[dup$line_b[1] + c(0:3)]
# The code-blocks in the example file are rather small, so if
# `min_block_size` is too large, none of the code-blocks will be analysed
# and the results will be empty:
dupree(example_file, min_block_size = 40)
Run duplicate-code detection over all R-files in a directory
Description
Run duplicate-code detection over all R-files in a directory
Usage
dupree_dir(
path = ".",
min_block_size = 40,
filter = NULL,
...,
recursive = TRUE
)
Arguments
path |
A directory (By default the current working directory). All files in this directory that have a ".R", ".r" or ".Rmd" extension will be checked for code duplication. |
min_block_size |
|
filter |
A pattern for use in grep - this is used to keep only particular files: eg, filter = "classes" would compare files with 'classes' in the filename |
... |
Further arguments for grep. For example, 'filter = "test", invert = TRUE' would disregard all files with 'test' in the file-path. |
recursive |
Should we consider files in subdirectories as well? |
See Also
dupree
Run duplicate-code detection over all files in the 'R' directory of a package
Description
The function fails if the path does not look like a typical R package (it should have both an R/ subdirectory and a DESCRIPTION file present).
Usage
dupree_package(package = ".", min_block_size = 40)
Arguments
package |
The name or path to the package that is to be checked (By default the current working directory). |
min_block_size |
|
See Also
dupree
print method for 'dups' class
Description
print method for 'dups' class
Usage
## S3 method for class 'dups'
print(x, ...)
Arguments
x |
an object used to select a method. |
... |
further arguments passed to or from other methods. |
Objects exported from other packages
Description
These objects are imported from other packages. Follow the links below to see their documentation.
- tibble