--- title: "From R to CFF" subtitle: "Crosswalk" description: > A comprehensive description of the internal mappings performed by the cffr package. author: Diego Hernangómez bibliography: REFERENCES.bib link-citations: true toc: true vignette: > %\VignetteIndexEntry{From R to CFF} %\VignetteEngine{quarto::html} %\VignetteEncoding{UTF-8} tbl-cap-location: bottom knitr: opts_chunk: collapse: true comment: "#>" code-fold: true code-summary: "**Example**" --- ```{r} #| include: false library(cffr) ``` The goal of this vignette is to provide an explicit map between the metadata fields used by **cffr** and each one of the valid keys of the [Citation File Format schema version 1.2.0](https://github.com/citation-file-format/citation-file-format/blob/main/schema-guide.md#valid-keys). ## Summary {#summary} We summarize here the fields that **cffr** can coerce and the original source of information for each one of them. The details on each key are presented in the next section of the document. The assessment of fields is based on the [Guide to Citation File Format schema version 1.2.0](https://github.com/citation-file-format/citation-file-format/blob/main/schema-guide.md#valid-keys) [@druskat_citation_2021]. ```{r} #| label: tbl-summary #| echo: false #| tbl-cap: "cffr and R packages: Conversion table" keys <- cff_schema_keys(sorted = TRUE) origin <- vector(length = length(keys)) origin[keys == "cff-version"] <- "argument on function" origin[keys == "type"] <- "Fixed value: 'software'" origin[keys == "identifiers"] <- "DESCRIPTION/CITATION files" origin[keys == "references"] <- "DESCRIPTION/CITATION files" origin[ keys %in% c( "message", "title", "version", "authors", "abstract", "repository", "repository-code", "url", "date-released", "contact", "keywords", "license", "commit" ) ] <- "DESCRIPTION file" origin[ keys %in% c( "doi", "preferred-citation" ) ] <- "CITATION file" origin[origin == FALSE] <- "Ignored by cffr" df <- data.frame( key = paste0("", keys, ""), source = origin ) knitr::kable(df, escape = FALSE) ``` ## Details ### abstract This key is extracted from the `"Description"` field of the `DESCRIPTION` file. ```{r} #| label: abstract library(cffr) # Create cffr for yaml cff_obj <- cff_create("rmarkdown") # Get DESCRIPTION of rmarkdown to check pkg <- desc::desc(file.path(find.package("rmarkdown"), "DESCRIPTION")) cat(cff_obj$abstract) cat(pkg$get("Description")) ``` Back to @tbl-summary. ### authors This key is coerced from the `"Authors"` or `"Authors@R"` field of the `DESCRIPTION` file. By default persons with the role `"aut"` or `"cre"` are considered, however this can be modified via the `authors_roles` argument. ```{r} #| label: authors # An example DESCRIPTION path <- system.file("examples/DESCRIPTION_many_persons", package = "cffr") pkg <- desc::desc(path) # See persons listed pkg$get_authors() # Default behaviour, use authors and creators (maintainers) cff_obj <- cff_create(path) cff_obj$authors # Use now Copyright holders and maintainers cff_obj_alt <- cff_create(path, authors_roles = c("cre", "cph")) cff_obj_alt$authors ``` Back to @tbl-summary. ### cff-version This key can be set via the arguments of the `cff_create()`/`cff_write()` functions: ```{r} #| label: cffversion cff_objv110 <- cff_create("jsonlite", cff_version = "v1.1.0") cat(cff_objv110$`cff-version`) ``` Back to @tbl-summary. ### commit This key is extracted from the `"RemoteSha"` field of the `DESCRIPTION` file. This is the case of packages installed using the [r-universe](https://r-universe.dev/search) or packages such as **remotes** or **pak**. ```{r} #| label: commit # An example DESCRIPTION path <- system.file("examples/DESCRIPTION_r_universe", package = "cffr") pkg <- desc::desc(path) # See RemoteSha pkg$get("RemoteSha") cff_read(path) ``` Back to @tbl-summary. ### contact This key is coerced from the `"Authors"` or `"Authors@R"` field of the `DESCRIPTION` file. Only persons with the role `"cre"` (i.e, the maintainer(s)) are considered. ```{r} #| label: contact cff_obj <- cff_create("rmarkdown") pkg <- desc::desc(file.path(find.package("rmarkdown"), "DESCRIPTION")) cff_obj$contact pkg$get_author() ``` Back to @tbl-summary. ### date-released This key is extracted from the `DESCRIPTION` file following this logic: - `"Date"` field or, - If not present, from `"Date/Publication"`. This is present on packages built on **CRAN** and **Bioconductor**. or, - If not present, from `"Packaged"`, that is present on packages built by the [r-universe](https://r-universe.dev/search). ```{r} #| label: date-released # From an installed package cff_obj <- cff_create("rmarkdown") pkg <- desc::desc(file.path(find.package("rmarkdown"), "DESCRIPTION")) cat(pkg$get("Date/Publication")) cat(cff_obj$`date-released`) # A DESCRIPTION file without a Date nodate <- system.file("examples/DESCRIPTION_basic", package = "cffr") tmp <- tempfile("DESCRIPTION") # Create a temporary file file.copy(nodate, tmp) pkgnodate <- desc::desc(tmp) cffnodate <- cff_create(tmp) # Won't appear cat(cffnodate$`date-released`) pkgnodate # Adding a Date desc::desc_set("Date", "1999-01-01", file = tmp) cat(cff_create(tmp)$`date-released`) ``` Back to @tbl-summary. ### doi {#doi} This key is coerced from the `"doi"` field of the [preferred-citation](#preferred-citation) object. If not present and the package is on **CRAN**, it would be populated with the doi provided by **CRAN** (e.g. ). ```{r} #| label: doi cff_doi <- cff_create("cffr") cat(cff_doi$doi) cat(cff_doi$`preferred-citation`$doi) ``` Back to @tbl-summary. ### identifiers This key includes all the possible identifiers of the package: - From the `DESCRIPTION` field, it includes all the URLs not included in [url](#url) or [repository-code](#repository-code). - From the `CITATION` file, it includes all the DOIs not included in [doi](#doi) and the identifiers (if any) not included in the `"identifiers"` key of [preferred-citation](#preferred-citation). - If the package is on **CRAN** and it has a `CITATION` file providing a doi, the doi provided by **CRAN** would be added as well. ```{r} #| label: identifiers file <- system.file("examples/DESCRIPTION_many_urls", package = "cffr") pkg <- desc::desc(file) cat(pkg$get_urls()) cat(cff_create(file)$url) cat(cff_create(file)$`repository-code`) cff_create(file)$identifiers ``` Back to @tbl-summary. ### keywords This key is extracted from the `DESCRIPTION` file. The keywords should appear in the `DESCRIPTION` as: ``` ... X-schema.org-keywords: keyword1, keyword2, keyword3 ``` ```{r} #| label: keyword # A DESCRIPTION file without keywords nokeywords <- system.file("examples/DESCRIPTION_basic", package = "cffr") tmp2 <- tempfile("DESCRIPTION") # Create a temporary file file.copy(nokeywords, tmp2) pkgnokeywords <- desc::desc(tmp2) cffnokeywords <- cff_create(tmp2) # Won't appear cat(cffnokeywords$keywords) pkgnokeywords # Adding Keywords desc::desc_set( "X-schema.org-keywords", "keyword1, keyword2, keyword3", file = tmp2 ) cat(cff_create(tmp2)$keywords) ``` Additionally, if the source code of the package is hosted on GitHub, **cffr** can retrieve the topics of the repository via the [GitHub API](https://docs.github.com/en/rest) and include those topics as keywords. This option is controlled via the `gh_keywords` argument: ```{r} #| label: ghkeyword # Get cff object from jsonvalidate jsonval <- cff_create("jsonvalidate") # Keywords are retrieved from the GitHub repo jsonval # Check keywords jsonval$keywords # The repo jsonval$`repository-code` ``` Back to @tbl-summary. ### license This key is extracted from the `"License"` field of the `DESCRIPTION` file. ```{r} #| label: license cff_obj <- cff_create("yaml") cat(cff_obj$license) pkg <- desc::desc(file.path(find.package("yaml"), "DESCRIPTION")) cat(pkg$get("License")) ``` Back to @tbl-summary. ### license-url This key is not extracted from the metadata of the package. See the description on the [Guide to CFF schema v1.2.0](https://github.com/citation-file-format/citation-file-format/blob/main/schema-guide.md#license-url). > - **description**: The URL of the license text under which the software or > dataset is licensed (only for non-standard licenses not included in the > [SPDX License > List](https://github.com/citation-file-format/citation-file-format/blob/main/schema-guide.md#definitionslicense-enum)). > > - **usage**: > > ``` yaml > license-url: "https://obscure-licenses.com?id=1234" > ``` Back to @tbl-summary. ### message This key is extracted from the `DESCRIPTION` field, specifically as: ```{r} #| eval: false #| code-fold: false msg <- paste0( 'To cite package "', "NAME_OF_THE_PACKAGE", '" in publications use:' ) ``` ```{r} #| label: message cat(cff_create("jsonlite")$message) ``` Back to @tbl-summary. ### preferred-citation {#preferred-citation} This key is extracted from the `CITATION` file. If several references are provided, it would select the first citation as the `"preferred-citation"` and the rest of them as [references](#references). ```{r} #| label: preferred-citation cffobj <- cff_create("rmarkdown") cffobj$`preferred-citation` citation("rmarkdown")[1] ``` Back to @tbl-summary. ### references {#references} This key is extracted from the `CITATION` file if several references are provided. The first citation is considered as the [preferred-citation](#preferred-citation) and the rest of them as `"references"`. It also extracts the package dependencies and adds those to this fields using `citation(auto = TRUE)` on each dependency. ```{r} #| label: references cffobj <- cff_create("rmarkdown") cffobj$references citation("rmarkdown")[-1] ``` Back to @tbl-summary. ### repository This key is extracted from the `"Repository"` field of the `DESCRIPTION` file. Usually, this field is auto-populated when a package is hosted on a repository (like **CRAN** or the [r-universe](https://r-universe.dev/search)). For packages without this field on the `DESCRIPTION` (which is the typical case for an in-development package), **cffr** would try to search the package on any of the default repositories specified on `options("repos")`. In the case of [Bioconductor](https://bioconductor.org/) packages, those are identified if a ["biocViews"](https://contributions.bioconductor.org/description.html#biocviews) is present on the `DESCRIPTION` file. If **cffr** detects that the package is available on **CRAN**, it returns the canonical URL form of the package (i.e. ). ```{r} #| label: repository # Installed package inst <- cff_create("jsonlite") cat(inst$repository) # Demo file downloaded from the r-universe runiv <- system.file("examples/DESCRIPTION_r_universe", package = "cffr") runiv_cff <- cff_create(runiv) cat(runiv_cff$repository) desc::desc(runiv)$get("Repository") # For in development package norepo <- system.file("examples/DESCRIPTION_basic", package = "cffr") # No repo norepo_cff <- cff_create(norepo) cat(norepo_cff[["repository"]]) # Change the name to a known package on CRAN: ggplot2 tmp <- tempfile("DESCRIPTION") file.copy(norepo, tmp) # Change name desc::desc_set("Package", "ggplot2", file = tmp) cat(cff_create(tmp)[["repository"]]) ``` Back to @tbl-summary. ### repository-artifact This key is not extracted from the metadata of the package. See the description on the [Guide to CFF schema v1.2.0](https://github.com/citation-file-format/citation-file-format/blob/main/schema-guide.md#repository-artifact). > - **description**: The URL of the work in a build artifact/binary repository > (when the work is software). > > - **usage**: > > ``` yaml > repository-artifact: "https://search.maven.org/artifact/org.corpus-tools/cff-maven-plugin/0.4.0/maven-plugin" > ``` Back to @tbl-summary. ### repository-code {#repository-code} This key is extracted from the `"BugReports"` or `"URL"` fields on the `DESCRIPTION` file. **cffr** tries to identify the url of the source on the following repositories: - [GitHub](https://github.com/). - [GitLab](https://about.gitlab.com/). - [R-Forge](https://r-forge.r-project.org/). - [Bitbucket](https://bitbucket.org/). - [Codeberg](https://codeberg.org/). ```{r} #| label: repository-code # Installed package on GitHub cat(cff_create("jsonlite")$`repository-code`) # GitLab gitlab <- system.file("examples/DESCRIPTION_gitlab", package = "cffr") cat(cff_create(gitlab)$`repository-code`) # Codeberg codeberg <- system.file("examples/DESCRIPTION_codeberg", package = "cffr") cat(cff_create(codeberg)$`repository-code`) ``` Back to @tbl-summary. ### title This key is extracted from the `"Description"` field of the `DESCRIPTION` file. ```{r} #| eval: false #| code-fold: false title <- paste0("NAME_OF_THE_PACKAGE", ": ", "TITLE_OF_THE_PACKAGE") ``` ```{r} #| label: title # Installed package cat(cff_create("testthat")$title) ``` Back to @tbl-summary. ### type Fixed value equal to `"software"`. The other possible value is `"dataset"`. See the description on the [Guide to CFF schema v1.2.0](https://github.com/citation-file-format/citation-file-format/blob/main/schema-guide.md#type). Back to @tbl-summary. ### url {#url} This key is extracted from the `"BugReports"` or `"URL"` fields on the `DESCRIPTION` file. It corresponds to the first url that is different to [repository-code](#repository-code). ```{r} #| label: url # Many urls manyurls <- system.file("examples/DESCRIPTION_many_urls", package = "cffr") cat(cff_create(manyurls)$url) # Check desc::desc(manyurls) ``` Back to @tbl-summary. ### version This key is extracted from the `"Version"` field on the `DESCRIPTION` file. ```{r} #| label: version #| code-fold: false # Should be (>= 3.0.0) cat(cff_create("testthat")$version) ``` Back to @tbl-summary. ## References