Type: Package
Title: Synthesize Bio API Wrapper
Version: 2.0.0
Description: Access Synthesize Bio models from their API https://app.synthesize.bio/ using this wrapper that provides a convenient interface to the Synthesize Bio API, allowing users to generate realistic gene expression data based on specified biological conditions. This package enables researchers to easily access AI-generated transcriptomic data for various modalities including bulk RNA-seq, single-cell RNA-seq, microarray data, and more.
URL: https://github.com/synthesizebio/rsynthbio
BugReports: https://github.com/synthesizebio/rsynthbio/issues
Imports: getPass, keyring, jsonlite, httr
Suggests: rmarkdown, knitr, testthat (≥ 3.0.0), mockery
Config/testthat/edition: 3
Encoding: UTF-8
RoxygenNote: 7.3.2
VignetteBuilder: knitr
License: MIT + file LICENSE
NeedsCompilation: no
Packaged: 2025-06-02 20:48:23 UTC; candace
Author: Candace Savonen [aut, cre]
Maintainer: Candace Savonen <cansav09@gmail.com>
Repository: CRAN
Date/Publication: 2025-06-04 08:00:06 UTC

API Base URL

Description

Base URL for the Synthesize Bio API

Usage

API_BASE_URL

Format

An object of class character of length 1.


Model Modalities

Description

A nested list containing supported modalities for different model versions + sra = this is bulk RNA-seq

Usage

MODEL_MODALITIES

Format

A nested list with structure: model type > version > modalities


Clear Synthesize Bio API Token

Description

Clears the Synthesize Bio API token from the environment for the current R session. This is useful for security purposes when you've finished working with the API or when switching between different accounts.

Usage

clear_synthesize_token(remove_from_keyring = FALSE)

Arguments

remove_from_keyring

Logical, whether to also remove the token from the system keyring if it's stored there. Defaults to FALSE.

Value

Invisibly returns TRUE.

Examples

## Not run: 
# Clear token from current session only
clear_synthesize_token()

# Clear token from both session and keyring
clear_synthesize_token(remove_from_keyring = TRUE)

## End(Not run)

Extract Gene Expression Data from API Response

Description

Extracts and combines gene expression data from a complex API response, with proper formatting and metadata association.

Usage

extract_expression_data(parsed_content, as_counts = TRUE)

Arguments

parsed_content

The parsed API response list

as_counts

Logical, if FALSE, transforms the predicted expression counts into logCPM (default is TRUE, returning raw counts).

Value

A list with two components: - metadata: tibble containing sample metadata - expression: tibble containing combined gene expression data


Get Valid Modalities

Description

Returns a vector of possible output modalities for the supported model. These modalities represent different types of gene expression data that can be generated by the Synthesize Bio API. Note only version 2 can be accessed with this version of the package. If you would like to use v1 models return to 1.x.x versions of this package.

Usage

get_valid_modalities()

Value

A character vector containing the valid modality strings.

Examples

# Get all supported modalities
modalities <- get_valid_modalities()
print(modalities)

# Check if a specific modality is supported
"bulk_rna-seq" %in% get_valid_modalities()

Get Valid Modes

Description

Returns a vector of possible modes for the supported model. These modes represent different types of gene expression data that can be generated by the Synthesize Bio API.

Usage

get_valid_modes()

Value

A character vector containing the valid mode strings.

Examples

# Get all supported modes
modes <- get_valid_modes()
print(modes)

# Check if a specific mode is supported
"sample generation" %in% get_valid_modes()

Get Valid Query Example

Description

Generates a sample query for prediction and validation for the v2.0 model. This function provides an example query structure that can be modified for specific needs. The sample query contains two example inputs: one for a cell line with CRISPR perturbation and another for a primary tissue sample with disease information.

Usage

get_valid_query()

Value

A list representing a valid query structure for v2.0.

Examples

# Get a sample query
query <- get_valid_query()

# Modify the query for a different modality
query$modality <- "bulk_rna-seq"

# Adjust the number of samples to generate
query$inputs[[1]]$num_samples <- 10

Check if Synthesize Bio API Token is Set

Description

Checks whether a Synthesize Bio API token is currently set in the environment. Useful for conditional code that requires an API token.

Usage

has_synthesize_token()

Value

Logical, TRUE if token is set, FALSE otherwise.

Examples

 ## Not run: 
# Check if token is set
if (!has_synthesize_token()) {
  # Prompt for token if not set
  set_synthesize_token()
}

## End(Not run)

Load Synthesize Bio API Token from Keyring

Description

Loads the previously stored Synthesize Bio API token from the system keyring and sets it in the environment for the current session.

Usage

load_synthesize_token_from_keyring()

Value

Invisibly returns TRUE if successful, FALSE if token not found in keyring.

Examples

## Not run: 
# Load token from keyring
load_synthesize_token_from_keyring()

## End(Not run)

Log CPM Transformation

Description

Transforms raw counts expression data into log1p(CPM) (Counts Per Million). This is a common normalization method for gene expression data that accounts for library size differences and applies a log transformation to reduce the effect of outliers.

Usage

log_cpm(expression)

Arguments

expression

A data.frame containing raw counts expression data.

Value

A data.frame containing log1p(CPM) transformed data.

Examples

# Create a sample expression matrix with raw counts
raw_counts <- data.frame(
  gene1 = c(100, 200, 300),
  gene2 = c(50, 100, 150),
  gene3 = c(10, 20, 30)
)

# Transform to log CPM
log_cpm_data <- log_cpm(raw_counts)
print(log_cpm_data)

Predict Gene Expression

Description

Sends a query to the Synthesize Bio API (v2.0) for prediction and retrieves gene expression samples. This function validates the query, sends it to the API, and processes the response into usable data frames.

Usage

predict_query(query, raw_response = FALSE, as_counts = TRUE)

Arguments

query

A list representing the query data to send to the API. Use 'get_valid_query()' to generate an example.

raw_response

If you do not want the gene expression data extracted from the JSON response set this to FALSE. Default is to return only the expression and metadata.

as_counts

passed to extract_expression() function. Logical, if FALSE, transforms the predicted expression counts into logCPM (default is TRUE, returning raw counts).

Value

A list with two data frames: - 'metadata': contains metadata for each sample - 'expression': contains expression data for each sample Throws an error If the API request fails or the response structure is invalid.

Examples

# Set your API key (in practice, use a more secure method)
## Not run: 

# To start using pysynthbio, first you need to have an account with synthesize.bio.
# Go here to create one: https://app.synthesize.bio/

Sys.setenv(SYNTHESIZE_API_KEY = "your_api_key_here")

# Create a query
query <- get_valid_query()

# Request raw counts
result <- predict_query(query, as_counts = TRUE)

# Access the results
metadata <- result$metadata
expression <- result$expression

# Request log CPM transformed data
log_result <- predict_query(query, as_counts = FALSE)
log_expression <- log_result$expression

# Explore the top expressed genes in the first sample
head(sort(expression[1, ], decreasing = TRUE))

## End(Not run)

Set Synthesize Bio API Token

Description

Securely prompts for and stores the Synthesize Bio API token in the environment. This function uses getPass to securely handle the token input without displaying it in the console. The token is stored in the SYNTHESIZE_API_KEY environment variable for the current R session.

Usage

set_synthesize_token(use_keyring = FALSE, token = NULL)

Arguments

use_keyring

Logical, whether to also store the token securely in the system keyring for future sessions. Defaults to FALSE.

token

Character, optional. If provided, uses this token instead of prompting. This parameter should only be used in non-interactive scripts.

Value

Invisibly returns TRUE if successful.

Examples

# Interactive prompt for token
## Not run: 
set_synthesize_token()

# Provide token directly (less secure, not recommended for interactive use)
set_synthesize_token(token = "your-token-here")

# Store in system keyring for future sessions
set_synthesize_token(use_keyring = TRUE)

## End(Not run)

Validate Query Modality

Description

Validates that the modality specified in the query is allowed for the v2.0 model. This function checks that the 'modality' value is one of the supported modalities.

Usage

validate_modality(query)

Arguments

query

A list containing the query data.

Value

Invisibly returns TRUE if validation passes. Throws an error If the modality key is missing or if the selected modality is not allowed.

Examples

# Create a valid query
query <- get_valid_query()
validate_modality(query) # Passes validation

# Example with invalid modality
## Not run: 
invalid_query <- get_valid_query()
invalid_query$modality <- "unsupported_modality"
validate_modality(invalid_query) # Throws error for invalid modality

## End(Not run)

Validate Query Structure

Description

Validates the structure and contents of the query based on the v2.0 model. This function checks that the query is a list and contains all required keys.

Usage

validate_query(query)

Arguments

query

A list containing the query data.

Value

Invisibly returns TRUE if validation passes. Throws an error If the query structure is invalid or missing required keys.

Examples

# Create a valid query
query <- get_valid_query()
validate_query(query) # Passes validation

# Example with invalid query (missing required key)
## Not run: 
invalid_query <- list(inputs = list(), mode = "mean estimation")
validate_query(invalid_query) # Throws error for missing modality

## End(Not run)