Help for package tfaddons

Type:

Package

Title:

Interface to 'TensorFlow SIG Addons'

Version:

0.10.0

Maintainer:

Turgut Abdullayev <turqut.a.314@gmail.com>

Description:

'TensorFlow SIG Addons' https://www.tensorflow.org/addons is a repository of community contributions that conform to well-established API patterns, but implement new functionality not available in core 'TensorFlow'. 'TensorFlow' natively supports a large number of operators, layers, metrics, losses, optimizers, and more. However, in a fast moving field like Machine Learning, there are many interesting new developments that cannot be integrated into core 'TensorFlow' (because their broad applicability is not yet clear, or it is mostly used by a smaller subset of the community).

License:

Apache License 2.0

URL:

https://github.com/henry090/tfaddons

BugReports:

https://github.com/henry090/tfaddons/issues

SystemRequirements:

TensorFlow >= 2.0 (https://www.tensorflow.org/)

Encoding:

UTF-8

LazyData:

true

RoxygenNote:

7.1.0

Imports:

reticulate, tensorflow, rstudioapi, keras, purrr

Suggests:

knitr, rmarkdown, testthat, dplyr

VignetteBuilder:

knitr

NeedsCompilation:

Packaged:

2020-05-25 15:30:15 UTC; turgutabdullayev

Author:

Turgut Abdullayev [aut, cre]

Repository:

CRAN

Date/Publication:

2020-06-02 08:50:04 UTC

Gelu

Description

Gaussian Error Linear Unit.

Usage

activation_gelu(x, approximate = TRUE)

Arguments

x

A 'Tensor'. Must be one of the following types: 'float16', 'float32', 'float64'.

approximate

bool, whether to enable approximation. Returns: A 'Tensor'. Has the same type as 'x'.

Details

Computes gaussian error linear: '0.5 * x * (1 + tanh(sqrt(2 / pi) * (x + 0.044715 * x^3)))' or 'x * P(X <= x) = 0.5 * x * (1 + erf(x / sqrt(2)))', where P(X) ~ N(0, 1), depending on whether approximation is enabled. See [Gaussian Error Linear Units (GELUs)](https://arxiv.org/abs/1606.08415) and [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805).

Value

A 'Tensor'. Has the same type as 'x'.

Computes gaussian error linear

'0.5 * x * (1 + tanh(sqrt(2 / pi) * (x + 0.044715 * x^3)))' or 'x * P(X <= x) = 0.5 * x * (1 + erf(x / sqrt(2)))', where P(X) ~ N(0, 1), depending on whether approximation is enabled.

Examples


## Not run: 
library(keras)
library(tfaddons)
model = keras_model_sequential() %>%
layer_conv_2d(filters = 10, kernel_size = c(3,3),input_shape = c(28,28,1),
              activation = activation_gelu)

## End(Not run)

Hardshrink

Description

Hard shrink function.

Usage

activation_hardshrink(x, lower = -0.5, upper = 0.5)

Arguments

x

A 'Tensor'. Must be one of the following types: 'float16', 'float32', 'float64'.

lower

'float', lower bound for setting values to zeros.

upper

'float', upper bound for setting values to zeros. Returns: A 'Tensor'. Has the same type as 'x'.

Details

Computes hard shrink function: 'x if x < lower or x > upper else 0'.

Value

A 'Tensor'. Has the same type as 'x'.

Computes hard shrink function

'x if x < lower or x > upper else 0'.

Examples


## Not run: 
library(keras)
library(tfaddons)
model = keras_model_sequential() %>%
layer_conv_2d(filters = 10, kernel_size = c(3,3),input_shape = c(28,28,1),
              activation = activation_hardshrink)

## End(Not run)

Lisht

Description

LiSHT: Non-Parameteric Linearly Scaled Hyperbolic Tangent Activation Function.

Usage

activation_lisht(x)

Arguments

x

A 'Tensor'. Must be one of the following types: 'float16', 'float32', 'float64'.

Details

Computes linearly scaled hyperbolic tangent (LiSHT): 'x * tanh(x)' See [LiSHT: Non-Parameteric Linearly Scaled Hyperbolic Tangent Activation Function for Neural Networks](https://arxiv.org/abs/1901.05894).

Value

A 'Tensor'. Has the same type as 'x'.

Examples


## Not run: 
library(keras)
library(tfaddons)
model = keras_model_sequential() %>%
layer_conv_2d(filters = 10, kernel_size = c(3,3),input_shape = c(28,28,1),
              activation = activation_lisht)

## End(Not run)

Mish

Description

Mish: A Self Regularized Non-Monotonic Neural Activation Function.

Usage

activation_mish(x)

Arguments

x

A 'Tensor'. Must be one of the following types: 'float16', 'float32', 'float64'. Returns: A 'Tensor'. Has the same type as 'x'.

Details

Computes mish activation: x * tanh(softplus(x)) See [Mish: A Self Regularized Non-Monotonic Neural Activation Function](https://arxiv.org/abs/1908.08681).

Value

A 'Tensor'. Has the same type as 'x'.

Rrelu

Description

rrelu function.

Usage

activation_rrelu(
  x,
  lower = 0.125,
  upper = 0.333333333333333,
  training = NULL,
  seed = NULL
)

Arguments

x

A 'Tensor'. Must be one of the following types: 'float16', 'float32', 'float64'.

lower

'float', lower bound for random alpha.

upper

'float', upper bound for random alpha.

training

'bool', indicating whether the 'call' is meant for training or inference.

seed

'int', this sets the operation-level seed. Returns:

Details

Computes rrelu function: 'x if x > 0 else random(lower, upper) * x' or 'x if x > 0 else x * (lower + upper) / 2' depending on whether training is enabled. See [Empirical Evaluation of Rectified Activations in Convolutional Network](https://arxiv.org/abs/1505.00853).

Value

A 'Tensor'. Has the same type as 'x'.

Computes rrelu function

'x if x > 0 else random(lower, upper) * x' or 'x if x > 0 else x * (lower + upper) / 2' depending on whether training is enabled.

Softshrink

Description

Soft shrink function.

Usage

activation_softshrink(x, lower = -0.5, upper = 0.5)

Arguments

x

A 'Tensor'. Must be one of the following types: 'float16', 'float32', 'float64'.

lower

'float', lower bound for setting values to zeros.

upper

'float', upper bound for setting values to zeros. Returns: A 'Tensor'. Has the same type as 'x'.

Details

Computes soft shrink function: 'x - lower if x < lower, x - upper if x > upper else 0'.

Value

A 'Tensor'. Has the same type as 'x'.

Computes soft shrink function

'x - lower if x < lower, x - upper if x > upper else 0'.

Sparsemax

Description

Sparsemax activation function [1].

Usage

activation_sparsemax(logits, axis = -1L)

Arguments

logits

Input tensor.

axis

Integer, axis along which the sparsemax operation is applied.

Details

For each batch 'i' and class 'j' we have $$sparsemax[i, j] = max(logits[i, j] - tau(logits[i, :]), 0)$$ [1]: https://arxiv.org/abs/1602.02068

Value

Tensor, output of sparsemax transformation. Has the same type and shape as 'logits'. Raises: ValueError: In case 'dim(logits) == 1'.

Raises

ValueError: In case 'dim(logits) == 1'.

Tanhshrink

Description

Applies the element-wise function: x - tanh(x)

Usage

activation_tanhshrink(x)

Arguments

x

A 'Tensor'. Must be one of the following types: 'float16', 'float32', 'float64'.

Value

A 'Tensor'. Has the same type as 'features'.

Bahdanau Attention

Description

Implements Bahdanau-style (additive) attention

Usage

attention_bahdanau(
  object,
  units,
  memory = NULL,
  memory_sequence_length = NULL,
  normalize = FALSE,
  probability_fn = "softmax",
  kernel_initializer = "glorot_uniform",
  dtype = NULL,
  name = "BahdanauAttention",
  ...
)

Arguments

object

Model or layer object

units

The depth of the query mechanism.

memory

The memory to query; usually the output of an RNN encoder. This tensor should be shaped [batch_size, max_time, ...].

memory_sequence_length

(optional): Sequence lengths for the batch entries in memory. If provided, the memory tensor rows are masked with zeros for values past the respective sequence lengths.

normalize

boolean. Whether to normalize the energy term.

probability_fn

(optional) string, the name of function to convert the attention score to probabilities. The default is softmax which is tf.nn.softmax. Other options is hardmax, which is hardmax() within this module. Any other value will result into validation error. Default to use softmax.

kernel_initializer

(optional), the name of the initializer for the attention kernel.

dtype

The data type for the query and memory layers of the attention mechanism.

name

Name to use when creating ops.

...

A list that contains other common arguments for layer creation.

Details

This attention has two forms. The first is Bahdanau attention, as described in: Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio. "Neural Machine Translation by Jointly Learning to Align and Translate." ICLR 2015. https://arxiv.org/abs/1409.0473 The second is the normalized form. This form is inspired by the weight normalization article: Tim Salimans, Diederik P. Kingma. "Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks." https://arxiv.org/abs/1602.07868 To enable the second form, construct the object with parameter 'normalize=TRUE'.

Value

None

Bahdanau Monotonic Attention

Description

Monotonic attention mechanism with Bahadanau-style energy function.

Usage

attention_bahdanau_monotonic(
  object,
  units,
  memory = NULL,
  memory_sequence_length = NULL,
  normalize = FALSE,
  sigmoid_noise = 0,
  sigmoid_noise_seed = NULL,
  score_bias_init = 0,
  mode = "parallel",
  kernel_initializer = "glorot_uniform",
  dtype = NULL,
  name = "BahdanauMonotonicAttention",
  ...
)

Arguments

object

Model or layer object

units

The depth of the query mechanism.

memory

The memory to query; usually the output of an RNN encoder. This tensor should be shaped [batch_size, max_time, ...].

memory_sequence_length

(optional): Sequence lengths for the batch entries in memory. If provided, the memory tensor rows are masked with zeros for values past the respective sequence lengths.

normalize

Python boolean. Whether to normalize the energy term.

sigmoid_noise

Standard deviation of pre-sigmoid noise. See the docstring for '_monotonic_probability_fn' for more information.

sigmoid_noise_seed

(optional) Random seed for pre-sigmoid noise.

score_bias_init

Initial value for score bias scalar. It's recommended to initialize this to a negative value when the length of the memory is large.

mode

How to compute the attention distribution. Must be one of 'recursive', 'parallel', or 'hard'. See the docstring for tfa.seq2seq.monotonic_attention for more information.

kernel_initializer

(optional), the name of the initializer for the attention kernel.

dtype

The data type for the query and memory layers of the attention mechanism.

name

Name to use when creating ops.

...

A list that contains other common arguments for layer creation.

Details

This type of attention enforces a monotonic constraint on the attention distributions; that is once the model attends to a given point in the memory it can't attend to any prior points at subsequence output timesteps. It achieves this by using the _monotonic_probability_fn instead of softmax to construct its attention distributions. Since the attention scores are passed through a sigmoid, a learnable scalar bias parameter is applied after the score function and before the sigmoid. Otherwise, it is equivalent to BahdanauAttention. This approach is proposed in

Colin Raffel, Minh-Thang Luong, Peter J. Liu, Ron J. Weiss, Douglas Eck, "Online and Linear-Time Attention by Enforcing Monotonic Alignments." ICML 2017. https://arxiv.org/abs/1704.00784

Value

None

Implements Luong-style (multiplicative) attention scoring.

Description

Implements Luong-style (multiplicative) attention scoring.

Usage

attention_luong(
  object,
  units,
  memory = NULL,
  memory_sequence_length = NULL,
  scale = FALSE,
  probability_fn = "softmax",
  dtype = NULL,
  name = "LuongAttention",
  ...
)

Arguments

object

Model or layer object

units

The depth of the attention mechanism.

memory

The memory to query; usually the output of an RNN encoder. This tensor should be shaped [batch_size, max_time, ...].

memory_sequence_length

(optional): Sequence lengths for the batch entries in memory. If provided, the memory tensor rows are masked with zeros for values past the respective sequence lengths.

scale

boolean. Whether to scale the energy term.

probability_fn

dtype

The data type for the memory layer of the attention mechanism.

name

Name to use when creating ops.

...

A list that contains other common arguments for layer creation.

Details

This attention has two forms. The first is standard Luong attention, as described in: Minh-Thang Luong, Hieu Pham, Christopher D. Manning. Effective Approaches to Attention-based Neural Machine Translation. EMNLP 2015. The second is the scaled form inspired partly by the normalized form of Bahdanau attention. To enable the second form, construct the object with parameter 'scale=TRUE'.

Value

None

Monotonic attention mechanism with Luong-style energy function.

Description

Monotonic attention mechanism with Luong-style energy function.

Usage

attention_luong_monotonic(
  object,
  units,
  memory = NULL,
  memory_sequence_length = NULL,
  scale = FALSE,
  sigmoid_noise = 0,
  sigmoid_noise_seed = NULL,
  score_bias_init = 0,
  mode = "parallel",
  dtype = NULL,
  name = "LuongMonotonicAttention",
  ...
)

Arguments

object

Model or layer object

units

The depth of the query mechanism.

memory

The memory to query; usually the output of an RNN encoder. This tensor should be shaped [batch_size, max_time, ...].

memory_sequence_length

(optional): Sequence lengths for the batch entries in memory. If provided, the memory tensor rows are masked with zeros for values past the respective sequence lengths.

scale

boolean. Whether to scale the energy term.

sigmoid_noise

Standard deviation of pre-sigmoid noise. See the docstring for '_monotonic_probability_fn' for more information.

sigmoid_noise_seed

(optional) Random seed for pre-sigmoid noise.

score_bias_init

Initial value for score bias scalar. It's recommended to initialize this to a negative value when the length of the memory is large.

mode

How to compute the attention distribution. Must be one of 'recursive', 'parallel', or 'hard'. See the docstring for tfa.seq2seq.monotonic_attention for more information.

dtype

The data type for the query and memory layers of the attention mechanism.

name

Name to use when creating ops.

...

A list that contains other common arguments for layer creation.

Details

This type of attention enforces a monotonic constraint on the attention distributions; that is once the model attends to a given point in the memory it can't attend to any prior points at subsequence output timesteps. It achieves this by using the _monotonic_probability_fn instead of softmax to construct its attention distributions. Otherwise, it is equivalent to LuongAttention. This approach is proposed in [Colin Raffel, Minh-Thang Luong, Peter J. Liu, Ron J. Weiss, Douglas Eck, "Online and Linear-Time Attention by Enforcing Monotonic Alignments." ICML 2017.](https://arxiv.org/abs/1704.00784)

Value

None

Monotonic attention

Description

Compute monotonic attention distribution from choosing probabilities.

Usage

attention_monotonic(p_choose_i, previous_attention, mode)

Arguments

p_choose_i

Probability of choosing input sequence/memory element i. Should be of shape (batch_size, input_sequence_length), and should all be in the range [0, 1].

previous_attention

The attention distribution from the previous output timestep. Should be of shape (batch_size, input_sequence_length). For the first output timestep, preevious_attention[n] should be [1, 0, 0, ..., 0] for all n in [0, ... batch_size - 1].

mode

How to compute the attention distribution. Must be one of 'recursive', 'parallel', or 'hard'. 'recursive' uses tf$scan to recursively compute the distribution. This is slowest but is exact, general, and does not suffer from numerical instabilities. 'parallel' uses parallelized cumulative-sum and cumulative-product operations to compute a closed-form solution to the recurrence relation defining the attention distribution. This makes it more efficient than 'recursive', but it requires numerical checks which make the distribution non-exact. This can be a problem in particular when input_sequence_length is long and/or p_choose_i has entries very close to 0 or 1. * 'hard' requires that the probabilities in p_choose_i are all either 0 or 1, and subsequently uses a more efficient and exact solution.

Details

Monotonic attention implies that the input sequence is processed in an explicitly left-to-right manner when generating the output sequence. In addition, once an input sequence element is attended to at a given output timestep, elements occurring before it cannot be attended to at subsequent output timesteps. This function generates attention distributions according to these assumptions. For more information, see 'Online and Linear-Time Attention by Enforcing Monotonic Alignments'.

Value

A tensor of shape (batch_size, input_sequence_length) representing the attention distributions for each sequence in the batch.

Raises

ValueError: mode is not one of 'recursive', 'parallel', 'hard'.

Attention Wrapper

Description

Attention Wrapper

Usage

attention_wrapper(
  object,
  cell,
  attention_mechanism,
  attention_layer_size = NULL,
  alignment_history = FALSE,
  cell_input_fn = NULL,
  output_attention = TRUE,
  initial_cell_state = NULL,
  name = NULL,
  attention_layer = NULL,
  attention_fn = NULL,
  ...
)

Arguments

object

Model or layer object

cell

An instance of RNNCell.

attention_mechanism

A list of AttentionMechanism instances or a single instance.

attention_layer_size

A list of Python integers or a single Python integer, the depth of the attention (output) layer(s). If 'NULL' (default), use the context as attention at each time step. Otherwise, feed the context and cell output into the attention layer to generate attention at each time step. If attention_mechanism is a list, attention_layer_size must be a list of the same length. If attention_layer is set, this must be 'NULL'. If attention_fn is set, it must guaranteed that the outputs of 'attention_fn' also meet the above requirements.

alignment_history

Python boolean, whether to store alignment history from all time steps in the final output state (currently stored as a time major TensorArray on which you must call stack()).

cell_input_fn

(optional) A callable. The default is: lambda inputs, attention: tf$concat(list(inputs, attention), -1).

output_attention

Python bool. If True (default), the output at each time step is the attention value. This is the behavior of Luong-style attention mechanisms. If FALSE, the output at each time step is the output of cell. This is the behavior of Bhadanau-style attention mechanisms. In both cases, the attention tensor is propagated to the next time step via the state and is used there. This flag only controls whether the attention mechanism is propagated up to the next cell in an RNN stack or to the top RNN output.

initial_cell_state

The initial state value to use for the cell when the user calls get_initial_state(). Note that if this value is provided now, and the user uses a batch_size argument of get_initial_state which does not match the batch size of initial_cell_state, proper behavior is not guaranteed.

name

Name to use when creating ops.

attention_layer

A list of tf$keras$layers$Layer instances or a single tf$keras$layers$Layer instance taking the context and cell output as inputs to generate attention at each time step. If 'NULL' (default), use the context as attention at each time step. If attention_mechanism is a list, attention_layer must be a list of the same length. If attention_layers_size is set, this must be 'NULL'.

attention_fn

An optional callable function that allows users to provide their own customized attention function, which takes input (attention_mechanism, cell_output, attention_state, attention_layer) and outputs (attention, alignments, next_attention_state). If provided, the attention_layer_size should be the size of the outputs of attention_fn.

...

Other keyword arguments to pass

Value

None

Note

If you are using the 'decoder_beam_search' with a cell wrapped in 'AttentionWrapper', then you must ensure that: - The encoder output has been tiled to 'beam_width' via 'tile_batch' (NOT 'tf$tile'). - The 'batch_size' argument passed to the 'get_initial_state' method of this wrapper is equal to 'true_batch_size * beam_width'. - The initial state created with 'get_initial_state' above contains a 'cell_state' value containing properly tiled final state from the encoder.

Attention Wrapper State

Description

'namedlist' storing the state of a 'attention_wrapper'.

Usage

attention_wrapper_state(
  object,
  cell_state,
  attention,
  alignments,
  alignment_history,
  attention_state
)

Arguments

object

Model or layer object

cell_state

The state of the wrapped RNNCell at the previous time step.

attention

The attention emitted at the previous time step.

alignments

A single or tuple of Tensor(s) containing the alignments emitted at the previous time step for each attention mechanism.

alignment_history

(if enabled) a single or tuple of TensorArray(s) containing alignment matrices from all time steps for each attention mechanism. Call stack() on each to convert to a Tensor.

attention_state

A single or tuple of nested objects containing attention mechanism state for each attention mechanism. The objects may contain Tensors or TensorArrays.

Value

None

Average Model Checkpoint

Description

Save the model after every epoch.

Usage

callback_average_model_checkpoint(
  filepath,
  update_weights,
  monitor = "val_loss",
  verbose = 0,
  save_best_only = FALSE,
  save_weights_only = FALSE,
  mode = "auto",
  save_freq = "epoch",
  ...
)

Arguments

filepath

string, path to save the model file.

update_weights

bool, wheteher to update weights or not

monitor

quantity to monitor.

verbose

verbosity mode, 0 or 1.

save_best_only

if 'save_best_only=TRUE', the latest best model according to the quantity monitored will not be overwritten. If ‘filepath' doesn’t contain formatting options like 'epoch' then 'filepath' will be overwritten by each new better model.

save_weights_only

if TRUE, then only the model's weights will be saved ('model$save_weights(filepath)'), else the full model is saved ('model$save(filepath)').

mode

one of auto, min, max. If 'save_best_only=TRUE', the decision to overwrite the current save file is made based on either the maximization or the minimization of the monitored quantity. For 'val_acc', this should be 'max', for 'val_loss' this should be 'min', etc. In 'auto' mode, the direction is automatically inferred from the name of the monitored quantity.

save_freq

''epoch'‘ or integer. When using '’epoch'', the callback saves the model after each epoch. When using integer, the callback saves the model at end of a batch at which this many samples have been seen since last saving. Note that if the saving isn't aligned to epochs, the monitored metric may potentially be less reliable (it could reflect as little as 1 batch, since the metrics get reset every epoch). Defaults to ''epoch''

...

Additional arguments for backwards compatibility. Possible key is 'period'.

Details

The callback that should be used with optimizers that extend AverageWrapper, i.e., MovingAverage and StochasticAverage optimizers. It saves and, optionally, assigns the averaged weights.

Value

None

For example

if 'filepath' is 'weights.epoch:02d-val_loss:.2f.hdf5',: then the model checkpoints will be saved with the epoch number and the validation loss in the filename.

Time Stopping

Description

Time Stopping

Usage

callback_time_stopping(seconds = 86400, verbose = 0)

Arguments

seconds

maximum amount of time before stopping. Defaults to 86400 (1 day).

verbose

verbosity mode. Defaults to 0.

Details

Stop training when a specified amount of time has passed.

Value

None

Examples


## Not run: 
model %>% fit(
x_train, y_train,
batch_size = 128,
epochs = 4,
validation_split = 0.2,
verbose = 0,
callbacks = callback_time_stopping(seconds = 6, verbose = 1)
)

## End(Not run)

TQDM Progress Bar

Description

TQDM Progress Bar

Usage

callback_tqdm_progress_bar(
  metrics_separator = " - ",
  overall_bar_format = NULL,
  epoch_bar_format = "{n_fmt}/{total_fmt}{bar} ETA: {remaining}s - {desc}",
  update_per_second = 10,
  leave_epoch_progress = TRUE,
  leave_overall_progress = TRUE,
  show_epoch_progress = TRUE,
  show_overall_progress = TRUE
)

Arguments

metrics_separator

(string) Custom separator between metrics. Defaults to ' - '

overall_bar_format

(string format) Custom bar format for overall (outer) progress bar, see https://github.com/tqdm/tqdm#parameters for more detail. By default: 'l_barbar n_fmt/total_fmt ETA: remainings, rate_fmtpostfix'

epoch_bar_format

(string format) Custom bar format for epoch (inner) progress bar, see https://github.com/tqdm/tqdm#parameters for more detail.

update_per_second

(int) Maximum number of updates in the epochs bar per second, this is to prevent small batches from slowing down training. Defaults to 10.

leave_epoch_progress

(bool) TRUE to leave epoch progress bars

leave_overall_progress

(bool) TRUE to leave overall progress bar

show_epoch_progress

(bool) FALSE to hide epoch progress bars

show_overall_progress

(bool) FALSE to hide overall progress bar

Details

TQDM Progress Bar for Tensorflow Keras.

Value

None

Examples


## Not run: 
model %>% fit(
x_train, y_train,
batch_size = 128,
epochs = 4,
validation_split = 0.2,
verbose = 0,
callbacks = callback_tqdm_progress_bar()
)

## End(Not run)

CRF binary score

Description

Computes the binary scores of tag sequences.

Usage

crf_binary_score(tag_indices, sequence_lengths, transition_params)

Arguments

tag_indices

A [batch_size, max_seq_len] matrix of tag indices.

sequence_lengths

A [batch_size] vector of true sequence lengths.

transition_params

A [num_tags, num_tags] matrix of binary potentials.

Value

binary_scores: A [batch_size] vector of binary scores.

CRF decode

Description

Decode the highest scoring sequence of tags.

Usage

crf_decode(potentials, transition_params, sequence_length)

Arguments

potentials

A [batch_size, max_seq_len, num_tags] tensor of unary potentials.

transition_params

A [num_tags, num_tags] matrix of binary potentials.

sequence_length

A [batch_size] vector of true sequence lengths.

Value

decode_tags: A [batch_size, max_seq_len] matrix, with dtype 'tf.int32'. Contains the highest scoring tag indices. best_score: A [batch_size] vector, containing the score of 'decode_tags'.

CRF decode backward

Description

Computes backward decoding in a linear-chain CRF.

Usage

crf_decode_backward(inputs, state)

Arguments

inputs

A [batch_size, num_tags] matrix of backpointer of next step (in time order).

state

A [batch_size, 1] matrix of tag index of next step.

Value

new_tags: A [batch_size, num_tags] tensor containing the new tag indices.

CRF decode forward

Description

Computes forward decoding in a linear-chain CRF.

Usage

crf_decode_forward(inputs, state, transition_params, sequence_lengths)

Arguments

inputs

A [batch_size, num_tags] matrix of unary potentials.

state

A [batch_size, num_tags] matrix containing the previous step's score values.

transition_params

A [num_tags, num_tags] matrix of binary potentials.

sequence_lengths

A [batch_size] vector of true sequence lengths.

Value

backpointers: A [batch_size, num_tags] matrix of backpointers. new_state: A [batch_size, num_tags] matrix of new score values.

CRF forward

Description

Computes the alpha values in a linear-chain CRF.

Usage

crf_forward(inputs, state, transition_params, sequence_lengths)

Arguments

inputs

A [batch_size, num_tags] matrix of unary potentials.

state

A [batch_size, num_tags] matrix containing the previous alpha values.

transition_params

A [num_tags, num_tags] matrix of binary potentials. This matrix is expanded into a [1, num_tags, num_tags] in preparation for the broadcast summation occurring within the cell.

sequence_lengths

A [batch_size] vector of true sequence lengths.

Details

See http://www.cs.columbia.edu/~mcollins/fb.pdf for reference.

Value

new_alphas: A [batch_size, num_tags] matrix containing the new alpha values.

CRF log likelihood

Description

Computes the log-likelihood of tag sequences in a CRF.

Usage

crf_log_likelihood(
  inputs,
  tag_indices,
  sequence_lengths,
  transition_params = NULL
)

Arguments

inputs

A [batch_size, max_seq_len, num_tags] tensor of unary potentials to use as input to the CRF layer.

tag_indices

A [batch_size, max_seq_len] matrix of tag indices for which we compute the log-likelihood.

sequence_lengths

A [batch_size] vector of true sequence lengths.

transition_params

A [num_tags, num_tags] transition matrix, if available.

Value

log_likelihood: A [batch_size] Tensor containing the log-likelihood of each example, given the sequence of tag indices. transition_params: A [num_tags, num_tags] transition matrix. This is either provided by the caller or created in this function.

CRF log norm

Description

Computes the normalization for a CRF.

Usage

crf_log_norm(inputs, sequence_lengths, transition_params)

Arguments

inputs

A [batch_size, max_seq_len, num_tags] tensor of unary potentials to use as input to the CRF layer.

sequence_lengths

A [batch_size] vector of true sequence lengths.

transition_params

A [num_tags, num_tags] transition matrix.

Value

log_norm: A [batch_size] vector of normalizers for a CRF.

CRF multitag sequence score

Description

Computes the unnormalized score of all tag sequences matching

Usage

crf_multitag_sequence_score(
  inputs,
  tag_bitmap,
  sequence_lengths,
  transition_params
)

Arguments

inputs

A [batch_size, max_seq_len, num_tags] tensor of unary potentials to use as input to the CRF layer.

tag_bitmap

A [batch_size, max_seq_len, num_tags] boolean tensor representing all active tags at each index for which to calculate the unnormalized score.

sequence_lengths

A [batch_size] vector of true sequence lengths.

transition_params

A [num_tags, num_tags] transition matrix.

Details

tag_bitmap. tag_bitmap enables more than one tag to be considered correct at each time step. This is useful when an observed output at a given time step is consistent with more than one tag, and thus the log likelihood of that observation must take into account all possible consistent tags. Using one-hot vectors in tag_bitmap gives results identical to crf_sequence_score.

Value

sequence_scores: A [batch_size] vector of unnormalized sequence scores.

CRF sequence score

Description

Computes the unnormalized score for a tag sequence.

Usage

crf_sequence_score(inputs, tag_indices, sequence_lengths, transition_params)

Arguments

inputs

A [batch_size, max_seq_len, num_tags] tensor of unary potentials to use as input to the CRF layer.

tag_indices

A [batch_size, max_seq_len] matrix of tag indices for which we compute the unnormalized score.

sequence_lengths

A [batch_size] vector of true sequence lengths.

transition_params

A [num_tags, num_tags] transition matrix. Returns:

Value

sequence_scores: A [batch_size] vector of unnormalized sequence scores.

CRF unary score

Description

Computes the unary scores of tag sequences.

Usage

crf_unary_score(tag_indices, sequence_lengths, inputs)

Arguments

tag_indices

A [batch_size, max_seq_len] matrix of tag indices.

sequence_lengths

A [batch_size] vector of true sequence lengths.

inputs

A [batch_size, max_seq_len, num_tags] tensor of unary potentials.

Value

unary_scores: A [batch_size] vector of unary scores.

Dynamic decode

Description

Perform dynamic decoding with 'decoder'.

Usage

decode_dynamic(
  decoder,
  output_time_major = FALSE,
  impute_finished = FALSE,
  maximum_iterations = NULL,
  parallel_iterations = 32L,
  swap_memory = FALSE,
  training = NULL,
  scope = NULL,
  ...
)

Arguments

decoder

A 'Decoder' instance.

output_time_major

boolean. Default: 'FALSE' (batch major). If 'TRUE', outputs are returned as time major tensors (this mode is faster). Otherwise, outputs are returned as batch major tensors (this adds extra time to the computation).

impute_finished

boolean. If 'TRUE', then states for batch entries which are marked as finished get copied through and the corresponding outputs get zeroed out. This causes some slowdown at each time step, but ensures that the final state and outputs have the correct values and that backprop ignores time steps that were marked as finished.

maximum_iterations

'int32' scalar, maximum allowed number of decoding steps. Default is 'NULL' (decode until the decoder is fully done).

parallel_iterations

Argument passed to 'tf$while_loop'.

swap_memory

Argument passed to 'tf$while_loop'.

training

boolean. Indicates whether the layer should behave in training mode or in inference mode. Only relevant when 'dropout' or 'recurrent_dropout' is used.

scope

Optional variable scope to use.

...

A list, other keyword arguments for dynamic_decode. It might contain arguments for 'BaseDecoder' to initialize, which takes all tensor inputs during 'call()'.

Details

Calls 'initialize()' once and 'step()' repeatedly on the Decoder object.

Value

'(final_outputs, final_state, final_sequence_lengths)'.

Raises

TypeError: if 'decoder' is not an instance of 'Decoder'. ValueError: if 'maximum_iterations' is provided but is not a scalar.

An RNN Decoder abstract interface object.

Description

An RNN Decoder abstract interface object.

Usage

decoder(...)

Arguments

...

arguments to pass

Details

- inputs: (structure of) tensors and TensorArrays that is passed as input to the RNNCell composing the decoder, at each time step. - state: (structure of) tensors and TensorArrays that is passed to the RNNCell instance as the state. - finished: boolean tensor telling whether each sequence in the batch is finished. - training: boolean whether it should behave in training mode or in inference mode. - outputs: Instance of BasicDecoderOutput. Result of the decoding, at each time step.

Value

None

Base Decoder

Description

An RNN Decoder that is based on a Keras layer.

Usage

decoder_base(object, cell, sampler, output_layer = NULL, ...)

Arguments

object

Model or layer object

cell

An RNNCell instance.

sampler

A Sampler instance.

output_layer

(Optional) An instance of tf$layers$Layer, i.e., tf$layers$Dense. Optional layer to apply to the RNN output prior to storing the result or sampling.

...

Other keyword arguments for layer creation.

Value

None

Basic Decoder

Description

Basic Decoder

Usage

decoder_basic(object, cell, sampler, output_layer = NULL, ...)

Arguments

object

Model or layer object

cell

An RNNCell instance.

sampler

A Sampler instance.

output_layer

(Optional) An instance of tf$layers$Layer, i.e., tf$layers$Dense. Optional layer to apply to the RNN output prior to storing the result or sampling.

...

Other keyword arguments for layer creation.

Value

None

Basic decoder output

Description

Basic decoder output

Usage

decoder_basic_output(rnn_output, sample_id)

Arguments

rnn_output

the output of RNN cell

sample_id

the 'id' of the sample

Value

None

BeamSearch sampling decoder

Description

BeamSearch sampling decoder

Usage

decoder_beam_search(
  object,
  cell,
  beam_width,
  embedding_fn = NULL,
  output_layer = NULL,
  length_penalty_weight = 0,
  coverage_penalty_weight = 0,
  reorder_tensor_arrays = TRUE,
  ...
)

Arguments

object

Model or layer object

cell

An RNNCell instance.

beam_width

integer, the number of beams.

embedding_fn

A callable that takes a vector tensor of ids (argmax ids).

output_layer

(Optional) An instance of tf.keras.layers.Layer, i.e., tf$keras$layers$Dense. Optional layer to apply to the RNN output prior to storing the result or sampling.

length_penalty_weight

Float weight to penalize length. Disabled with 0.0.

coverage_penalty_weight

Float weight to penalize the coverage of source sentence. Disabled with 0.0.

reorder_tensor_arrays

If ‘TRUE', TensorArrays’ elements within the cell state will be reordered according to the beam search path. If the TensorArray can be reordered, the stacked form will be returned. Otherwise, the TensorArray will be returned as is. Set this flag to False if the cell state contains TensorArrays that are not amenable to reordering.

...

A list, other keyword arguments for initialization.

Value

None

Note

If you are using the 'BeamSearchDecoder' with a cell wrapped in 'AttentionWrapper', then you must ensure that: - The encoder output has been tiled to 'beam_width' via 'tile_batch()' (NOT 'tf$tile'). - The 'batch_size' argument passed to the 'get_initial_state' method of this wrapper is equal to 'true_batch_size * beam_width'. - The initial state created with 'get_initial_state' above contains a 'cell_state' value containing properly tiled final state from the encoder.

Beam Search Decoder Output

Description

Beam Search Decoder Output

Usage

decoder_beam_search_output(scores, predicted_ids, parent_ids)

Arguments

scores

calculate the scores for each beam

predicted_ids

The final prediction. A tensor of shape '[batch_size, T, beam_width]' (or '[T, batch_size, beam_width]' if 'output_time_major' is 'TRUE'). Beams are ordered from best to worst.

parent_ids

The parent ids of shape '[max_time, batch_size, beam_width]'.

Value

None

Beam Search Decoder State

Description

Beam Search Decoder State

Usage

decoder_beam_search_state(
  cell_state,
  log_probs,
  finished,
  lengths,
  accumulated_attention_probs
)

Arguments

cell_state

cell_state

log_probs

log_probs

finished

finished

lengths

lengths

accumulated_attention_probs

accumulated_attention_probs

Value

None

Final Beam Search Decoder Output

Description

Final outputs returned by the beam search after all decoding is finished.

Usage

decoder_final_beam_search_output(predicted_ids, beam_search_decoder_output)

Arguments

predicted_ids

The final prediction. A tensor of shape '[batch_size, T, beam_width]' (or '[T, batch_size, beam_width]' if 'output_time_major' is TRUE). Beams are ordered from best to worst.

beam_search_decoder_output

An instance of 'BeamSearchDecoderOutput' that describes the state of the beam search.

Value

None

Factory function returning an optimizer class with decoupled weight decay

Description

Factory function returning an optimizer class with decoupled weight decay

Usage

extend_with_decoupled_weight_decay(base_optimizer)

Arguments

base_optimizer

An optimizer class that inherits from tf$optimizers$Optimizer.

Details

The API of the new optimizer class slightly differs from the API of the base optimizer:

- The first argument to the constructor is the weight decay rate. - minimize and apply_gradients accept the optional keyword argument decay_var_list, which specifies the variables that should be decayed. If NULLs, all variables that are optimized are decayed.

Value

A new optimizer class that inherits from DecoupledWeightDecayExtension and base_optimizer.

Note

Note: this extension decays weights BEFORE applying the update based on the gradient, i.e. this extension only has the desired behaviour for optimizers which do not depend on the value of 'var' in the update step! Note: when applying a decay to the learning rate, be sure to manually apply the decay to the 'weight_decay' as well.

Examples


## Not run: 

### MyAdamW is a new class
MyAdamW = extend_with_decoupled_weight_decay(tf$keras$optimizers$Adam)
### Create a MyAdamW object
optimizer = MyAdamW(weight_decay = 0.001, learning_rate = 0.001)
#### update var1, var2 but only decay var1
optimizer$minimize(loss, var_list = list(var1, var2), decay_variables = list(var1))


## End(Not run)

Gather tree

Description

Gather tree

Usage

gather_tree(step_ids, parent_ids, max_sequence_lengths, end_token)

Arguments

step_ids

requires the step id

parent_ids

The parent ids of shape '[max_time, batch_size, beam_width]'.

max_sequence_lengths

get max_sequence_length across all beams for each batch.

end_token

'int32' scalar, the token that marks end of decoding.

Value

None

Gather tree from array

Description

Calculates the full beams for 'TensorArray's.

Usage

gather_tree_from_array(t, parent_ids, sequence_length)

Arguments

t

A stacked 'TensorArray' of size 'max_time' that contains 'Tensor's of shape '[batch_size, beam_width, s]' or '[batch_size * beam_width, s]' where 's' is the depth shape.

parent_ids

The parent ids of shape '[max_time, batch_size, beam_width]'.

sequence_length

The sequence length of shape '[batch_size, beam_width]'.

Value

A 'Tensor' which is a stacked 'TensorArray' of the same size and type as 't' and where beams are sorted in each 'Tensor' according to 'parent_ids'.

Hardmax

Description

Returns batched one-hot vectors.

Usage

hardmax(logits, name = NULL)

Arguments

logits

A batch tensor of logit values.

name

Name to use when creating ops.

Details

The depth index containing the '1' is that of the maximum logit value.

Value

A batched one-hot tensor.

Adjust hsv in yiq

Description

Adjust hue, saturation, value of an RGB image in YIQ color space.

Usage

img_adjust_hsv_in_yiq(
  image,
  delta_hue = 0,
  scale_saturation = 1,
  scale_value = 1,
  name = NULL
)

Arguments

image

RGB image or images. Size of the last dimension must be 3.

delta_hue

float, the hue rotation amount, in radians.

scale_saturation

float, factor to multiply the saturation by.

scale_value

float, factor to multiply the value by.

name

A name for this operation (optional).

Details

This is a convenience method that converts an RGB image to float representation, converts it to YIQ, rotates the color around the Y channel by delta_hue in radians, scales the chrominance channels (I, Q) by scale_saturation, scales all channels (Y, I, Q) by scale_value, converts back to RGB, and then back to the original data type. 'image' is an RGB image. The image hue is adjusted by converting the image to YIQ, rotating around the luminance channel (Y) by 'delta_hue' in radians, multiplying the chrominance channels (I, Q) by 'scale_saturation', and multiplying all channels (Y, I, Q) by 'scale_value'. The image is then converted back to RGB.

Value

Adjusted image(s), same shape and dtype as 'image'.

Angles to projective transforms

Description

Returns projective transform(s) for the given angle(s).

Usage

img_angles_to_projective_transforms(
  angles,
  image_height,
  image_width,
  name = NULL
)

Arguments

angles

A scalar angle to rotate all images by, or (for batches of images) a vector with an angle to rotate each image in the batch. The rank must be statically known (the shape is not 'TensorShape(NULL)'.

image_height

Height of the image(s) to be transformed.

image_width

Width of the image(s) to be transformed.

name

name of the op.

Value

A tensor of shape (num_images, 8). Projective transforms which can be given to 'transform' op.

Blend

Description

Blend image1 and image2 using 'factor'.

Usage

img_blend(image1, image2, factor)

Arguments

image1

An image Tensor of shape (num_rows, num_columns, num_channels) (HWC), or (num_rows, num_columns) (HW), or (num_channels, num_rows, num_columns).

image2

An image Tensor of shape (num_rows, num_columns, num_channels) (HWC), or (num_rows, num_columns) (HW), or (num_channels, num_rows, num_columns).

factor

A floating point value or Tensor of type tf.float32 above 0.0.

Details

Factor can be above 0.0. A value of 0.0 means only image1 is used. A value of 1.0 means only image2 is used. A value between 0.0 and 1.0 means we linearly interpolate the pixel values between the two images. A value greater than 1.0 "extrapolates" the difference between the two pixel values, and we clip the results to values between 0 and 255.

Value

A blended image Tensor of tf$float32.

Compose transforms

Description

Composes the transforms tensors.

Usage

img_compose_transforms(transforms, name = NULL)

Arguments

transforms

List of image projective transforms to be composed. Each transform is length 8 (single transform) or shape (N, 8) (batched transforms). The shapes of all inputs must be equal, and at least one input must be given.

name

The name for the op.

Value

A composed transform tensor. When passed to 'transform' op, equivalent to applying each of the given transforms to the image in order.

Connected components

Description

Labels the connected components in a batch of images.

Usage

img_connected_components(images, name = NULL)

Arguments

images

A 2D (H, W) or 3D (N, H, W) Tensor of image (integer, floating point and boolean types are supported).

name

The name of the op.

Details

A component is a set of pixels in a single input image, which are all adjacent and all have the same non-zero value. The components using a squared connectivity of one (all equal entries are joined with their neighbors above,below, left, and right). Components across all images have consecutive ids 1 through n. Components are labeled according to the first pixel of the component appearing in row-major order (lexicographic order by image_index_in_batch, row, col). Zero entries all have an output id of 0. This op is equivalent with 'scipy.ndimage.measurements.label' on a 2D array with the default structuring element (which is the connectivity used here).

Value

Components with the same shape as 'images'. entries that evaluate to FALSE (e.g. 0/0.0f, FALSE) in 'images' have value 0, and all other entries map to a component id > 0.

Raises

TypeError: if 'images' is not 2D or 3D.

Cutout

Description

Apply cutout (https://arxiv.org/abs/1708.04552) to images.

Usage

img_cutout(
  images,
  mask_size,
  offset = list(0, 0),
  constant_values = 0,
  data_format = "channels_last"
)

Arguments

images

A tensor of shape (batch_size, height, width, channels) (NHWC), (batch_size, channels, height, width)(NCHW).

mask_size

Specifies how big the zero mask that will be generated is that is applied to the images. The mask will be of size (mask_height x mask_width). Note: mask_size should be divisible by 2.

offset

A list of (height, width) or (batch_size, 2)

constant_values

What pixel value to fill in the images in the area that has the cutout mask applied to it.

data_format

A string, one of 'channels_last' (default) or 'channels_first'. The ordering of the dimensions in the inputs. 'channels_last' corresponds to inputs with shape '(batch_size, ..., channels)' while 'channels_first' corresponds to inputs with shape '(batch_size, channels, ...)'.

Details

This operation applies a (mask_height x mask_width) mask of zeros to a location within 'img' specified by the offset. The pixel values filled in will be of the value 'replace'. The located where the mask will be applied is randomly chosen uniformly over the whole images.

Value

An image Tensor.

Raises

InvalidArgumentError: if mask_size can't be divisible by 2.

Dense image warp

Description

Image warping using per-pixel flow vectors.

Usage

img_dense_image_warp(image, flow, name = NULL)

Arguments

image

4-D float Tensor with shape [batch, height, width, channels].

flow

A 4-D float Tensor with shape [batch, height, width, 2].

name

A name for the operation (optional).

Details

Apply a non-linear warp to the image, where the warp is specified by a dense flow field of offset vectors that define the correspondences of pixel values in the output image back to locations in the source image. Specifically, the pixel value at output[b, j, i, c] is images[b, j - flow[b, j, i, 0], i - flow[b, j, i, 1], c]. The locations specified by this formula do not necessarily map to an int index. Therefore, the pixel value is obtained by bilinear interpolation of the 4 nearest pixels around (b, j - flow[b, j, i, 0], i - flow[b, j, i, 1]). For locations outside of the image, we use the nearest pixel values at the image boundary.

Value

A 4-D float 'Tensor' with shape'[batch, height, width, channels]' and same type as input image.

Raises

ValueError: if height < 2 or width < 2 or the inputs have the wrong number of dimensions.

Note

Note that image and flow can be of type tf$half, tf$float32, or tf$float64, and do not necessarily have to be the same type.

Examples


## Not run: 
flow_shape = list(1L, as.integer(input_img$shape[[2]]), as.integer(input_img$shape[[3]]), 2L)
init_flows = tf$random$normal(flow_shape) * 2.0
dense_img_warp = img_dense_image_warp(input_img, init_flows)
dense_img_warp = tf$squeeze(dense_img_warp, 0)

## End(Not run)

Equalize

Description

Equalize image(s)

Usage

img_equalize(image, data_format = "channels_last", name = NULL)

Arguments

image

A tensor of shape (num_images, num_rows, num_columns, num_channels) (NHWC), or (num_images, num_channels, num_rows, num_columns) (NCHW), or (num_rows, num_columns, num_channels) (HWC), or (num_channels, num_rows, num_columns) (CHW), or (num_rows, num_columns) (HW). The rank must be statically known (the shape is not TensorShape(None)).

data_format

Either 'channels_first' or 'channels_last'

name

The name of the op. Returns: Image(s) with the same type and shape as 'images', equalized.

Value

Image(s) with the same type and shape as 'images', equalized.

Examples


## Not run: 
img_equalize(img)

## End(Not run)

Euclidean dist transform

Description

Applies euclidean distance transform(s) to the image(s).

Usage

img_euclidean_dist_transform(images, dtype = tf$float32, name = NULL)

Arguments

images

A tensor of shape (num_images, num_rows, num_columns, 1) (NHWC), or (num_rows, num_columns, 1) (HWC) or (num_rows, num_columns) (HW).

dtype

DType of the output tensor.

name

The name of the op.

Value

Image(s) with the type 'dtype' and same shape as 'images', with the transform applied. If a tensor of all ones is given as input, the output tensor will be filled with the max value of the 'dtype'.

Raises

TypeError: If 'image' is not tf.uint8, or 'dtype' is not floating point. ValueError: If 'image' more than one channel, or 'image' is not of rank between 2 and 4.

Examples


## Not run: 
img_path = tf$keras$utils$get_file('tensorflow.png','https://tensorflow.org/images/tf_logo.png')
img_raw = tf$io$read_file(img_path)
img = tf$io$decode_png(img_raw)
img = tf$image$convert_image_dtype(img, tf$float32)
img = tf$image$resize(img, c(500L,500L))
bw_img = 1.0 - tf$image$rgb_to_grayscale(img)
gray = tf$image$convert_image_dtype(bw_img,tf$uint8)
gray = tf$expand_dims(gray, 0L)
eucid = img_euclidean_dist_transform(gray)
eucid = tf$squeeze(eucid, c(0,-1))

## End(Not run)

Flat transforms to matrices

Description

Converts projective transforms to affine matrices.

Usage

img_flat_transforms_to_matrices(transforms, name = NULL)

Arguments

transforms

Vector of length 8, or batches of transforms with shape '(N, 8)'.

name

The name for the op.

Details

Note that the output matrices map output coordinates to input coordinates. For the forward transformation matrix, call 'tf$linalg$inv' on the result.

Value

3D tensor of matrices with shape '(N, 3, 3)'. The output matrices map the *output coordinates* (in homogeneous coordinates) of each transform to the corresponding *input coordinates*.

Raises

ValueError: If 'transforms' have an invalid shape.

From 4D image

Description

Convert back to an image with 'ndims' rank.

Usage

img_from_4D(image, ndims)

Arguments

image

4D tensor.

ndims

The original rank of the image.

Value

'ndims'-D tensor with the same type.

Get ndims

Description

Print dimensions

Usage

img_get_ndims(image)

Arguments

image

image

Value

dimensions of the image

Interpolate bilinear

Description

Similar to Matlab's interp2 function.

Usage

img_interpolate_bilinear(grid, query_points, indexing = "ij", name = NULL)

Arguments

grid

a 4-D float Tensor of shape [batch, height, width, channels].

query_points

a 3-D float Tensor of N points with shape [batch, N, 2].

indexing

whether the query points are specified as row and column (ij), or Cartesian coordinates (xy).

name

a name for the operation (optional).

Details

Finds values for query points on a grid using bilinear interpolation.

Value

values: a 3-D 'Tensor' with shape '[batch, N, channels]'

Raises

ValueError: if the indexing mode is invalid, or if the shape of the inputs invalid.

Interpolate spline

Description

Interpolate signal using polyharmonic interpolation.

Usage

img_interpolate_spline(
  train_points,
  train_values,
  query_points,
  order,
  regularization_weight = 0,
  name = "interpolate_spline"
)

Arguments

train_points

'[batch_size, n, d]' float 'Tensor' of n d-dimensional locations. These do not need to be regularly-spaced.

train_values

'[batch_size, n, k]' float 'Tensor' of n c-dimensional values evaluated at train_points.

query_points

'[batch_size, m, d]' 'Tensor' of m d-dimensional locations where we will output the interpolant's values.

order

order of the interpolation. Common values are 1 for '$\phi(r) = r$, 2 for $\phi(r) = r^2 * log(r)$ (thin-plate spline), or 3 for $\phi(r) = r^3$'.

regularization_weight

weight placed on the regularization term. This will depend substantially on the problem, and it should always be tuned. For many problems, it is reasonable to use no regularization. If using a non-zero value, we recommend a small value like 0.001.

name

name prefix for ops created by this function

Details

The interpolant has the form f(x) = '\sum_i = 1^n w_i \phi(||x - c_i||) + v^T x + b'. This is a sum of two terms: (1) a weighted sum of radial basis function (RBF) terms, with the centers $c_1, ... c_n$, and (2) a linear term with a bias. The $c_i$ vectors are 'training' points. In the code, b is absorbed into v by appending 1 as a final dimension to x. The coefficients w and v are estimated such that the interpolant exactly fits the value of the function at the $c_i$ points, the vector w is orthogonal to each $c_i$, and the vector w sums to 0. With these constraints, the coefficients can be obtained by solving a linear system. '$\phi$' is an RBF, parametrized by an interpolation order. Using order=2 produces the well-known thin-plate spline. We also provide the option to perform regularized interpolation. Here, the interpolant is selected to trade off between the squared loss on the training data and a certain measure of its curvature ([details](https://en.wikipedia.org/wiki/Polyharmonic_spline)). Using a regularization weight greater than zero has the effect that the interpolant will no longer exactly fit the training data. However, it may be less vulnerable to overfitting, particularly for high-order interpolation. Note the interpolation procedure is differentiable with respect to all inputs besides the order parameter. We support dynamically-shaped inputs, where batch_size, n, and m are NULL at graph construction time. However, d and k must be known.

Value

'[b, m, k]' float 'Tensor' of query values. We use train_points and train_values to perform polyharmonic interpolation. The query values are the values of the interpolant evaluated at the locations specified in query_points.

This is a sum of two terms

(1) a weighted sum of radial basis function: (RBF) terms, with the centers $c_1, ... c_n$, and (2) a linear term with a bias. The $c_i$ vectors are 'training' points. In the code, b is absorbed into v by appending 1 as a final dimension to x. The coefficients w and v are estimated such that the interpolant exactly fits the value of the function at the $c_i$ points, the vector w is orthogonal to each $c_i$, and the vector w sums to 0. With these constraints, the coefficients can be obtained by solving a linear system.

Matrices to flat transforms

Description

Converts affine matrices to projective transforms.

Usage

img_matrices_to_flat_transforms(transform_matrices, name = NULL)

Arguments

transform_matrices

One or more affine transformation matrices, for the reverse transformation in homogeneous coordinates. Shape 'c(3, 3)' or 'c(N, 3, 3)'.

name

The name for the op.

Details

Note that we expect matrices that map output coordinates to input coordinates. To convert forward transformation matrices, call 'tf$linalg$inv' on the matrices and use the result here.

Value

2D tensor of flat transforms with shape '(N, 8)', which may be passed into 'transform' op.

Raises

ValueError: If 'transform_matrices' have an invalid shape.

Mean filter2d

Description

Perform mean filtering on image(s).

Usage

img_mean_filter2d(
  image,
  filter_shape = list(3, 3),
  padding = "REFLECT",
  constant_values = 0,
  name = NULL
)

Arguments

image

Either a 2-D Tensor of shape [height, width], a 3-D Tensor of shape [height, width, channels], or a 4-D Tensor of shape [batch_size, height, width, channels].

filter_shape

An integer or tuple/list of 2 integers, specifying the height and width of the 2-D mean filter. Can be a single integer to specify the same value for all spatial dimensions.

padding

A string, one of "REFLECT", "CONSTANT", or "SYMMETRIC". The type of padding algorithm to use, which is compatible with mode argument in tf.pad. For more details, please refer to https://www.tensorflow.org/api_docs/python/tf/pad.

constant_values

A scalar, the pad value to use in "CONSTANT" padding mode.

name

A name for this operation (optional).

Value

3-D or 4-D 'Tensor' of the same dtype as input.

Raises

ValueError: If 'image' is not 2, 3 or 4-dimensional, if 'padding' is other than "REFLECT", "CONSTANT" or "SYMMETRIC", or if 'filter_shape' is invalid.

Median filter2d

Description

Perform median filtering on image(s).

Usage

img_median_filter2d(
  image,
  filter_shape = list(3, 3),
  padding = "REFLECT",
  constant_values = 0,
  name = NULL
)

Arguments

image

Either a 2-D Tensor of shape [height, width], a 3-D Tensor of shape [height, width, channels], or a 4-D Tensor of shape [batch_size, height, width, channels].

filter_shape

An integer or tuple/list of 2 integers, specifying the height and width of the 2-D median filter. Can be a single integer to specify the same value for all spatial dimensions.

padding

constant_values

A scalar, the pad value to use in "CONSTANT" padding mode.

name

A name for this operation (optional)

Value

3-D or 4-D 'Tensor' of the same dtype as input.

Raises

ValueError: If 'image' is not 2, 3 or 4-dimensional, if 'padding' is other than "REFLECT", "CONSTANT" or "SYMMETRIC", or if 'filter_shape' is invalid.

Random cutout

Description

Apply cutout (https://arxiv.org/abs/1708.04552) to images.

Usage

img_random_cutout(
  images,
  mask_size,
  constant_values = 0,
  seed = NULL,
  data_format = "channels_last"
)

Arguments

images

A tensor of shape (batch_size, height, width, channels) (NHWC), (batch_size, channels, height, width)(NCHW).

mask_size

Specifies how big the zero mask that will be generated is that is applied to the images. The mask will be of size (mask_height x mask_width). Note: mask_size should be divisible by 2.

constant_values

What pixel value to fill in the images in the area that has the cutout mask applied to it.

seed

An integer. Used in combination with 'tf$random$set_seed' to create a reproducible sequence of tensors across multiple calls.

data_format

Details

This operation applies a (mask_height x mask_width) mask of zeros to a random location within 'img'. The pixel values filled in will be of the value 'replace'. The located where the mask will be applied is randomly chosen uniformly over the whole images.

Value

An image Tensor.

Raises

InvalidArgumentError: if mask_size can't be divisible by 2.

Random hsv in yiq

Description

Adjust hue, saturation, value of an RGB image randomly in YIQ color

Usage

img_random_hsv_in_yiq(
  image,
  max_delta_hue = 0,
  lower_saturation = 1,
  upper_saturation = 1,
  lower_value = 1,
  upper_value = 1,
  seed = NULL,
  name = NULL
)

Arguments

image

RGB image or images. Size of the last dimension must be 3.

max_delta_hue

float. Maximum value for the random delta_hue. Passing 0 disables adjusting hue.

lower_saturation

float. Lower bound for the random scale_saturation.

upper_saturation

float. Upper bound for the random scale_saturation.

lower_value

float. Lower bound for the random scale_value.

upper_value

float. Upper bound for the random scale_value.

seed

An operation-specific seed. It will be used in conjunction with the graph-level seed to determine the real seeds that will be used in this operation. Please see the documentation of set_random_seed for its interaction with the graph-level random seed.

name

A name for this operation (optional).

Details

space. Equivalent to 'adjust_yiq_hsv()' but uses a 'delta_h' randomly picked in the interval '[-max_delta_hue, max_delta_hue]', a 'scale_saturation' randomly picked in the interval '[lower_saturation, upper_saturation]', and a 'scale_value' randomly picked in the interval '[lower_saturation, upper_saturation]'.

Value

3-D float tensor of shape '[height, width, channels]'.

Raises

ValueError: if 'max_delta', 'lower_saturation', 'upper_saturation', 'lower_value', or 'upper_value' is invalid.

Examples


## Not run: 
delta = 0.5
lower_saturation = 0.1
upper_saturation = 0.9
lower_value = 0.2
upper_value = 0.8
rand_hsvinyiq = img_random_hsv_in_yiq(img, delta,
lower_saturation, upper_saturation,
lower_value, upper_value)
)

## End(Not run)

Resampler

Description

Resamples input data at user defined coordinates.

Usage

img_resampler(data, warp, name = NULL)

Arguments

data

Tensor of shape [batch_size, data_height, data_width, data_num_channels] containing 2D data that will be resampled.

warp

Tensor of minimum rank 2 containing the coordinates at which resampling will be performed. Since only bilinear interpolation is currently supported, the last dimension of the warp tensor must be 2, representing the (x, y) coordinate where x is the index for width and y is the index for height.

name

Optional name of the op.

Details

The resampler currently only supports bilinear interpolation of 2D data.

Value

Tensor of resampled values from 'data'. The output tensor shape is determined by the shape of the warp tensor. For example, if 'data' is of shape '[batch_size, data_height, data_width, data_num_channels]' and warp of shape '[batch_size, dim_0, ... , dim_n, 2]' the output will be of shape '[batch_size, dim_0, ... , dim_n, data_num_channels]'.

Raises

ImportError: if the wrapper generated during compilation is not present when the function is called.

Rotate

Description

Rotate image(s) counterclockwise by the passed angle(s) in radians.

Usage

img_rotate(images, angles, interpolation = "NEAREST", name = NULL)

Arguments

images

A tensor of shape (num_images, num_rows, num_columns, num_channels) (NHWC), (num_rows, num_columns, num_channels) (HWC), or (num_rows, num_columns) (HW).

angles

A scalar angle to rotate all images by, or (if images has rank 4) a vector of length num_images, with an angle for each image in the batch.

interpolation

Interpolation mode. Supported values: "NEAREST", "BILINEAR".

name

The name of the op.

Value

Image(s) with the same type and shape as 'images', rotated by the given angle(s). Empty space due to the rotation will be filled with zeros.

Raises

TypeError: If 'image' is an invalid type.

Sharpness

Description

Change sharpness of image(s)

Usage

img_sharpness(image, factor)

Arguments

image

an image

factor

A floating point value or Tensor above 0.0.

Value

Image(s) with the same type and shape as 'images', sharper.

Shear x-axis

Description

Perform shear operation on an image (x-axis)

Usage

img_shear_x(image, level, replace)

Arguments

image

A 3D image Tensor.

level

A float denoting shear element along y-axis

replace

A one or three value 1D tensor to fill empty pixels.

Value

Transformed image along X or Y axis, with space outside image filled with replace.

Shear y-axis

Description

Perform shear operation on an image (y-axis)

Usage

img_shear_y(image, level, replace)

Arguments

image

A 3D image Tensor.

level

A float denoting shear element along x-axis

replace

A one or three value 1D tensor to fill empty pixels.

Value

Transformed image along X or Y axis, with space outside image filled with replace.

Sparse image warp

Description

Image warping using correspondences between sparse control points.

Usage

img_sparse_image_warp(
  image,
  source_control_point_locations,
  dest_control_point_locations,
  interpolation_order = 2,
  regularization_weight = 0,
  num_boundary_points = 0,
  name = "sparse_image_warp"
)

Arguments

image

'[batch, height, width, channels]' float 'Tensor'

source_control_point_locations

'[batch, num_control_points, 2]' float 'Tensor'

dest_control_point_locations

'[batch, num_control_points, 2]' float 'Tensor'

interpolation_order

polynomial order used by the spline interpolation

regularization_weight

weight on smoothness regularizer in interpolation

num_boundary_points

How many zero-flow boundary points to include at each image edge. Usage: num_boundary_points=0: don't add zero-flow points num_boundary_points=1: 4 corners of the image num_boundary_points=2: 4 corners and one in the middle of each edge (8 points total) num_boundary_points=n: 4 corners and n-1 along each edge

name

A name for the operation (optional).

Details

Apply a non-linear warp to the image, where the warp is specified by the source and destination locations of a (potentially small) number of control points. First, we use a polyharmonic spline ('tf$contrib$image$interpolate_spline') to interpolate the displacements between the corresponding control points to a dense flow field. Then, we warp the image using this dense flow field ('tf$contrib$image$dense_image_warp'). Let t index our control points. For regularization_weight=0, we have: warped_image[b, dest_control_point_locations[b, t, 0], dest_control_point_locations[b, t, 1], :] = image[b, source_control_point_locations[b, t, 0], source_control_point_locations[b, t, 1], :]. For regularization_weight > 0, this condition is met approximately, since regularized interpolation trades off smoothness of the interpolant vs. reconstruction of the interpolant at the control points. See 'tf$contrib$image$interpolate_spline' for further documentation of the interpolation_order and regularization_weight arguments.

Value

warped_image: '[batch, height, width, channels]' float 'Tensor' with same type as input image. flow_field: '[batch, height, width, 2]' float 'Tensor' containing the dense flow field produced by the interpolation.

To 4D image

Description

Convert 2/3/4D image to 4D image.

Usage

img_to_4D(image)

Arguments

image

2/3/4D tensor.

Value

4D tensor with the same type.

Examples


## Not run: 
img_to_4D(img)

## End(Not run)

Transform

Description

Applies the given transform(s) to the image(s).

Usage

img_transform(
  images,
  transforms,
  interpolation = "NEAREST",
  output_shape = NULL,
  name = NULL
)

Arguments

images

A tensor of shape (num_images, num_rows, num_columns, num_channels) (NHWC), (num_rows, num_columns, num_channels) (HWC), or (num_rows, num_columns) (HW).

transforms

Projective transform matrix/matrices. A vector of length 8 or tensor of size N x 8. If one row of transforms is [a0, a1, a2, b0, b1, b2, c0, c1], then it maps the output point (x, y) to a transformed input point (x', y') = ((a0 x + a1 y + a2) / k, (b0 x + b1 y + b2) / k), where k = c0 x + c1 y + 1. The transforms are inverted compared to the transform mapping input points to output points. Note that gradients are not backpropagated into transformation parameters.

interpolation

Interpolation mode. Supported values: "NEAREST", "BILINEAR".

output_shape

Output dimesion after the transform, [height, width]. If NULL, output is the same size as input image.

name

The name of the op.

Value

Image(s) with the same type and shape as 'images', with the given transform(s) applied. Transformed coordinates outside of the input image will be filled with zeros.

Raises

TypeError: If 'image' is an invalid type. ValueError: If output shape is not 1-D int32 Tensor.

Examples


## Not run: 
transform = img_transform(img, c(1.0, 1.0, -250, 0.0, 1.0, 0.0, 0.0, 0.0))

## End(Not run)

Translate

Description

Translate image(s) by the passed vectors(s).

Usage

img_translate(images, translations, interpolation = "NEAREST", name = NULL)

Arguments

images

A tensor of shape (num_images, num_rows, num_columns, num_channels) (NHWC), (num_rows, num_columns, num_channels) (HWC), or (num_rows, num_columns) (HW). The rank must be statically known (the shape is not TensorShape(None)).

translations

A vector representing [dx, dy] or (if images has rank 4) a matrix of length num_images, with a [dx, dy] vector for each image in the batch.

interpolation

Interpolation mode. Supported values: "NEAREST", "BILINEAR".

name

The name of the op.

Value

Image(s) with the same type and shape as 'images', translated by the given vector(s). Empty space due to the translation will be filled with zeros.

Raises

TypeError: If 'images' is an invalid type.

Translate xy dims

Description

Translates image in X or Y dimension.

Usage

img_translate_xy(image, translate_to, replace)

Arguments

image

A 3D image Tensor.

translate_to

A 1D tensor to translate [x, y]

replace

A one or three value 1D tensor to fill empty pixels.

Value

Translated image along X or Y axis, with space outside image filled with replace. Raises: ValueError: if axis is neither 0 nor 1.

Raises

ValueError: if axis is neither 0 nor 1.

Translations to projective transforms

Description

Returns projective transform(s) for the given translation(s).

Usage

img_translations_to_projective_transforms(translations, name = NULL)

Arguments

translations

A 2-element list representing [dx, dy] or a matrix of 2-element lists representing [dx, dy] to translate for each image (for a batch of images). The rank must be statically known (the shape is not 'TensorShape(NULL)').

name

The name of the op.

Value

A tensor of shape c(num_images, 8) projective transforms which can be given to 'img_transform'.

Uwrap

Description

Unwraps an image produced by wrap.

Usage

img_unwrap(image, replace)

Arguments

image

image

replace

a one or three value 1D tensor to fill empty pixels.

Details

Where there is a 0 in the last channel for every spatial position, the rest of the three channels in that spatial dimension are grayed (set to 128). Operations like translate and shear on a wrapped Tensor will leave 0s in empty locations. Some transformations look at the intensity of values to do preprocessing, and we want these empty pixels to assume the 'average' value, rather than pure black.

Value

a 3D image Tensor with 3 channels.

Wrap

Description

wrap an image array

Usage

img_wrap(image)

Arguments

image

a 3D Image Tensor with 4 channels.

Value

'image' with an extra channel set to all 1s.

Install TensorFlow SIG Addons

Description

This function is used to install the 'TensorFlow SIG Addons' python module

Usage

install_tfaddons(version = NULL, ..., restart_session = TRUE)

Arguments

version

for specific version of 'TensorFlow SIG Addons', e.g. "0.10.0"

...

other arguments passed to [reticulate::py_install()].

restart_session

Restart R session after installing (note this will only occur within RStudio).

Value

a python module 'tensorflow_addons'

Gaussian Error Linear Unit

Description

Gaussian Error Linear Unit

Usage

layer_activation_gelu(object, approximate = TRUE, ...)

Arguments

object

Model or layer object

approximate

(bool) Whether to apply approximation

...

additional parameters to pass

Details

A smoother version of ReLU generally used in the BERT or BERT architecture based models. Original paper: https://arxiv.org/abs/1606.08415

Value

A tensor

Note

Input shape: Arbitrary. Use the keyword argument 'input_shape' (tuple of integers, d oes not include the samples axis) when using this layer as the first layer in a model.

Output shape: Same shape as the input.

Correlation Cost Layer.

Description

Correlation Cost Layer.

Usage

layer_correlation_cost(
  object,
  kernel_size,
  max_displacement,
  stride_1,
  stride_2,
  pad,
  data_format,
  ...
)

Arguments

object

Model or layer object

kernel_size

An integer specifying the height and width of the patch used to compute the per-patch costs.

max_displacement

An integer specifying the maximum search radius for each position.

stride_1

An integer specifying the stride length in the input.

stride_2

An integer specifying the stride length in the patch.

pad

An integer specifying the paddings in height and width.

data_format

Specifies the data format. Possible values are: "channels_last" float [batch, height, width, channels] "channels_first" float [batch, channels, height, width] Defaults to "channels_last".

...

additional parameters to pass

Details

This layer implements the correlation operation from FlowNet Learning Optical Flow with Convolutional Networks (Fischer et al.): https://arxiv.org/abs/1504.06

Value

A tensor

FilterResponseNormalization

Description

Filter response normalization layer.

Usage

layer_filter_response_normalization(
  object,
  epsilon = 1e-06,
  axis = c(1, 2),
  beta_initializer = "zeros",
  gamma_initializer = "ones",
  beta_regularizer = NULL,
  gamma_regularizer = NULL,
  beta_constraint = NULL,
  gamma_constraint = NULL,
  learned_epsilon = FALSE,
  learned_epsilon_constraint = NULL,
  name = NULL
)

Arguments

object

Model or layer object

epsilon

Small positive float value added to variance to avoid dividing by zero.

axis

List of axes that should be normalized. This should represent the spatial dimensions.

beta_initializer

Initializer for the beta weight.

gamma_initializer

Initializer for the gamma weight.

beta_regularizer

Optional regularizer for the beta weight.

gamma_regularizer

Optional regularizer for the gamma weight.

beta_constraint

Optional constraint for the beta weight.

gamma_constraint

Optional constraint for the gamma weight.

learned_epsilon

(bool) Whether to add another learnable epsilon parameter or not.

learned_epsilon_constraint

learned_epsilon_constraint

name

Optional name for the layer

Details

Filter Response Normalization (FRN), a normalization method that enables models trained with per-channel normalization to achieve high accuracy. It performs better than all other normalization techniques for small batches and is par with Batch Normalization for bigger batch sizes.

Value

A tensor

Note

Input shape Arbitrary. Use the keyword argument 'input_shape' (list of integers, does not include the samples axis) when using this layer as the first layer in a model. This layer, as of now, works on a 4-D tensor where the tensor should have the shape [N X H X W X C] TODO: Add support for NCHW data format and FC layers. Output shape Same shape as input. References - [Filter Response Normalization Layer: Eliminating Batch Dependence in the training of Deep Neural Networks] (https://arxiv.org/abs/1911.09737)

Group normalization layer

Description

Group normalization layer

Usage

layer_group_normalization(
  object,
  groups = 2,
  axis = -1,
  epsilon = 0.001,
  center = TRUE,
  scale = TRUE,
  beta_initializer = "zeros",
  gamma_initializer = "ones",
  beta_regularizer = NULL,
  gamma_regularizer = NULL,
  beta_constraint = NULL,
  gamma_constraint = NULL,
  ...
)

Arguments

object

Model or layer object

groups

Integer, the number of groups for Group Normalization. Can be in the range [1, N] where N is the input dimension. The input dimension must be divisible by the number of groups.

axis

Integer, the axis that should be normalized.

epsilon

Small float added to variance to avoid dividing by zero.

center

If TRUE, add offset of beta to normalized tensor. If False, beta is ignored.

scale

If TRUE, multiply by gamma. If False, gamma is not used.

beta_initializer

Initializer for the beta weight.

gamma_initializer

Initializer for the gamma weight.

beta_regularizer

Optional regularizer for the beta weight.

gamma_regularizer

Optional regularizer for the gamma weight.

beta_constraint

Optional constraint for the beta weight.

gamma_constraint

Optional constraint for the gamma weight.

...

additional parameters to pass

Details

Group Normalization divides the channels into groups and computes within each group the mean and variance for normalization. Empirically, its accuracy is more stable than batch norm in a wide range of small batch sizes, if learning rate is adjusted linearly with batch sizes. Relation to Layer Normalization: If the number of groups is set to 1, then this operation becomes identical to Layer Normalization. Relation to Instance Normalization: If the number of groups is set to the input dimension (number of groups is equal to number of channels), then this operation becomes identical to Instance Normalization.

Value

A tensor

Instance normalization layer

Description

Instance normalization layer

Usage

layer_instance_normalization(
  object,
  groups = 2,
  axis = -1,
  epsilon = 0.001,
  center = TRUE,
  scale = TRUE,
  beta_initializer = "zeros",
  gamma_initializer = "ones",
  beta_regularizer = NULL,
  gamma_regularizer = NULL,
  beta_constraint = NULL,
  gamma_constraint = NULL,
  ...
)

Arguments

object

Model or layer object

groups

Integer, the number of groups for Group Normalization. Can be in the range [1, N] where N is the input dimension. The input dimension must be divisible by the number of groups.

axis

Integer, the axis that should be normalized.

epsilon

Small float added to variance to avoid dividing by zero.

center

If TRUE, add offset of 'beta' to normalized tensor. If FALSE, 'beta' is ignored.

scale

If TRUE, multiply by 'gamma'. If FALSE, 'gamma' is not used.

beta_initializer

Initializer for the beta weight.

gamma_initializer

Initializer for the gamma weight.

beta_regularizer

Optional regularizer for the beta weight.

gamma_regularizer

Optional regularizer for the gamma weight.

beta_constraint

Optional constraint for the beta weight.

gamma_constraint

Optional constraint for the gamma weight.

...

additional parameters to pass

Details

Instance Normalization is an specific case of “'GroupNormalizationsince“' it normalizes all features of one channel. The Groupsize is equal to the channel size. Empirically, its accuracy is more stable than batch norm in a wide range of small batch sizes, if learning rate is adjusted linearly with batch sizes.

Value

A tensor

References

[Instance Normalization: The Missing Ingredient for Fast Stylization](https://arxiv.org/abs/1607.08022)

Maxout layer

Description

Maxout layer

Usage

layer_maxout(object, num_units, axis = -1, ...)

Arguments

object

Model or layer object

num_units

Specifies how many features will remain after maxout in the axis dimension (usually channel). This must be a factor of number of features.

axis

The dimension where max pooling will be performed. Default is the last dimension.

...

additional parameters to pass

Details

"Maxout Networks" Ian J. Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, Yoshua Bengio. https://arxiv.org/abs/1302.4389 Usually the operation is performed in the filter/channel dimension. This can also be used after Dense layers to reduce number of features.

Value

A tensor

Keras-based multi head attention layer

Description

MultiHead Attention layer.

Usage

layer_multi_head_attention(
  object,
  head_size,
  num_heads,
  output_size = NULL,
  dropout = 0,
  use_projection_bias = TRUE,
  return_attn_coef = FALSE,
  kernel_initializer = "glorot_uniform",
  kernel_regularizer = NULL,
  kernel_constraint = NULL,
  bias_initializer = "zeros",
  bias_regularizer = NULL,
  bias_constraint = NULL,
  ...
)

Arguments

object

Model or layer object

head_size

int, dimensionality of the 'query', 'key' and 'value' tensors after the linear transformation.

num_heads

int, number of attention heads.

output_size

int, dimensionality of the output space, if 'NULL' then the input dimension of 'value' or 'key' will be used, default 'NULL'.

dropout

float, 'rate' parameter for the dropout layer that is applied to attention after softmax, default '0'.

use_projection_bias

bool, whether to use a bias term after the linear output projection.

return_attn_coef

bool, if 'TRUE', return the attention coefficients as an additional output argument.

kernel_initializer

initializer, initializer for the kernel weights.

kernel_regularizer

regularizer, regularizer for the kernel weights.

kernel_constraint

constraint, constraint for the kernel weights.

bias_initializer

initializer, initializer for the bias weights.

bias_regularizer

regularizer, regularizer for the bias weights.

bias_constraint

constraint, constraint for the bias weights.

...

additional parameters to pass

Details

Defines the MultiHead Attention operation as defined in [Attention Is All You Need](https://arxiv.org/abs/1706.03762) which takes in a 'query', 'key' and 'value' tensors returns the dot-product attention between them.

Value

A tensor

Examples


## Not run: 

mha = layer_multi_head_attention(head_size=128, num_heads=128)
query = tf$random$uniform(list(32L, 20L, 200L)) # (batch_size, query_elements, query_depth)
key = tf$random$uniform(list(32L, 15L, 300L)) # (batch_size, key_elements, key_depth)
value = tf$random$uniform(list(32L, 15L, 400L)) # (batch_size, key_elements, value_depth)
attention = mha(list(query, key, value)) # (batch_size, query_elements, value_depth)

# If `value` is not given then internally `value = key` will be used:
mha = layer_multi_head_attention(head_size=128, num_heads=128)
query = tf$random$uniform(list(32L, 20L, 200L)) # (batch_size, query_elements, query_depth)
key = tf$random$uniform(list(32L, 15L, 300L)) # (batch_size, key_elements, key_depth)
attention = mha(list(query, key)) # (batch_size, query_elements, value_depth)


## End(Not run)

Neural Architecture Search (NAS) recurrent network cell.

Description

Neural Architecture Search (NAS) recurrent network cell.

Usage

layer_nas_cell(
  object,
  units,
  projection = NULL,
  use_bias = FALSE,
  kernel_initializer = "glorot_uniform",
  recurrent_initializer = "glorot_uniform",
  projection_initializer = "glorot_uniform",
  bias_initializer = "zeros",
  ...
)

Arguments

object

Model or layer object

units

int, The number of units in the NAS cell.

projection

(optional) int, The output dimensionality for the projection matrices. If None, no projection is performed.

use_bias

(optional) bool, If 'TRUE' then use biases within the cell. This is 'FALSE' by default.

kernel_initializer

Initializer for kernel weight.

recurrent_initializer

Initializer for recurrent kernel weight.

projection_initializer

Initializer for projection weight, used when projection is not 'NULL'.

bias_initializer

Initializer for bias, used when 'use_bias' is 'TRUE'.

...

Additional keyword arguments.

Details

This implements the recurrent cell from the paper: https://arxiv.org/abs/1611.01578 Barret Zoph and Quoc V. Le. "Neural Architecture Search with Reinforcement Learning" Proc. ICLR 2017. The class uses an optional projection layer.

Value

A tensor

LSTM cell with layer normalization and recurrent dropout.

Description

LSTM cell with layer normalization and recurrent dropout.

Usage

layer_norm_lstm_cell(
  object,
  units,
  activation = "tanh",
  recurrent_activation = "sigmoid",
  use_bias = TRUE,
  kernel_initializer = "glorot_uniform",
  recurrent_initializer = "orthogonal",
  bias_initializer = "zeros",
  unit_forget_bias = TRUE,
  kernel_regularizer = NULL,
  recurrent_regularizer = NULL,
  bias_regularizer = NULL,
  kernel_constraint = NULL,
  recurrent_constraint = NULL,
  bias_constraint = NULL,
  dropout = 0,
  recurrent_dropout = 0,
  norm_gamma_initializer = "ones",
  norm_beta_initializer = "zeros",
  norm_epsilon = 0.001,
  ...
)

Arguments

object

Model or layer object

units

Positive integer, dimensionality of the output space.

activation

Activation function to use. Default: hyperbolic tangent ('tanh'). If you pass 'NULL', no activation is applied (ie. "linear" activation: 'a(x) = x').

recurrent_activation

Activation function to use for the recurrent step. Default: sigmoid ('sigmoid'). If you pass 'NULL', no activation is applied (ie. "linear" activation: 'a(x) = x').

use_bias

Boolean, whether the layer uses a bias vector.

kernel_initializer

Initializer for the 'kernel' weights matrix, used for the linear transformation of the inputs.

recurrent_initializer

Initializer for the 'recurrent_kernel' weights matrix, used for the linear transformation of the recurrent state.

bias_initializer

Initializer for the bias vector.

unit_forget_bias

Boolean. If True, add 1 to the bias of the forget gate at initialization. Setting it to true will also force 'bias_initializer="zeros"'. This is recommended in [Jozefowicz et al.](http://www.jmlr.org/proceedings/papers/v37/jozefowicz15.pdf)

kernel_regularizer

Regularizer function applied to the 'kernel' weights matrix.

recurrent_regularizer

Regularizer function applied to the 'recurrent_kernel' weights matrix.

bias_regularizer

Regularizer function applied to the bias vector.

kernel_constraint

Constraint function applied to the 'kernel' weights matrix.

recurrent_constraint

Constraint function applied to the 'recurrent_kernel' weights matrix.

bias_constraint

Constraint function applied to the bias vector.

dropout

Float between 0 and 1. Fraction of the units to drop for the linear transformation of the inputs.

recurrent_dropout

Float between 0 and 1. Fraction of the units to drop for the linear transformation of the recurrent state.

norm_gamma_initializer

Initializer for the layer normalization gain initial value.

norm_beta_initializer

Initializer for the layer normalization shift initial value.

norm_epsilon

Float, the epsilon value for normalization layers.

...

List, the other keyword arguments for layer creation.

Details

This class adds layer normalization and recurrent dropout to a LSTM unit. Layer normalization implementation is based on: https://arxiv.org/abs/1607.06450. "Layer Normalization" Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton and is applied before the internal nonlinearities. Recurrent dropout is based on: https://arxiv.org/abs/1603.05118 "Recurrent Dropout without Memory Loss" Stanislau Semeniuta, Aliaksei Severyn, Erhardt Barth.

Value

A tensor

Project into the Poincare ball with norm <= 1.0 - epsilon

Description

Project into the Poincare ball with norm <= 1.0 - epsilon

Usage

layer_poincare_normalize(object, axis = 1, epsilon = 1e-05, ...)

Arguments

object

Model or layer object

axis

Axis along which to normalize. A scalar or a vector of integers.

epsilon

A small deviation from the edge of the unit sphere for numerical stability.

...

additional parameters to pass

Details

https://en.wikipedia.org/wiki/Poincare_ball_model Used in Poincare Embeddings for Learning Hierarchical Representations Maximilian Nickel, Douwe Kiela https://arxiv.org/pdf/1705.08039.pdf For a 1-D tensor with axis = 0, computes

Value

A tensor

Sparsemax activation function

Description

Sparsemax activation function

Usage

layer_sparsemax(object, axis = -1, ...)

Arguments

object

Model or layer object

axis

Integer, axis along which the sparsemax normalization is applied.

...

additional parameters to pass

Details

The output shape is the same as the input shape. https://arxiv.org/abs/1602.02068

Value

A tensor

Examples


## Not run: 
model = keras_model_sequential() %>%
  layer_conv_2d(filters = 10, kernel_size = c(3,3),input_shape = c(28,28,1),
                activation = activation_gelu) %>%
  layer_sparsemax()

## End(Not run)

Weight Normalization layer

Description

Weight Normalization layer

Usage

layer_weight_normalization(object, layer, data_init = TRUE, ...)

Arguments

object

Model or layer object

layer

a layer instance.

data_init

If 'TRUE' use data dependent variable initialization

...

additional parameters to pass

Details

This wrapper reparameterizes a layer by decoupling the weight's magnitude and direction. This speeds up convergence by improving the conditioning of the optimization problem. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks: https://arxiv.org/abs/1602.07868 Tim Salimans, Diederik P. Kingma (2016) WeightNormalization wrapper works for keras and tf layers.

Value

A tensor

Examples


## Not run: 

model= keras_model_sequential() %>%
layer_weight_normalization(
layer_conv_2d(filters = 2, kernel_size = 2, activation = 'relu'),
input_shape = c(32L, 32L, 3L))
model



## End(Not run)

Lookahead mechanism

Description

Lookahead mechanism

Usage

lookahead_mechanism(
  optimizer,
  sync_period = 6,
  slow_step_size = 0.5,
  name = "Lookahead",
  clipnorm = NULL,
  clipvalue = NULL,
  decay = NULL,
  lr = NULL
)

Arguments

optimizer

The original optimizer that will be used to compute and apply the gradients.

sync_period

An integer. The synchronization period of lookahead. Enable lookahead mechanism by setting it with a positive value.

slow_step_size

A floating point value. The ratio for updating the slow weights.

name

Optional name for the operations created when applying gradients. Defaults to "Lookahead".

clipnorm

is clip gradients by norm.

clipvalue

is clip gradients by value.

decay

is included for backward compatibility to allow time inverse decay of learning rate.

lr

is included for backward compatibility, recommended to use learning_rate instead.

Details

The mechanism is proposed by Michael R. Zhang et.al in the paper [Lookahead Optimizer: k steps forward, 1 step back](https://arxiv.org/abs/1907.08610v1). The optimizer iteratively updates two sets of weights: the search directions for weights are chosen by the inner optimizer, while the "slow weights" are updated each k steps based on the directions of the "fast weights" and the two sets of weights are synchronized. This method improves the learning stability and lowers the variance of its inner optimizer.

Value

Optimizer for use with 'keras::compile()'

Examples


## Not run: 

opt = tf$keras$optimizers$SGD(learning_rate)
opt = lookahead_mechanism(opt)


## End(Not run)

Contrastive loss

Description

Computes the contrastive loss between 'y_true' and 'y_pred'.

Usage

loss_contrastive(
  margin = 1,
  reduction = tf$keras$losses$Reduction$SUM_OVER_BATCH_SIZE,
  name = "contrasitve_loss"
)

Arguments

margin

Float, margin term in the loss definition. Default value is 1.0.

reduction

(Optional) Type of tf$keras$losses$Reduction to apply. Default value is SUM_OVER_BATCH_SIZE.

name

(Optional) name for the loss.

Details

This loss encourages the embedding to be close to each other for the samples of the same label and the embedding to be far apart at least by the margin constant for the samples of different labels. The euclidean distances 'y_pred' between two embedding matrices 'a' and 'b' with shape [batch_size, hidden_size] can be computed as follows: “'python # y_pred = '\sqrt' ('\sum_i' (a[:, i] - b[:, i])^2) y_pred = tf$linalg.norm(a - b, axis=1) “' See: http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf

Value

contrastive_loss: 1-D float 'Tensor' with shape [batch_size].

Examples


## Not run: 
keras_model_sequential() %>%
  layer_dense(4, input_shape = c(784)) %>%
  compile(
    optimizer = 'sgd',
    loss=loss_contrastive(),
    metrics='accuracy'
  )

## End(Not run)

Implements the GIoU loss function.

Description

GIoU loss was first introduced in the [Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression](https://giou.stanford.edu/GIoU.pdf). GIoU is an enhancement for models which use IoU in object detection.

Usage

loss_giou(
  mode = "giou",
  reduction = tf$keras$losses$Reduction$AUTO,
  name = "giou_loss"
)

Arguments

mode

one of ['giou', 'iou'], decided to calculate GIoU or IoU loss.

reduction

(Optional) Type of tf$keras$losses$Reduction to apply. Default value is SUM_OVER_BATCH_SIZE.

name

A name for the operation (optional).

Value

GIoU loss float 'Tensor'.

Hamming loss

Description

Computes hamming loss.

Usage

loss_hamming(
  mode,
  name = "hamming_loss",
  threshold = NULL,
  dtype = tf$float32,
  ...
)

Arguments

mode

multi-class or multi-label

name

(optional) String name of the metric instance.

threshold

Elements of 'y_pred' greater than threshold are converted to be 1, and the rest 0. If threshold is None, the argmax is converted to 1, and the rest 0.

dtype

(optional) Data type of the metric result. Defaults to 'tf$float32'.

...

additional arguments that are passed on to function 'fn'.

Details

Hamming loss is the fraction of wrong labels to the total number of labels. In multi-class classification, hamming loss is calculated as the hamming distance between 'actual' and 'predictions'. In multi-label classification, hamming loss penalizes only the individual labels.

Value

hamming loss: float

Examples

## Not run: 

# multi-class hamming loss
hl = loss_hamming(mode='multiclass', threshold=0.6)
actuals = tf$constant(list(as.integer(c(1, 0, 0, 0)),as.integer(c(0, 0, 1, 0)),
                       as.integer(c(0, 0, 0, 1)),as.integer(c(0, 1, 0, 0))),
                      dtype=tf$float32)
predictions = tf$constant(list(c(0.8, 0.1, 0.1, 0),
                           c(0.2, 0, 0.8, 0),
                           c(0.05, 0.05, 0.1, 0.8),
                           c(1, 0, 0, 0)),
                          dtype=tf$float32)
hl$update_state(actuals, predictions)
paste('Hamming loss: ', hl$result()$numpy()) # 0.25
# multi-label hamming loss
hl = loss_hamming(mode='multilabel', threshold=0.8)
actuals = tf$constant(list(as.integer(c(1, 0, 1, 0)),as.integer(c(0, 1, 0, 1)),
                       as.integer(c(0, 0, 0,1))), dtype=tf$int32)
predictions = tf$constant(list(c(0.82, 0.5, 0.90, 0),
                           c(0, 1, 0.4, 0.98),
                           c(0.89, 0.79, 0, 0.3)),
                          dtype=tf$float32)
hl$update_state(actuals, predictions)
paste('Hamming loss: ', hl$result()$numpy()) # 0.16666667


## End(Not run)

Lifted structured loss

Description

Computes the lifted structured loss.

Usage

loss_lifted_struct(margin = 1, name = NULL, ...)

Arguments

margin

Float, margin term in the loss definition.

name

Optional name for the op.

...

additional parameters to pass

Details

The loss encourages the positive distances (between a pair of embeddings with the same labels) to be smaller than any negative distances (between a pair of embeddings with different labels) in the mini-batch in a way that is differentiable with respect to the embedding vectors. See: https://arxiv.org/abs/1511.06452

Value

lifted_loss: tf$float32 scalar.

Npairs loss

Description

Computes the npairs loss between 'y_true' and 'y_pred'.

Usage

loss_npairs(name = "npairs_loss")

Arguments

name

Optional name for the op.

Details

Value

npairs_loss: float scalar.

Npairs multilabel loss

Description

Computes the npairs loss between multilabel data 'y_true' and 'y_pred'.

Usage

loss_npairs_multilabel(name = "npairs_multilabel_loss")

Arguments

name

Optional name for the op.

Details

Npairs loss expects paired data where a pair is composed of samples from the same labels and each pairs in the minibatch have different labels. The loss takes each row of the pair-wise similarity matrix, 'y_pred', as logits and the remapped multi-class labels, 'y_true', as labels. To deal with multilabel inputs, the count of label intersection is computed as follows: “' L_i,j = | set_of_labels_for(i) '\cap' set_of_labels_for(j) | “' Each row of the count based label matrix is further normalized so that each row sums to one. 'y_true' should be a binary indicator for classes. That is, if 'y_true[i, j] = 1', then 'i'th sample is in 'j'th class; if 'y_true[i, j] = 0', then 'i'th sample is not in 'j'th class. The similarity matrix 'y_pred' between two embedding matrices 'a' and 'b' with shape '[batch_size, hidden_size]' can be computed as follows: “' # y_pred = a * b^T y_pred = tf.matmul(a, b, transpose_a=FALSE, transpose_b=TRUE) “'

Value

npairs_multilabel_loss: float scalar.

See

http://www.nec-labs.com/uploads/images/Department-Images/MediaAnalytics/papers/nips16_npairmetriclearning.pdf

Pinball loss

Description

Computes the pinball loss between 'y_true' and 'y_pred'.

Usage

loss_pinball(
  tau = 0.5,
  reduction = tf$keras$losses$Reduction$AUTO,
  name = "pinball_loss"
)

Arguments

tau

(Optional) Float in [0, 1] or a tensor taking values in [0, 1] and shape = [d0,..., dn]. It defines the slope of the pinball loss. In the context of quantile regression, the value of tau determines the conditional quantile level. When tau = 0.5, this amounts to l1 regression, an estimator of the conditional median (0.5 quantile).

reduction

(Optional) Type of tf.keras.losses.Reduction to apply to loss. Default value is AUTO. AUTO indicates that the reduction option will be determined by the usage context. For almost all cases this defaults to SUM_OVER_BATCH_SIZE. When used with tf.distribute.Strategy, outside of built-in training loops such as tf$keras compile and fit, using AUTO or SUM_OVER_BATCH_SIZE will raise an error. Please see https://www.tensorflow.org/alpha/tutorials/distribute/training_loops for more details on this.

name

Optional name for the op.

Details

'loss = maximum(tau * (y_true - y_pred), (tau - 1) * (y_true - y_pred))' In the context of regression this, loss yields an estimator of the tau conditional quantile. See: https://en.wikipedia.org/wiki/Quantile_regression Usage: “'python loss = pinball_loss([0., 0., 1., 1.], [1., 1., 1., 0.], tau=.1) # loss = max(0.1 * (y_true - y_pred), (0.1 - 1) * (y_true - y_pred)) # = (0.9 + 0.9 + 0 + 0.1) / 4 print('Loss: ', loss$numpy()) # Loss: 0.475 “'

Value

pinball_loss: 1-D float 'Tensor' with shape [batch_size].

Usage

“'python_loss = pinball_loss([0., 0., 1., 1.], [1., 1., 1., 0.], tau=.1) ““

References

- https://en.wikipedia.org/wiki/Quantile_regression - https://projecteuclid.org/download/pdfview_1/euclid.bj/1297173840

Examples


## Not run: 
keras_model_sequential() %>%
  layer_dense(4, input_shape = c(784)) %>%
  compile(
    optimizer = 'sgd',
    loss=loss_pinball(),
    metrics='accuracy'
  )

## End(Not run)

Weighted cross-entropy loss for a sequence of logits.

Description

Weighted cross-entropy loss for a sequence of logits.

Usage

loss_sequence(...)

Arguments

...

A list of parameters

Value

None

Sigmoid focal crossentropy loss

Description

Sigmoid focal crossentropy loss

Usage

loss_sigmoid_focal_crossentropy(
  from_logits = FALSE,
  alpha = 0.25,
  gamma = 2,
  reduction = tf$keras$losses$Reduction$NONE,
  name = "sigmoid_focal_crossentropy"
)

Arguments

from_logits

If logits are provided then convert the predictions into probabilities

alpha

balancing factor.

gamma

modulating factor.

reduction

(Optional) Type of tf$keras$losses$Reduction to apply. Default value is SUM_OVER_BATCH_SIZE.

name

(Optional) name for the loss.

Value

Weighted loss float 'Tensor'. If 'reduction' is 'NONE',this has the same shape as 'y_true'; otherwise, it is scalar.

Examples


## Not run: 
keras_model_sequential() %>%
  layer_dense(4, input_shape = c(784)) %>%
  compile(
    optimizer = 'sgd',
    loss=loss_sigmoid_focal_crossentropy(),
    metrics='accuracy'
  )

## End(Not run)

Sparsemax loss

Description

Sparsemax loss function [1].

Usage

loss_sparsemax(
  from_logits = TRUE,
  reduction = tf$keras$losses$Reduction$SUM_OVER_BATCH_SIZE,
  name = "sparsemax_loss"
)

Arguments

from_logits

Whether y_pred is expected to be a logits tensor. Default is True, meaning y_pred is the logits.

reduction

(Optional) Type of tf$keras$losses$Reduction to apply to loss. Default value is SUM_OVER_BATCH_SIZE.

name

Optional name for the op.

Details

Computes the generalized multi-label classification loss for the sparsemax function. The implementation is a reformulation of the original loss function such that it uses the sparsemax properbility output instead of the internal au variable. However, the output is identical to the original loss function. [1]: https://arxiv.org/abs/1602.02068

Value

A 'Tensor'. Has the same type as 'logits'.

Triplet hard loss

Description

Computes the triplet loss with hard negative and hard positive mining.

Usage

loss_triplet_hard(margin = 1, soft = FALSE, name = NULL, ...)

Arguments

margin

Float, margin term in the loss definition. Default value is 1.0.

soft

Boolean, if set, use the soft margin version. Default value is False.

name

Optional name for the op.

...

additional arguments to pass

Value

triplet_loss: float scalar with dtype of y_pred.

Examples


## Not run: 
model = keras_model_sequential() %>%
  layer_conv_2d(filters = 64, kernel_size = 2, padding='same', input_shape=c(28,28,1)) %>%
  layer_max_pooling_2d(pool_size=2) %>%
  layer_flatten() %>%
  layer_dense(256, activation= NULL) %>%
  layer_lambda(f = function(x) tf$math$l2_normalize(x, axis = 1L))

model %>% compile(
  optimizer = optimizer_lazy_adam(),
  # apply triplet semihard loss
  loss = loss_triplet_hard())

## End(Not run)

Triplet semihard loss

Description

Computes the triplet loss with semi-hard negative mining.

Usage

loss_triplet_semihard(margin = 1, name = NULL, ...)

Arguments

margin

Float, margin term in the loss definition. Default value is 1.0.

name

Optional name for the op.

...

additional arguments to pass

Value

triplet_loss: float scalar with dtype of y_pred.

Examples


## Not run: 
model = keras_model_sequential() %>%
  layer_conv_2d(filters = 64, kernel_size = 2, padding='same', input_shape=c(28,28,1)) %>%
  layer_max_pooling_2d(pool_size=2) %>%
  layer_flatten() %>%
  layer_dense(256, activation= NULL) %>%
  layer_lambda(f = function(x) tf$math$l2_normalize(x, axis = 1L))

model %>% compile(
  optimizer = optimizer_lazy_adam(),
  # apply triplet semihard loss
  loss = loss_triplet_semihard())

## End(Not run)

Computes Kappa score between two raters

Description

Computes Kappa score between two raters

Usage

metric_cohen_kappa(
  num_classes,
  name = "cohen_kappa",
  weightage = NULL,
  sparse_labels = FALSE,
  regression = FALSE,
  dtype = NULL
)

Arguments

num_classes

Number of unique classes in your dataset.

name

(optional) String name of the metric instance

weightage

(optional) Weighting to be considered for calculating kappa statistics. A valid value is one of [None, 'linear', 'quadratic']. Defaults to 'NULL'

sparse_labels

(bool) Valid only for multi-class scenario. If True, ground truth labels are expected tp be integers and not one-hot encoded

regression

(bool) If set, that means the problem is being treated as a regression problem where you are regressing the predictions. **Note:** If you are regressing for the values, the the output layer should contain a single unit.

dtype

(optional) Data type of the metric result. Defaults to 'NULL'

Details

The score lies in the range [-1, 1]. A score of -1 represents complete disagreement between two raters whereas a score of 1 represents complete agreement between the two raters. A score of 0 means agreement by chance.

Value

Input tensor or list of input tensors.

Examples


## Not run: 
model = keras_model_sequential() %>%
  layer_dense(units = 10, input_shape = ncol(iris) - 1,activation = activation_lisht) %>%
  layer_dense(units = 3)

model %>% compile(loss = 'categorical_crossentropy',
                  optimizer = optimizer_radam(),
                  metrics = metric_cohen_kappa(3))

## End(Not run)

FBetaScore

Description

Computes F-Beta score.

Usage

metric_fbetascore(
  num_classes,
  average = NULL,
  beta = 1,
  threshold = NULL,
  name = "fbeta_score",
  dtype = tf$float32,
  ...
)

Arguments

num_classes

Number of unique classes in the dataset.

average

Type of averaging to be performed on data. Acceptable values are None, micro, macro and weighted. Default value is NULL. micro, macro and weighted. Default value is NULL. - None: Scores for each class are returned - micro: True positivies, false positives and false negatives are computed globally. - macro: True positivies, false positives and - false negatives are computed for each class and their unweighted mean is returned. - weighted: Metrics are computed for each class and returns the mean weighted by the number of true instances in each class.-

beta

Determines the weight of precision and recall in harmonic mean. Determines the weight given to the precision and recall. Default value is 1.

threshold

Elements of y_pred greater than threshold are converted to be 1, and the rest 0. If threshold is None, the argmax is converted to 1, and the rest 0.

name

(optional) String name of the metric instance.

dtype

(optional) Data type of the metric result. Defaults to 'tf$float32'.

...

additional parameters to pass

Details

It is the weighted harmonic mean of precision and recall. Output range is [0, 1]. Works for both multi-class and multi-label classification. F-Beta = (1 + beta^2) * (prec * recall) / ((beta^2 * prec) + recall)

Value

F-Beta Score: float

Raises

ValueError: If the 'average' has values other than [NULL, micro, macro, weighted].

Hamming distance

Description

Computes hamming distance.

Usage

metric_hamming_distance(actuals, predictions)

Arguments

actuals

actual value

predictions

predicted value

Details

Hamming distance is for comparing two binary strings. It is the number of bit positions in which two bits are different.

Value

hamming distance: float

Examples


## Not run: 

actuals = tf$constant(as.integer(c(1, 1, 0, 0, 1, 0, 1, 0, 0, 1)), dtype=tf$int32)
predictions = tf$constant(as.integer(c(1, 0, 0, 0, 1, 0, 0, 1, 0, 1)),dtype=tf$int32)
result = metric_hamming_distance(actuals, predictions)
paste('Hamming distance: ', result$numpy())


## End(Not run)

MatthewsCorrelationCoefficient

Description

Computes the Matthews Correlation Coefficient.

Usage

metric_mcc(
  num_classes = NULL,
  name = "MatthewsCorrelationCoefficient",
  dtype = tf$float32
)

Arguments

num_classes

Number of unique classes in the dataset.

name

(Optional) String name of the metric instance.

dtype

(Optional) Data type of the metric result. Defaults to 'tf$float32'.

Details

The statistic is also known as the phi coefficient. The Matthews correlation coefficient (MCC) is used in machine learning as a measure of the quality of binary and multiclass classifications. It takes into account true and false positives and negatives and is generally regarded as a balanced measure which can be used even if the classes are of very different sizes. The correlation coefficient value of MCC is between -1 and +1. A coefficient of +1 represents a perfect prediction, 0 an average random prediction and -1 an inverse prediction. The statistic is also known as the phi coefficient. MCC = (TP * TN) - (FP * FN) / ((TP + FP) * (TP + FN) * (TN + FP ) * (TN + FN))^(1/2) Usage:

Value

Matthews correlation coefficient: float

Examples


## Not run: 

actuals = tf$constant(list(1, 1, 1, 0), dtype=tf$float32)
preds = tf$constant(list(1,0,1,1), dtype=tf$float32)
# Matthews correlation coefficient
mcc = metric_mcc(num_classes=1)
mcc$update_state(actuals, preds)
paste('Matthews correlation coefficient is:', mcc$result()$numpy())
# Matthews correlation coefficient is : -0.33333334


## End(Not run)

MultiLabelConfusionMatrix

Description

Computes Multi-label confusion matrix.

Usage

metric_multilabel_confusion_matrix(
  num_classes,
  name = "Multilabel_confusion_matrix",
  dtype = tf$int32
)

Arguments

num_classes

Number of unique classes in the dataset.

name

(Optional) String name of the metric instance.

dtype

(Optional) Data type of the metric result. Defaults to 'tf$int32'.

Details

Class-wise confusion matrix is computed for the evaluation of classification. If multi-class input is provided, it will be treated as multilabel data. Consider classification problem with two classes (i.e num_classes=2). Resultant matrix 'M' will be in the shape of (num_classes, 2, 2). Every class 'i' has a dedicated 2*2 matrix that contains: - true negatives for class i in M(0,0) - false positives for class i in M(0,1) - false negatives for class i in M(1,0) - true positives for class i in M(1,1) “'python # multilabel confusion matrix y_true = tf$constant(list(as.integer(c(1, 0, 1)), as.integer(c(0, 1, 0))), dtype=tf$int32) y_pred = tf$constant(list(as.integer(c(1, 0, 0)), as.integer(c(0, 1, 1))), dtype=tf$int32) output = metric_multilabel_confusion_matrix(num_classes=3) output$update_state(y_true, y_pred) paste('Confusion matrix:', output$result()) # Confusion matrix: [[[1 0] [0 1]] [[1 0] [0 1]] [[0 1] [1 0]]] # if multiclass input is provided y_true = tf$constant(list(as.integer(c(1, 0, 0)), as.integer(c(0, 1, 0))), dtype=tf$int32) y_pred = tf$constant(list(as.integer(c(1, 0, 0)), as.integer(c(0, 0, 1))), dtype=tf$int32) output = metric_multilabel_confusion_matrix(num_classes=3) output$update_state(y_true, y_pred) paste('Confusion matrix:', output$result()) # Confusion matrix: [[[1 0] [0 1]] [[1 0] [1 0]] [[1 1] [0 0]]] “'

Value

MultiLabelConfusionMatrix: float

RSquare This is also called as coefficient of determination. It tells how close are data to the fitted regression line. Highest score can be 1.0 and it indicates that the predictors perfectly accounts for variation in the target. Score 0.0 indicates that the predictors do not account for variation in the target. It can also be negative if the model is worse.

Description

RSquare

This is also called as coefficient of determination. It tells how close are data to the fitted regression line. Highest score can be 1.0 and it indicates that the predictors perfectly accounts for variation in the target. Score 0.0 indicates that the predictors do not account for variation in the target. It can also be negative if the model is worse.

Usage

metric_rsquare(
  name = "r_square",
  dtype = tf$float32,
  ...,
  multioutput = "uniform_average"
)

Arguments

name

(Optional) String name of the metric instance.

dtype

(Optional) Data type of the metric result. Defaults to 'tf$float32'.

...

additional arguments to pass

multioutput

one of the following: "raw_values", "uniform_average", "variance_weighted"

Value

r squared score: float

Examples


## Not run: 

actuals = tf$constant(c(1, 4, 3), dtype=tf$float32)
preds = tf$constant(c(2, 4, 4), dtype=tf$float32)
result = metric_rsquare()
result$update_state(actuals, preds)
paste('R^2 score is: ', r1$result()$numpy()) # 0.57142866


## End(Not run)

F1Score

Description

Computes F-1 Score.

Usage

metrics_f1score(
  num_classes,
  average = NULL,
  threshold = NULL,
  name = "f1_score",
  dtype = tf$float32
)

Arguments

num_classes

Number of unique classes in the dataset.

average

Type of averaging to be performed on data. Acceptable values are NULL, micro, macro and weighted. Default value is NULL. - None: Scores for each class are returned - micro: True positivies, false positives and false negatives are computed globally. - macro: True positivies, false positives and - false negatives are computed for each class and their unweighted mean is returned. - weighted: Metrics are computed for each class and returns the mean weighted by the number of true instances in each class.

threshold

Elements of y_pred above threshold are considered to be 1, and the rest 0. If threshold is NULL, the argmax is converted to 1, and the rest 0.

name

(optional) String name of the metric instance.

dtype

(optional) Data type of the metric result. Defaults to 'tf$float32'.

Details

It is the harmonic mean of precision and recall. Output range is [0, 1]. Works for both multi-class and multi-label classification. F-1 = 2 * (precision * recall) / (precision + recall)

Value

F-1 Score: float

Raises

ValueError: If the 'average' has values other than [NULL, micro, macro, weighted].

Examples


## Not run: 
model = keras_model_sequential() %>%
  layer_dense(units = 10, input_shape = ncol(iris) - 1,activation = activation_lisht) %>%
  layer_dense(units = 3)

model %>% compile(loss = 'categorical_crossentropy',
                  optimizer = optimizer_radam(),
                  metrics = metrics_f1score(3))

## End(Not run)

Conditional Gradient

Description

Conditional Gradient

Usage

optimizer_conditional_gradient(
  learning_rate,
  lambda_,
  epsilon = 1e-07,
  use_locking = FALSE,
  name = "ConditionalGradient",
  clipnorm = NULL,
  clipvalue = NULL,
  decay = NULL,
  lr = NULL
)

Arguments

learning_rate

A Tensor or a floating point value, or a schedule that is a tf$keras$optimizers$schedules$LearningRateSchedule The learning rate.

lambda_

A Tensor or a floating point value. The constraint.

epsilon

A Tensor or a floating point value. A small constant for numerical stability when handling the case of norm of gradient to be zero.

use_locking

If True, use locks for update operations.

name

Optional name prefix for the operations created when applying gradients. Defaults to 'ConditionalGradient'.

clipnorm

is clip gradients by norm.

clipvalue

is clip gradients by value.

decay

is included for backward compatibility to allow time inverse decay of learning rate.

lr

is included for backward compatibility, recommended to use learning_rate instead.

Value

Optimizer for use with 'keras::compile()'

Optimizer that implements the Adam algorithm with weight decay

Description

This is an implementation of the AdamW optimizer described in "Decoupled Weight Decay Regularization" by Loshchilov & Hutter (https://arxiv.org/abs/1711.05101) ([pdf])(https://arxiv.org/pdf/1711.05101.pdf). It computes the update step of tf.keras.optimizers.Adam and additionally decays the variable. Note that this is different from adding L2 regularization on the variables to the loss: it regularizes variables with large gradients more than L2 regularization would, which was shown to yield better training loss and generalization error in the paper above.

Usage

optimizer_decay_adamw(
  weight_decay,
  learning_rate = 0.001,
  beta_1 = 0.9,
  beta_2 = 0.999,
  epsilon = 1e-07,
  amsgrad = FALSE,
  name = "AdamW",
  clipnorm = NULL,
  clipvalue = NULL,
  decay = NULL,
  lr = NULL
)

Arguments

weight_decay

A Tensor or a floating point value. The weight decay.

learning_rate

A Tensor or a floating point value. The learning rate.

beta_1

A float value or a constant float tensor. The exponential decay rate for the 1st moment estimates.

beta_2

A float value or a constant float tensor. The exponential decay rate for the 2nd moment estimates.

epsilon

A small constant for numerical stability. This epsilon is "epsilon hat" in the Kingma and Ba paper (in the formula just before Section 2.1), not the epsilon in Algorithm 1 of the paper.

amsgrad

boolean. Whether to apply AMSGrad variant of this algorithm from the paper "On the Convergence of Adam and beyond".

name

Optional name for the operations created when applying

clipnorm

is clip gradients by norm.

clipvalue

is clip gradients by value.

decay

is included for backward compatibility to allow time inverse decay of learning rate.

lr

is included for backward compatibility, recommended to use learning_rate instead.

Value

Optimizer for use with 'keras::compile()'

Examples


## Not run: 

step = tf$Variable(0L, trainable = FALSE)
schedule = tf$optimizers$schedules$PiecewiseConstantDecay(list(c(10000, 15000)),
list(c(1e-0, 1e-1, 1e-2)))
lr = 1e-1 * schedule(step)
wd = lambda: 1e-4 * schedule(step)


## End(Not run)

Optimizer that implements the Momentum algorithm with weight_decay

Description

This is an implementation of the SGDW optimizer described in "Decoupled Weight Decay Regularization" by Loshchilov & Hutter (https://arxiv.org/abs/1711.05101) ([pdf])(https://arxiv.org/pdf/1711.05101.pdf). It computes the update step of tf.keras.optimizers.SGD and additionally decays the variable. Note that this is different from adding L2 regularization on the variables to the loss. Decoupling the weight decay from other hyperparameters (in particular the learning rate) simplifies hyperparameter search. For further information see the documentation of the SGD Optimizer.

Usage

optimizer_decay_sgdw(
  weight_decay,
  learning_rate = 0.001,
  momentum = 0,
  nesterov = FALSE,
  name = "SGDW",
  clipnorm = NULL,
  clipvalue = NULL,
  decay = NULL,
  lr = NULL
)

Arguments

weight_decay

weight decay rate.

learning_rate

float hyperparameter >= 0. Learning rate.

momentum

float hyperparameter >= 0 that accelerates SGD in the relevant direction and dampens oscillations.

nesterov

boolean. Whether to apply Nesterov momentum.

name

Optional name prefix for the operations created when applying gradients. Defaults to 'SGD'.

clipnorm

is clip gradients by norm.

clipvalue

is clip gradients by value.

decay

is included for backward compatibility to allow time inverse decay of learning rate.

lr

is included for backward compatibility, recommended to use learning_rate instead.

Value

Optimizer for use with 'keras::compile()'

Examples


## Not run: 

step = tf$Variable(0L, trainable = FALSE)
schedule = tf$optimizers$schedules$PiecewiseConstantDecay(list(c(10000, 15000)),
list(c(1e-0, 1e-1, 1e-2)))
lr = 1e-1 * schedule(step)
wd = lambda: 1e-4 * schedule(step)


## End(Not run)

Layer-wise Adaptive Moments

Description

Layer-wise Adaptive Moments

Usage

optimizer_lamb(
  learning_rate = 0.001,
  beta_1 = 0.9,
  beta_2 = 0.999,
  epsilon = 1e-06,
  weight_decay_rate = 0,
  exclude_from_weight_decay = NULL,
  exclude_from_layer_adaptation = NULL,
  name = "LAMB",
  clipnorm = NULL,
  clipvalue = NULL,
  decay = NULL,
  lr = NULL
)

Arguments

learning_rate

A 'Tensor' or a floating point value. or a schedule that is a 'tf$keras$optimizers$schedules$LearningRateSchedule' The learning rate.

beta_1

A 'float' value or a constant 'float' tensor. The exponential decay rate for the 1st moment estimates.

beta_2

A 'float' value or a constant 'float' tensor. The exponential decay rate for the 2nd moment estimates.

epsilon

A small constant for numerical stability.

weight_decay_rate

weight decay rate.

exclude_from_weight_decay

List of regex patterns of variables excluded from weight decay. Variables whose name contain a substring matching the pattern will be excluded.

exclude_from_layer_adaptation

List of regex patterns of variables excluded from layer adaptation. Variables whose name contain a substring matching the pattern will be excluded.

name

Optional name for the operations created when applying gradients. Defaults to "LAMB".

clipnorm

is clip gradients by norm.

clipvalue

is clip gradients by value.

decay

is included for backward compatibility to allow time inverse decay of learning rate.

lr

is included for backward compatibility, recommended to use learning_rate instead.

Value

Optimizer for use with 'keras::compile()'

Examples


## Not run: 
keras_model_sequential() %>%
  layer_dense(32, input_shape = c(784)) %>%
  compile(
    optimizer = optimizer_lamb(),
    loss='binary_crossentropy',
    metrics='accuracy'
  )

## End(Not run)

Lazy Adam

Description

Lazy Adam

Usage

optimizer_lazy_adam(
  learning_rate = 0.001,
  beta_1 = 0.9,
  beta_2 = 0.999,
  epsilon = 1e-07,
  amsgrad = FALSE,
  name = "LazyAdam",
  clipnorm = NULL,
  clipvalue = NULL,
  decay = NULL,
  lr = NULL
)

Arguments

learning_rate

A Tensor or a floating point value. or a schedule that is a tf.keras.optimizers.schedules.LearningRateSchedule The learning rate.

beta_1

A float value or a constant float tensor. The exponential decay rate for the 1st moment estimates.

beta_2

A float value or a constant float tensor. The exponential decay rate for the 2nd moment estimates.

epsilon

A small constant for numerical stability. This epsilon is "epsilon hat" in Adam: A Method for Stochastic Optimization. Kingma et al., 2014 (in the formula just before Section 2.1), not the epsilon in Algorithm 1 of the paper.

amsgrad

boolean. Whether to apply AMSGrad variant of this algorithm from the paper "On the Convergence of Adam and beyond". Note that this argument is currently not supported and the argument can only be False.

name

Optional name for the operations created when applying gradients. Defaults to "LazyAdam".

clipnorm

is clip gradients by norm;

clipvalue

is clip gradients by value,

decay

is included for backward compatibility to allow time inverse decay of learning rate.

lr

is included for backward compatibility, recommended to use learning_rate instead.

Value

Optimizer for use with 'keras::compile()'

Examples


## Not run: 
keras_model_sequential() %>%
  layer_dense(32, input_shape = c(784)) %>%
  compile(
    optimizer = optimizer_lazy_adam(),
    loss='binary_crossentropy',
    metrics='accuracy'
  )

## End(Not run)

Moving Average

Description

Moving Average

Usage

optimizer_moving_average(
  optimizer,
  sequential_update = TRUE,
  average_decay = 0.99,
  num_updates = NULL,
  name = "MovingAverage",
  clipnorm = NULL,
  clipvalue = NULL,
  decay = NULL,
  lr = NULL
)

Arguments

optimizer

str or tf$keras$optimizers$Optimizer that will be used to compute and apply gradients.

sequential_update

Bool. If False, will compute the moving average at the same time as the model is updated, potentially doing benign data races. If True, will update the moving average after gradient updates.

average_decay

float. Decay to use to maintain the moving averages of trained variables.

num_updates

Optional count of the number of updates applied to variables.

name

Optional name for the operations created when applying gradients. Defaults to "MovingAverage".

clipnorm

is clip gradients by norm.

clipvalue

is clip gradients by value.

decay

is included for backward compatibility to allow time inverse decay of learning rate.

lr

is included for backward compatibility, recommended to use learning_rate instead.

Details

Optimizer that computes a moving average of the variables. Empirically it has been found that using the moving average of the trained parameters of a deep network is better than using its trained parameters directly. This optimizer allows you to compute this moving average and swap the variables at save time so that any code outside of the training loop will use by default the average values instead of the original ones.

Value

Optimizer for use with 'keras::compile()'

Examples


## Not run: 

opt = tf$keras$optimizers$SGD(learning_rate)
opt = moving_average(opt)


## End(Not run)

NovoGrad

Description

NovoGrad

Usage

optimizer_novograd(
  learning_rate = 0.001,
  beta_1 = 0.9,
  beta_2 = 0.999,
  epsilon = 1e-07,
  weight_decay = 0,
  grad_averaging = FALSE,
  amsgrad = FALSE,
  name = "NovoGrad",
  clipnorm = NULL,
  clipvalue = NULL,
  decay = NULL,
  lr = NULL
)

Arguments

learning_rate

A 'Tensor' or a floating point value. or a schedule that is a 'tf$keras$optimizers$schedules$LearningRateSchedule' The learning rate.

beta_1

A float value or a constant float tensor. The exponential decay rate for the 1st moment estimates.

beta_2

A float value or a constant float tensor. The exponential decay rate for the 2nd moment estimates.

epsilon

A small constant for numerical stability.

weight_decay

A floating point value. Weight decay for each param.

grad_averaging

determines whether to use Adam style exponential moving averaging for the first order moments.

amsgrad

boolean. Whether to apply AMSGrad variant of this algorithm from the paper "On the Convergence of Adam and beyond"

name

Optional name for the operations created when applying gradients. Defaults to "NovoGrad".

clipnorm

is clip gradients by norm.

clipvalue

is clip gradients by value.

decay

is included for backward compatibility to allow time inverse decay of learning rate.

lr

is included for backward compatibility, recommended to use learning_rate instead.

Value

Optimizer for use with 'keras::compile()'

Examples


## Not run: 
keras_model_sequential() %>%
  layer_dense(32, input_shape = c(784)) %>%
  compile(
    optimizer = optimizer_novograd(),
    loss='binary_crossentropy',
    metrics='accuracy'
  )

## End(Not run)

Rectified Adam (a.k.a. RAdam)

Description

Rectified Adam (a.k.a. RAdam)

Usage

optimizer_radam(
  learning_rate = 0.001,
  beta_1 = 0.9,
  beta_2 = 0.999,
  epsilon = 1e-07,
  weight_decay = 0,
  amsgrad = FALSE,
  sma_threshold = 5,
  total_steps = 0,
  warmup_proportion = 0.1,
  min_lr = 0,
  name = "RectifiedAdam",
  clipnorm = NULL,
  clipvalue = NULL,
  decay = NULL,
  lr = NULL
)

Arguments

learning_rate

A 'Tensor' or a floating point value. or a schedule that is a 'tf$keras$optimizers$schedules$LearningRateSchedule' The learning rate.

beta_1

A float value or a constant float tensor. The exponential decay rate for the 1st moment estimates.

beta_2

A float value or a constant float tensor. The exponential decay rate for the 2nd moment estimates.

epsilon

A small constant for numerical stability.

weight_decay

A floating point value. Weight decay for each param.

amsgrad

boolean. Whether to apply AMSGrad variant of this algorithm from the paper "On the Convergence of Adam and beyond".

sma_threshold

A float value. The threshold for simple mean average.

total_steps

An integer. Total number of training steps. Enable warmup by setting a positive value.

warmup_proportion

A floating point value. The proportion of increasing steps.

min_lr

A floating point value. Minimum learning rate after warmup.

name

Optional name for the operations created when applying gradients. Defaults to "RectifiedAdam".

clipnorm

is clip gradients by norm.

clipvalue

is clip gradients by value.

decay

is included for backward compatibility to allow time inverse decay of learning rate.

lr

is included for backward compatibility, recommended to use learning_rate instead.

Value

Optimizer for use with 'keras::compile()'

Stochastic Weight Averaging

Description

Stochastic Weight Averaging

Usage

optimizer_swa(
  optimizer,
  start_averaging = 0,
  average_period = 10,
  name = "SWA",
  sequential_update = TRUE,
  clipnorm = NULL,
  clipvalue = NULL,
  decay = NULL,
  lr = NULL
)

Arguments

optimizer

The original optimizer that will be used to compute and apply the gradients.

start_averaging

An integer. Threshold to start averaging using SWA. Averaging only occurs at start_averaging iters, must be >= 0. If start_averaging = m, the first snapshot will be taken after the mth application of gradients (where the first iteration is iteration 0).

average_period

An integer. The synchronization period of SWA. The averaging occurs every average_period steps. Averaging period needs to be >= 1.

name

Optional name for the operations created when applying gradients. Defaults to 'SWA'.

sequential_update

Bool. If FALSE, will compute the moving average at the same time as the model is updated, potentially doing benign data races. If True, will update the moving average after gradient updates

clipnorm

is clip gradients by norm.

clipvalue

is clip gradients by value.

decay

is included for backward compatibility to allow time inverse decay of learning rate.

lr

is included for backward compatibility, recommended to use learning_rate instead.

Details

The Stochastic Weight Averaging mechanism was proposed by Pavel Izmailov et. al in the paper [Averaging Weights Leads to Wider Optima and Better Generalization](https://arxiv.org/abs/1803.05407). The optimizer implements averaging of multiple points along the trajectory of SGD. The optimizer expects an inner optimizer which will be used to apply the gradients to the variables and itself computes a running average of the variables every k steps (which generally corresponds to the end of a cycle when a cyclic learning rate is employed). We also allow the specification of the number of steps averaging should first happen after. Let's say, we want averaging to happen every k steps after the first m steps. After step m we'd take a snapshot of the variables and then average the weights appropriately at step m + k, m + 2k and so on. The assign_average_vars function can be called at the end of training to obtain the averaged_weights from the optimizer.

Value

Optimizer for use with 'keras::compile()'

Examples


## Not run: 
opt = tf$keras$optimizers$SGD(learning_rate)
opt = optimizer_swa(opt, start_averaging=m, average_period=k)

## End(Not run)

Yogi

Description

Yogi

Usage

optimizer_yogi(
  learning_rate = 0.01,
  beta1 = 0.9,
  beta2 = 0.999,
  epsilon = 0.001,
  l1_regularization_strength = 0,
  l2_regularization_strength = 0,
  initial_accumulator_value = 1e-06,
  activation = "sign",
  name = "Yogi",
  clipnorm = NULL,
  clipvalue = NULL,
  decay = NULL,
  lr = NULL
)

Arguments

learning_rate

A Tensor or a floating point value. The learning rate.

beta1

A float value or a constant float tensor. The exponential decay rate for the 1st moment estimates.

beta2

A float value or a constant float tensor. The exponential decay rate for the 2nd moment estimates.

epsilon

A constant trading off adaptivity and noise.

l1_regularization_strength

A float value, must be greater than or equal to zero.

l2_regularization_strength

A float value, must be greater than or equal to zero.

initial_accumulator_value

The starting value for accumulators. Only positive values are allowed.

activation

Use hard sign or soft tanh to determin sign.

name

Optional name for the operations created when applying gradients. Defaults to "Yogi".

clipnorm

is clip gradients by norm.

clipvalue

is clip gradients by value.

decay

is included for backward compatibility to allow time inverse decay of learning rate.

lr

is included for backward compatibility, recommended to use learning_rate instead.

Value

Optimizer for use with 'keras::compile()'

Parse time

Description

Parse an input string according to the provided format string into a

Usage

parse_time(time_string, time_format, output_unit)

Arguments

time_string

The input time string to be parsed.

time_format

The time format.

output_unit

The output unit of the parsed unix time. Can only be SECOND, MILLISECOND, MICROSECOND, NANOSECOND.

Details

Unix time. Parse an input string according to the provided format string into a Unix time, the number of seconds / milliseconds / microseconds / nanoseconds elapsed since January 1, 1970 UTC. Uses strftime()-like formatting options, with the same extensions as FormatTime(), but with the exceptions that characters as it can, so the matching data should always be terminated with a non-numeric. consumes exactly four characters, including any sign. Unspecified fields are taken from the default date and time of ... "1970-01-01 00:00:00.0 +0000" For example, parsing a string of "15:45" ( Unix time that represents "1970-01-01 15:45:00.0 +0000". Note that ParseTime only heeds the fields year, month, day, hour, minute, (fractional) second, and UTC offset. Other fields, like weekday ( ignored in the conversion. Date and time fields that are out-of-range will be treated as errors rather than normalizing them like 'absl::CivilSecond' does. For example, it is an error to parse the date "Oct 32, 2013" because 32 is out of range. A leap second of ":60" is normalized to ":00" of the following minute with fractional seconds discarded. The following table shows how the given seconds and subseconds will be parsed: "59.x" -> 59.x // exact "60.x" -> 00.0 // normalized "00.x" -> 00.x // exact

Value

the number of seconds / milliseconds / microseconds / nanoseconds elapsed since January 1, 1970 UTC.

Raises

ValueError: If 'output_unit' is not a valid value, if parsing 'time_string' according to 'time_format' failed.

Objects exported from other packages

Description

These objects are imported from other packages. Follow the links below to see their documentation.

tensorflow: tf

Value

a alias for tensorflow::tf

Register all

Description

Usage

register_all(keras_objects = TRUE, custom_kernels = TRUE)

Arguments

keras_objects

boolean, 'TRUE' by default. If 'TRUE', register all Keras objects with 'tf$keras$utils$register_keras_serializable(package="Addons")' If set to FALSE, doesn't register any Keras objects of Addons in TensorFlow.

custom_kernels

boolean, 'TRUE' by default. If 'TRUE', loads all custom kernels of TensorFlow Addons with 'tf.load_op_library("path/to/so/file.so")'. Loading the SO files register them automatically. If ‘FALSE' doesn’t load and register the shared objects files. Not that it might be useful to turn it off if your installation of Addons doesn't work well with custom ops.

Details

When loading a Keras model that has a TF Addons' function, it is needed for this function to be known by the Keras deserialization process. There are two ways to do this, either do “' tf$keras$models$load_model( "my_model.tf", custom_objects=list("LAMB": tfaddons::optimizer_lamb) ) “' or you can do: “'python register_all() tf$keras$models$load_model("my_model.tf") “' If the model contains custom ops (compiled ops) of TensorFlow Addons, and the graph is loaded with 'tf$saved_model$load', then custom ops need to be registered before to avoid an error of the type: “' tensorflow.python.framework.errors_impl.NotFoundError: Op type not registered '...' in binary running on ... Make sure the Op and Kernel are registered in the binary running in this process. “' In this case, the only way to make sure that the ops are registered is to call this function: “' register_all() tf$saved_model$load("my_model.tf") “' Note that you can call this function multiple times in the same process, it only has an effect the first time. Afterward, it's just a no-op.

Value

None

Register custom kernels

Description

Usage

register_custom_kernels(...)

Arguments

...

parameters to pass

Value

None

Register keras objects

Description

Usage

register_keras_objects(...)

Arguments

...

parameters to pass

Value

None

Safe cumprod

Description

Computes cumprod of x in logspace using cumsum to avoid underflow.

Usage

safe_cumprod(x, ...)

Arguments

x

Tensor to take the cumulative product of.

...

Passed on to cumsum; these are identical to those in cumprod

Details

The cumprod function and its gradient can result in numerical instabilities when its argument has very small and/or zero values. As long as the argument is all positive, we can instead compute the cumulative product as exp(cumsum(log(x))). This function can be called identically to tf$cumprod.

Value

Cumulative product of x.

Bernoulli sample

Description

Samples from Bernoulli distribution.

Usage

sample_bernoulli(
  probs = NULL,
  logits = NULL,
  dtype = tf$int32,
  sample_shape = list(),
  seed = NULL
)

Arguments

probs

probabilities

logits

logits

dtype

the data type

sample_shape

a list/vector of integers

seed

integer, random seed

Value

a Tensor

Categorical sample

Description

Samples from categorical distribution.

Usage

sample_categorical(
  logits,
  dtype = tf$int32,
  sample_shape = list(),
  seed = NULL
)

Arguments

logits

logits

dtype

dtype

sample_shape

the shape of sample

seed

random seed: integer

Value

a Tensor

Sampler

Description

Interface for implementing sampling in seq2seq decoders.

Usage

sampler(...)

Arguments

...

parametr to pass batch_size, initialize, next_inputs, sample, sample_ids_dtype, sample_ids_shape

Value

None

Base abstract class that allows the user to customize sampling.

Description

Base abstract class that allows the user to customize sampling.

Usage

sampler_custom(
  initialize_fn,
  sample_fn,
  next_inputs_fn,
  sample_ids_shape = NULL,
  sample_ids_dtype = NULL
)

Arguments

initialize_fn

callable that returns (finished, next_inputs) for the first iteration.

sample_fn

callable that takes (time, outputs, state) and emits tensor sample_ids.

next_inputs_fn

callable that takes (time, outputs, state, sample_ids) and emits (finished, next_inputs, next_state).

sample_ids_shape

Either a list of integers, or a 1-D Tensor of type int32, the shape of each value in the sample_ids batch. Defaults to a scalar.

sample_ids_dtype

The dtype of the sample_ids tensor. Defaults to int32.

Value

None

Greedy Embedding Sampler

Description

A sampler for use during inference.

Usage

sampler_greedy_embedding(embedding_fn = NULL)

Arguments

embedding_fn

A optional callable that takes a vector tensor of ids (argmax ids), or the params argument for embedding_lookup. The returned tensor will be passed to the decoder input. Default to use tf$nn$embedding_lookup.

Details

Uses the argmax of the output (treated as logits) and passes the result through an embedding layer to get the next input.

Value

None

Inference Sampler

Description

Inference Sampler

Usage

sampler_inference(
  sample_fn,
  sample_shape,
  sample_dtype = tf$int32,
  end_fn,
  next_inputs_fn = NULL,
  ...
)

Arguments

sample_fn

A callable that takes outputs and emits tensor sample_ids.

sample_shape

Either a list of integers, or a 1-D Tensor of type int32, the shape of the each sample in the batch returned by sample_fn.

sample_dtype

the dtype of the sample returned by sample_fn.

end_fn

A callable that takes sample_ids and emits a bool vector shaped [batch_size] indicating whether each sample is an end token.

next_inputs_fn

(Optional) A callable that takes sample_ids and returns the next batch of inputs. If not provided, sample_ids is used as the next batch of inputs.

...

A list that contains other common arguments for layer creation.

Details

A helper to use during inference with a custom sampling function.

Value

None

Sample Embedding Sampler

Description

A sampler for use during inference.

Usage

sampler_sample_embedding(
  embedding_fn = NULL,
  softmax_temperature = NULL,
  seed = NULL
)

Arguments

embedding_fn

(Optional) A callable that takes a vector tensor of ids (argmax ids), or the params argument for embedding_lookup. The returned tensor will be passed to the decoder input.

softmax_temperature

(Optional) float32 scalar, value to divide the logits by before computing the softmax. Larger values (above 1.0) result in more random samples, while smaller values push the sampling distribution towards the argmax. Must be strictly greater than 0. Defaults to 1.0.

seed

(Optional) The sampling seed.

Details

Uses sampling (from a distribution) instead of argmax and passes the result through an embedding layer to get the next input.

Value

None

A training sampler that adds scheduled sampling

Description

A training sampler that adds scheduled sampling

Usage

sampler_scheduled_embedding_training(
  sampling_probability,
  embedding_fn = NULL,
  time_major = FALSE,
  seed = NULL,
  scheduling_seed = NULL
)

Arguments

sampling_probability

A float32 0-D or 1-D tensor: the probability of sampling categorically from the output ids instead of reading directly from the inputs.

embedding_fn

A callable that takes a vector tensor of ids (argmax ids), or the params argument for embedding_lookup.

time_major

bool. Whether the tensors in inputs are time major. If 'FALSE' (default), they are assumed to be batch major.

seed

The sampling seed.

scheduling_seed

The schedule decision rule sampling seed.

Value

Returns -1s for sample_ids where no sampling took place; valid sample id values elsewhere.

Scheduled Output Training Sampler

Description

A training sampler that adds scheduled sampling directly to outputs.

Usage

sampler_scheduled_output_training(
  sampling_probability,
  time_major = FALSE,
  seed = NULL,
  next_inputs_fn = NULL
)

Arguments

sampling_probability

A float32 scalar tensor: the probability of sampling from the outputs instead of reading directly from the inputs.

time_major

bool. Whether the tensors in inputs are time major. If False (default), they are assumed to be batch major.

seed

The sampling seed.

next_inputs_fn

(Optional) callable to apply to the RNN outputs to create the next input when sampling. If None (default), the RNN outputs will be used as the next inputs.

Value

FALSE for sample_ids where no sampling took place; TRUE elsewhere.

A Sampler for use during training.

Description

Only reads inputs.

Usage

sampler_training(time_major = FALSE)

Arguments

time_major

bool. Whether the tensors in inputs are time major. If 'FALSE' (default), they are assumed to be batch major.

Value

None

Skip gram sample

Description

Generates skip-gram token and label paired Tensors from the input

Usage

skip_gram_sample(
  input_tensor,
  min_skips = 1,
  max_skips = 5,
  start = 0,
  limit = -1,
  emit_self_as_target = FALSE,
  vocab_freq_table = NULL,
  vocab_min_count = NULL,
  vocab_subsampling = NULL,
  corpus_size = NULL,
  batch_size = NULL,
  batch_capacity = NULL,
  seed = NULL,
  name = NULL
)

Arguments

input_tensor

A rank-1 'Tensor' from which to generate skip-gram candidates.

min_skips

'int' or scalar 'Tensor' specifying the minimum window size to randomly use for each token. Must be >= 0 and <= 'max_skips'. If 'min_skips' and 'max_skips' are both 0, the only label outputted will be the token itself when 'emit_self_as_target = TRUE' - or no output otherwise.

max_skips

'int' or scalar 'Tensor' specifying the maximum window size to randomly use for each token. Must be >= 0.

start

'int' or scalar 'Tensor' specifying the position in 'input_tensor' from which to start generating skip-gram candidates.

limit

'int' or scalar 'Tensor' specifying the maximum number of elements in 'input_tensor' to use in generating skip-gram candidates. -1 means to use the rest of the 'Tensor' after 'start'.

emit_self_as_target

'bool' or scalar 'Tensor' specifying whether to emit each token as a label for itself.

vocab_freq_table

(Optional) A lookup table (subclass of 'lookup.InitializableLookupTableBase') that maps tokens to their raw frequency counts. If specified, any token in 'input_tensor' that is not found in 'vocab_freq_table' will be filtered out before generating skip-gram candidates. While this will typically map to integer raw frequency counts, it could also map to float frequency proportions. 'vocab_min_count' and 'corpus_size' should be in the same units as this.

vocab_min_count

(Optional) 'int', 'float', or scalar 'Tensor' specifying minimum frequency threshold (from 'vocab_freq_table') for a token to be kept in 'input_tensor'. If this is specified, 'vocab_freq_table' must also be specified - and they should both be in the same units.

vocab_subsampling

(Optional) 'float' specifying frequency proportion threshold for tokens from 'input_tensor'. Tokens that occur more frequently (based on the ratio of the token's 'vocab_freq_table' value to the 'corpus_size') will be randomly down-sampled. Reasonable starting values may be around 1e-3 or 1e-5. If this is specified, both 'vocab_freq_table' and 'corpus_size' must also be specified. See Eq. 5 in http://arxiv.org/abs/1310.4546 for more details.

corpus_size

(Optional) 'int', 'float', or scalar 'Tensor' specifying the total number of tokens in the corpus (e.g., sum of all the frequency counts of 'vocab_freq_table'). Used with 'vocab_subsampling' for down-sampling frequently occurring tokens. If this is specified, 'vocab_freq_table' and 'vocab_subsampling' must also be specified.

batch_size

(Optional) 'int' specifying batch size of returned 'Tensors'.

batch_capacity

(Optional) 'int' specifying batch capacity for the queue used for batching returned 'Tensors'. Only has an effect if 'batch_size' > 0. Defaults to 100 * 'batch_size' if not specified.

seed

(Optional) 'int' used to create a random seed for window size and subsampling. See 'set_random_seed' docs for behavior.

name

(Optional) A 'string' name or a name scope for the operations.

Details

tensor. Generates skip-gram '("token", "label")' pairs using each element in the rank-1 'input_tensor' as a token. The window size used for each token will be randomly selected from the range specified by '[min_skips, max_skips]', inclusive. See https://arxiv.org/abs/1301.3781 for more details about skip-gram. For example, given 'input_tensor = ["the", "quick", "brown", "fox", "jumps"]', 'min_skips = 1', 'max_skips = 2', 'emit_self_as_target = FALSE', the output '(tokens, labels)' pairs for the token "quick" will be randomly selected from either '(tokens=["quick", "quick"], labels=["the", "brown"])' for 1 skip, or '(tokens=["quick", "quick", "quick"], labels=["the", "brown", "fox"])' for 2 skips. If 'emit_self_as_target = TRUE', each token will also be emitted as a label for itself. From the previous example, the output will be either '(tokens=["quick", "quick", "quick"], labels=["the", "quick", "brown"])' for 1 skip, or '(tokens=["quick", "quick", "quick", "quick"], labels=["the", "quick", "brown", "fox"])' for 2 skips. The same process is repeated for each element of 'input_tensor' and concatenated together into the two output rank-1 'Tensors' (one for all the tokens, another for all the labels). If 'vocab_freq_table' is specified, tokens in 'input_tensor' that are not present in the vocabulary are discarded. Tokens whose frequency counts are below 'vocab_min_count' are also discarded. Tokens whose frequency proportions in the corpus exceed 'vocab_subsampling' may be randomly down-sampled. See Eq. 5 in http://arxiv.org/abs/1310.4546 for more details about subsampling. Due to the random window sizes used for each token, the lengths of the outputs are non-deterministic, unless 'batch_size' is specified to batch the outputs to always return 'Tensors' of length 'batch_size'.

Value

A 'list' containing (token, label) 'Tensors'. Each output 'Tensor' is of rank-1 and has the same type as 'input_tensor'. The 'Tensors' will be of length 'batch_size'; if 'batch_size' is not specified, they will be of random length, though they will be in sync with each other as long as they are evaluated together.

Raises

ValueError: If 'vocab_freq_table' is not provided, but 'vocab_min_count', 'vocab_subsampling', or 'corpus_size' is specified. If 'vocab_subsampling' and 'corpus_size' are not both present or both absent.

Skip gram sample with text vocab

Description

Skip-gram sampling with a text vocabulary file.

Usage

skip_gram_sample_with_text_vocab(
  input_tensor,
  vocab_freq_file,
  vocab_token_index = 0,
  vocab_token_dtype = tf$string,
  vocab_freq_index = 1,
  vocab_freq_dtype = tf$float64,
  vocab_delimiter = ",",
  vocab_min_count = NULL,
  vocab_subsampling = NULL,
  corpus_size = NULL,
  min_skips = 1,
  max_skips = 5,
  start = 0,
  limit = -1,
  emit_self_as_target = FALSE,
  batch_size = NULL,
  batch_capacity = NULL,
  seed = NULL,
  name = NULL
)

Arguments

input_tensor

A rank-1 'Tensor' from which to generate skip-gram candidates.

vocab_freq_file

'string' specifying full file path to the text vocab file.

vocab_token_index

'int' specifying which column in the text vocab file contains the tokens.

vocab_token_dtype

'DType' specifying the format of the tokens in the text vocab file.

vocab_freq_index

'int' specifying which column in the text vocab file contains the frequency counts of the tokens.

vocab_freq_dtype

'DType' specifying the format of the frequency counts in the text vocab file.

vocab_delimiter

'string' specifying the delimiter used in the text vocab file.

vocab_min_count

'int', 'float', or scalar 'Tensor' specifying minimum frequency threshold (from 'vocab_freq_file') for a token to be kept in 'input_tensor'. This should correspond with 'vocab_freq_dtype'.

vocab_subsampling

(Optional) 'float' specifying frequency proportion threshold for tokens from 'input_tensor'. Tokens that occur more frequently will be randomly down-sampled. Reasonable starting values may be around 1e-3 or 1e-5. See Eq. 5 in http://arxiv.org/abs/1310.4546 for more details.

corpus_size

(Optional) 'int', 'float', or scalar 'Tensor' specifying the total number of tokens in the corpus (e.g., sum of all the frequency counts of 'vocab_freq_file'). Used with 'vocab_subsampling' for down-sampling frequently occurring tokens. If this is specified, 'vocab_freq_file' and 'vocab_subsampling' must also be specified. If 'corpus_size' is needed but not supplied, then it will be calculated from 'vocab_freq_file'. You might want to supply your own value if you have already eliminated infrequent tokens from your vocabulary files (where frequency < vocab_min_count) to save memory in the internal token lookup table. Otherwise, the unused tokens' variables will waste memory. The user-supplied 'corpus_size' value must be greater than or equal to the sum of all the frequency counts of 'vocab_freq_file'.

min_skips

max_skips

'int' or scalar 'Tensor' specifying the maximum window size to randomly use for each token. Must be >= 0.

start

'int' or scalar 'Tensor' specifying the position in 'input_tensor' from which to start generating skip-gram candidates.

limit

'int' or scalar 'Tensor' specifying the maximum number of elements in 'input_tensor' to use in generating skip-gram candidates. -1 means to use the rest of the 'Tensor' after 'start'.

emit_self_as_target

'bool' or scalar 'Tensor' specifying whether to emit each token as a label for itself.

batch_size

(Optional) 'int' specifying batch size of returned 'Tensors'.

batch_capacity

(Optional) 'int' specifying batch capacity for the queue used for batching returned 'Tensors'. Only has an effect if 'batch_size' > 0. Defaults to 100 * 'batch_size' if not specified.

seed

(Optional) 'int' used to create a random seed for window size and subsampling. See ['set_random_seed'](../../g3doc/python/constant_op.md#set_random_seed) for behavior.

name

(Optional) A 'string' name or a name scope for the operations.

Details

Wrapper around 'skip_gram_sample()' for use with a text vocabulary file. The vocabulary file is expected to be a plain-text file, with lines of 'vocab_delimiter'-separated columns. The 'vocab_token_index' column should contain the vocabulary term, while the 'vocab_freq_index' column should contain the number of times that term occurs in the corpus. For example, with a text vocabulary file of: “' bonjour,fr,42 hello,en,777 hola,es,99 “' You should set 'vocab_delimiter=","', 'vocab_token_index=0', and 'vocab_freq_index=2'. See 'skip_gram_sample()' documentation for more details about the skip-gram sampling process.

Value

Raises

ValueError: If 'vocab_token_index' or 'vocab_freq_index' is less than 0 or exceeds the number of columns in 'vocab_freq_file'. If 'vocab_token_index' and 'vocab_freq_index' are both set to the same column. If any token in 'vocab_freq_file' has a negative frequency.

Version of TensorFlow SIG Addons

Description

Get the current version of TensorFlow SIG Addons

Usage

tfaddons_version()

Value

prints the version.

Tile batch

Description

Tile the batch dimension of a (possibly nested structure of) tensor(s)

Usage

tile_batch(t, multiplier, name = NULL)

Arguments

t

'Tensor' shaped '[batch_size, ...]'.

multiplier

Python int.

name

Name scope for any created operations.

Details

t. For each tensor t in a (possibly nested structure) of tensors, this function takes a tensor t shaped '[batch_size, s0, s1, ...]' composed of minibatch entries 't[0], ..., t[batch_size - 1]' and tiles it to have a shape '[batch_size * multiplier, s0, s1, ...]' composed of minibatch entries 't[0], t[0], ..., t[1], t[1], ...' where each minibatch entry is repeated 'multiplier' times.

Value

A (possibly nested structure of) 'Tensor' shaped '[batch_size * multiplier, ...]'.

Raises

ValueError: if tensor(s) 't' do not have a statically known rank or the rank is < 1.

Viterbi decode

Description

Decode the highest scoring sequence of tags outside of TensorFlow.

Usage

viterbi_decode(score, transition_params)

Arguments

score

A [seq_len, num_tags] matrix of unary potentials.

transition_params

A [num_tags, num_tags] matrix of binary potentials.

Details

This should only be used at test time.

Value

viterbi: A [seq_len] list of integers containing the highest scoring tag indices. viterbi_score: A float containing the score for the Viterbi sequence.