Polyclonal selection is a breeding strategy developed for ancient grapevine varieties, but of more widespread interest. It seeks to select a group of different genotypes that collectively meet predefined desirable criteria for quantitative traits. Polyclonal selection, as opposed to the selection of a single genotype, helps to preserve genetic diversity and provides greater environmental stability, while still securing high genetic gains.
This approach relies on the predictors of the genetic effects obtained from the fitting of a mixed model. Rather than focusing on individual genotypes, polyclonal selection considers the group as the unit of selection, with the goal of maximizing the overall predicted genetic gain across multiple traits of interest. Identifying such a superior group is an optimization problem, in which the objective is to select a subset of genotypes that jointly maximize the expected genetic value, subject to constraints which satisfy agronomic and enological interests.
Most existing methods for genetic selection based on several traits, such as selection indices, are designed to maximize the genetic gain of selection of a single genotype. When the objective is to select a group of genotypes, rather than a single genotype, the conventional approach has been to simply choose the top-ranking genotypes. However, this strategy does not ensure that the selected group, as a whole, satisfies predefined breeding objectives when multiple traits are considered simultaneously.
This package implements a novel Integer Programming-based method developed by Surgy et al. (2025), designed specifically for group-based selection under multiple trait criteria. While the methodology is demonstrated using data from grapevine breeding trials, it is broadly applicable across different species and breeding programs, especially in scenarios where group performance is more relevant than individual merit alone.
For a complete description of the methodology, please refer to:
Surgy, S., Cadima, J. & Gonçalves, E. Integer programming as a powerful tool for polyclonal selection in ancient grapevine varieties. Theor Appl Genet 138, 122 (2025). https://doi.org/10.1007/s00122-025-04885-0
Each function in the package includes an example that reproduces scenarios described in the original publication, thereby facilitating both practical understanding and a direct application of the method. This document provides a more detailed explanation of the outputs generated by the various functions and offers guidance on how to interpret them effectively.
The package includes a real dataset - Gouveio - to ensure the reproducibility of the examples and to allow users to explore additional scenarios. This dataset contains Empirical Best Linear Unbiased Predictors (EBLUPs) of genotypic effects, obtained from the fitting of a linear mixed model to phenotypic data from a selection field trial involving 150 genotypes of the grapevine variety Gouveio, established according to a resolvable row-column design.
head(Gouveio)
#>   Clone           yd         pa         ta         ph          bw
#> 1 GV001 -0.005005668 0.23031296 -0.3811156 0.12110451 -0.02831195
#> 2 GV002 -0.486152063 0.41496698  0.0265791 0.02831156  0.12689991
#> 3 GV003 -0.402470892 0.03527182 -0.2086296 0.07848777 -0.08730846
#> 4 GV004 -0.316439937 0.15302600 -0.3562443 0.11582836  0.03532205
#> 5 GV005  0.341634591 0.42183408 -0.1215062 0.05332889  0.03196529
#> 6 GV006  0.575303862 0.62634443 -0.1553684 0.08762044  0.04216359The traits evaluated were:
The selection criteria are defined by the breeder. In this example, they are: to increase yield, potential alcohol, and total acidity, while decreasing pH and berry weight.
Below is a table describing the common arguments used across the functions in this package:
| Argument | Description | Used in functions | 
|---|---|---|
| traits | A vector with the names of the traits to be optimized, i.e., those included in the objective function | All | 
| ref | Name of the reference column (e.g., genotype ID). Defaults to the first column. | All | 
| clmin | Integer, indicating the minimum group size (default is 1). | All | 
| clmax | Integer, indicating the maximum group size. If omitted, equal to clmin | All | 
| dmg | Can be of one of two types. In the standard form, it will be a
dataframe with three columns defining constraints: trait names,
constraint relations ( ">=","<="), and
right-hand side values. Alternatively, it can be given the text
valuedmg = "base"which sets the right-hand side
of the constraints of all traits to zero. | polyclonal() | 
| constraints | Named numeric vector with traits to which constraints apply. If
omitted, all except refare used. | rmaxa() | 
| meanvec | Named numeric vector of the means for each trait. If omitted, data are assumed to be normalized (divided) by the mean. | All | 
| criteria | Named numeric vector with the criterion for each trait: 1 to increase, -1 to decrease. If omitted, all traits are assumed to be increasing. | All | 
| data | A dataframe with the predictors of genotypic effects for the several traits. Rows are genotypes; columns are traits. | All | 
If the gain for a group of a certain size is not displayed, it means that it was not possible to satisfy all the constraints for that group size. It is possible that constraints are not feasible for some group sizes but are feasible for others.
The genotypes selected in a group of a given size may differ substantially from those selected in a group of another size. Changing any condition — including the group size — results in a new optimisation problem.
The dataframe used for the dmg argument can have any
column names, but the order must be respected: first the trait, then the
relation (e.g., “>=”), and finally the right-hand side
value, that is, the value for the minimum desired gain in percentage of
the overall mean of the trait. It should be noted that positive values
are always understood as improvements, even for a criterion
value of -1. Thus, >= 3 with criterion value -1 is understood as a
decrease of at least 3% Consequently, minimum desired gains should be
defined consistently, regardless of the selection criterion
used.
If a loss in a trait is admissible, the maximum admissible loss
must be indicated as a negative number on the right-hand side of the
dmg argument. On the other hand, if the goal is to
establish an upper bound on the increase of the gain of a trait, the
relation in the dmg argument must be specified as
“<=”.
If no group of any requested size satisfies the specified conditions, the message ‘No possible solution!’ is returned.
polyclonal()This is the main function of this package.
In the example for the polyclonal() function, the
objective is to maximize the genetic gain of groups with sizes ranging
from 7 to 20 genotypes, while enforcing desired minimum gains for the
five traits (expressed as percentages of the overall mean of each trait
in the field trial): 20% for yield (yd), 3% for potential alcohol (pa),
3% for total acidity (ta), 1% for pH, and 2% for berry weight (bw).
# Named numeric vector with the trait means
mymeanvec <- c(yd = 3.517, pa = 12.760, ta = 4.495, ph = 3.927, bw = 1.653)
# Vector with the traits to be optimized
mytraits <- c("yd", "pa",  "ta", "ph", "bw")
# Named numeric vector with the selection criteria
mycriteria <- c(yd = 1, pa = 1, ta = 1, ph = -1, bw = -1)
# Name of the reference column with the genotype labels
myref = "Clone"If meanvec is omitted, it is assumed that all trait
values have already been standardized by dividing by their respective
overall means. If criteria is omitted, the default
assumption is that an increase is desired for all traits. If
ref is omitted, the first column in the dataset is assumed
to contain the genotype labels.
In the polyclonal() function the constraints are
indicated by argument dmg (desired minimum gains). The
standard form of this argument is a data frame with three columns: the
first specifying the traits to be constrained; the second with the
constraint relation; the second with the relation of the constraint
(“>=” or “<=”); and the third with the right-hand side
values of the constraints. Column names are arbitrary, but their order
must be preserved.
# dataframe with the desired minimum gains
mydmg <- data.frame(
  lhs = c("yd", "pa", "ta", "ph", "bw"),
  rel = c(">=", ">=", ">=", ">=", ">="),
  rhs = c(20, 3, 3, 1, 2)
  )Note that the positive values of 1 and 2 for pH (ph) and berry weight (bw), which have selection criteria -1, mean a decrease of at least 1% in pH and a decrease of at least 3% in berry weight.
Now, the polyclonal function is called to perform the
selection based on the specified constraints:
# Using polyclonal() function
with_dmg <- polyclonal(
  traits = mytraits,
  clmin = 7,
  clmax = 20,
  dmg = mydmg,
  meanvec = mymeanvec,
  criteria = mycriteria,
  data = Gouveio
  )
#> No possible solution was found for some of the requested group sizes.Groups ranging from 7 to 20 genotypes were requested, as these cover
the most appropriate group sizes for practical applications of the
polyclonal methodology for grapevine selection. However, other group
sizes can also be specified. In fact, it is possible to request a single
group size by assigning the same value to both clmin and
clmax or by omitting clmax. Analysing the
output reveals how the imposed constraints influence the selection
results.
# Results
with_dmg
#> Predicted genetic gains as a % of the overall mean
#> 
#> $gain 
#>  Group.Size       yd       pa       ta       ph       bw
#>          10 21.35475 3.009712 3.321584 1.062815 2.314695
#>           8 20.97573 3.029873 4.004230 1.002937 2.130498
#>           7 24.20511 3.057488 3.278892 1.010023 2.255069
#> 
#> 
#> Selected genotypes (per group size)
#> 
#> $selected
#>     10     8     7
#>  GV053 GV060 GV038
#>  GV065 GV065 GV081
#>  GV080 GV088 GV088
#>  GV081 GV089 GV089
#>  GV088 GV095 GV127
#>  GV089 GV097 GV144
#>  GV095 GV144 GV146
#>  GV097 GV146      
#>  GV144            
#>  GV146The output displays the genetic gain achieved for each trait across the different group sizes. All values are expressed as percentages of the trait mean. For example, in the 10-genotype group, the gains obtained were approximately 21.4%, 3.0%, 3.3%, 1.1% and 2.3% for yield (yd), potential alcohol (pa), total acidity (ta), pH (ph), and berry weight (bw), respectively. Positive values indicate gains in accordance with the selection criteria.
In this case, although group sizes between 7 and 20 genotypes were
requested, the object gain only provides results for groups
of 7, 8, and 10 genotypes. This indicates that the desired minimum gains
could not be achieved for other group sizes within the specified
range.
Examining the lists of genotypes within each group reveals that genotype groups of different sizes are not necessarily nested. For example, genotype GV060 is not included in the group of 7 genotypes, appears in the group of 8 genotypes, and is then excluded from the group of 10 genotypes.
The summary() method can be used with
polyclonal() object to retrieve a summary of the initial
conditions and results.
# Summary results
summary(with_dmg)
#> Summary of Selection Results
#> -----------------------------------
#> 
#> Number of groups selected: 3 
#> Group sizes: 10, 8, 7 
#> 
#> Overview
#>    Crit   Mean DMG   MaxGain MaxGroup
#> yd    1  3.517  20 24.205112        7
#> pa    1 12.760   3  3.057488        7
#> ta    1  4.495   3  4.004230        8
#> ph   -1  3.927   1  1.062815       10
#> bw   -1  1.653   2  2.314695       10The summary() indicates how many groups were achieved
for those conditions and the related dimensions. In the table, the first
column gives the trait names, followed by the means, the criteria and
the desired minimum gain (DMG). The Last two columns, “MaxGain” and
“MaxGroup” show the maximum gain achieved for each trait and the group
size where that gain occurs, respectively.
The so-called base situation, as described in the original publication of the method (Surgy et al., 2025), consists in maximizing the predicted genetic gain of the target traits while ensuring that no trait decreases. To implement this, the right-hand side of all constraints is set to zero.
To simplify the computation of this scenario, the argument
dmg can be set to "base". In that case, all
traits present in the dataset are included as constraints, each required
to be greater than or equal to zero, except for the reference trait
specified in ref. If ref is omitted, the first
column of the dataset is assumed to correspond to the reference
trait.
It is important to note that when using the “base” mode, the user
must provide values for both meanvec and
criteria for all traits in the dataset. If the
user wishes to exclude any traits from the constraint set, the full
specification of dmg must be provided instead.
The following example illustrates the application of the base
situation. Groups of 7 to 12 clones are selected under these conditions.
The same values for traits, meanvec, and
criteria are used throughout.
# Named numeric vector with the trait means
mymeanvec <- c(yd = 3.517, pa = 12.760, ta = 4.495, ph = 3.927, bw = 1.653)
# Vector with the traits to be optimized
mytraits <- c("yd", "pa","ta", "ph", "bw")
# Named numeric vector with the selection criteria
mycriteria <- c(yd = 1, pa = 1, ta = 1, ph = -1, bw = -1)
base_sit <- polyclonal(
  traits = mytraits,
  clmin = 7,
  clmax = 12,
  dmg = "base",
  meanvec = mymeanvec,
  criteria = mycriteria,
  data = Gouveio
  )Unlike what happens with a selection index, where increasing the group size typically involves the sequential inclusion of genotypes according to a fixed ranking, the method implemented here allows each group size to be independently optimized. As a result, groups of different sizes are not necessarily nested.
base_sit
#> Predicted genetic gains as a % of the overall mean
#> 
#> $gain 
#>  Group.Size       yd          pa       ta       ph       bw
#>          12 37.93654 0.087693734 3.145778 1.755908 3.933390
#>          11 39.23907 0.029947255 3.244531 1.886373 3.506967
#>          10 38.30817 0.148095205 3.356216 1.864944 4.434100
#>           9 40.16918 0.004116393 3.103095 1.857812 3.837374
#>           8 39.12182 0.148572474 3.225021 1.827456 5.037591
#>           7 40.82980 0.069781809 2.568590 1.646327 5.396038
#> 
#> 
#> Selected genotypes (per group size)
#> 
#> $selected
#>     12    11    10     9     8     7
#>  GV079 GV080 GV080 GV080 GV080 GV080
#>  GV080 GV088 GV088 GV088 GV088 GV088
#>  GV088 GV089 GV089 GV089 GV089 GV095
#>  GV089 GV093 GV095 GV093 GV095 GV127
#>  GV093 GV095 GV127 GV095 GV127 GV132
#>  GV094 GV127 GV132 GV127 GV137 GV144
#>  GV095 GV132 GV137 GV137 GV144 GV145
#>  GV127 GV137 GV144 GV144 GV145      
#>  GV137 GV144 GV145 GV145            
#>  GV144 GV145 GV146                  
#>  GV145 GV146                        
#>  GV146For example, consider the groups of 7 and 8 genotypes. In the 8-genotype group, not just one but two genotypes differ from the groups of 7: genotypes GV089 and GV137. On the other hand, genotype GV132, which is included in the group of 7 genotypes, is not present in the group of 8. This illustrates how the inclusion of constraints can lead to changes in group composition, rather than simply adding the next best-ranked individual.
In the base situation, summary() show all DMG
constraints as 0.
summary(base_sit)
#> Summary of Selection Results
#> -----------------------------------
#> 
#> Number of groups selected: 6 
#> Group sizes: 12, 11, 10, 9, 8, 7 
#> 
#> Overview
#>    Crit   Mean DMG    MaxGain MaxGroup
#> yd    1  3.517   0 40.8297959        7
#> pa    1 12.760   0  0.1485725        8
#> ta    1  4.495   0  3.3562158       10
#> ph   -1  3.927   0  1.8863728       11
#> bw   -1  1.653   0  5.3960377        7rmaxp()The package includes a function rmaxp(), which computes
the maximum possible gain for each trait. The maximum possible gain
corresponds to the value obtained by selecting the top genotypes for
each trait individually, without applying any constraints. The function
allows the user to request the gain for one or multiple traits
simultaneously. The output displays one column per trait; however, the
values in different columns are not linked to the same solution and
should be interpreted independently.
In the following example, it is asked the maximum possible gain for
yd and pa for group sizes from 9 to 20
genotypes.
# Named numeric vector with the trait means
mymeanvec <- c(yd = 3.517, pa = 12.760)
# Vector with the traits to be optimized
mytraits <- c("yd", "pa")The vector specifying the selection criteria can be omitted, as the default value of 1 is used for both traits. With this function, no constraints are applied.
# Using rmaxp()
max_pos_gain <- rmaxp(
  traits = mytraits,
  clmin = 9, 
  clmax = 20,
  meanvec = mymeanvec,
  data = Gouveio
  )
# Results
max_pos_gain
#> Maximum possible gains for each trait correspond to independently selected groups;
#> thus, gains are not directly comparable and should be interpreted separately.
#> See 'selected' for group details
#> 
#> Predicted genetic gains as a % of the overall mean
#> 
#> $gain
#>  Group.Size       yd       pa
#>          20 39.45423 5.915127
#>          19 40.21696 5.968099
#>          18 40.89337 6.024443
#>          17 41.62513 6.078201
#>          16 42.41884 6.135137
#>          15 43.13445 6.195106
#>          14 43.90097 6.256910
#>          13 44.58007 6.305906
#>          12 45.30719 6.359933
#>          11 45.96563 6.419047
#>          10 46.63716 6.483659
#>           9 47.43910 6.559512
#> 
#> 
#> Selected genotypes (per group)
#> 
#> $selected
#>     yd    pa Entry
#>  GV084 GV050    9 
#>  GV088 GV063    9 
#>  GV092 GV065    9 
#>  GV093 GV069    9 
#>  GV095 GV074    9 
#>  GV132 GV085    9 
#>  GV138 GV127    9 
#>  GV143 GV135    9 
#>  GV145 GV141    9 
#>  GV137 GV144    10
#>  GV079 GV133    11
#>  GV070 GV125    12
#>  GV080 GV142    13
#>  GV089 GV055    14
#>  GV091 GV053    15
#>  GV094 GV082    16
#>  GV090 GV068    17
#>  GV078 GV103    18
#>  GV040 GV073    19
#>  GV131 GV006    20As in previous outputs, one column per trait is shown. However, in this case, the gain reported in each column refers to an independent solution. For example, the group of 20 genotypes achieving a yield gain of 39.5% is not composed of the same genotypes as the group achieving a potential alcohol gain of 5.9%. This can be verified by examining the output stored in the selected object.
Since no constraints are applied in this scenario, the output represents a ranking of genotypes for each trait. Increasing the group size simply adds the next genotype in the ranking. Thus, each column corresponds to the top genotypes for a given trait. The final column, Entry, indicates the smallest group size at which each genotype is first included. It means that, for example, for yield, the group size of 9 is composed by genotypes from GV084 to GV145. Genotype GV137 appears for the first time in yield when the group size reaches 10 genotypes. When the group size increases to 11, all the previous 10 genotypes remain in the group and genotype GV079 is added, and so on.
No customized summary() is provided, given that the
conditions do not vary and no constraints are applied.
rmaxa()rmaxa(traits, ref = NULL, clmin = 2, clmax, constraints = NULL, meanvec = NULL, criteria = NULL, data)The package includes a function rmaxa(), which computes
the maximum admissible gains. The maximum admissible gain corresponds to
the value obtained by maximizing one trait while avoiding losses in all
others, enforced through constraints—predefined in this case as
>= 0. As with rmaxp(), the user may request
gains for one or multiple traits simultaneously. The output displays one
column per trait; however, the values in different columns correspond to
independent solutions and should be interpreted separately. This
distinguishes the output of the gain object from this function from that
of the polyclonal() function with
dmg = "base", where the sum of the values related to all
traits are optimized.
# Named numeric vector with the trait means
mymeanvec <- c(yd = 3.517, pa = 12.760, ta = 4.495, ph = 3.927, bw = 1.653)
# Vector with the traits to be optimized
mytraits <- c("yd", "pa")
# Named numeric vector with the selection criteria
mycriteria <- c(yd = 1, pa = 1, ta = 1, ph = -1, bw = -1)In this function, if the traits to which the constraints apply are
not specified, all the traits are assumed to be greater or equal than
zero. If this vector is omitted, all columns in the dataset are
considered as constrained traits, except the one indicated in
ref or the first column if ref is omitted. The
criteria as well as the meanvec arguments must
indicate the criteria for all the traits involved in both maximization
and the constraints.
This example requests groups with 12 to 20 genotypes.
# Using rmaxa()
max_adm_gain <- rmaxa(
  traits = mytraits,
  clmin = 12,
  clmax = 20,
  meanvec = mymeanvec,
  data = Gouveio
  )
# Results
max_adm_gain
#> Maximum admissible gains for each trait correspond to independently selected groups;
#> thus, gains are not directly comparable and should be interpreted separately.
#> See 'selected' for group details
#> 
#> Predicted genetic gains as a % of the overall mean
#> 
#> $gain 
#>  Group.Size       yd       pa
#>          20 26.21528 5.659511
#>          19 26.63661 5.696647
#>          18 27.05655 5.755160
#>          17 27.44606 5.802291
#>          16 27.74894 5.833466
#>          15 28.17461 5.862092
#>          14 28.50718 5.905133
#>          13 28.85517 5.937964
#>          12 29.71520 6.011691
#> 
#> 
#> $ selected_yd 
#>     20    19    18    17    16    15    14    13    12
#>  GV005 GV006 GV006 GV006 GV006 GV006 GV006 GV006 GV006
#>  GV006 GV007 GV007 GV025 GV040 GV040 GV040 GV025 GV040
#>  GV007 GV039 GV040 GV040 GV049 GV049 GV049 GV040 GV049
#>  GV025 GV040 GV049 GV049 GV067 GV067 GV067 GV049 GV067
#>  GV040 GV049 GV067 GV067 GV075 GV088 GV079 GV067 GV079
#>  GV049 GV067 GV079 GV079 GV088 GV090 GV088 GV088 GV088
#>  GV067 GV079 GV088 GV088 GV090 GV093 GV090 GV090 GV093
#>  GV079 GV080 GV090 GV090 GV093 GV095 GV093 GV093 GV095
#>  GV080 GV088 GV093 GV093 GV095 GV112 GV095 GV095 GV127
#>  GV088 GV090 GV095 GV095 GV112 GV127 GV112 GV112 GV138
#>  GV090 GV093 GV112 GV112 GV127 GV132 GV127 GV127 GV140
#>  GV093 GV095 GV127 GV127 GV138 GV138 GV132 GV140 GV148
#>  GV095 GV112 GV138 GV140 GV140 GV140 GV140 GV145      
#>  GV112 GV127 GV140 GV144 GV145 GV146 GV148            
#>  GV127 GV138 GV144 GV145 GV146 GV148                  
#>  GV138 GV140 GV145 GV146 GV148                        
#>  GV140 GV145 GV146 GV148                              
#>  GV145 GV146 GV148                                    
#>  GV146 GV148                                          
#>  GV148                                                
#> 
#> $ selected_pa 
#>     20    19    18    17    16    15    14    13    12
#>  GV006 GV006 GV006 GV006 GV006 GV006 GV006 GV006 GV050
#>  GV050 GV050 GV050 GV050 GV050 GV050 GV050 GV053 GV053
#>  GV053 GV053 GV053 GV053 GV053 GV053 GV053 GV065 GV065
#>  GV060 GV060 GV060 GV060 GV065 GV065 GV065 GV069 GV069
#>  GV063 GV063 GV065 GV065 GV069 GV069 GV069 GV081 GV081
#>  GV065 GV065 GV069 GV069 GV073 GV081 GV074 GV085 GV125
#>  GV069 GV069 GV073 GV074 GV074 GV085 GV081 GV125 GV127
#>  GV073 GV074 GV074 GV081 GV081 GV097 GV097 GV127 GV133
#>  GV074 GV081 GV081 GV085 GV097 GV125 GV127 GV133 GV135
#>  GV081 GV085 GV085 GV097 GV125 GV127 GV133 GV135 GV141
#>  GV085 GV097 GV097 GV125 GV127 GV133 GV135 GV141 GV144
#>  GV097 GV125 GV125 GV127 GV133 GV135 GV141 GV144 GV146
#>  GV125 GV127 GV127 GV133 GV135 GV141 GV144 GV146      
#>  GV127 GV133 GV133 GV135 GV141 GV144 GV146            
#>  GV133 GV135 GV135 GV141 GV144 GV146                  
#>  GV135 GV140 GV141 GV144 GV146                        
#>  GV140 GV141 GV144 GV146                              
#>  GV141 GV144 GV146                                    
#>  GV144 GV146                                          
#>  GV146As in the rmaxp() gain output, one column per trait is
shown; however, the gain reported in each column corresponds to an
independent solution. For example, the group of 20 genotypes achieving a
gain of 26.2% for yd is not composed of the same genotypes as
the group achieving a gain of 5.7% for pa. Furthermore, the
gains obtained with rmaxa() for any trait are always less
than or equal to those obtained with rmaxp(), since
rmaxa() incorporates constraints during optimization.
Regarding the selected object, due to the constraints
applied in this function, the genotypes composing one group differ from
those in other groups. Consequently, there is one selected object per
trait, named as selected_<trait>.
This package offers a ready-to-use implementation of the integer programming method for maximizing the predicted genetic gains in polyclonal selection developed by Surgy et al. (2025). It allowing breeders to efficiently and consistently perform group-based multi-trait selection, which supports a balanced genetic gain across multiple traits in breeding programs. A dataset and examples are provided to assist in understanding and reproducing results.
Surgy, S., Cadima, J. & Gonçalves, E., 2025. Integer programming as a powerful tool for polyclonal selection in ancient grapevine varieties. Theor Appl Genet 138, 122. https://doi.org/10.1007/s00122-025-04885-0