Troubleshooting

Troubleshooting

A common issue that comes up when running spiec.easi is coming up with an empty network after running StARS.

For example:

library(SpiecEasi)
data(amgut1.filt)

pargs <- list(seed=10010)
se3 <- spiec.easi(amgut1.filt, method='mb', lambda.min.ratio=5e-1, nlambda=10, pulsar.params=pargs)
getOptInd(se3)
# [1] 1
sum(getRefit(se3))/2
# [1] 139

As the warning indicates, the network stability could not be determined from the lambda path. Looking at the stability along the lambda path, se$select$stars$summary, we can see that the maximum value of the StARS summary statistic never crosses the default threshold (0.05).

This problem we can fix by lowering lambda.min.ratio to explore denser networks:

se4 <- spiec.easi(amgut1.filt, method='mb', lambda.min.ratio=1e-1, nlambda=10, pulsar.params=pargs)

We have now fit a network, but since we have only a rough, discrete sampling of networks along the lambda path, we should check how far we are from the target stability threshold (0.05):

getStability(se4)
# [1] 0.0003237095
sum(getRefit(se4))/2
# [1] 158

To get closer to the mark, we should bump up nlambda to more finely sample of the lambda path, which gives a denser network:

se5 <- spiec.easi(amgut1.filt, method='mb', lambda.min.ratio=1e-1, nlambda=100, pulsar.params=pargs)
getStability(se5)
# [1] 0.0003237095
sum(getRefit(se5))/2
# [1] 210

Common issues and solutions

1. Empty networks

Problem: After running spiec.easi, you get an empty network (no edges).

Solutions: - Lower lambda.min.ratio to explore denser networks - Increase nlambda for finer sampling of the lambda path - Check if your data has sufficient signal-to-noise ratio - Try different methods (‘mb’ vs ‘glasso’)

2. Very dense networks

Problem: The inferred network has too many edges.

Solutions: - Increase lambda.min.ratio to explore sparser networks - Adjust the StARS threshold in pulsar.params - Use cross-validation instead of StARS

3. Computational issues

Problem: The analysis takes too long or runs out of memory.

Solutions: - Use parallel processing with ncores parameter (Unix-like systems only) - Use B-StARS method for large datasets - Reduce rep.num in pulsar.params - Use batch mode for HPC clusters

4. Windows parallel processing issues

Problem: Error “‘mc.cores’ > 1 is not supported on Windows”

Solutions: - Use ncores=1 for serial processing on Windows - Use snow cluster for parallel processing on Windows:

library(parallel)
cl <- makeCluster(4, type = "SOCK")
pargs.windows <- list(rep.num=50, seed=10010, cluster=cl)
se.windows <- spiec.easi(data, method='mb', pulsar.params=pargs.windows)
stopCluster(cl)
  • Use batch mode which works on all platforms
  • Consider using WSL (Windows Subsystem for Linux) for Unix-like environment

5. Convergence issues

Problem: The algorithm doesn’t converge or gives warnings.

Solutions: - Check data preprocessing and normalization - Ensure data doesn’t have constant columns - Try different starting values - Check for missing or infinite values

6. Memory issues

Problem: R runs out of memory during analysis.

Solutions: - Use sparse matrices where possible - Reduce dataset size by filtering rare taxa - Use batch processing for large datasets - Increase system memory if available

Platform-specific considerations

Windows users:

  • Default parallel processing (mc.cores > 1) is not supported
  • Use ncores=1 for serial processing
  • Use snow cluster for parallel processing
  • Consider batch mode for large datasets

Unix-like systems (Linux, macOS):

  • Full support for parallel processing with mc.cores
  • Can use ncores parameter directly
  • Both multicore and snow clusters available

Diagnostic functions

SpiecEasi provides several functions to help diagnose issues:

# Check stability along lambda path
getStability(se)

# Get optimal lambda index
getOptInd(se)

# Get summary statistics
se$select$stars$summary

# Check network density
sum(getRefit(se))/2

# Visualize stability curve
plot(se$select$stars$summary)

# Check platform information
.Platform$OS.type

Parameter tuning guidelines

For small datasets (< 100 samples, < 50 taxa):

  • lambda.min.ratio = 1e-2
  • nlambda = 20-50
  • rep.num = 20-50

For medium datasets (100-1000 samples, 50-200 taxa):

  • lambda.min.ratio = 1e-3
  • nlambda = 50-100
  • rep.num = 50-100
  • Use parallel processing (Unix-like systems only)

For large datasets (> 1000 samples, > 200 taxa):

  • lambda.min.ratio = 1e-4
  • nlambda = 100+
  • rep.num = 100+
  • Use B-StARS method
  • Consider batch processing

Windows-specific recommendations:

  • Use ncores=1 for serial processing
  • Use snow cluster for parallel processing
  • Consider batch mode for large datasets
  • Use B-StARS method to reduce computational time

Session info:

sessionInfo()
# R version 4.5.1 (2025-06-13)
# Platform: x86_64-pc-linux-gnu
# Running under: Ubuntu 24.04.3 LTS
# 
# Matrix products: default
# BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
# LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
# 
# locale:
#  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
#  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
# [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
# 
# time zone: Etc/UTC
# tzcode source: system (glibc)
# 
# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods   base     
# 
# other attached packages:
# [1] phyloseq_1.55.0  igraph_2.2.1     Matrix_1.7-4     SpiecEasi_1.99.3
# [5] BiocStyle_2.39.0
# 
# loaded via a namespace (and not attached):
#  [1] gtable_0.3.6        shape_1.4.6.1       xfun_0.54          
#  [4] bslib_0.9.0         ggplot2_4.0.0       rhdf5_2.55.4       
#  [7] Biobase_2.71.0      lattice_0.22-7      vctrs_0.6.5        
# [10] rhdf5filters_1.23.0 tools_4.5.1         generics_0.1.4     
# [13] biomformat_1.39.0   stats4_4.5.1        parallel_4.5.1     
# [16] cluster_2.1.8.1     pkgconfig_2.0.3     huge_1.3.5         
# [19] data.table_1.17.8   RColorBrewer_1.1-3  S7_0.2.0           
# [22] S4Vectors_0.49.0    lifecycle_1.0.4     compiler_4.5.1     
# [25] farver_2.1.2        stringr_1.6.0       Biostrings_2.79.2  
# [28] Seqinfo_1.1.0       codetools_0.2-20    permute_0.9-8      
# [31] htmltools_0.5.8.1   sys_3.4.3           buildtools_1.0.0   
# [34] sass_0.4.10         yaml_2.3.10         glmnet_4.1-10      
# [37] crayon_1.5.3        jquerylib_0.1.4     MASS_7.3-65        
# [40] cachem_1.1.0        vegan_2.7-2         iterators_1.0.14   
# [43] foreach_1.5.2       nlme_3.1-168        digest_0.6.37      
# [46] stringi_1.8.7       reshape2_1.4.4      labeling_0.4.3     
# [49] maketools_1.3.2     splines_4.5.1       ade4_1.7-23        
# [52] fastmap_1.2.0       grid_4.5.1          cli_3.6.5          
# [55] magrittr_2.0.4      survival_3.8-3      ape_5.8-1          
# [58] withr_3.0.2         scales_1.4.0        rmarkdown_2.30     
# [61] XVector_0.51.0      multtest_2.67.0     pulsar_0.3.11      
# [64] VGAM_1.1-13         evaluate_1.0.5      knitr_1.50         
# [67] IRanges_2.45.0      mgcv_1.9-3          rlang_1.1.6        
# [70] Rcpp_1.1.0          glue_1.8.0          BiocManager_1.30.26
# [73] BiocGenerics_0.57.0 jsonlite_2.0.0      R6_2.6.1           
# [76] Rhdf5lib_1.33.0     plyr_1.8.9