Based on community detection to automatically classify the keywords, can utilize different algorithms for clustering. In this vignette, a benchmark is provided to show the difference for various algorithms on multiple sizes of networks.
First, we’ll load the needed packages.
Then, we prepare the needed data. The built-in data table
biblio_data_table would be used here.
Next, a combination of network size and community detection algorithms are designed to be tested:
100:300 -> topn_sample
ls("package:akc") %>% 
  str_extract("^group.+") %>% 
  na.omit() %>% 
  setdiff(c("group_biconnected_component",
            "group_components",
            "group_optimal")) -> com_detect_fun_listFinally, we’ll implement the computation and record the results.
all = tibble()
for(i in com_detect_fun_list){
    for(j in topn_sample){
      system.time({
        clean_data %>% 
          keyword_group(top = j,com_detect_fun = get(i)) %>% 
          as_tibble -> grouped_network_table
      }) %>% na.omit-> time_info
      grouped_network_table %>% nrow -> node_no
      grouped_network_table %>% distinct(group) %>% nrow -> group_no
      grouped_network_table %>% 
        count(group) %>% 
        summarise(mean(n)) %>% 
        .[[1]] -> group_avg_node_no
      grouped_network_table %>% 
        count(group) %>% 
        summarise(sd(n)) %>% 
        .[[1]] -> group_sd_node_no
      c(com_detect_fun = i, 
        topn = j,
        node_no = node_no,group_no = group_no,
        avg = group_avg_node_no,
        sd = group_sd_node_no,time_info[1:3]) %>% 
        bind_rows(all,.) -> all
    }
}
res = all %>% 
  mutate_at(2:9,function(x) as.numeric(x) %>% round(2)) %>% 
  distinct(com_detect_fun,node_no,.keep_all = T) %>% 
  select(-topn,-contains("self")) %>% 
  setNames(c("com_detect_fun","No. of total nodes","No. of total groups",
             "Average node number in each group","Standard deviation of node number",
             "Computer running time for keyword_group function")) The results are displayed in the following table.
| com_detect_fun | No. of total nodes | No. of total groups | Average node number in each group | Standard deviation of node number | Computer running time for keyword_group function | 
|---|---|---|---|---|---|
| group_edge_betweenness | 103 | 36 | 2.86 | 9.17 | 0.50 | 
| group_edge_betweenness | 207 | 68 | 3.04 | 12.53 | 2.98 | 
| group_edge_betweenness | 326 | 89 | 3.66 | 13.12 | 10.03 | 
| group_fast_greedy | 103 | 5 | 20.60 | 8.17 | 0.17 | 
| group_fast_greedy | 207 | 5 | 41.40 | 24.36 | 0.18 | 
| group_fast_greedy | 326 | 6 | 54.33 | 34.77 | 0.19 | 
| group_infomap | 103 | 1 | 103.00 | NA | 0.17 | 
| group_infomap | 207 | 4 | 51.75 | 94.83 | 0.22 | 
| group_infomap | 326 | 6 | 54.33 | 114.98 | 0.34 | 
| group_label_prop | 103 | 1 | 103.00 | NA | 0.16 | 
| group_label_prop | 207 | 1 | 207.00 | NA | 0.17 | 
| group_label_prop | 326 | 1 | 326.00 | NA | 0.18 | 
| group_leading_eigen | 103 | 4 | 25.75 | 9.57 | 0.17 | 
| group_leading_eigen | 207 | 5 | 41.40 | 19.19 | 0.18 | 
| group_leading_eigen | 326 | 7 | 46.57 | 35.15 | 0.22 | 
| group_louvain | 103 | 5 | 20.60 | 12.14 | 0.16 | 
| group_louvain | 207 | 8 | 25.88 | 14.11 | 0.17 | 
| group_louvain | 326 | 9 | 36.22 | 19.08 | 0.18 | 
| group_spinglass | 103 | 5 | 20.60 | 5.13 | 1.66 | 
| group_spinglass | 207 | 8 | 25.88 | 13.38 | 4.04 | 
| group_spinglass | 326 | 8 | 40.75 | 12.07 | 7.30 | 
| group_walktrap | 103 | 103 | 1.00 | 0.00 | 0.16 | 
| group_walktrap | 207 | 207 | 1.00 | 0.00 | 0.17 | 
| group_walktrap | 326 | 326 | 1.00 | 0.00 | 0.17 | 
The session information is displayed as below:
sessionInfo()
#> R version 4.4.2 (2024-10-31 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 11 x64 (build 26100)
#> 
#> Matrix products: default
#> 
#> 
#> locale:
#> [1] LC_COLLATE=C                               
#> [2] LC_CTYPE=Chinese (Simplified)_China.utf8   
#> [3] LC_MONETARY=Chinese (Simplified)_China.utf8
#> [4] LC_NUMERIC=C                               
#> [5] LC_TIME=Chinese (Simplified)_China.utf8    
#> 
#> time zone: Asia/Shanghai
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.37     R6_2.5.1          fastmap_1.2.0     xfun_0.49        
#>  [5] cachem_1.1.0      knitr_1.49        htmltools_0.5.8.1 rmarkdown_2.29   
#>  [9] lifecycle_1.0.4   cli_3.6.3         sass_0.4.9        jquerylib_0.1.4  
#> [13] compiler_4.4.2    rstudioapi_0.17.1 tools_4.4.2       evaluate_1.0.1   
#> [17] bslib_0.8.0       yaml_2.3.10       rlang_1.1.4       jsonlite_1.8.9