The goal of metaphonebr is to simplify brazilian names phonetically using a custom metaphoneBR algorithm that preserves ending vowels, created for aiding in dataset pairing in the absence of unambiguous keys.
The stable version of the package can be installed with:
install.packages("metaphonebr")You can install the development version of metaphonebr from GitHub with :
# install.packages("remotes")
remotes::install_github("ipeadata-lab/metaphonebr")This is a basic example which shows how to use the main function:
example_names <- c("João da Silva", "Maria", "Marya",
"Helena", "Elena", "Philippe", "Filipe", "Xavier", "Chavier")
phonetic_codes <- metaphonebr::metaphonebr(example_names)
print(data.frame(original = example_names, metaphonebr = phonetic_codes))metaphoneBR phonetic encoding algorithm proceeds as
follows:LH is replaced by 1 (representing a
palatal lateral approximant, like in “Filha” -> “FI1A”).NH is replaced by 3 (representing a
palatal nasal, like in “Manhã” -> “MA3A”).CH is replaced by X (representing the /ʃ/
sound, like in “Chico” -> “XICO”).SH is replaced by X (for foreign names
with /ʃ/ sound, like in “Shirley” -> “XIRLEY”).SCH is replaced by X (approximating /ʃ/ or
/sk/, like in “Schmidt” -> “XMIT”).PH is replaced by F (like in “Philip”
-> “FILIP”).SC followed by E or I becomes
S (like in “SCENA” -> “SENA”).SC followed by A, O, or
U becomes SK (like in “ESCOVA” ->
“ESKOVA”).QU or QÜ followed by E or
I becomes K (e.g., “QUEIJO” ->
“KEIJO”).GU or GÜ followed by E or
I becomes G (the U is silent,
e.g., “GUERRA” -> “GERRA”).QU becomes K (e.g., “QUANTO”
-> “KANTO”).Ç is replaced by S.C followed by E or I is
replaced by S (like in “CELSO” -> “SELSO”).C (not part of an already transformed digraph
like CH or SC) is replaced by K (like in “CARLOS” ->
“KARLOS”).G followed by E or I is
replaced by J (like in “GELO” -> “JELO”; GUE/GUI already
handled).Q (that wasn’t part of QU) is replaced by
K.W is replaced by V (common Brazilian
Portuguese pronunciation, e.g., “WALTER” -> “VALTER”).Y is replaced by I (e.g., “YARA” ->
“IARA”).Z is replaced by S (e.g., “ZEBRA” ->
“SEBRA”).X preceded by S has the X
removed (e.g., “EXCELENTE” -> “ESELENTE”, to avoid a double /s/
representation from SKS).N is replaced by M (e.g.,
“JOAQUIN” -> “JOAQUIM”).AO is replaced by OM (e.g.,
“JOÃO” -> “JOOM”).ÃES is replaced by AES (e.g.,
“MÃES” -> “MAES”).1 for LH or 3 for NH) are
reduced to a single letter (e.g., “CARRO” might become “CARO”, “LESSA”
becomes “LESA”. Note: This rule simplifies sounds like ‘RR’ and ‘SS’ to
their single counterparts, which is a common Metaphone-style
simplification).The resulting code is an attempt to represent the phonetic signature of the name in a simplified, standardized way for a Brazilian Portuguese context. In particular, by construction it preserves ending vowels since they imply generally gender information in Brazilian Names (ex.: ADRIANO and ADRIANA).

metaphonebr is developed by a team of researchers at Instituto de Pesquisa Econômica Aplicada (Ipea).