\documentclass[a4paper]{article} \usepackage[utf8]{inputenc} \usepackage{url} \newcommand{\strong}[1]{{\normalfont\fontseries{b}\selectfont #1}} \newcommand{\code}[1]{\mbox{\texttt{#1}}} \newcommand{\pkg}[1]{\strong{#1}} \newcommand{\proglang}[1]{\textsf{#1}} \newcommand{\acronym}[1]{\textsc{#1}} %% \VignetteIndexEntry{Introduction to the wordnet Package} \begin{document} <>= options(width = 75) @ \title{Introduction to the \pkg{wordnet} Package} \author{Ingo Feinerer} \maketitle \sloppy \begin{abstract} The \pkg{wordnet} package provides a \proglang{R} interface to the \proglang{WordNet} lexical database of English. \end{abstract} \section*{Introduction} The \pkg{wordnet} package provides a \proglang{R} via \proglang{Java} interface to the \proglang{WordNet}\footnote{\url{https://wordnet.princeton.edu/}} lexical database of English which is commonly used in linguistics and text mining. Internally \pkg{wordnet} uses \proglang{Jawbone}\footnote{\url{https://sites.google.com/site/mfwallace/jawbone}}, a \proglang{Java} \acronym{Api} to \proglang{WordNet}, to access the database. Thus, this package needs both a working \proglang{Java} installation, activated \proglang{Java} under \proglang{R} support, and a working \proglang{WordNet} installation. Since we simulate the behavior of \proglang{Jawbone}, its homepage is a valuable source of information for background information and details besides this vignette. \section*{Loading the Package} The package is loaded via <<>>= library("wordnet") @ \section*{Dictionary} A so-called \emph{dictionary} is the main structure for accessing the \proglang{WordNet} database. Before accessing the database the dictionary must be initialized with the path to the directory where the \proglang{WordNet} database has been installed (e.g., \code{/usr/local/WordNet-3.0/dict}). On start up the package searches environment variables (\code{WNHOME}) and default installation locations such that the \proglang{WordNet} installation is found automatically in most cases. On success the package stores internally a pointer to the \proglang{WordNet} dictionary which is used package wide by various functions. You can manually provide the path to the \proglang{WordNet} installation via \code{setDict()}. \section*{Filters} The package provides a set of filters in order to search for terms according to certain criteria. Available filter types can be listed via <<>>= getFilterTypes() @ A detailed description of available filters gives the \proglang{Jawbone} homepage. E.g., we want to search for words in the database which start with \code{car}. So we create the desired filter with \code{getTermFilter()}, and access the first five terms which are nouns via \code{getIndexTerms()}. So-called \emph{index term}s hold information on the word itself and related meanings (i.e., so-called \emph{synset}s). The function \code{getLemma()} extracts the word (so-called \emph{lemma} in \proglang{Jawbone} terminology). <>= filter <- getTermFilter("StartsWithFilter", "car", TRUE) terms <- getIndexTerms("NOUN", 5, filter) sapply(terms, getLemma) @ \begin{Soutput} [1] "car" "car-ferry" "car-mechanic" "car battery" [5] "car bomb" \end{Soutput} \section*{Synonyms} A very common usage is to find synonyms for a given term. Therefore, we provide the low-level function \code{getSynonyms()}. In this example we ask the database for the synonyms of the term \code{company}. <>= filter <- getTermFilter("ExactMatchFilter", "company", TRUE) terms <- getIndexTerms("NOUN", 1, filter) getSynonyms(terms[[1]]) @ \begin{Soutput} [1] "caller" "companionship" "company" "fellowship" [5] "party" "ship's company" "society" "troupe" \end{Soutput} In addition there is the high-level function \code{synonyms()} omitting special parameter settings. <>= synonyms("company", "NOUN") @ \begin{Soutput} [1] "caller" "companionship" "company" "fellowship" [5] "party" "ship's company" "society" "troupe" \end{Soutput} \section*{Related Synsets} Besides synonyms, synsets can provide information to related terms and synsets. Following code example finds the antonyms (i.e., opposite meaning) for the adjective \code{hot} in the database. <>= filter <- getTermFilter("ExactMatchFilter", "hot", TRUE) terms <- getIndexTerms("ADJECTIVE", 1, filter) synsets <- getSynsets(terms[[1]]) related <- getRelatedSynsets(synsets[[1]], "!") sapply(related, getWord) @ \begin{Soutput} [1] "cold" \end{Soutput} \end{document}