xmlSource                package:XML                R Documentation

_S_o_u_r_c_e _t_h_e _R _c_o_d_e, _e_x_a_m_p_l_e_s, _e_t_c. _f_r_o_m _a_n _X_M_L _d_o_c_u_m_e_n_t

_D_e_s_c_r_i_p_t_i_o_n:

     This is the equivalent of a smart 'source' for extracting the R
     code elements from an XML document and evaluating them. This
     allows for a simple way to collect R functions definitions or a
     sequence of (annotated) R code segments in an XML document along
     with other material such as notes, documentation, data, FAQ
     entries, etc.,  and still  be able to access the R code directly
     from within an R session. The approach enables one to use the XML
     document as a container for a heterogeneous collection of related
     material, some of which is R code. In the literate programming
     parlance, this function essentially dynamically "tangles" the
     document within R, but can work on small subsets of it that are
     easily specified in the 'xmlSource' function call. This is a
     convenient way to annotate code in a rich way and work with source
     files in a new and potentially more effective manner.

     This style of authoring code supports mixed language support in
     which we put, for example, C and R code together in the same
     document. Indeed, one can use the document to store arbitrary
     content and still retrieve the R code.  The more structure there
     is, the easier it is to create tools to extract that information
     using XPath expressions.

     We can identify individual 'r:code' nodes in the document to
     process, i.e. evaluate. We do this using their 'id' attribute and
     specifying which to process via the 'ids' argument. Alternatively,
     if a document has a node 'r:codeIds' as a child of the top-level
     node (or within an invisible node), we read its contents as  a
     sequence of line separated 'id' values as if they had been
     specified via the argument 'ids' to this function.

     We can also use XSL to extract the code. See 'getCode.xsl' in the
     Omegahat XSL collection.

     This particular version (as opposed to other implementations) uses
     XPath to conveniently find the nodes of interest.

_U_s_a_g_e:

     xmlSource(url, ...,
               envir = globalenv(),
               xpath = character(),
               ids = character(),
               omit = character(),
               ask = FALSE,
               example = NA,
               fatal = TRUE, verbose = FALSE, echo = verbose, print = echo,
               xnodes = c("//r:function", 
                          "//r:init[not(@eval='false')]",
                          "//r:code[not(@eval='false')]",
                          "//r:plot[not(@eval='false')]"),
               namespaces = DefaultXPathNamespaces, section = character(),
               eval = TRUE)

_A_r_g_u_m_e_n_t_s:

     url: the name of the file, URL  containing the XML document, or an
          XML string. This is passed to 'xmlTreeParse' which is called
          with 'useInternalNodes = TRUE'. 

     ...: additional arguments passed to 'xmlTreeParse'

   envir: the environment in which the code elements of the XML
          document are to be evaluated. By default, they are evaluated
          in the global environment so that assignments take place
          there. 

   xpath: a string giving an XPath expression which is used after
          parsing the document to filter the document to a particular
          subset of nodes.  This allows one to restrict the evaluation
          to a subset of the original document. One can do this
          directly by parsing the XML document, applying the XPath
          query and then passing the resulting node set to this
          'xmlSource' function's appropriate method.  This argument
          merely allows for a more convenient form of those steps,
          collapsing it into one action. 

     ids: a character vector.  XML nodes containing R code (e.g.
          'r:code', 'r:init', 'r:function', 'r:plot') can have an id
          attribute. This vector allows the caller to specify the
          subset of these nodes to process, i.e. whose code will be
          evaluated. The order is currently not important. It may be
          used in the future to specify the order in which the nodes
          are evaluated.

          If this is not specified and the document has a node
          'r:codeIds' as an immediate child of the top-most node, the
          contents of this node or contained within an 'invisible' node
          (so that it doesn't have to be filtered when rendering the
          document), the names of the r:code id values to process are
          taken as the individual lines from the body of this node. 

    omit: a character vector. The values of the id attributes of the
          nodes that we want to skip or omit from the evaluation. This
          allows us to specify the set that we don't want evaluated, in
          contrast to the 'ids' argument. The order is not important. 

     ask: logical

 example: a character or numeric vector specifying the values of the id
          attributes of any 'r:example' nodes in the document. A single
          document may contain numerous, separate examples and these
          can be marked uniquely using an 'id' attribute, e.g.
          '<r:example id='''.  This argument allows the caller to
          specify which example (or examples) to run. If this is not
          specified by the caller and there are r:example nodes in the
          document, the user is prompted to select an example via a
          (text-based) menu. If a character vector is given by the
          caller, we use partial matching against the collection of
          'id' attributes of the r:example nodes to identify the
          examples of interest. Alternatively, one can specify the
          example(s) to run by number. 

   fatal: (currently unused) a logical value. The idea is to control
          how we handle errors when evaluating individual code
          segments.  We could recover from errors and continue
          processing subsequent nodes.

 verbose: a logical value. If 'TRUE', information about what code
          segments are being evaluated is displayed on the console.
          'echo' controls whether code is displayed, but this controls
          whether additional informatin is also displayed. See
          'source'. 

  xnodes: a character vector.  This is a collection of xpath
          expressions given as individual strings which find the nodes
          whose contents we evaluate. 

    echo: a logical value indicating whether to display the code before
          it is evaluated.

namespaces: a named character vector (i.e. name = value pairs of
          strings) giving the prefix - URI pairings for the namespaces
          used in the XPath expressions. The URIs must match those in
          the document, but the prefixes are local to the XPath
          expression. The default provides mappings for the prefixes
          "r", "omg", "perl", "py", and so on. See
          'XML:::DefaultXPathNamespaces'. 

 section: a vector of numbers or  strings.  This allows the caller to 
          specify that the function should only look for R-related 
          nodes within the specified section(s). This is useful for
          being able to easily  process only the code in a particular
          subset of the document identified by a DocBook 'section'
          node.  A string value is used to match  the 'id' attribute of
          the 'section' node. A number (assumed to be an integer) is
          used to index the set of  'section' nodes. These amount to
          XPath expressions of the form '//section[number]' and
          '//section[@id = string]'. 

   print: a logical value indicating whether to print the results

    eval: a logical value indicating whether to evaluate the code in
          the specified nodes or to just return the result of parsing
          the text in each node.

_D_e_t_a_i_l_s:

     This evaluates the 'code', 'function' and 'example' elements in
     the XML content that have the appropriate namespace (i.e. r, s, or
     no namespace) and discards all others. It also discards r:output
     nodes from the text, along with processing instructions and
     comments. And it resolves 'r:frag' or 'r:code' nodes with a 'ref'
     attribute by identifying the corresponding 'r:code' node with the
     same value for its 'id' attribute and then evaluating that node in
     place of the 'r:frag' reference.

_V_a_l_u_e:

     An R object (typically a list) that contains the results of
     evaluating the content of the different selected code segments in
     the XML document.  We use 'sapply' to iterate over the nodes and
     so If the results of all the nodes A list giving the pairs of
     expressions and evaluated objects for each of the different XML
     elements processed.

_A_u_t_h_o_r(_s):

     Duncan Temple Lang <duncan@wald.ucdavis.edu>

_S_e_e _A_l_s_o:

     'xmlTreeParse'

_E_x_a_m_p_l_e_s:

      xmlSource(system.file("exampleData", "Rsource.xml", package="XML"))

       # This illustrates using r:frag nodes.
       # The r:frag nodes are not processed directly, but only
       # if referenced in the contents/body of a r:code node
      f = system.file("exampleData", "Rref.xml", package="XML")
      xmlSource(f)

