newXMLDoc                package:XML                R Documentation

_C_r_e_a_t_e _i_n_t_e_r_n_a_l _X_M_L _n_o_d_e _o_r _d_o_c_u_m_e_n_t _o_b_j_e_c_t

_D_e_s_c_r_i_p_t_i_o_n:

     These are used to create internal `libxml' nodes and top-level
     document objects  that are used to write XML trees.  While the
     functions are available, their direct use is not encouraged.
     Instead, use 'xmlTree' as the functions need to be used within a
     strict regime to avoid corrupting C level structures.

     'xmlDoc' creates a new XMLInternalDocument object by copying the
     given node and all of its descendants and putting them into a new
     document. This is useful when we want to work with sub-trees with
     general tools that work on documents, e.g. XPath queries.

     'newXMLDoc' allows one to create a regular XML node with a name
     and attributes. One can provide new namespace definitions via
     'namespaceDefinitions'. While these might also be given in the
     attributes in the slightly more verbose form of 'c('xmlns:prefix'
     = 'http://...')', the result is that the XML node does not
     interpret that as a namespace definition but merely an attribute
     with a name 'xmlns:prefix'. Instead, one should specify the
     namespace definitions via the 'namespaceDefinitions' parameter.

     In addition to namespace definitions, a node name can also have a
     namespace definition.  This can be specified in the 'name'
     argument as 'prefix:name' and 'newXMLDoc' will do the right thing
     in separating this into the namespace and regular name. 
     Alternatively, one can specify a namespace separately via the
     'namespace' argument. This can be either a simple name or an
     internal namespace object defined earlier.

     How do we define a default namespace?

_U_s_a_g_e:

     xmlDoc(node)
     newXMLDoc(dtd, namespaces=NULL, addFinalizer = TRUE)
     newXMLNode(name, ..., attrs = NULL, namespace="",
                 namespaceDefinitions = character(),
                  doc = NULL, .children = list(...), parent = NULL)
     newXMLTextNode(text, parent = NULL, doc = NULL)
     newXMLCDataNode(text, parent = NULL, doc = NULL)
     newXMLCommentNode(text, parent = NULL, doc = NULL)
     newXMLPINode(name, text, parent = NULL, doc = NULL)
     newXMLDTDNode(nodeName, externalID = character(), systemID = character(), doc = NULL)  

_A_r_g_u_m_e_n_t_s:

    node: a 'XMLInternalNode' object that will be copied to create a
          subtree for a new document.

     dtd: the name of the DTD to use for the XML document.

namespaces: a named character vector with each element specifying a
          name space identifier and the corresponding URI for that
          namespace that are to be declared and used in the XML
          document, \ e.g. 'c(shelp =
          "http://www.omegahat.org/XML/SHelp")'

addFinalizer: a logical value indicating whether the default finalizer
          routine should be registered to free the internal xmlDoc when
          R no longer has a reference to this external pointer object. 

    name: the tag/element name for the XML node and the for a
          Processing Instruction (PI) node, this is the "target", e.g.
          the identifier for the system for whose attention this PI
          node is intended.

     ...: the children of this node. These can be other nodes created
          earlier or R strings that are converted to text nodes and
          added as children to this newly created node.

   attrs: a named list of name-value pairs to be used as  attributes
          for the XML node. One should not use this argument to define
          namespaces, i.e. attributes of the form
          'xmlns:prefix='http:/...''. Instead, such definitions should
          be specified ideally via the 'namespaceDefinitions' argument,
          or even the 'namespace' argument. The reason is that
          namespace definitions are special attributes that are shared
          across nodes wherease regular attributes are particular to a
          node. So a namespace needs to be explicitly defined so that
          the XML representation can recognize it as such. 

namespace: a character vector specifying the namespace for this new
          node. Typically this is used to specify  i) the prefix of the
          namespace to use, or ii) one or more namespace definitions,
          or iii) a combination of both. If this is a character vector
          with a) one element and b) with an empty 'names' attribute
          and c) whose value does not start with 'http:/' or 'ftp:/',
          then it is assumed that the value is a namespace prefix for a
          namespace defined in an ancestor node. To be able to resolve
          this prefix to a namespace definition, 'parent' must be
          specified so that we can traverse the chain of ancestor
          nodes. However, if c) does not hold, i.e. the string starts
          with 'http:/' or 'ftp:/', then we take this single element to
          be a namespace definition and the since it has no name b),
          this is the definition for the default namespace for this new
          node, i.e. corresponding to 'xmlns='http:/...'. It is
          cumbersome to specify '""' as a  name for an element in a
          character vector (as 'c('' = 'value') gives an unnecessary
          error!'. Elements with names are expanded to namespace
          definitions with the name as the prefix and the value as the
          namespace URI. 

     doc: the 'XMLInternalDocument' object created with 'newXMLDoc'
          that is used to root the node.

.children: a list containing XML node elements or content. This is an
          alternative form of specifying the child nodes than ... which
          is useful for programmatic interaction when the "sub"-content
          is already in a list rather than a loose collection of
          values.

    text: the text content for the new XML node

nodeName: the name of the node to put in the DOCTYPE element that will
          appear as the top-most node in the XML document.

externalID: the PUBLIC identifier for the document type. This is a
          string of the form 'A//B//C//D'. A is either + or -; B
          identifies the person or insitution that defined the format
          (i.e. the "creator"); C is the name of the format; and
          language is an encoding for the language that comes from the
          ISO 639 document.

systemID: the SYSTEM identifier for the DTD for the document. This is a
          URI

namespaceDefinitions: a character vector or a list with each element
          being a string. These give the URIs identifying the
          namespaces uniquely. The elements should have names which are
          used as prefixes. A default namespace has "" as the name.
          This argument can be used to remove any ambiguity that arises
          when specifying a single string with no names attribute as
          the value for 'namespace'. The values here are used only for
          defining new namespaces and not for determining the namespace
          to use for this particular node. 

  parent: the node which will act as the parent of this newly created
          node. This need not be specified and one can add the new node
          to another node in a separate operation via 'addChildren'.

_D_e_t_a_i_l_s:

     These create internal C level objects/structure instances that can
     be added to a libxml DOM and subsequently inserted into other
     document objects or ``serialized'' to textual form.

_V_a_l_u_e:

     Each function returns an R object that points to the C-level
     structure instance. These are of class 'XMLInternalDocument' and
     'XMLInternalNode', respectively

_N_o_t_e:

     These functions are used to build up an internal XML tree. This
     can be used in the Sxslt package (<URL:
     http://www.omegahat.org/Sxslt>) when creating content in R that is
     to be dynamically inserted into an XML document.

_A_u_t_h_o_r(_s):

     Duncan Temple Lang

_R_e_f_e_r_e_n_c_e_s:

     <URL: http://www.w3.org/XML>, <URL: http://www.xmlsoft.org>, <URL:
     http://www.omegahat.org>

_S_e_e _A_l_s_o:

     'xmlTree' 'saveXML'

_E_x_a_m_p_l_e_s:

      # Simple creation of an XML tree using these functions
     top = newXMLNode("a")
     newXMLNode("b", attrs = c(x = 1, y = 'abc'), parent = top)
     newXMLNode("c", "With some text", parent = top)
     d = newXMLNode("d", newXMLTextNode("With text as an explicit node"), parent = top)
     newXMLCDataNode("x <- 1\n x > 2", parent = d)

     newXMLPINode("R", "library(XML)", top)
     newXMLCommentNode("This is a comment", parent = top)

     o = newXMLNode("ol", parent = top)

     kids = lapply(letters[1:3],
                    function(x)
                       newXMLNode("li", x))
     addChildren(o, kids)

     cat(saveXML(top))



     x = summary(rnorm(1000))
     d = xmlTree()
     d$addNode("table", close = FALSE)

     d$addNode("tr", .children = sapply(names(x), function(x) d$addNode("th", x)))
     d$addNode("tr", .children = sapply(x, function(x) d$addNode("td", format(x))))

     d$closeNode()

     # Just doctype
     z = xmlTree("people", dtd = "people")
     # no public element
     z = xmlTree("people", dtd = c("people", "", "http://www.omegahat.org/XML/types.dtd"))
     # public and system
     z = xmlTree("people", dtd = c("people", "//a//b//c//d", "http://www.omegahat.org/XML/types.dtd"))

     # Using a DTD node directly.
     dtd = newXMLDTDNode(c("people", "", "http://www.omegahat.org/XML/types.dtd"))
     z = xmlTree("people", dtd = dtd)

     x = rnorm(3)
     z = xmlTree("r:data", namespaces = c(r = "http://www.r-project.org"))
     z$addNode("numeric", attrs = c("r:length" = length(x)), close = FALSE)
     lapply(x, function(v) z$addNode("el", x))
     z$closeNode()
     # should give   <r:data><numeric r:length="3"/></r:data>

     # shows namespace prefix on an attribute, and different from the one on the node.
     z = xmlTree()
     z$addNode("r:data",  namespace = c(r = "http://www.r-project.org", omg = "http://www.omegahat.org"), close = FALSE)
     x = rnorm(3)
     z$addNode("r:numeric", attrs = c("omg:length" = length(x)))

     z = xmlTree("people", namespaces = list(r = "http://www.r-project.org"))
     z$setNamespace("r")

     z$addNode("person", attrs = c(id = "123"), close = FALSE)
     z$addNode("firstname", "Duncan")
     z$addNode("surname", "Temple Lang")
     z$addNode("title", "Associate Professor")
     z$addNode("expertize", close = FALSE)
     z$addNode("topic", "Data Technologies")
     z$addNode("topic", "Programming Language Design")
     z$addNode("topic", "Parallel Computing")
     z$addNode("topic", "Data Visualization")
     z$closeTag()
     z$addNode("address", "4210 Mathematical Sciences Building, UC Davis")

