AI & IT UG - Weblog
23.05.2020 0:09
r2x

Builds of R2X available

Builds of R2X R packages are available from our download page

R
install_packages("https://ai-and-it.de/download/r2x_1.2.2.tar.gz", repos=NULL)
20.05.2020 20:24

UOpt solver released

For deep learning and training of neural networks we have developed the UOpt library. This solver is specially well suited for the unconstrained optimization that is done in machine learning. See the section UOpt on this site for more information.

08.05.2020 10:55

Company registered officially

I just received news that AI & IT UG has been entered into the commercial register in Aachen, so the company founding is now complete from a legal point of view!

04.05.2020 18:20
parser : r2x : xml

R2X: A star is born

As our first contribution in the open source software of the world we would like to present R2X: A seamless XML to R bridge. Install from Github with

devtools::install_github("rainac/r2x")

Then you can create XML from R with

R
library(r2x) xml <- r2x( list( a = list(b = 1), a = list(bb = 2), a = list(cc = 3) ) )
<r2x ><a ><b >1</b></a><a ><bb >2</bb></a><a ><cc >3</cc></a></r2x>

For attributes you use, well, attributes, and for a XML document use xml2::read_xml.

R
xml <- r2x( list( a = structure( list(b = structure(1, d=4, e=5)), d=6, e=7), a = list(bb = 2), a = list(cc = 3) ) ) (doc <- read_xml(xml))
{xml_document} <r2x> [1] <a d="6" e="7">\n <b d="4" e="5">1</b>\n</a> [2] <a>\n <bb>2</bb>\n</a> [3] <a>\n <cc>3</cc>\n</a>

The inverse function is x2r, it uses XSLT to generate the R code for the result. It converts strings to numbers when possible, and you can give it XML markup or a document.

R
l1 <- x2r(doc) l2 <- x2r(xml) l2$a$b
[1] 1 attr(,"d") [1] 4 attr(,"e") [1] 5

We think the possibilities are endless! Think about scraping data:

R
l3 <- x2r(read_xml('http://www.ai-and-it.de/')) lapply(l3$body$div$div$article, nchar)
$h1 [1] 38 $div h1 p p p p p 7 129 316 183 17 19

But watch out that $ will only ever return the first matching child. So use other means:

R
lapply(l3$body$div[names(l3$body$div) == 'div'], nchar)
$div article 784 $div article 265

Also the result of x2r, that is, the lists that make up the XML, do not look that pretty - you saw that already. This is just because R lists the attributes after the value, which is of course the other way round in XML. So we recommend the element function. To see that form, use deparse

R
cat(deparse(read_xml('<a b="B"><a c="C">42</a><a c="D">42</a></a>')))

The output, when embellished by hand looks like this:

R
a <- element( b = 'B', list( a = element( c = 'C', 42), a = element( c = 'D', 42) ) )

This is just the structure function inverted, but that makes it look much more like XML already. R2X also converts numerical vectors to lists of numbers and back. There is one catch: R2X ignores any text in between elements, so your elements can either have element children or text.

R
x2r(read_xml('<a> X <a>42 0x42</a> Y <a>abc def</a> Z </a>'))
$a [1] 42 66 $a [1] "abc def"

Some text is lost in translation, but all the elements and all the attributes will be there, as well as any single-child text nodes.

The github repository is at https::github.com/rainac/r2x.