<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
  <title>Rmarkdown on JLA Data</title>
  <link>https://www.jla-data.net/tags/rmarkdown/</link>
  <description>Recent content in Rmarkdown on JLA Data</description>
  <generator>Hugo -- gohugo.io</generator>
<language>en-us</language>
<copyright>Jindra Lacko</copyright>
<lastBuildDate>Fri, 21 Jun 2019 00:00:00 +0000</lastBuildDate>

<atom:link href="https://www.jla-data.net/tags/rmarkdown/index.xml" rel="self" type="application/rss+xml" />


<item>
  <title>Parametrické reporty v erku</title>
  <link>https://www.jla-data.net/cze/parametricke-reporty-v-erku/</link>
  <pubDate>Fri, 21 Jun 2019 00:00:00 +0000</pubDate>
  
<guid>https://www.jla-data.net/cze/parametricke-reporty-v-erku/</guid>
  <description>


&lt;p&gt;Tvorba reportů ve formátech &lt;em&gt;pdf&lt;/em&gt;, &lt;em&gt;html&lt;/em&gt; a &lt;em&gt;docx&lt;/em&gt; – tedy souborů čitelných v &lt;a href=&#34;https://en.wikipedia.org/wiki/Adobe_Acrobat&#34;&gt;Adobe Acrobat Readeru&lt;/a&gt;, &lt;a href=&#34;https://en.wikipedia.org/wiki/Web_browser&#34;&gt;internetovém prohlížeči&lt;/a&gt; a &lt;a href=&#34;https://en.wikipedia.org/wiki/Microsoft_Word&#34;&gt;MS Wordu&lt;/a&gt; – je vcelku dobře známá silná stránka erka.&lt;/p&gt;
&lt;p&gt;Ne tak často využívaná, ale rovněž velmi zajímavá, je možnost &lt;em&gt;parametrického&lt;/em&gt; reportingu. Tato o něco pokročilejší technika je postavena na předání určité hodnoty – &lt;em&gt;parametru&lt;/em&gt; – R Markdownu při generování reportu. Je tak možné podle jedné zdrojové markdown šablony vytvořit více hotových dokumentů.&lt;/p&gt;
&lt;p&gt;Typické přiklady použití parametrizace jsou :&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;reporty shodné daty a strukturou, ale zpracované k odlišnému datu&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;sada reportů stejné struktury, ale mírně odlišných dat (například ke stejnému datu za více regionů)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Z popisu je vidět, že parametrizace je dobrá cesta k odbourání nudné a nezáživné (navíc náchylné k chybě) ruční práce.&lt;/p&gt;
&lt;p&gt;Oceníme jí zejména v případě, kdy dojde k institucionalizaci původně jednorázového reportu. Což se, zejména při práci v korporátu, může stát…&lt;/p&gt;
&lt;p&gt;Tvorba parametrického reportu je téma na více souborů – vyžaduje minimálně dva:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;RMarkdown šablonu s definovaným parametrem&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;erkový skript který šablonu volá s konkrétní hodnotou parametru&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;V případě ukládání do &lt;em&gt;pdf&lt;/em&gt; bývá praktické navíc doplnit LaTeX-ovou šablonu.&lt;/p&gt;
&lt;p&gt;Nabízím vám ilustrativní příklad parametrického reportu, který ilustruje práci s parametry v Rmd a jejich volání přes &lt;code&gt;rmarkdown::render()&lt;/code&gt;. Protože příklad z povahy věci pracuje s více soubory nebylo praktické ho publikovat na těchto stránkách. Místo toho jsem jej uložil na &lt;a href=&#34;https://github.com/jlacko/R4RPTG&#34;&gt;GitHubu&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Projekt si snado a rychle vyklonujete z adresy &lt;code&gt;https://github.com/jlacko/R4RPTG.git&lt;/code&gt; postupem popsaným v mé &lt;a href=&#34;https://www.jla-data.net/r4su/r4su-environment-setup/#rstudio-projekty&#34;&gt;cestě erka&lt;/a&gt;.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;https://www.jla-data.net/CZE/2019-06-21-parametricke-reporty-v-erku_files/praha.png&#34; /&gt;

&lt;/div&gt;
&lt;p&gt;Pro ilustraci používám svojí oblíbenou časovou řadu ceny piva v regionech podle ČSÚ.&lt;/p&gt;
&lt;p&gt;Z hlediska dalšího rozvoje stojí za úvahu integrace generování reportů s balíčkem &lt;a href=&#34;https://cran.r-project.org/web/packages/cronR/vignettes/cronR.html&#34;&gt;cronR&lt;/a&gt; pro přehlednější scheduling jobů v Linuxovém prostředí (tj. v kontextu serverové verze RStudia).&lt;/p&gt;
&lt;p&gt;Dalším logickým krokem je automatizace distribuce takto vytvořených reportů, ale ta již hodně závisí na konkrétní infrastruktuře.&lt;/p&gt;
</description>
  </item>
  
<item>
  <title>Unbearable Lightness of SQL Code Chunks</title>
  <link>https://www.jla-data.net/eng/unbearable-lightness-of-sql-chunks/</link>
  <pubDate>Tue, 17 Apr 2018 00:00:00 +0000</pubDate>
  
<guid>https://www.jla-data.net/eng/unbearable-lightness-of-sql-chunks/</guid>
  <description>


&lt;p&gt;Using code chunks with R code in &lt;a href=&#34;https://rmarkdown.rstudio.com/&#34;&gt;RMarkdown&lt;/a&gt; documents is a well understood (and much appreciated!) topic. In this post I would like to draw attention to a slightly different aspect of RMarkdown, that is the option of writing code chunks in different programming lanugages.&lt;/p&gt;
&lt;p&gt;I yet have to find the need to mix and match R and Python code in a single document, but I have found it advantageous to use SQL code chunks.&lt;/p&gt;
&lt;p&gt;Like it or not SQL is the de facto language of data, and SQL code is immediately clear to any old BI hand - much more so than a &lt;code&gt;dplyr&lt;/code&gt; pipeline. In addition it allows me to use features of SQL language that do not translate easily to R code.&lt;/p&gt;
&lt;p&gt;The first task is creating a database connection; this needs to be done in a R (or Python, but let us stick to R) code chunk.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(odbc)
con &amp;lt;- dbConnect(odbc::odbc(), 
                 driver = &amp;quot;PostgreSQL Unicode&amp;quot;, 
                 server = &amp;quot;db.jla-data.net&amp;quot;, 
                 port = 5432, 
                 uid = &amp;quot;babisobot&amp;quot;, # user babisobot has select rights only ...
                 password = &amp;quot;babisobot&amp;quot;, # ... so his password need not be too secret :)
                 database = &amp;quot;dbase&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The next chunk is declared as SQL &lt;code&gt;{sql ... }&lt;/code&gt; and and it is necessary to specify both &lt;code&gt;connection = con&lt;/code&gt; and &lt;code&gt;output.var = &amp;quot;frmVystup&amp;quot;&lt;/code&gt; in the header (i.e. in the curly braces). The quotation marks around output variable are important.&lt;/p&gt;
&lt;pre class=&#34;sql&#34;&gt;&lt;code&gt;select 
  date_trunc(&amp;#39;day&amp;#39;, saved) date,
  count(1) volume
from 
  babisobot 
group by 
  date_trunc(&amp;#39;day&amp;#39;, saved)
order by 
  2 desc
limit 5&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now that I have the result of SQL script safely stored in variable &lt;code&gt;frmVystup&lt;/code&gt; I can use it in my futher work in R. For this proof of concept showing the data frame in a simple &lt;code&gt;kable&lt;/code&gt; is enough.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(kableExtra)

kable(frmVystup, # the variable created in previous chunk
      format = &amp;#39;html&amp;#39;,
      booktabs = T,
      align = c(&amp;#39;l&amp;#39;, &amp;#39;r&amp;#39;)) %&amp;gt;%
  kable_styling(full_width = F) %&amp;gt;%
  column_spec(1, width = &amp;quot;6cm&amp;quot;) &lt;/code&gt;&lt;/pre&gt;
&lt;table class=&#34;table&#34; style=&#34;width: auto !important; margin-left: auto; margin-right: auto;&#34;&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&#34;text-align:left;&#34;&gt;
date
&lt;/th&gt;
&lt;th style=&#34;text-align:right;&#34;&gt;
volume
&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;width: 6cm; &#34;&gt;
2018-03-25
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
3363
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;width: 6cm; &#34;&gt;
2018-04-06
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1753
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;width: 6cm; &#34;&gt;
2018-04-10
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1741
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;width: 6cm; &#34;&gt;
2018-03-27
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1537
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left;width: 6cm; &#34;&gt;
2018-04-11
&lt;/td&gt;
&lt;td style=&#34;text-align:right;&#34;&gt;
1505
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The last, but not least, thing is not forgetting about closing the database connection on exit.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;dbDisconnect(con) # because it is good manners to shut the door and turn off the light&lt;/code&gt;&lt;/pre&gt;
</description>
  </item>
  
<item>
  <title>Parametrized R Markdown Reports</title>
  <link>https://www.jla-data.net/eng/parametrized-r-markdown-reports/</link>
  <pubDate>Wed, 10 Jan 2018 00:00:00 +0000</pubDate>
  
<guid>https://www.jla-data.net/eng/parametrized-r-markdown-reports/</guid>
  <description>


&lt;p&gt;Every business, no matter how big or small, simple or sophisticated, requires regular reports to run. R Studio, especially in its server flavor with option of cron jobs, is eminently capable of producing these. Parametrized reports are thus able to perform the role of a &lt;a href=&#34;https://en.wikipedia.org/wiki/Gateway_drug_theory&#34;&gt;gateway drug&lt;/a&gt; and wean the analytic team off their beloved Excel sheets.&lt;/p&gt;
&lt;p&gt;In fact, if I was looking for a single feature to convince a die hard Excel user to see the light and give up his VLOOKUP, I would stress out the &lt;em&gt;ease&lt;/em&gt; of regular reporting with parametrized reports. It might not be a fancy ML / AI technique that catches the headlines, but it is one of the small things which take the pain out of everyday chores.&lt;/p&gt;
&lt;p&gt;This example will demonstrate creating parametrized reports using the well known and much loved &lt;em&gt;Iris&lt;/em&gt; dataset.&lt;/p&gt;
&lt;p&gt;It will show:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;a &lt;em&gt;R Markdown&lt;/em&gt; template, with a single parameter &lt;code&gt;species&lt;/code&gt; defined&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;using &lt;code&gt;knitr::kable&lt;/code&gt; function and the &lt;code&gt;kableExtra&lt;/code&gt; package to build a simple table with a calculated summary row and some basic formatting&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;a master &lt;em&gt;R&lt;/em&gt; script, calling &lt;code&gt;rmarkdown::render&lt;/code&gt; on the template to build the reports, iterating value of the parameter &lt;code&gt;species&lt;/code&gt; over unique values of species from the Iris dataset&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The R markdown template in its easiest part needs just two parts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;YAML header&lt;/li&gt;
&lt;li&gt;a single R chunk&lt;/li&gt;
&lt;/ul&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;---
title: &amp;quot;Iris *`r params$species`* are rather cute...&amp;quot; # a report looks better with the title set
params:  # this is the parameter declaration
  species: &amp;quot;setosa&amp;quot; # default value, overrriden by the render function, but helpful for debugging
output:
  pdf_document:
    latex_engine: pdflatex
header-includes:
- \usepackage{booktabs}
- \usepackage{longtable}
- \usepackage{array}
- \usepackage{multirow}
- \usepackage[table]{xcolor}
- \usepackage{wrapfig}
- \usepackage{float}
- \usepackage{colortbl}
- \usepackage{pdflscape}
- \usepackage{tabu}
- \usepackage{threeparttable}
- \usepackage[normalem]{ulem}
---&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The YAML header needs to include declaration of the parameters (indentation is, as is often the case with YAML, crucial). Including a default value is optional, but helpful in debugging.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;header-includes&lt;/code&gt; option loads LaTeX macros necessary for table formatting; this list, &lt;a href=&#34;http://haozhu233.github.io/kableExtra/awesome_table_in_pdf.pdf&#34;&gt;helpfuly provided by&lt;/a&gt; Hao Zhu (the author of &lt;code&gt;kableExtra&lt;/code&gt; package) should keep the dreaded LaTeX error “environment &lt;em&gt;xyz&lt;/em&gt; undefined” at bay.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(tidyverse)
library(knitr)
library(kableExtra)

src &amp;lt;- iris %&amp;gt;% # here you would normally load a file or connect to a database...
  filter(Species == params$species) %&amp;gt;%
  mutate(Species = as.character(Species)) %&amp;gt;% # factor would be a problem for summary row
  select(Species, Sepal.Length) %&amp;gt;% # just two columns for the sake of clarity...
  slice(1:5) # first five rows only, so that page space is not an issue

src &amp;lt;- rbind(src, # add summary row 
             c(&amp;quot;Grand total&amp;quot;, sum(src$Sepal.Length)))

kable(src,
      format = &amp;#39;latex&amp;#39;,
      booktabs = T,
      align = c(&amp;#39;l&amp;#39;,&amp;#39;r&amp;#39;)) %&amp;gt;%
      row_spec(nrow(src), bold = T) # make the last (summary) row bold&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The body chunk needs to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;declare your libraries (note that &lt;code&gt;knitr&lt;/code&gt;, where &lt;code&gt;kable&lt;/code&gt; lives, is not a formal part of tidyverse - it is ‘just’ suggested - and needs to be loaded separately)&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;load your data (I have cheated a little, and used a pre-loaded Iris dataset) and&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;peform necessary filtering / aggregating&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Note how &lt;code&gt;params$species&lt;/code&gt; is applied as filter condition, and how the summary row is created by binding a new row to the filtered dataset.&lt;/p&gt;
&lt;p&gt;The master script needs to do two things:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;construct a vector of unique Iris species, each of which will be passed as a parameter the &lt;code&gt;render&lt;/code&gt; function to generate a report&lt;/li&gt;
&lt;li&gt;call the &lt;code&gt;render&lt;/code&gt; function from &lt;code&gt;rmarkdown&lt;/code&gt; package, with a list of parameters as required by the template. In this simple case just a sigle parameter ‘species’.&lt;/li&gt;
&lt;/ul&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(rmarkdown)

flowers &amp;lt;- unique(iris$Species) # setosa, versicolor, virginica - you know them all, don&amp;#39;t you?

for (i in seq_along(flowers)) {
  myIris &amp;lt;- flowers[i]  # my species - to be reused as 1) parameter &amp;amp; 2) file name
  render(&amp;quot;report-template.Rmd&amp;quot;, # the template
          params = list(species = myIris), # value of myIris passed to the species parameter
          output_file = paste(myIris, &amp;#39;.pdf&amp;#39;, sep = &amp;#39;&amp;#39;), # name of the output file - species name and pdf extension
          quiet = T,
          encoding = &amp;#39;UTF-8&amp;#39;)
}&lt;/code&gt;&lt;/pre&gt;
When you put it all together and source the master script you should end up with three pdf files like this:
&lt;p align=&#34;center&#34;&gt;
&lt;img src=&#34;https://www.jla-data.net/img/2018-01-10-iris-screenshot.png&#34; /&gt;
&lt;/p&gt;
&lt;p&gt;You can download a working example of both the &lt;a href=&#34;https://www.jla-data.net/sample/par-temp.Rmd&#34;&gt;markdown document&lt;/a&gt; and &lt;a href=&#34;https://www.jla-data.net/sample/par-master.R&#34;&gt;master script&lt;/a&gt; directly from my pages.&lt;/p&gt;
&lt;p&gt;As a next step I recommend learning more about the &lt;a href=&#34;https://cran.r-project.org/web/packages/cronR/cronR.pdf&#34;&gt;cronR&lt;/a&gt; package - when teamed with the parametric report functionality you get a report that makes itself; an business analyst dream!&lt;/p&gt;
</description>
  </item>
  
</channel>
  </rss>