<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
  <title>Automation on JLA Data</title>
  <link>https://www.jla-data.net/tags/automation/</link>
  <description>Recent content in Automation on JLA Data</description>
  <generator>Hugo -- gohugo.io</generator>
<language>en-us</language>
<copyright>Jindra Lacko</copyright>
<lastBuildDate>Thu, 09 Oct 2025 00:00:00 +0000</lastBuildDate>

<atom:link href="https://www.jla-data.net/tags/automation/index.xml" rel="self" type="application/rss+xml" />


<item>
  <title>Extracting location from text with AI</title>
  <link>https://www.jla-data.net/eng/extracting-location-from-text-with-gemini-ai/</link>
  <pubDate>Thu, 09 Oct 2025 00:00:00 +0000</pubDate>
  
<guid>https://www.jla-data.net/eng/extracting-location-from-text-with-gemini-ai/</guid>
  <description>
&lt;link href=&#34;https://www.jla-data.net/eng/extracting-location-from-text-with-gemini-ai/index_files/htmltools-fill/fill.css&#34; rel=&#34;stylesheet&#34; /&gt;
&lt;script src=&#34;https://www.jla-data.net/eng/extracting-location-from-text-with-gemini-ai/index_files/htmlwidgets/htmlwidgets.js&#34;&gt;&lt;/script&gt;
&lt;script src=&#34;https://www.jla-data.net/eng/extracting-location-from-text-with-gemini-ai/index_files/jquery/jquery-3.6.0.min.js&#34;&gt;&lt;/script&gt;
&lt;link href=&#34;https://www.jla-data.net/eng/extracting-location-from-text-with-gemini-ai/index_files/leaflet/leaflet.css&#34; rel=&#34;stylesheet&#34; /&gt;
&lt;script src=&#34;https://www.jla-data.net/eng/extracting-location-from-text-with-gemini-ai/index_files/leaflet/leaflet.js&#34;&gt;&lt;/script&gt;
&lt;link href=&#34;https://www.jla-data.net/eng/extracting-location-from-text-with-gemini-ai/index_files/leafletfix/leafletfix.css&#34; rel=&#34;stylesheet&#34; /&gt;
&lt;script src=&#34;https://www.jla-data.net/eng/extracting-location-from-text-with-gemini-ai/index_files/proj4/proj4.min.js&#34;&gt;&lt;/script&gt;
&lt;script src=&#34;https://www.jla-data.net/eng/extracting-location-from-text-with-gemini-ai/index_files/Proj4Leaflet/proj4leaflet.js&#34;&gt;&lt;/script&gt;
&lt;link href=&#34;https://www.jla-data.net/eng/extracting-location-from-text-with-gemini-ai/index_files/rstudio_leaflet/rstudio_leaflet.css&#34; rel=&#34;stylesheet&#34; /&gt;
&lt;script src=&#34;https://www.jla-data.net/eng/extracting-location-from-text-with-gemini-ai/index_files/leaflet-binding/leaflet.js&#34;&gt;&lt;/script&gt;


&lt;p&gt;In one of my projects I have recently faced the need to summarize and geolocate a largish volume of short pieces of text. While the actual project is not that relevant for this post I found the problem as a fun excuse for using AI methods, and overall an interesting learning exercise.&lt;/p&gt;
&lt;p&gt;The method I ended using is the &lt;a href=&#34;https://ai.google.dev/gemini-api/docs/structured-output&#34;&gt;structured output of Gemini API&lt;/a&gt;, leveraged from R via the &lt;a href=&#34;https://jhk0530.github.io/gemini.R/&#34;&gt;gemini.R&lt;/a&gt; package. Gemini because I have a working relationship with Google from using their geocoding and routing APIs extensively in my work, and structured output in order to ease the conversion from plain text to the &lt;code&gt;{sf}&lt;/code&gt; flavor of data frame.&lt;/p&gt;
&lt;p&gt;Since my task was relatively simple and high volume I tried at first the &lt;a href=&#34;https://ai.google.dev/gemini-api/docs/models#gemini-2.5-flash-lite&#34;&gt;Lite&lt;/a&gt; version of the current Gemini model, as it produces the fastest (and cheapest) output. I found the quality of the lite model more than adequate for my needs, and so I stuck with the initial choice.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;The first step is reading in the libraries required; no surprise here…&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(gemini.R)  # for accessing the Gemini API
library(dplyr)     # for the pipe and data frame handling
library(jsonlite)  # to make sense of the JSON results
library(leaflet)   # to visualize the output&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The Gemini call is not overly complicated: it involves passing a prompt to a model, and handling the output. In the case of a &lt;a href=&#34;https://jhk0530.github.io/gemini.R/reference/gemini_structured.html&#34;&gt;&lt;code&gt;gemini.R::gemini_structured()&lt;/code&gt;&lt;/a&gt; call also with specification of the required structure, which is well documented on the package website.&lt;/p&gt;
&lt;p&gt;Since the Gemini API is a paid service it is also necessary to register an API key; a step that I am omitting here for practical reasons (I suggest keeping the key in your &lt;code&gt;.Renviron&lt;/code&gt; file).&lt;/p&gt;
&lt;p&gt;The fun part is playing with various versions of a prompt; in my case it follows the structure of “you are an experienced &lt;em&gt;whatever&lt;/em&gt;, do give me &lt;em&gt;this &amp;amp; that&lt;/em&gt;” followed by the piece of text that needs summarising and geocoding.&lt;/p&gt;
&lt;p&gt;With a little tuning it can be tweaked to geocode either all the locations mentioned, or only the most important one. Since my use case called for one (and only one) location per piece of text I am asking very specifically for the single most important location.&lt;/p&gt;
&lt;p&gt;To test my prompt I am using a piece of lyrics from Peter Sarsted’s &lt;a href=&#34;https://en.wikipedia.org/wiki/Where_Do_You_Go_To_(My_Lovely)%3F&#34;&gt;Where Do You Go To&lt;/a&gt;:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# initial prompt
prompt_header &amp;lt;- &amp;quot;you are an experienced geographer; analyze this text and 
                  give me its single most important location as a name and 
                  as a POINT in simple features WKT format 
                  and state your confidence on a scale from 0 to 100 \n\n&amp;quot;

# text to be analyzed
text_input &amp;lt;- &amp;quot;You talk like Marlene Dietrich
               And you dance like Zizi Jeanmaire
               Your clothes are all made by Balmain
               And there&amp;#39;s diamonds and pearls in your hair, yes, there are
               You live in a fancy apartment
               Off the Boulevard St. Michel
               Where you keep your Rolling Stones records
               And a friend of Sacha Distel, yes, you do
               
               But where do you go to, my lovely
               When you&amp;#39;re alone in your bed?
               Tell me the thoughts that surround you
               I want to look inside your head, yes, I do
               
               I&amp;#39;ve seen all your qualifications
               You got from the Sorbonne
               And the painting you stole from Picasso
               Your loveliness goes on and on, yes, it does
               When you go on your summer vacation
               You go to Juan-les-Pins
               With your carefully designed topless swimsuit
               You get an even suntan on your back, and on your legs
               And when the snow falls you&amp;#39;re found in St. Moritz
               With the others of the jet set
               And you sip your Napoleon brandy
               But you never get your lips wet, no, you don&amp;#39;t
               
               But where do you go to, my lovely
               When you&amp;#39;re alone in your bed?
               Won&amp;#39;t you tell me the thoughts that surround you?
               I want to look inside your head, yes, I do&amp;quot;

# schema to give the output a firm structure
schema &amp;lt;- list(
   type = &amp;quot;ARRAY&amp;quot;,
   items = list(
      type = &amp;quot;OBJECT&amp;quot;,
      properties = list(
         name = list(type = &amp;quot;STRING&amp;quot;),
         location = list(type = &amp;quot;STRING&amp;quot;),
         confidence = list(type = &amp;quot;NUMBER&amp;quot;)
      ),
      propertyOrdering = c(&amp;quot;name&amp;quot;, &amp;quot;location&amp;quot;, &amp;quot;confidence&amp;quot;)
   )
)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Having all the parts ready I place a call to the Gemini model:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# let Gemini perform its magic!
location &amp;lt;- gemini_structured(prompt = paste(prompt_header, text_input),
                              model = &amp;quot;2.5-flash-lite&amp;quot;, # for the cheapskates...
                              schema = schema)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Gemini is generating a structured response...&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;To give an overview of the result returned I first check the JSON returned, and then transform the output from JSON to &lt;code&gt;{sf}&lt;/code&gt; data format via a regular data frame. The model evidently understands the logic of simple features format well and uses EPSG:4326 coordinates by default. The transformation from a well known text to &lt;code&gt;{sf}&lt;/code&gt; is thus not a complicated one.&lt;/p&gt;
&lt;p&gt;As a final step I pipe the &lt;code&gt;{sf}&lt;/code&gt; result to a &lt;code&gt;{leaflet}&lt;/code&gt; call, visualizing the location on the default OSM basemap.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# initial overview of the result as JSON object
prettify(location)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [
##     {
##         &amp;quot;name&amp;quot;: &amp;quot;Boulevard Saint-Michel&amp;quot;,
##         &amp;quot;location&amp;quot;: &amp;quot;POINT(2.3366434 48.8464368)&amp;quot;,
##         &amp;quot;confidence&amp;quot;: 95
##     }
## ]
## &lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# interpret the result as sf object
location %&amp;gt;% 
   jsonlite::fromJSON() %&amp;gt;%  
   sf::st_as_sf(wkt = &amp;quot;location&amp;quot;, crs = 4326) %&amp;gt;% 
   leaflet() %&amp;gt;% 
   addTiles() %&amp;gt;% 
   addCircleMarkers(label = ~ paste(name, &amp;quot;- confidence&amp;quot;, confidence, &amp;quot;of 100&amp;quot;),
                    color = &amp;quot;red&amp;quot;,
                    stroke = NA,
                    fillOpacity = 1)&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;leaflet html-widget html-fill-item&#34; id=&#34;htmlwidget-1&#34; style=&#34;width:100%;height:480px;&#34;&gt;&lt;/div&gt;
&lt;script type=&#34;application/json&#34; data-for=&#34;htmlwidget-1&#34;&gt;{&#34;x&#34;:{&#34;options&#34;:{&#34;crs&#34;:{&#34;crsClass&#34;:&#34;L.CRS.EPSG3857&#34;,&#34;code&#34;:null,&#34;proj4def&#34;:null,&#34;projectedBounds&#34;:null,&#34;options&#34;:{}}},&#34;calls&#34;:[{&#34;method&#34;:&#34;addTiles&#34;,&#34;args&#34;:[&#34;https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png&#34;,null,null,{&#34;minZoom&#34;:0,&#34;maxZoom&#34;:18,&#34;tileSize&#34;:256,&#34;subdomains&#34;:&#34;abc&#34;,&#34;errorTileUrl&#34;:&#34;&#34;,&#34;tms&#34;:false,&#34;noWrap&#34;:false,&#34;zoomOffset&#34;:0,&#34;zoomReverse&#34;:false,&#34;opacity&#34;:1,&#34;zIndex&#34;:1,&#34;detectRetina&#34;:false,&#34;attribution&#34;:&#34;&amp;copy; &lt;a href=\&#34;https://openstreetmap.org/copyright/\&#34;&gt;OpenStreetMap&lt;\/a&gt;,  &lt;a href=\&#34;https://opendatacommons.org/licenses/odbl/\&#34;&gt;ODbL&lt;\/a&gt;&#34;}]},{&#34;method&#34;:&#34;addCircleMarkers&#34;,&#34;args&#34;:[48.8464368,2.3366434,10,null,null,{&#34;interactive&#34;:true,&#34;className&#34;:&#34;&#34;,&#34;stroke&#34;:null,&#34;color&#34;:&#34;red&#34;,&#34;weight&#34;:5,&#34;opacity&#34;:0.5,&#34;fill&#34;:true,&#34;fillColor&#34;:&#34;red&#34;,&#34;fillOpacity&#34;:1},null,null,null,null,&#34;Boulevard Saint-Michel - confidence 95 of 100&#34;,{&#34;interactive&#34;:false,&#34;permanent&#34;:false,&#34;direction&#34;:&#34;auto&#34;,&#34;opacity&#34;:1,&#34;offset&#34;:[0,0],&#34;textsize&#34;:&#34;10px&#34;,&#34;textOnly&#34;:false,&#34;className&#34;:&#34;&#34;,&#34;sticky&#34;:true},null]}],&#34;limits&#34;:{&#34;lat&#34;:[48.8464368,48.8464368],&#34;lng&#34;:[2.3366434,2.3366434]}},&#34;evals&#34;:[],&#34;jsHooks&#34;:[]}&lt;/script&gt;
&lt;p&gt;
&lt;/p&gt;
&lt;p&gt;I was pleasantly surprised that the model did not fall for the red herrings of the Azure Coast and St. Moritz, and places the most significant location firmly in the Left Bank of Paris. The location returned is about 250 meters off the &lt;em&gt;actual&lt;/em&gt; Boulevard St. Michel, a level of accuracy that is more than adequate for my needs.&lt;/p&gt;
&lt;p&gt;Since my original use case was in multiple languages I am trying out next a different call, using lyrics from Zhanna Bichevskaya’s &lt;a href=&#34;https://en.wikipedia.org/wiki/Po_dikim_stepyam_Zabaikalya&#34;&gt;The Vagabond&lt;/a&gt; in both an unfamiliar language and script:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# text to be analyzed
text_input &amp;lt;- &amp;quot;По диким степям Забайкалья,
               Где золото роют в горах,
               Бродяга, судьбу проклиная,
               Тащился с сумой на плечах.
               
               Бежал из тюрьмы тёмной ночью,
               В тюрьме он за правду страдал.
               Идти дальше нет уже мочи –
               Пред ним расстилался Байкал.
               
               Бродяга к Байкалу подходит,
               Рыбацкую лодку берёт
               И грустную песню заводит,
               Про Родину что-то поёт.

               Бродяга Байкал переехал,
               Навстречу - родимая мать.
               &amp;#39;Ах, здравствуй, ах, здравствуй, мамаша,
               Здоров ли отец мой да брат?&amp;#39;
               
               &amp;#39;Отец твой давно уж в могиле,
               Землею сырою лежит,
               А брат твой давно уж в Сибири,
               Давно кандалами гремит.&amp;#39;&amp;quot;

# the same Gemini call, with the same prompt header &amp;quot;experienced geographer&amp;quot;
location &amp;lt;- gemini_structured(prompt = paste(prompt_header, text_input),
                              model = &amp;quot;2.5-flash-lite&amp;quot;, 
                              schema = schema)

# initial overview of the result as JSON object
prettify(location)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [
##     {
##         &amp;quot;name&amp;quot;: &amp;quot;Lake Baikal&amp;quot;,
##         &amp;quot;location&amp;quot;: &amp;quot;POINT(108.316667 53.616667)&amp;quot;,
##         &amp;quot;confidence&amp;quot;: 95
##     }
## ]
## &lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# interpret the result as sf object
location %&amp;gt;% 
   jsonlite::fromJSON() %&amp;gt;% 
   sf::st_as_sf(wkt = &amp;quot;location&amp;quot;, crs = 4326) %&amp;gt;% 
   leaflet() %&amp;gt;% 
   addTiles() %&amp;gt;% 
   addCircleMarkers(label = ~ paste(name, &amp;quot;- confidence&amp;quot;, confidence, &amp;quot;of 100&amp;quot;),
                    color = &amp;quot;red&amp;quot;,
                    stroke = NA,
                    fillOpacity = 1)&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;leaflet html-widget html-fill-item&#34; id=&#34;htmlwidget-2&#34; style=&#34;width:100%;height:480px;&#34;&gt;&lt;/div&gt;
&lt;script type=&#34;application/json&#34; data-for=&#34;htmlwidget-2&#34;&gt;{&#34;x&#34;:{&#34;options&#34;:{&#34;crs&#34;:{&#34;crsClass&#34;:&#34;L.CRS.EPSG3857&#34;,&#34;code&#34;:null,&#34;proj4def&#34;:null,&#34;projectedBounds&#34;:null,&#34;options&#34;:{}}},&#34;calls&#34;:[{&#34;method&#34;:&#34;addTiles&#34;,&#34;args&#34;:[&#34;https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png&#34;,null,null,{&#34;minZoom&#34;:0,&#34;maxZoom&#34;:18,&#34;tileSize&#34;:256,&#34;subdomains&#34;:&#34;abc&#34;,&#34;errorTileUrl&#34;:&#34;&#34;,&#34;tms&#34;:false,&#34;noWrap&#34;:false,&#34;zoomOffset&#34;:0,&#34;zoomReverse&#34;:false,&#34;opacity&#34;:1,&#34;zIndex&#34;:1,&#34;detectRetina&#34;:false,&#34;attribution&#34;:&#34;&amp;copy; &lt;a href=\&#34;https://openstreetmap.org/copyright/\&#34;&gt;OpenStreetMap&lt;\/a&gt;,  &lt;a href=\&#34;https://opendatacommons.org/licenses/odbl/\&#34;&gt;ODbL&lt;\/a&gt;&#34;}]},{&#34;method&#34;:&#34;addCircleMarkers&#34;,&#34;args&#34;:[53.616667,108.316667,10,null,null,{&#34;interactive&#34;:true,&#34;className&#34;:&#34;&#34;,&#34;stroke&#34;:null,&#34;color&#34;:&#34;red&#34;,&#34;weight&#34;:5,&#34;opacity&#34;:0.5,&#34;fill&#34;:true,&#34;fillColor&#34;:&#34;red&#34;,&#34;fillOpacity&#34;:1},null,null,null,null,&#34;Lake Baikal - confidence 95 of 100&#34;,{&#34;interactive&#34;:false,&#34;permanent&#34;:false,&#34;direction&#34;:&#34;auto&#34;,&#34;opacity&#34;:1,&#34;offset&#34;:[0,0],&#34;textsize&#34;:&#34;10px&#34;,&#34;textOnly&#34;:false,&#34;className&#34;:&#34;&#34;,&#34;sticky&#34;:true},null]}],&#34;limits&#34;:{&#34;lat&#34;:[53.616667,53.616667],&#34;lng&#34;:[108.316667,108.316667]}},&#34;evals&#34;:[],&#34;jsHooks&#34;:[]}&lt;/script&gt;
&lt;p&gt;
&lt;/p&gt;
&lt;p&gt;The model interprets the song accurately, and places the principal location in the middle of Lake Baikal as expected (you may need to zoom the map out a little to fully appreciate this).&lt;/p&gt;
&lt;p&gt;Given the AI models well documented eagerness to please – and the inevitable hallucinations which result from it – I wanted to finally test my prompt with a piece of text guaranteed to contain absolutely no usable information; Lewis Carroll’s &lt;a href=&#34;https://en.wikipedia.org/wiki/Jabberwocky&#34;&gt;Jabberwocky&lt;/a&gt; ensures that:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# text to be analyzed
text_input &amp;lt;- &amp;quot;&amp;#39;Twas brillig, and the slithy toves
               Did gyre and gimble in the wabe;
               All mimsy were the borogoves,
               And the mome raths outgrabe.
               
               &amp;#39;Beware the Jabberwock, my son!
               The jaws that bite, the claws that catch!
               Beware the Jubjub bird, and shun
               The frumious Bandersnatch!&amp;#39;
               
               He took his vorpal sword in hand:
               Long time the manxome foe he sought—
               So rested he by the Tumtum tree,
               And stood awhile in thought.
               
               And as in uffish thought he stood,
               The Jabberwock, with eyes of flame,
               Came whiffling through the tulgey wood,
               And burbled as it came!
               
               One, two! One, two! And through and through
               The vorpal blade went snicker-snack!
               He left it dead, and with its head
               He went galumphing back.
               
               &amp;#39;And hast thou slain the Jabberwock?
               Come to my arms, my beamish boy!
               O frabjous day! Callooh! Callay!&amp;#39;
               He chortled in his joy.
               
               &amp;#39;Twas brillig, and the slithy toves
               Did gyre and gimble in the wabe;
               All mimsy were the borogoves,
               And the mome raths outgrabe.&amp;quot;
               
# the same Gemini call, with the same prompt header &amp;quot;experienced geographer&amp;quot;
location &amp;lt;- gemini_structured(prompt = paste(prompt_header, text_input),
                              model = &amp;quot;2.5-flash-lite&amp;quot;, 
                              schema = schema)

# initial overview of the result as JSON object
prettify(location)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [
##     {
##         &amp;quot;name&amp;quot;: &amp;quot;The Wabe&amp;quot;,
##         &amp;quot;location&amp;quot;: &amp;quot;POINT(0 0)&amp;quot;,
##         &amp;quot;confidence&amp;quot;: 10
##     }
## ]
## &lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# interpret the result as sf object
location %&amp;gt;% 
   jsonlite::fromJSON() %&amp;gt;% 
   sf::st_as_sf(wkt = &amp;quot;location&amp;quot;, crs = 4326) %&amp;gt;% 
   leaflet() %&amp;gt;% 
   addTiles() %&amp;gt;% 
   addCircleMarkers(label = ~ paste(name, &amp;quot;- confidence&amp;quot;, confidence, &amp;quot;of 100&amp;quot;),
                    color = &amp;quot;red&amp;quot;,
                    stroke = NA,
                    fillOpacity = 1)&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;leaflet html-widget html-fill-item&#34; id=&#34;htmlwidget-3&#34; style=&#34;width:100%;height:480px;&#34;&gt;&lt;/div&gt;
&lt;script type=&#34;application/json&#34; data-for=&#34;htmlwidget-3&#34;&gt;{&#34;x&#34;:{&#34;options&#34;:{&#34;crs&#34;:{&#34;crsClass&#34;:&#34;L.CRS.EPSG3857&#34;,&#34;code&#34;:null,&#34;proj4def&#34;:null,&#34;projectedBounds&#34;:null,&#34;options&#34;:{}}},&#34;calls&#34;:[{&#34;method&#34;:&#34;addTiles&#34;,&#34;args&#34;:[&#34;https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png&#34;,null,null,{&#34;minZoom&#34;:0,&#34;maxZoom&#34;:18,&#34;tileSize&#34;:256,&#34;subdomains&#34;:&#34;abc&#34;,&#34;errorTileUrl&#34;:&#34;&#34;,&#34;tms&#34;:false,&#34;noWrap&#34;:false,&#34;zoomOffset&#34;:0,&#34;zoomReverse&#34;:false,&#34;opacity&#34;:1,&#34;zIndex&#34;:1,&#34;detectRetina&#34;:false,&#34;attribution&#34;:&#34;&amp;copy; &lt;a href=\&#34;https://openstreetmap.org/copyright/\&#34;&gt;OpenStreetMap&lt;\/a&gt;,  &lt;a href=\&#34;https://opendatacommons.org/licenses/odbl/\&#34;&gt;ODbL&lt;\/a&gt;&#34;}]},{&#34;method&#34;:&#34;addCircleMarkers&#34;,&#34;args&#34;:[0,0,10,null,null,{&#34;interactive&#34;:true,&#34;className&#34;:&#34;&#34;,&#34;stroke&#34;:null,&#34;color&#34;:&#34;red&#34;,&#34;weight&#34;:5,&#34;opacity&#34;:0.5,&#34;fill&#34;:true,&#34;fillColor&#34;:&#34;red&#34;,&#34;fillOpacity&#34;:1},null,null,null,null,&#34;The Wabe - confidence 10 of 100&#34;,{&#34;interactive&#34;:false,&#34;permanent&#34;:false,&#34;direction&#34;:&#34;auto&#34;,&#34;opacity&#34;:1,&#34;offset&#34;:[0,0],&#34;textsize&#34;:&#34;10px&#34;,&#34;textOnly&#34;:false,&#34;className&#34;:&#34;&#34;,&#34;sticky&#34;:true},null]}],&#34;limits&#34;:{&#34;lat&#34;:[0,0],&#34;lng&#34;:[0,0]}},&#34;evals&#34;:[],&#34;jsHooks&#34;:[]}&lt;/script&gt;
&lt;p&gt;
&lt;/p&gt;
&lt;p&gt;The model, again as expected, responded by hallucinating up a place called “The Wabe” and placing it on &lt;a href=&#34;https://en.wikipedia.org/wiki/Null_Island&#34;&gt;the Null Island&lt;/a&gt;. But at least it had the good manners to acknowledge the poor quality of its output by giving it a rather low confidence value. In a real world scenario such low confidence locations would be likely filtered out.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;I believe I have shown the feasibility of summarizing and geocoding a piece of text by leveraging the Gemini API.&lt;/p&gt;
&lt;p&gt;The concept seems to be working, with better than expected accuracy. Considering the (possibly excessive, but who am I to judge?) resources invested in the AI toolchain recently it is not surprising that the rough edges were sorted out, and the process of calling the Gemini model API from the comfort of my R session is very smooth.&lt;/p&gt;
&lt;p&gt;And while the Gemini API is a paid service, the costs involved are very reasonable – especially considering the effort that processing of such a volume of short texts manually would involve.&lt;/p&gt;
</description>
  </item>
  
<item>
  <title>Parametrické reporty v erku</title>
  <link>https://www.jla-data.net/cze/parametricke-reporty-v-erku/</link>
  <pubDate>Fri, 21 Jun 2019 00:00:00 +0000</pubDate>
  
<guid>https://www.jla-data.net/cze/parametricke-reporty-v-erku/</guid>
  <description>


&lt;p&gt;Tvorba reportů ve formátech &lt;em&gt;pdf&lt;/em&gt;, &lt;em&gt;html&lt;/em&gt; a &lt;em&gt;docx&lt;/em&gt; – tedy souborů čitelných v &lt;a href=&#34;https://en.wikipedia.org/wiki/Adobe_Acrobat&#34;&gt;Adobe Acrobat Readeru&lt;/a&gt;, &lt;a href=&#34;https://en.wikipedia.org/wiki/Web_browser&#34;&gt;internetovém prohlížeči&lt;/a&gt; a &lt;a href=&#34;https://en.wikipedia.org/wiki/Microsoft_Word&#34;&gt;MS Wordu&lt;/a&gt; – je vcelku dobře známá silná stránka erka.&lt;/p&gt;
&lt;p&gt;Ne tak často využívaná, ale rovněž velmi zajímavá, je možnost &lt;em&gt;parametrického&lt;/em&gt; reportingu. Tato o něco pokročilejší technika je postavena na předání určité hodnoty – &lt;em&gt;parametru&lt;/em&gt; – R Markdownu při generování reportu. Je tak možné podle jedné zdrojové markdown šablony vytvořit více hotových dokumentů.&lt;/p&gt;
&lt;p&gt;Typické přiklady použití parametrizace jsou :&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;reporty shodné daty a strukturou, ale zpracované k odlišnému datu&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;sada reportů stejné struktury, ale mírně odlišných dat (například ke stejnému datu za více regionů)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Z popisu je vidět, že parametrizace je dobrá cesta k odbourání nudné a nezáživné (navíc náchylné k chybě) ruční práce.&lt;/p&gt;
&lt;p&gt;Oceníme jí zejména v případě, kdy dojde k institucionalizaci původně jednorázového reportu. Což se, zejména při práci v korporátu, může stát…&lt;/p&gt;
&lt;p&gt;Tvorba parametrického reportu je téma na více souborů – vyžaduje minimálně dva:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;RMarkdown šablonu s definovaným parametrem&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;erkový skript který šablonu volá s konkrétní hodnotou parametru&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;V případě ukládání do &lt;em&gt;pdf&lt;/em&gt; bývá praktické navíc doplnit LaTeX-ovou šablonu.&lt;/p&gt;
&lt;p&gt;Nabízím vám ilustrativní příklad parametrického reportu, který ilustruje práci s parametry v Rmd a jejich volání přes &lt;code&gt;rmarkdown::render()&lt;/code&gt;. Protože příklad z povahy věci pracuje s více soubory nebylo praktické ho publikovat na těchto stránkách. Místo toho jsem jej uložil na &lt;a href=&#34;https://github.com/jlacko/R4RPTG&#34;&gt;GitHubu&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Projekt si snado a rychle vyklonujete z adresy &lt;code&gt;https://github.com/jlacko/R4RPTG.git&lt;/code&gt; postupem popsaným v mé &lt;a href=&#34;https://www.jla-data.net/r4su/r4su-environment-setup/#rstudio-projekty&#34;&gt;cestě erka&lt;/a&gt;.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;https://www.jla-data.net/CZE/2019-06-21-parametricke-reporty-v-erku_files/praha.png&#34; /&gt;

&lt;/div&gt;
&lt;p&gt;Pro ilustraci používám svojí oblíbenou časovou řadu ceny piva v regionech podle ČSÚ.&lt;/p&gt;
&lt;p&gt;Z hlediska dalšího rozvoje stojí za úvahu integrace generování reportů s balíčkem &lt;a href=&#34;https://cran.r-project.org/web/packages/cronR/vignettes/cronR.html&#34;&gt;cronR&lt;/a&gt; pro přehlednější scheduling jobů v Linuxovém prostředí (tj. v kontextu serverové verze RStudia).&lt;/p&gt;
&lt;p&gt;Dalším logickým krokem je automatizace distribuce takto vytvořených reportů, ale ta již hodně závisí na konkrétní infrastruktuře.&lt;/p&gt;
</description>
  </item>
  
<item>
  <title>Parametrized R Markdown Reports</title>
  <link>https://www.jla-data.net/eng/parametrized-r-markdown-reports/</link>
  <pubDate>Wed, 10 Jan 2018 00:00:00 +0000</pubDate>
  
<guid>https://www.jla-data.net/eng/parametrized-r-markdown-reports/</guid>
  <description>


&lt;p&gt;Every business, no matter how big or small, simple or sophisticated, requires regular reports to run. R Studio, especially in its server flavor with option of cron jobs, is eminently capable of producing these. Parametrized reports are thus able to perform the role of a &lt;a href=&#34;https://en.wikipedia.org/wiki/Gateway_drug_theory&#34;&gt;gateway drug&lt;/a&gt; and wean the analytic team off their beloved Excel sheets.&lt;/p&gt;
&lt;p&gt;In fact, if I was looking for a single feature to convince a die hard Excel user to see the light and give up his VLOOKUP, I would stress out the &lt;em&gt;ease&lt;/em&gt; of regular reporting with parametrized reports. It might not be a fancy ML / AI technique that catches the headlines, but it is one of the small things which take the pain out of everyday chores.&lt;/p&gt;
&lt;p&gt;This example will demonstrate creating parametrized reports using the well known and much loved &lt;em&gt;Iris&lt;/em&gt; dataset.&lt;/p&gt;
&lt;p&gt;It will show:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;a &lt;em&gt;R Markdown&lt;/em&gt; template, with a single parameter &lt;code&gt;species&lt;/code&gt; defined&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;using &lt;code&gt;knitr::kable&lt;/code&gt; function and the &lt;code&gt;kableExtra&lt;/code&gt; package to build a simple table with a calculated summary row and some basic formatting&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;a master &lt;em&gt;R&lt;/em&gt; script, calling &lt;code&gt;rmarkdown::render&lt;/code&gt; on the template to build the reports, iterating value of the parameter &lt;code&gt;species&lt;/code&gt; over unique values of species from the Iris dataset&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The R markdown template in its easiest part needs just two parts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;YAML header&lt;/li&gt;
&lt;li&gt;a single R chunk&lt;/li&gt;
&lt;/ul&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;---
title: &amp;quot;Iris *`r params$species`* are rather cute...&amp;quot; # a report looks better with the title set
params:  # this is the parameter declaration
  species: &amp;quot;setosa&amp;quot; # default value, overrriden by the render function, but helpful for debugging
output:
  pdf_document:
    latex_engine: pdflatex
header-includes:
- \usepackage{booktabs}
- \usepackage{longtable}
- \usepackage{array}
- \usepackage{multirow}
- \usepackage[table]{xcolor}
- \usepackage{wrapfig}
- \usepackage{float}
- \usepackage{colortbl}
- \usepackage{pdflscape}
- \usepackage{tabu}
- \usepackage{threeparttable}
- \usepackage[normalem]{ulem}
---&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The YAML header needs to include declaration of the parameters (indentation is, as is often the case with YAML, crucial). Including a default value is optional, but helpful in debugging.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;header-includes&lt;/code&gt; option loads LaTeX macros necessary for table formatting; this list, &lt;a href=&#34;http://haozhu233.github.io/kableExtra/awesome_table_in_pdf.pdf&#34;&gt;helpfuly provided by&lt;/a&gt; Hao Zhu (the author of &lt;code&gt;kableExtra&lt;/code&gt; package) should keep the dreaded LaTeX error “environment &lt;em&gt;xyz&lt;/em&gt; undefined” at bay.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(tidyverse)
library(knitr)
library(kableExtra)

src &amp;lt;- iris %&amp;gt;% # here you would normally load a file or connect to a database...
  filter(Species == params$species) %&amp;gt;%
  mutate(Species = as.character(Species)) %&amp;gt;% # factor would be a problem for summary row
  select(Species, Sepal.Length) %&amp;gt;% # just two columns for the sake of clarity...
  slice(1:5) # first five rows only, so that page space is not an issue

src &amp;lt;- rbind(src, # add summary row 
             c(&amp;quot;Grand total&amp;quot;, sum(src$Sepal.Length)))

kable(src,
      format = &amp;#39;latex&amp;#39;,
      booktabs = T,
      align = c(&amp;#39;l&amp;#39;,&amp;#39;r&amp;#39;)) %&amp;gt;%
      row_spec(nrow(src), bold = T) # make the last (summary) row bold&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The body chunk needs to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;declare your libraries (note that &lt;code&gt;knitr&lt;/code&gt;, where &lt;code&gt;kable&lt;/code&gt; lives, is not a formal part of tidyverse - it is ‘just’ suggested - and needs to be loaded separately)&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;load your data (I have cheated a little, and used a pre-loaded Iris dataset) and&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;peform necessary filtering / aggregating&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Note how &lt;code&gt;params$species&lt;/code&gt; is applied as filter condition, and how the summary row is created by binding a new row to the filtered dataset.&lt;/p&gt;
&lt;p&gt;The master script needs to do two things:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;construct a vector of unique Iris species, each of which will be passed as a parameter the &lt;code&gt;render&lt;/code&gt; function to generate a report&lt;/li&gt;
&lt;li&gt;call the &lt;code&gt;render&lt;/code&gt; function from &lt;code&gt;rmarkdown&lt;/code&gt; package, with a list of parameters as required by the template. In this simple case just a sigle parameter ‘species’.&lt;/li&gt;
&lt;/ul&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(rmarkdown)

flowers &amp;lt;- unique(iris$Species) # setosa, versicolor, virginica - you know them all, don&amp;#39;t you?

for (i in seq_along(flowers)) {
  myIris &amp;lt;- flowers[i]  # my species - to be reused as 1) parameter &amp;amp; 2) file name
  render(&amp;quot;report-template.Rmd&amp;quot;, # the template
          params = list(species = myIris), # value of myIris passed to the species parameter
          output_file = paste(myIris, &amp;#39;.pdf&amp;#39;, sep = &amp;#39;&amp;#39;), # name of the output file - species name and pdf extension
          quiet = T,
          encoding = &amp;#39;UTF-8&amp;#39;)
}&lt;/code&gt;&lt;/pre&gt;
When you put it all together and source the master script you should end up with three pdf files like this:
&lt;p align=&#34;center&#34;&gt;
&lt;img src=&#34;https://www.jla-data.net/img/2018-01-10-iris-screenshot.png&#34; /&gt;
&lt;/p&gt;
&lt;p&gt;You can download a working example of both the &lt;a href=&#34;https://www.jla-data.net/sample/par-temp.Rmd&#34;&gt;markdown document&lt;/a&gt; and &lt;a href=&#34;https://www.jla-data.net/sample/par-master.R&#34;&gt;master script&lt;/a&gt; directly from my pages.&lt;/p&gt;
&lt;p&gt;As a next step I recommend learning more about the &lt;a href=&#34;https://cran.r-project.org/web/packages/cronR/cronR.pdf&#34;&gt;cronR&lt;/a&gt; package - when teamed with the parametric report functionality you get a report that makes itself; an business analyst dream!&lt;/p&gt;
</description>
  </item>
  
</channel>
  </rss>