Extracting location from text with AI

In one of my projects I have recently faced the need to summarize and geolocate a largish volume of short pieces of text. While the actual project is not that relevant for this post I found the problem as a fun excuse for using AI methods, and overall an interesting learning exercise.

The method I ended using is the structured output of Gemini API, leveraged from R via the gemini.R package. Gemini because I have a working relationship with Google from using their geocoding and routing APIs extensively in my work, and structured output in order to ease the conversion from plain text to the {sf} flavor of data frame.

Since my task was relatively simple and high volume I tried at first the Lite version of the current Gemini model, as it produces the fastest (and cheapest) output. I found the quality of the lite model more than adequate for my needs, and so I stuck with the initial choice.

The first step is reading in the libraries required; no surprise here…

library(gemini.R)  # for accessing the Gemini API
library(dplyr)     # for the pipe and data frame handling
library(jsonlite)  # to make sense of the JSON results
library(leaflet)   # to visualize the output

The Gemini call is not overly complicated: it involves passing a prompt to a model, and handling the output. In the case of a gemini.R::gemini_structured() call also with specification of the required structure, which is well documented on the package website.

Since the Gemini API is a paid service it is also necessary to register an API key; a step that I am omitting here for practical reasons (I suggest keeping the key in your .Renviron file).

The fun part is playing with various versions of a prompt; in my case it follows the structure of “you are an experienced whatever, do give me this & that” followed by the piece of text that needs summarising and geocoding.

With a little tuning it can be tweaked to geocode either all the locations mentioned, or only the most important one. Since my use case called for one (and only one) location per piece of text I am asking very specifically for the single most important location.

To test my prompt I am using a piece of lyrics from Peter Sarsted’s Where Do You Go To:

# initial prompt
prompt_header <- "you are an experienced geographer; analyze this text and 
                  give me its single most important location as a name and 
                  as a POINT in simple features WKT format 
                  and state your confidence on a scale from 0 to 100 \n\n"

# text to be analyzed
text_input <- "You talk like Marlene Dietrich
               And you dance like Zizi Jeanmaire
               Your clothes are all made by Balmain
               And there's diamonds and pearls in your hair, yes, there are
               You live in a fancy apartment
               Off the Boulevard St. Michel
               Where you keep your Rolling Stones records
               And a friend of Sacha Distel, yes, you do
               
               But where do you go to, my lovely
               When you're alone in your bed?
               Tell me the thoughts that surround you
               I want to look inside your head, yes, I do
               
               I've seen all your qualifications
               You got from the Sorbonne
               And the painting you stole from Picasso
               Your loveliness goes on and on, yes, it does
               When you go on your summer vacation
               You go to Juan-les-Pins
               With your carefully designed topless swimsuit
               You get an even suntan on your back, and on your legs
               And when the snow falls you're found in St. Moritz
               With the others of the jet set
               And you sip your Napoleon brandy
               But you never get your lips wet, no, you don't
               
               But where do you go to, my lovely
               When you're alone in your bed?
               Won't you tell me the thoughts that surround you?
               I want to look inside your head, yes, I do"

# schema to give the output a firm structure
schema <- list(
   type = "ARRAY",
   items = list(
      type = "OBJECT",
      properties = list(
         name = list(type = "STRING"),
         location = list(type = "STRING"),
         confidence = list(type = "NUMBER")
      ),
      propertyOrdering = c("name", "location", "confidence")
   )
)

Having all the parts ready I place a call to the Gemini model:

# let Gemini perform its magic!
location <- gemini_structured(prompt = paste(prompt_header, text_input),
                              model = "2.5-flash-lite", # for the cheapskates...
                              schema = schema)

## Gemini is generating a structured response...

To give an overview of the result returned I first check the JSON returned, and then transform the output from JSON to {sf} data format via a regular data frame. The model evidently understands the logic of simple features format well and uses EPSG:4326 coordinates by default. The transformation from a well known text to {sf} is thus not a complicated one.

As a final step I pipe the {sf} result to a {leaflet} call, visualizing the location on the default OSM basemap.

# initial overview of the result as JSON object
prettify(location)

## [
##     {
##         "name": "Boulevard Saint-Michel",
##         "location": "POINT(2.3366434 48.8464368)",
##         "confidence": 95
##     }
## ]
##

# interpret the result as sf object
location %>% 
   jsonlite::fromJSON() %>%  
   sf::st_as_sf(wkt = "location", crs = 4326) %>% 
   leaflet() %>% 
   addTiles() %>% 
   addCircleMarkers(label = ~ paste(name, "- confidence", confidence, "of 100"),
                    color = "red",
                    stroke = NA,
                    fillOpacity = 1)

I was pleasantly surprised that the model did not fall for the red herrings of the Azure Coast and St. Moritz, and places the most significant location firmly in the Left Bank of Paris. The location returned is about 250 meters off the actual Boulevard St. Michel, a level of accuracy that is more than adequate for my needs.

Since my original use case was in multiple languages I am trying out next a different call, using lyrics from Zhanna Bichevskaya’s The Vagabond in both an unfamiliar language and script:

# text to be analyzed
text_input <- "По диким степям Забайкалья,
               Где золото роют в горах,
               Бродяга, судьбу проклиная,
               Тащился с сумой на плечах.
               
               Бежал из тюрьмы тёмной ночью,
               В тюрьме он за правду страдал.
               Идти дальше нет уже мочи –
               Пред ним расстилался Байкал.
               
               Бродяга к Байкалу подходит,
               Рыбацкую лодку берёт
               И грустную песню заводит,
               Про Родину что-то поёт.

               Бродяга Байкал переехал,
               Навстречу - родимая мать.
               'Ах, здравствуй, ах, здравствуй, мамаша,
               Здоров ли отец мой да брат?'
               
               'Отец твой давно уж в могиле,
               Землею сырою лежит,
               А брат твой давно уж в Сибири,
               Давно кандалами гремит.'"

# the same Gemini call, with the same prompt header "experienced geographer"
location <- gemini_structured(prompt = paste(prompt_header, text_input),
                              model = "2.5-flash-lite", 
                              schema = schema)

# initial overview of the result as JSON object
prettify(location)

## [
##     {
##         "name": "Lake Baikal",
##         "location": "POINT(108.316667 53.616667)",
##         "confidence": 95
##     }
## ]
##

# interpret the result as sf object
location %>% 
   jsonlite::fromJSON() %>% 
   sf::st_as_sf(wkt = "location", crs = 4326) %>% 
   leaflet() %>% 
   addTiles() %>% 
   addCircleMarkers(label = ~ paste(name, "- confidence", confidence, "of 100"),
                    color = "red",
                    stroke = NA,
                    fillOpacity = 1)

The model interprets the song accurately, and places the principal location in the middle of Lake Baikal as expected (you may need to zoom the map out a little to fully appreciate this).

Given the AI models well documented eagerness to please – and the inevitable hallucinations which result from it – I wanted to finally test my prompt with a piece of text guaranteed to contain absolutely no usable information; Lewis Carroll’s Jabberwocky ensures that:

# text to be analyzed
text_input <- "'Twas brillig, and the slithy toves
               Did gyre and gimble in the wabe;
               All mimsy were the borogoves,
               And the mome raths outgrabe.
               
               'Beware the Jabberwock, my son!
               The jaws that bite, the claws that catch!
               Beware the Jubjub bird, and shun
               The frumious Bandersnatch!'
               
               He took his vorpal sword in hand:
               Long time the manxome foe he sought—
               So rested he by the Tumtum tree,
               And stood awhile in thought.
               
               And as in uffish thought he stood,
               The Jabberwock, with eyes of flame,
               Came whiffling through the tulgey wood,
               And burbled as it came!
               
               One, two! One, two! And through and through
               The vorpal blade went snicker-snack!
               He left it dead, and with its head
               He went galumphing back.
               
               'And hast thou slain the Jabberwock?
               Come to my arms, my beamish boy!
               O frabjous day! Callooh! Callay!'
               He chortled in his joy.
               
               'Twas brillig, and the slithy toves
               Did gyre and gimble in the wabe;
               All mimsy were the borogoves,
               And the mome raths outgrabe."
               
# the same Gemini call, with the same prompt header "experienced geographer"
location <- gemini_structured(prompt = paste(prompt_header, text_input),
                              model = "2.5-flash-lite", 
                              schema = schema)

# initial overview of the result as JSON object
prettify(location)

## [
##     {
##         "name": "The Wabe",
##         "location": "POINT(0 0)",
##         "confidence": 10
##     }
## ]
##

# interpret the result as sf object
location %>% 
   jsonlite::fromJSON() %>% 
   sf::st_as_sf(wkt = "location", crs = 4326) %>% 
   leaflet() %>% 
   addTiles() %>% 
   addCircleMarkers(label = ~ paste(name, "- confidence", confidence, "of 100"),
                    color = "red",
                    stroke = NA,
                    fillOpacity = 1)

The model, again as expected, responded by hallucinating up a place called “The Wabe” and placing it on the Null Island. But at least it had the good manners to acknowledge the poor quality of its output by giving it a rather low confidence value. In a real world scenario such low confidence locations would be likely filtered out.

I believe I have shown the feasibility of summarizing and geocoding a piece of text by leveraging the Gemini API.

The concept seems to be working, with better than expected accuracy. Considering the (possibly excessive, but who am I to judge?) resources invested in the AI toolchain recently it is not surprising that the rough edges were sorted out, and the process of calling the Gemini model API from the comfort of my R session is very smooth.

And while the Gemini API is a paid service, the costs involved are very reasonable – especially considering the effort that processing of such a volume of short texts manually would involve.