Downloading files from AEMET FTP server with Java and Apache Commons Net

Some time ago the AEMET, the Spanish government meteorological agency, release many of its data publicly: weather radar, weather stations, lightnings, … The way they do is simple and effective: public the contents on a FTP server and update on regular intervals of time. Talking with a colleague we decide to create (on our very few free time) a simple web page that allows people to see the current values of the weather stations. A map of Spain with some dots representing weather stations and a graph with showing data from the selected station.

The data

The kind of data we are looking for is stored in the ‘/datos_observacion/observaciones_diezminutales‘ folder of the FTP sever. Within it we can found two types of folders, ended with ‘diezminutales‘ or ‘estaciones‘. We will work with the first one. For every day we will found a ‘YYYYMMDD_diezminutales‘ folder which contains the day data specify by YYYYMMDD. Inside it new files are created names ‘YYYYMMDDHHmm_datos.csv.gz‘ which contains all the weather stations data for the 10minutes interval. We also need to take into account one more consideration. A file is not immutable, that means you can read a file with some data (or file  size) and in 1 hour it can have more new data (bigger file size) because some new data has recovery and corresponds to that interval.

The design

The idea is to create some simple program that given a date (YYYYMMDD) synchronizes the corresponding remote  ‘YYYYMMDD_diezminutales‘ folder with a local one. This way we can maintain locally and update copy of the files. Once files are locally stored a different process, that means a different post :), is responsible to handle data in the right way to be used by the web application. The summary of steps are:

  1. Given a YYYYMMDD check if the remote  ‘YYYYMMDD_diezminutales‘ folder exists. If so, then synchronize it locally:
    1. Create a local ‘YYYYMMDD_diezminutales‘ folder if not exists.
    2. For every file on remote folder:
      1. Download if it doesn’t exists or his size has changes.
      2. Otherwise, ignore if it.

The implementation

Because I’m more experienced with Java I have used it to code a little program to do the previously commented synchronization. To make FTP request I have used the Apache Commons Net project. From its web page we can read:

Apache Commons Net implements the client side of many basic Internet protocols.

The Main class

This is the starter point. User must execute the program specifying a year-month-day to check YYYY-MM-DD, something like:

Lets talk a bit about the main program. The first to do is to ensure user has specified a valid date:

Second, we create the folder structure which will hold the local files copy:

Finally, we get the files through the use of  ‘AdquisicioObservaciones‘ class:

Putting all together:

The Utils class

This is a helper class with some methods used around the code. Mainly it helps creating and setting dates in UTC (because this is the format used by AEMET) and also contains a method that uncompress GZIP files:

The AdquisicioObservaciones class

In class constructor we must specify the local folder were files must be synchronized. Later, we can execute the ‘get‘ method as many times we want and for any day we desire. The ‘get‘ method, as we commented,  needs a date to work:

Mainly it stores the date on a class attribute, prints some log information and delegates the hard work to the ‘handleFiles‘ method. First thing this method does is to create a FTP connection and change the remote working directory to ‘datos_observacion/observaciones_diezminutales‘ and within it change to the desired day:

Note we set the transfer mode to binary with:

which is important because files are GZIP compressed. Then if local folder for the specified day doesn’t exists it creates one:

 

Finally the big part comes here. The code gets the list of remote files and check which one doesn’t exist locally or has changed it size:

The next important method is ‘downloadFile‘, which requires two parameters the remote FTP file and the local file name where we want to store data. The remote files is retrieved using the FTPClient’s ‘retrieveFile‘ method and uncompressed with the helper method Utils.uncompressGzFile():

Conclusions

The program is far beyond to be a serious program to maintain a database with all the weather stations data, but can you on how to download files from FTP server using Java.

More work must be done to get also the metadata information from the weather stations (name, location, etc) and merge together with the observations. Hope that will be shown in a next post.

Finally, simply to say the code is plenty of log messages which can make code cumbersome but gives lot of information about the synchronization process. A typicall output is something like:

Download

  • Source code: here. It is bundled as a NetBeans project but can you open with any text editor.
  • Binaries: here. Remember to execute the code you must set the day to check, like:

    If you execute the program for the same day, only the new files or those with size modified will be processed again.

References

4 Responses

  1. Rafa September 2, 2012 / 02:32

    Bua! hace como un año me curre un script en php que hacia lo mismo.
    Jopelines si llego a leer esto antes!

    • asantiago September 2, 2012 / 11:08

      Don’t worry man. Remember how good you feel when finish your script :P
      I was working too in DatosHoy (http://acuriousanimal.com/datosHoy) but for the moment I do not released the code (it is PHP). Anybody interested on it.

      • Rafa September 7, 2012 / 09:41

        OH! very nice. I going to follow the proyect

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">