ETL Tools
 
Section 7.0: ETL Tools

RAMADDA provides an interactive facility to transform CSV, JSON, XML or HTML files and create structured point data files and databases.

To start, upload the CSV/JSON/XML/HTML file into RAMADDA. Go to the Main Menu->View->Convert Data menu to see the conversion interface below. Here is the convert page for the below example.

CSV Conversion Interface
Image 1: CSV Conversion Interface
A pipeline of commands are defined for manipulating the rows and columns of the source file. These commands can be on the same line or on multiple lines. If on multiple lines then the intermediate processing results of each line of commands is available. Prefix a line with "#" to comment out the line.
  • The button allows you to add a new command. Once added you can edit the command in place or right click on the command to bring up the command editor dialog (see below).
  • The button allows you to insert a reference to another file in RAMADDA for those c ommands (e.g., join) that require other files.
  • The button allows you to specify settings -
  • The button displays help.
  • The Header, Table, Records, etc., buttons runs the commands and produces different output. The Save checkbox saves the command text when you run the commands so you can return to it later. The Do Commands checkbox applies the commands when you press one of the run buttons.
Many of the commands below take a columns specification. This takes the form of:
"colA,colB,colC-colD,colE,..."
e.g.:
"0,1,2,7-10,12"

The conversion service supports extracting data from an HTML table in a HTML page. Save the HTML page and upload to RAMADDA as a CSV File entry type. Then go to the conversion page. The first command to enter is the -html command:
-html "name value arguments"
The name/value arguments include:
-html "skip <how many tables in the HTML file to skip>"
-html "removePattern <regexp pattern to remove>  removePattern2 <another pattern to remove>"
Because there are many levels of parsing of escape characters if you need to remove text that contains special regular expression characters - [,],(,) and . - use the special tags: _leftbracket_, _rightbracket_, _leftparent_, _rightparen_, _dot_, e.g.:
-html "removePattern _leftbracket_.*?_rightbracket_ "