ETL Tools
 
Section 7.1: ETL Tools

-columns  <columns> Column indices, one per line. Can include ranges, e.g. 0-5 
Only include the given columns

-notcolumns  <columns> Column indices, one per line. Can include ranges, e.g. 0-5 
Don't include given columns

-delete  <columns> Column indices, one per line. Can include ranges, e.g. 0-5 
Remove the columns

-cut  <rows> One or more rows. -1 to the end. e.g., 0-3,5,10,-1 

-include  <rows> one or more rows, -1 to the end 
Only include specified rows

-skip  <rows> How many rows to skip 
Skip number of rows

-copy  <column> 
	 <name> 

-insert  <column> Column to insert after 
	 <values> Single value or comma separated for multiple rows 
Insert new column values

-concat  <columns> Column indices, one per line. Can include ranges, e.g. 0-5 
	 <delimiter> 
Create a new column from the given columns

-split  <column> 
	 <delimiter> What to split on 
	 <names> Comma separated new column names 
Split the column

-splat  <key col> 
	 <column> 
	 <delimiter> 
	 <name> new column name 
Create a new column from the values in the given column

-shift  <rows> Rows to apply to 
	 <column> Column to start at 
	 <count> 
Shift columns over by count for given rows

-addcell  <row> 
	 <column> 
	 <value> 
Add a new cell at row/column

-deletecell  <row> 
	 <column> 
Delete cell at row/column

-mergerows  <rows> 2 or more rows 
	 <delimiter> 
	 <close> 

-rowop  <keys> Key columns 
	 <values> Value columns 
	 <operator> Operator 
Apply an operator to columns and merge rows

-rotate 
Rotate the data

-flip 
Reverse the order of the rows except the header

-unfurl  <column> column to get new column header# 
	 <value columns> Columns to get values from 
	 <unique column> The unique value, e.g. date 
	 <other columns> Other columns to include 
Make columns from data values

-furl  <columns> Column indices, one per line. Can include ranges, e.g. 0-5 
	 <header label> 
	 <value label> 
Use values in header to make new row

-explode  <column> 
Make separate files based on value of column

-join  <key columns> 
	 <value_columns> value columns 
	 <file> File to join with 
	 <source_columns> source key columns 
Join the 2 files together

7.1.1 Filter
-start  <start pattern> 
Start at pattern in source file

-stop  <stop pattern> 
End at pattern in source file

-rawlines  <lines> How many lines to pass through unprocesed 

-min  <min # columns> 
Only pass thorough lines that have at least this number of columns

-max  <max # columns> 
Only pass through lines that have no more than this number of columns

-pattern  <column> 
	 <pattern> 
Pass through rows that match the pattern

-notpattern  <column> 
	 <pattern> 
Pass through rows that don't match the pattern

-unique  <columns> Column indices, one per line. Can include ranges, e.g. 0-5 
Pass through unique values

-dups  <columns> Column indices, one per line. Can include ranges, e.g. 0-5 
Pass through duplicate values

-maxvalue  <key column> 
	 <value column> 

-eq  <column> 
	 <value> 
Extract rows that pass the expression

-ne  <column> 
	 <value> 
Extract rows that pass the expression

-gt  <column> 
	 <value> 
Extract rows that pass the expression

-ge  <column> 
	 <value> 
Extract rows that pass the expression

-lt  <column> 
	 <value> 
Extract rows that pass the expression

-le  <column> 
	 <value> 
Extract rows that pass the expression

-groupfilter  <column> 
	 <value_column> Value column 
	 <operator> 
	 <value> 
One row in each group has to match

-before  <column> 
	 <format> 
	 <date> 
	 <format2> 

-after  <column> 
	 <format> 
	 <date> 
	 <format2> 

-latest  <columns> Key columns 
	 <column> Date column 
	 <format> 

-countvalue  <column> 
	 <count> 
No more than count unique values

-decimate  <rows> # of start rows to include 
	 <skip> skip factor 
only include every <skip factor> row

-skipline  <pattern> 
Skip any line that matches the pattern

-change  <columns> Column indices, one per line. Can include ranges, e.g. 0-5 
	 <pattern> 
	 <substitution string> use $1, $2, etc for pattern (...) matches 
Change columns

-changerow  <rows> Row indices, one per line. Can include ranges, e.g. 0-5 
	 <columns> Column indices, one per line. Can include ranges, e.g. 0-5 
	 <pattern> 
	 <substitution string> 
Change the values in the row/cols

-set  <columns> Column indices, one per line. Can include ranges, e.g. 0-5 
	 <rows> Row indices, one per line. Can include ranges, e.g. 0-5 
	 <value> 
Write the value into the cells

-macro  <pattern> 
	 <template> 
	 <column label> 
Look for the pattern in the header and apply the template to make a new column, template: '{1} {2} ...', use 'none' for column name for no header

-setcol  <column> match col # 
	 <pattern> 
	 <write column> 
	 <value> 
Write the value into the write col for rows that match the pattern

-priorprefix  <column> 
	 <pattern> 
	 <delimiter> 
Append prefix from the previous element to rows that match pattern

-case  <type> 
	 <column> 
Change case of column

-width  <columns> Column indices, one per line. Can include ranges, e.g. 0-5 
	 <size> 
Limit the string size of the columns

-prepend  <text> 
Add the text to the beginning of the file. use _nl_ to insert newlines

-pad  <count> 
	 <pad string> 
Add or remove columns to achieve the count

-prefix  <column> 
	 <prefix> 
Add prefix to column

-suffix  <column> 
	 <suffix> 
Add suffix to column

-js  <javascript> 
Define Javascript e.g., functions to use later in the -func call

-func  <names> New column names 
	 <javascript> javascript expression 
Apply the javascript function. Use _colname or _col#

-endswith  <column> 
	 <string> 
Ensure that each column ends with the string

-trim  <columns> Column indices, one per line. Can include ranges, e.g. 0-5 
Trim the string values

-truncate  <column> 
	 <max length> 
	 <suffix> 

-extract  <column> 
	 <pattern> 
	 <replace with> use 'none' for no replacement 
	 <new column name> 
Extract text from column and make a new column

-map  <column> 
	 <new columns name> 
	 <value newvalue ...> 
Change values in column to new values

-combine  <column> 
	 <delimiter> 
	 <new column name> 
Combine columns with the delimiter. deleting columns

-combineinplace  <column> 
	 <delimiter> 
	 <new column name> 
Combine columns with the delimiter

-format  <columns> Column indices, one per line. Can include ranges, e.g. 0-5 
	 <format> Decimal format  e.g. '##0.00' 

-denormalize  <file> From csv file 
	 <from id idx> 
	 <from value idx> 
	 <to idx> 
	 <new col name> 
	 <mode replace add> 
Read the id,value from file and substitute the value in the dest file col idx

-break  <label1> 
	 <label2> 
	 <columns> Column indices, one per line. Can include ranges, e.g. 0-5 
Break apart column values and make new rows

-md  <columns> Column indices, one per line. Can include ranges, e.g. 0-5 
	 <type> 
Make a message digest of the column values

-uuid 
Add a UUID field

-number 
Add 1,2,3... as column

-letter 
Add 'A','B', ... as column

7.1.4 Lookup
-wikidesc  <column> 
	 <suffix> 
Add a description from wikipedia

-image  <column> 
	 <suffix> 
Search for an image

-imagefill  <querycolumn> 
	 <suffix> 
	 <imagecolumn> 
Search for an image with the query column text if the given image column is blank. Add the given suffix to the search.

-gender  <column> 
Figure out the gender of the name in the column

7.1.5 Dates
-convertdate  <column> 
	 <sourceformat> Source format 
	 <destformat> Target format 

-extractdate  <date column> 
	 <format> Date format 
	 <timezone> 
	 <what> What to extract 

-formatdate  <columns> Column indices, one per line. Can include ranges, e.g. 0-5 
	 <intial date format> 
	 <target date format> 

7.1.6 Numeric
-scale  <column> 
	 <delta1> 
	 <scale> 
	 <delta2> 
Set value={value+delta1}*scale+delta2

-generate  <label> 
	 <start> 
	 <step> 
Add row values

-decimals  <column> 
	 <how many decimals to round to> 

-delta  <key columns> 
	 <columns> Column indices, one per line. Can include ranges, e.g. 0-5 
Add column that is the delta from the previous step

-operator  <columns> Column indices, one per line. Can include ranges, e.g. 0-5 
	 <new col name> 
	 <operator +,-,*,/> 
Apply the operator to the given columns and create new one

-round  <columns> Column indices, one per line. Can include ranges, e.g. 0-5 
round the values

-sum  <key columns> 
	 <value columns> 
	 <carry over columns> 
Sum values keying on name column value. If no value columns specified then do a count

-percent  <columns to add> 

-increase  <column> 
	 <how far back> 
Calculate percent increase

-average  <columns> Column indices, one per line. Can include ranges, e.g. 0-5 
	 <period> 
	 <label> 
Calculate a moving average

7.1.7 Geocode
-geocode  <columns> Column indices, one per line. Can include ranges, e.g. 0-5 
	 <prefix> e.g., state: or county: 
	 <suffix> 

-geocodeaddressdb  <columns> Column indices, one per line. Can include ranges, e.g. 0-5 
	 <prefix> 
	 <suffix> 

-geocodejoin  <column> 
	 <csv file> File to get lat/lon from 
	 <name idx> 
	 <lat idx> 
	 <lon idx> 
Geocode with file

-statename  <column> 
Add state name from state ID

-mercator  <columns> Column indices, one per line. Can include ranges, e.g. 0-5 
Convert x/y to lon/lat

-region  <columns> Column indices, one per line. Can include ranges, e.g. 0-5 
Add the state's region

-population  <columns> Column indices, one per line. Can include ranges, e.g. 0-5 
	 <prefix> e.g., state: or county: 
	 <suffix> 
Add in population from address

-sort  <column> Column to sort on 

-descsort  <column> Column to descending sort on 

-count 
Show count

-maxrows  <Max rows to print> 

-changeline  <from> 
	 <to> 
Change the line

-changeraw  <from> 
	 <to> 
Change input text

-crop  <columns> Column indices, one per line. Can include ranges, e.g. 0-5 
	 <pattern1,pattern2> 
Crop last part of string after any of the patterns

-strict 
Be strict on columns. any rows that are not the size of the other rows are dropped

-flag 
Be strict on columns. any rows that are not the size of the other rows are shown

-verify  <# columns> 
Throw error if a row has a different number of columns

-prop  <property> 
	 <value> start, end, etc 
Set a property

-comment  <string> 

-verify 
Verify that all of the rows have the same # of columns

7.1.9 Input
-delimiter  <delimiter> Use 'space' for space, 'tab' for tab 
Specify a delimiter

-tab 
Use tabs

-widths  <widths> w1,w2,...,wN 
Columns are fixed widths

-header  <header> Column names 
Raw header

-html  <skip> Number of tables to skip 
	 <pattern> Pattern to skip to 
	 <properties> Other attributes -    removeEntity false removePattern pattern 
Parse the table in the input html file

-htmlpattern  <columns> Column names 
	 <startPattern> 
	 <endPattern> 
	 <pattern> Row pattern. Use (...) to match columns 
Parse the input html file

-json  <arrayPath> Path to the array e.g., obj1.arr[2].obj2 
	 <objectPaths> One or more paths to the objects e.g. geometry,features 
Parse the input as json

-xml  <path> Path to the elements 
Parse the input as xml

-text  <comma separated header> 
	 <chunk pattern> 
	 <token pattern> 
Extract rows from the text

-text2  <comma separated header> 
	 <chunk pattern> 
	 <token pattern> 
Extract rows from the text

-text3  <comma separated header> 
	 <token pattern> 
Extract rows from the text

-tokenize  <header> header1,header2... 
	 <pattern> 
Tokenize the input from the pattern

-sql  <db> The database id (defined in the environment) 
	 <table> The table to select from 
	 <properties> 'columns' c1,c2,...  'where' c1,<|>|<>|like|notlike;... 
Connect to the given database

-prune  <bytes> Number of leading bytes to remove 
Prune out the first N bytes

7.1.10 Output
-print 
Output the rows

-template  <prefix> 
	 <template> Use ${0},${1}, etc for values 
	 <delimiter> Output between rows 
	 <suffix> 
Apply the template to make the output

-raw 
Print the file raw

-record 
Print records

-printheader 
Print the first line

-pointheader 
Generate the RAMADDA point properties

-addheader  <properties> name1 value1 ... nameN valueN 
Add the RAMADDA point properties

-deheader 
Strip off the point header

-db  <props> Name value pairs: 		table.id  table.name  table.cansearch  table.canlist  table.icon  		.id  .label  		.type  		.format  		.canlist  .cansearch  		install  		nukedb Generate the RAMADDA db xml from the header

-toxml  <tag> 
Generate XML

-run  <Name of process directory> 

-cat  <*.csv> 
One or more csv files

-script 
Generate the script to call

-args 
Generate the CSV file commands

-args2 
Print out the args