weka-virtualenv now with scripting support

Over the last few days, I’ve implemented a basic scripting framework in my weka-virtualenv tool. All previous command-line commands are available, plus special scripting commands. Scripting is done either through the script command or through a special tab in the user interface (choose Script from the main menu).

It has basic variable support, offering string and string array variables. You can simply set variables via set name=value and remove them with unset name.

You can perform simple looping with for loops and iteration using the foreach command, which iterates over a string array variable. Each time you are iterating a nested block (blocks are defined by indentation, similar to Python, using either blanks or tabs).

Some scripting commands:

  • simple piping of output (stdout or stderr) through filters (grep, replace, tee)
  • read_lines for reading text file into string array variable
  • search for files (list_files) and directories (list_dirs)
  • deleting of files (del_file) and directories (del_dir)
  • calc for calculating mathematical expressions

Here is an example that iterates through all ARFF files in a directory and cross-validates J48 on them, storing the filtered output (accuracy and RMSE) in a text file:

# setup
set searchdir=/home/fracpete/development/datasets/uci/nominal
set outfile=/home/fracpete/j48.txt
set env=weka392

# search for arff files
echo --message "\nsearch dir: ${searchdir}"
list_files --dir ${searchdir} --regexp ".*\.arff" --dest files

# evaluate J48 on each of the arff files and output accuracy in text file
foreach --iterate files --dest file
  echo --message "\n--> ${file}" | tee --stdout --append --output ${outfile}
  run ${env} --class weka.classifiers.trees.J48 -t ${file} \
     | grep --stdout --regexp ".*(Time taken|Correctly Classified|Root mean squared).*" \
     | tee --stdout --append --output ${outfile}

You can find documentation on the available script commands on github:

Scripting is available starting with release 0.0.16.