WEKA is a popular machine learning workbench, developed at the University of Waikato, written in Java and released under GPL v3 (previous releases used GPL v2). The software allows the user to analyze data from various data sources (files, databases), using classification, regression, clustering, attribute selection. Visualizations for, e.g., data, classifications errors, are available as well. Its plugin architecture makes it easy to add custom algorithms, visualizations, etc., in case the supplied ones are not sufficient.

Between 2005 and 2007, I was the maintainer of this software while being a research programmer at the university. Since then I’ve been contributing bugfixes, enhancements and new packages, as I continue to use it for other, larger projects in my research projects.

Here are some of my code contributions:

  • dynamic class discovery, allowing user to add new algorithms simply by adding jars
  • viewer for datasets in tabular format (“ArffViewer”)
  • SQL worksheet interface for loading in data from JDBC databases (“SqlViewer”)
  • support for storing datasets in JSON, LibSVM and Matlab
  • Maven build system for publication on Maven Central

And some contributed packages (available through the WEKA Package Manager):

  • WekaExcel – loading/saving of datasets in MS Excel (.xls, .xlsx) format in the WEKA Explorer
  • WekaODF – loading/saving of datasets in ODF format (OpenOfficeLibreOffice; .ods) in the WEKA Explorer
  • gridSearch – parameter optimization for classification/regression
  • partialLeastSquares – PLS filter and classifier
  • scriptingClassifiers – enables user to write classifiers in Jython or Groovy

It comes with an exploratory interface, called the WEKA Explorer:

Displaying errors (actual vs predicted) of a numeric dataset:

For comparing datasets and/or algorithms, the WEKA Experimenter can be used to perform paired t-tests:

It also comes with a workflow engine, allowing the user to combine preprocessing and output of evaluations all at once: