It’s been a while since the last release and there were quite a number of bugfixes and additions this time (eg database access, text mining), so well worth the upgrade. A major addition is the workflow component, encapsulating a lot of the functionality that the python-weka-wrapper library offers in a convenient way. Check out the examples in the examples repository, src/wekaexamples/flow sub-directory. The workflow was inspired by the one available from the ADAMS framework, though much simpler. It is a pure-Python implementation and has nothing to do with Weka’s KnowledgeFlow.
Changes:
- added method “ndarray_to_instances” to “weka.converters” module for converting Numpy 2-dimensional array into “Instances” object
- added method “plot_learning_curve” to “weka.plot.classifiers” module for creating learning curves for multiple classifiers for a specific metric
- added plotting of experiments with “plot_experiment” methid in “weka.plot.experiments” module
- “Instance.create_instance” method now takes list of tuples (index, internal float value) when generating sparse instances
- added “weka.core.database” module for loading data from a database
- added “make_copy” class method to “Clusterer” class
- added “make_copy” class method to “Associator” class
- added “make_copy” class method to “Filter” class
- added “make_copy” class method to “DataGenerator” class
- most classes (like Classifier and Filter) now have a default classname value in the constructor
- added “TextDirectoryLoader” class to “weka.core.converters”
- moved all methods from “weka.core.utils” to “weka.core.classes”
- fixed “Attribute.index_of” method for determining label index
- fixed “Attribute.add_string_value” method (used incorrect JNI parameter)
- “create_instance” and “create_sparse_instance” methods of class “Instance” now ensure that list values are float
- added “to_help” method to “OptionHandler” class which outputs a help string generated from the base class’s “globalInfo” and “listOptions” methods
- fixed “test_model” method of “Evaluation” class when supplying a “PredictionOutput” object (previously generated “No dataset structure provided!” exception)
- added “batch_finished” method to “Filter” class for incremental filtering
- added “line_plot” method to “weka.plot.dataset” module for plotting dataset using internal format (one line plot per instance)
- added “is_serializable” property to “JavaObject” class
- added “has_class” convenience property to “Instance” class
- added “__repr__” method to “JavaObject” classes (simply calls “toString()” method)
- added “Stemmer” class in module “weka.core.stemmers”
- added “Stopwords” class in module “weka.core.stopwords”
- added “Tokenizer” class in module “weka.core.tokenizers”
- added “StringToWordVector” filter class in module “weka.filters”
- added simple workflow engine (see documentation on Flow)