MEKA didn’t have any proper support for dealing with textual data. So far, it was not possible to import text files, like Weka with its TextDirectoryLoader. Due to a recent post on the MEKA mailing list, I decided to spend my lunch break today in putting together similar functionality within MEKA. The result is the meka.core.converters.MultiLabelTextDirectoryLoader.
Example:
/text-dataset /class1 /0 3.txt 5.txt /1 1.txt 2.txt 4.txt /class2 /0 1.txt 4.txt /1 2.txt 3.txt 5.txt
Will generate something like this:
@relation 'example: -C 2' @attribute @@class-class1@@ {0,1} @attribute @@class-class2@@ {0,1} @attribute file-ID string @attribute text string @data 1,0,1.txt,'file 1\n' 1,1,2.txt,'file 2\n' 0,1,3.txt,'file 3\n' 1,0,4.txt,'file 4\n' 0,1,5.txt,'file 5\n'
In order to make this user-friendly, I also added a Import text… menu item to the Explorer’s File menu.