MEKA didn’t have any proper support for dealing with textual data. So far, it was not possible to import text files, like Weka with its TextDirectoryLoader. Due to a recent post on the MEKA mailing list, I decided to spend my lunch break today in putting together similar functionality within MEKA. The result is the meka.core.converters.MultiLabelTextDirectoryLoader.
Example:
/text-dataset /class1 /0 3.txt 5.txt /1 1.txt 2.txt 4.txt /class2 /0 1.txt 4.txt /1 2.txt 3.txt 5.txt
Will generate something like this:
@relation 'example: -C 2'
@attribute @@class-class1@@ {0,1}
@attribute @@class-class2@@ {0,1}
@attribute file-ID string
@attribute text string
@data
1,0,1.txt,'file 1\n'
1,1,2.txt,'file 2\n'
0,1,3.txt,'file 3\n'
1,0,4.txt,'file 4\n'
0,1,5.txt,'file 5\n'
In order to make this user-friendly, I also added a Import text… menu item to the Explorer’s File menu.