Manual - Classifier

If you think anything is missing, please let me know and I will add it. You can find the code reference here.
Installing / Uninstalling

To install, download the library, run the .exe and follow the instructions. Uninstall it via the Control Panels Add/Remove programs.

Getting started / Project settings

Using the C++ Text Classification Library is really simple. Here is a step-by-step guide of how to set up the sample project for Microsoft Visual Studio.

  1. Download and install the library to the an installation directory of your choice (default is usually ‘c:\program files\C++ Text Classifier Library’). From now I will refer to this directory as the InstallDir.
  2. Create an empty win32 console project in Visual Studio.
  3. Add source files from the library to your newly created project. They are located under ‘InstallDir\include’ and ‘InstallDir\include\internal’.
  4. Add the sample source file (main.cpp) from ‘InstallDir\sample’ to your project.
  5. Right-click on the project and find ‘Properties->C/C++->General’, on the line ‘Additional Include Directories‘ add ‘InstallDir\include’. Click ‘Ok’.
  6. Right-click on the project and find ‘Properties->Linker->General’, on the line ‘Additional Library Directories‘ add ‘InstallDir\lib’. Click ‘Ok’.
  7. Right-click on the project and find ‘Properties->Linker->Input’, on the line ‘Additional Dependencies‘ add ‘classifier.lib’. Click ‘Ok’.
  8. Finally copy the ‘classifier.dll’ file from ‘InstallDir\bin’ to the projects working (output) directory.

Now you should be able to compile and run the sample!

Usage
The code given in this chapter assumes that the project has been setup properly i.e. include directories, library directories and additional dependencies. See ‘Getting Started / Project Settings’ of how to do this.

Namespaces

A namespace is used to express logical grouping, that is declarations that logically belong to each other can be put in the same namespace. All declarations for the C++ Text Classifier Library is accessible under the namespace codeode::classifier. To gain access to declarations within a namespace you can either specify it in front of declarations or use the using keyword. For example:

codeode::classifier::Classifier<string> classifier;
or
using namespace codeode::classifier;
Classifier<string> classifier;

Creating a classifier

It’s possible to create two different classifiers, either one that works on ASCII strings or one that works on wide strings. When you create the classifier object you specify this as the template parameter.

Classifier<string> classifier; // creates a classifier for ASCII strings
Classifier<wstring> classifier; // creates a classifier for wide strings

Adding classes (bins)

Once you have created a classifier you need to add the classes you want it to be able to work on. A class (sometimes called a bin) is a logical grouping such as in the case of a spam classifier:

classifier.addClass(”legitimate”);
classifier.addClass(”spam”);

Before you can train the classifier on a class you need to add it. It’s also important that you add all classes before you start to classify anything.

Training the classifier

Once you have added all classes you can begin to train the classifier. For every document you train it on you need to specify for what class you are training.

classifier.train(”legitimate”, “This is a legitimate message.”);
classifier.train(”spam”, “This is a spam message.”);

Classifying

To classify a document just pass it to the classifier:

Classification result = classifier.classify(”This is a legitimate message.”);

The returned Classification is a map from each class to a probability. The one with the highest probability is the class to which the document is most probable to belong.

// Printing the result
cout << “Legitimate %: ” << result[”legitimate”] << endl
<< “Spam %: ” << result[”spam”] << endl;

Implementing custom logging

By default, the classifier writes a log to standard out, you can change this by implementing the Callback class and then passing a Callback object to the Classifier constructor.

MyCallback callback;
Classifier<string> classifier(callback);

Implementing the Callback class is straigtforward, please see the reference for details.

Saving the training data

To save training you use the getTrainingData method to retrive a char stream. This char stream consists of data you have previously trained the classifier with. After you have retrived it, just save it to disk. The training data consists of personal information you may want to enrypt it before you save it to disk.

std::vector<char> data;
classifier.getTrainingData(data);
// Save the data to disk here.

Loading the training data

To Load the classifier with previously saved training data start by loading it from disk, then use the setTrainingData method to initiate the classifier.

std::vector<char> data;
// Read the previously saved training data file into the data vector

classifier.setTrainingData(data);