Professional Documents
Culture Documents
Section 5
Machine Learning Tutorial for the UKP lab June 10, 2011 10
Weka W k API
Series f S i of experiments are l b i i t laborious i WEKA GUI in The API is simple and easy to use to design complex workflows
e.g. grid search / simulated annealing over the classifier hyperparameter space g g g yp p p
Major concepts
arff file Instances / Instance Classifier
Weka W k API
Very simple, self-explaining code V i l lf l i i d Clear architecture / structure
weka.classifiers; weka.classifiers.bayes (BayesNet, Naive Bayes) weka.classifiers.functions (Neural Net, Linear Regression, Logistic regression/Maxent, SVM, ) weka.classifiers.lazy (Nearest Neighbor, ) weka.classifiers.rules (JRip, ) weka.classifiers.trees (C4.5, Random Forest, ) weka.classifiers.meta (Boosting, Bagging, Attribute Selection, Voting, ) weka.clusterers;
Quick prototyping, testing of many algorithms Not all state of the art Not the most efficient (e g logreg is slow) (e.g. Ideal for learning / teaching / starting up
SS 2011 | Computer Science Department | UKP Lab - Gyrgy Szarvas | 7
Section 6
Machine Learning Tutorial for the UKP lab June 10, 2011 10
Classification Cl ifi ti
Until U til now: classification l ifi ti
finite set of (nominal) class labels classification units / instances were
tokens token sequences sentences documents
Regression R i
Regression: R i
approximate a real valued target variable g also called function learning error is measured as the difference between the predicted and the observed values usually based on real valued features
Less typical problem setting for NLP Methods worth to consider / try
linear regression support vector machine
10
Ranking R ki
Preference learning
instead of classification, try to predict a total order over a set of possible labels (e.g. all possible actions at a time) research area of the KE group h h f h here
Semi-supervised l S i i d learning i
Exploit labeled + unlabeled data to improve models (or likewise to get a similar model with less labeled data) Examples
in SVM, maximize the margin (distance from decision boundary) taking into account unlabeled points use unlabeled data to calculate feature statistics use automatically labeled data to append training set
Semi-supervised l S i i d learning i
Self-training Self training
train a model predicted instances that meet a predefined selection criterion (e.g. p(+) > 0.95) are added to the training pool, and then retrain dd d h i i l d h i
Co training Co-training
train two different models / the same model on 2 independent representations (e.g. spam filtering based on text and on links) predicted instances that meet a predefined selection criterion are added to the training pool of the other model, and then retrain both
Active learning
train a model on a small initial set instances that meet a predefined selection criterion (e.g. model shows high uncertainty, p(+) ~ P(-)) are asked for human labeling, and then retrain
SS 2011 | Computer Science Department | UKP Lab - Gyrgy Szarvas | 13
Distant supervision
start with an assumption of positive / negative membership (e.g. for pairs in a knowledge base, you know the label, look for texts having that pair) generate potential positive/negative instances based on the assumption, and then train a model
Train on errors
having labeled data for an associated task, train on its errors (which partly are due to the lack of knowledge about your current problem e.g. e g disease and associated symptom codes are never added to the same document learn D/S relationships from D and S labels/classifiers
SS 2011 | Computer Science Department | UKP Lab - Gyrgy Szarvas | 14
Domain adaptation D i d t ti
When Wh crossing d i domains, th t t (f t i the texts (feature and/or l b l di t ib ti d/ label distributions) can ) change
this degrades ML performance (on target D with small train, compared to source D with large train) i ) try to tackle this domain impact to have OK performance in (almost) unseen domains
15