Professional Documents
Culture Documents
nels. In fact, certain Kernels like RBF kernel map data to infinite dimensional
spaces. As you might be thinking, mapping and producing such high dimensional re
presentation is a computationally daunting task. A related concept called Kernel
Trick lets us pretty much bypass this computation cheaply. Kernel functions are
almost never used without this Kernel Trick.
#DictVectorizer()
>>> measurements = [
...
{'city': 'Dubai', 'temperature': 33.},
...
{'city': 'London', 'temperature': 12.},
...
{'city': 'San Fransisco', 'temperature': 18.},
... ]
>>> from sklearn.feature_extraction import DictVectorizer
>>> vec = DictVectorizer()
>>> vec.fit_transform(measurements).toarray()
array([[ 1., 0., 0., 33.],
[ 0., 1., 0., 12.],
[ 0., 0., 1., 18.]])
>>> vec.get_feature_names()
['city=Dubai', 'city=London', 'city=San Fransisco', 'temperature']
sklearn.feature_selection.VarianceThreshold(threshold=0.0)
>>> X = [[0, 2, 0, 3], [0, 1, 4, 3], [0, 1, 1, 3]]
>>> selector = VarianceThreshold()
>>> X_new = selector.fit_transform(X) #get rid of all features with variane <= t
hreshold
clf = Pipeline([
('feature_selection', LinearSVC(penalty="l1")),
('classification', RandomForestClassifier())
])
clf.fit(X, y)