scikit-learn:4.1. Pipeline and FeatureUnion: combining estimators(特征与预测器结合;特征与特征结合)

前端之家收集整理的这篇文章主要介绍了scikit-learn:4.1. Pipeline and FeatureUnion: combining estimators(特征与预测器结合;特征与特征结合)前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。
@H_404_0@带病在网吧里写,,,,给点鼓励吧。。。

@H_404_0@http://scikit-learn.org/stable/modules/pipeline.html

@H_404_0@1、pipeline和featureUnion是干什么的:

@H_404_0@pipeline之前已经介绍过了,结合transformer和estimator。

@H_404_0@featureUinon听名字就知道,将多个transformer的结果vector拼接成大的vector。

@H_404_0@

@H_404_0@

@H_404_0@2、两者的区别:

@H_404_0@前者相当于feature串行处理,后一个transformer处理前一个transformer的feature结果;

@H_404_0@后者相当于feature的并行处理,将所有transformer的处理结果拼接成大的feature vector。

@H_404_0@

@H_404_0@

@H_404_0@3、pipeline:chaining estimators

@H_404_0@Pipelinecan be used to chain multiple estimators into one. 因为我们处理数据的过程一般都是比较固定的,比如特征选择、规范化、分类。所以pipeline主要由两个目的:

@H_404_0@方便:fit、predict一次即可处理所有estimators的结果。

@H_404_0@拼接参数选择:仅需一次即可grid search所有estimators的所有parameters。

@H_404_0@

@H_404_0@pipeline的所有的estimators(除了最后一个)都必须是transformer(有transform方法),最后一个estimator可以使任何类型(transformer、classifier)

@H_404_0@

@H_404_0@使用:通过一组(key,value)对来串联所有的estimators,key是自己对每一步骤的随意的命名,value是一个estimator object,例如:

>>> from sklearn.pipeline import Pipeline
sklearn.svm SVC
sklearn.decomposition PCA
>>> estimators = [('reduce_dim', PCA()), ('svm'SVC())]
clf = Pipeline(estimators)
clf 
Pipeline(steps=[('reduce_dim',PCA(copy=True,n_components=None,
 whiten=False)),('svm',SVC(C=1.0,cache_size=200,class_weight=None,
    coef0=0.0,degree=3,gamma=0.0,kernel='rbf',max_iter=-1,
    probability=False,random_state=None,shrinking=True,tol=0.001,51)">    verbose=False))])
每一个阶段的estimators存放在 steps属性中,可以通过索引这样取出每一个estimators:
>>> clf.steps[0]
('reduce_dim',whiten=False))
也可以通过name这样取出每一个estimators( as a dict in named_steps:):
.named_steps['reduce_dim']
PCA(copy=True,whiten=False)
想改变estimators的parameter值?用这样的语法: <estimator>__<parameter> Syntax,例如: @H_404_0@

clf.set_paramssvm__C=10) 
    whiten=False)),SVC(C=10,51)">    verbose=False))])

@H_404_0@终极目的,grid searches:

from sklearn.grid_search import GridSearchCV
>>> params = dict(reduce_dim__n_components=[2, 5,80)">10],
...               svm__C=[0.1,80)">10,80)">100])
>>> grid_search = GridSearchCV(clf, param_grid=params)

@H_404_0@最经典的文本分类来了:

# define a pipeline combining a text feature extractor with a simple
# classifier
pipeline = Pipeline([
    ('vect', CountVectorizer()),
    ('tfidf', TfidfTransformer()),160)">'clf', SGDClassifier()),
])

# uncommenting more parameters will give better exploring power but will
# increase processing time in a combinatorial way
parameters = {
    'vect__max_df': (0.5, 0.75,80)">1.0),
    #'vect__max_features': (None,5000,10000,50000),
    'vect__ngram_range': ((1,80)">1), (2)),  # unigrams or bigrams
    #'tfidf__use_idf': (True,False),144); font-style:italic">#'tfidf__norm': ('l1','l2'),160)">'clf__alpha': (0.00001,80)">0.000001),160)">'clf__penalty': ('l2', 'elasticnet'),144); font-style:italic">#'clf__n_iter': (10,50,80),
}

if __name__ == "__main__":
    # multiprocessing requires the fork to happen in a __main__ protected
    # block

    # find the best parameters for both the feature extraction and the
    # classifier
    grid_search = GridSearchCV(pipeline, parameters, n_jobs=-verbose1)


@H_404_0@Notes:重要的事情不翻译,

@H_404_0@Callingfiton the pipeline is the same as callingon each estimator in turn,transformthe input and pass it on to the next step.

@H_404_0@Thepipeline has all the methods that the last estimator in the pipeline has,i.e. if the last estimator is a classifier,thecan be used as a classifier. If the last estimator is a transformer,again,so is the pipeline.




4、FeatureUnion:composite feature spaces


featureUnion描述,重要的不翻译:

FeatureUnioncombines several transformer objects into a new transformer that combines their output. Atakes a list of transformer objects. During fitting,each of these is fit to the data independently. For transforming data,the transformers are applied in parallel,and thesample vectors they output are concatenated end-to-end into larger vectors.

@H_404_0@

@H_404_0@featureUnion和pipleline同样是为了方便和joint parameter,两者也可以结合成更加复杂的模型。

@H_404_0@

@H_404_0@(featureUnion不管两个transformers是否产生相同的特征,他仅仅简单的拼接所有的特征,判重工作还是要你自己来做的。。。)

@H_404_0@

@H_404_0@

@H_404_0@使用:通过一组(key,value)对来串联所有的estimators,key是自己对每一步骤的随意的命名,value是一个estimator object,例如:

sklearn.pipeline import FeatureUnion
sklearn.decomposition import PCA
import KernelPCA
>>> estimators = [('linear_pca', PCA()), ('kernel_pca', KernelPCA())]
>>> combined = FeatureUnion(estimators)
>>> combined 
FeatureUnion(n_jobs=1,transformer_list=[('linear_pca',
    n_components=None,whiten=False)),('kernel_pca',KernelPCA(alpha=1.0,51)">    coef0=1,eigen_solver='auto',fit_inverse_transform=False,51)">    gamma=None,kernel='linear',kernel_params=None,max_iter=None,remove_zero_eig=False,tol=0))],51)">    transformer_weights=None)

@H_404_0@最后给个例子:

@H_404_0@http://scikit-learn.org/stable/auto_examples/feature_stacker.html#example-feature-stacker-py

@H_404_0@感谢

Author: Andreas Mueller <amueller@ais.uni-bonn.de>
# Author: Andreas Mueller <amueller@ais.uni-bonn.de>
#
# License: BSD 3 clause

import Pipeline, FeatureUnion
import GridSearchCV
sklearn.svm import SVC
sklearn.datasets import load_iris
import PCA
sklearn.feature_selection import SelectKBest

iris = load_iris()

X, y = iris.data, iris.target

# This dataset is way to high-dimensional. Better do PCA:
pca = PCA(n_components2)

# Maybe some original features where good,too?
selection = SelectKBest(k1)

# Build estimator from PCA and Univariate selection:

combined_features = FeatureUnion([("pca", pca),160)">"univ_select", selection)])

# Use combined features to transform dataset:
X_features = combined_features.fit(X, y).transform(X)

svm = SVC(kernel="linear")

# Do grid search over k,n_components and C:

pipeline Pipeline([("features", combined_features),160)">"svm", svm)])

param_grid dict(features__pca__n_components3],
                  features__univ_select__k2],
                  svm__C10])

grid_search = GridSearchCV(pipeline,102)">=param_grid,80)">10)
grid_searchy)
print(grid_search.best_estimator_)

@H_404_0@

@H_404_0@完,看来以后提取特征有可以省很多事了。。。。。。。。

原文链接:https://www.f2er.com/javaschema/284617.html

猜你在找的设计模式相关文章