Pipeline
class sklearn.pipeline.Pipeline(
steps,
memory = None
)
按顺序应用transforms
列表和最终估计器。流水线的中间步骤必须是“transforms
”,即必须实现fit
和transforms
的方法(最后的估算器只需要实现fit
)。管道中的变换器可以使用memory参数进行缓存。
管道的目的是组装几个可以一起交叉验证的步骤,同时设置不同的参数。为此,它可以使用名称和参数名以“__”分隔各个步骤来设置各个步骤。一个步骤的估算器可能完全根据参数名称替换为另一个估算器,或者通过设置为None来移除transformer
。
参数:
steps:列表
链接的(名称,变换)元组的列表(实现拟合/变换),链接的顺序是最后一个对象是一个估计器。
memory:None,str或带有joblib.Memory接口的对象(可选)
用于缓存管道的拟合变压器。默认情况下,不执行缓存。如果给出了一个字符串,它就是缓存目录的路径。在启动缓存之前,启动一个克隆
transformer
。因此,给管道的transformer
实例不能直接检查。使用该属性named_steps
或steps
检查管道内的估计器。如果fit十分费时,缓存transformers
是有利的。
属性:
named_steps:束对象,一个具有属性访问的字典
只读属性根据用户给定的名称访问任何步骤参数。键是步骤名称,值是步骤参数。
方法:
decision_function(X) |
Apply transforms, and decision_function of the final estimator |
---|---|
fit (X[, y]) |
Fit the model |
fit_predict (X[, y]) |
Applies fit_predict of last step in pipeline after transforms. |
fit_transform (X[, y]) |
Fit the model and transform with the final estimator |
get_params ([deep]) |
Get parameters for this estimator. |
predict (X) |
Apply transforms to the data, and predict with the final estimator |
predict_log_proba (X) |
Apply transforms, and predict_log_proba of the final estimator |
predict_proba (X) |
Apply transforms, and predict_proba of the final estimator |
score (X[, y, sample_weight]) |
Apply transforms, and score with the final estimator |
set_params (**kwargs) |
Set the parameters of this estimator. |
例子:
>>> from sklearn import svm
>>> from sklearn.datasets import samples_generator
>>> from sklearn.feature_selection import SelectKBest
>>> from sklearn.feature_selection import f_regression
>>> from sklearn.pipeline import Pipeline
>>> # generate some data to play with
>>> X, y = samples_generator.make_classification(
... n_informative=5, n_redundant=0, random_state=42)
>>> # ANOVA SVM-C
>>> anova_filter = SelectKBest(f_regression, k=5)
>>> clf = svm.SVC(kernel='linear')
>>> anova_svm = Pipeline([('anova', anova_filter), ('svc', clf)])
>>> # You can set the parameters using the names issued
>>> # For instance, fit using a k of 10 in the SelectKBest
>>> # and a parameter 'C' of the svm
>>> anova_svm.set_params(anova__k=10, svc__C=.1).fit(X, y)
...
Pipeline(memory=None,
steps=[('anova', SelectKBest(...)),
('svc', SVC(...))])
>>> prediction = anova_svm.predict(X)
>>> anova_svm.score(X, y)
0.829...
>>> # getting the selected features chosen by anova_filter
>>> anova_svm.named_steps['anova'].get_support()
...
array([False, False, True, True, False, False, True, True, False,
True, False, True, True, False, True, False, True, True,
False, False], dtype=bool)
>>> # Another way to get selected features chosen by anova_filter
>>> anova_svm.named_steps.anova.get_support()
...
array([False, False, True, True, False, False, True, True, False,
True, False, True, True, False, True, False, True, True,
False, False], dtype=bool)