2021年3月23日星期二

Why the Constructor for Sklearn Transformer within ColumnTransformer is invoked twice, further, the parameters for two invocations are different

Three questions for below code and its output:

  1. Why the constructor for MyDebug Transformer being invoked twice, first time for line 26, and second time for line 37?
  2. Why the two invocations show different parameter myname, especially weird for second invocation for line 37, why it doesn't take in the passed parameter, not even default value, but None instead as in the output?
  3. If you uncomment line 36, ct1.fit, it also invokes Transformer's transform function, which is only expected for ct1.fit_transform?

Environment: Python version is 3.6.10 and Sklearn version is 0.22.1

  1 import numpy as np    2 from sklearn.compose import ColumnTransformer    3 from sklearn.preprocessing import Normalizer    4 from sklearn.base import BaseEstimator,TransformerMixin    5 from sklearn.pipeline import Pipeline    6 from datetime import datetime    7    8    9   10 class MyDebug(BaseEstimator, TransformerMixin):   11     def __init__(self, myname="HELP"):   12         print(f"intialized with myname: {myname}")   13         self._name = myname   14         print (f"Debug.__init__ being invoked for {myname}, {self._name}, {id(self)}")   15     def transform(self, X):   16         print (f"in {self._name} transform with type: {type(X)}, shape: {X.shape} at {datetime.now()}")   17         self.shape = X.shape   18         # what other output you want   19         return X   20     def fit(self, X, y=None, **fit_params):   21         print (f"in {self._name} fit with type: {type(X)}, shape: {X.shape} at {datetime.now()}")   22         return self   23   24   25 print("************************************************************")   26 ct1 = ColumnTransformer(   27     [("norm1", Pipeline(steps=[("norm", Normalizer(norm='l1')), ("debug", MyDebug("MYDEBUG_1"))]), [0, 1]),   28      ("norm2", Pipeline(steps=[("norm", Normalizer(norm='l1')), ("debug", MyDebug("MYDEBUG_2"))]), slice(2, 10))])   29   30 print("************************************************************")   31 print(f"id(ct1): {id(ct1)}")   32 X = np.array([[0., 1., 2., 2., 0., 1., 2., 2.],   33               [1., 1., 0., 1., 1., 1., 0., 1.]])   34   35 print("************************************************************")   36 # ret = ct1.fit(X)   37 ret = ct1.fit_transform(X)   38 print("************************************************************")   39 print(f"id(ct1): {id(ct1)}")   40 print(f"type(ret): {type(ret)}")   41 print(type(ct1.named_transformers_["norm1"]), id(ct1.named_transformers_["norm1"]), id(ct1.named_transformers_["norm2"]), "\n",   42 type(ct1.named_transformers_["norm1"].named_steps["norm"]), id(ct1.named_transformers_["norm1"].named_steps["norm"]), id(ct1.named_transformers_["norm2"].named_steps["norm"]), "\n",   43 type(ct1.named_transformers_["norm1"].named_steps["debug"]), id(ct1.named_transformers_["norm1"].named_steps["debug"]), id(ct1.named_transformers_["norm2"].named_steps["debug"]))  

Output:

************************************************************  intialized with myname: MYDEBUG_1  Debug.__init__ being invoked for **MYDEBUG_1, MYDEBUG_1**, 140118618819160  intialized with myname: MYDEBUG_2  Debug.__init__ being invoked for **MYDEBUG_2, MYDEBUG_2**, 140118618819216  ************************************************************  id(ct1): 140118618819328  ************************************************************  intialized with myname: None  Debug.__init__ being invoked for **None, None**, 140118618819944  in None fit with type: <class 'numpy.ndarray'>, shape: (2, 2) at 2021-03-24 00:45:41.850603  in None transform with type: <class 'numpy.ndarray'>, shape: (2, 2) at 2021-03-24 00:45:41.851159  intialized with myname: None  Debug.__init__ being invoked for **None, None**, 140118618820392  in None fit with type: <class 'numpy.ndarray'>, shape: (2, 6) at 2021-03-24 00:45:41.852955  in None transform with type: <class 'numpy.ndarray'>, shape: (2, 6) at 2021-03-24 00:45:41.852995  ************************************************************  id(ct1): 140118618819328  type(ret): <class 'numpy.ndarray'>  <class 'sklearn.pipeline.Pipeline'> 140118618819776 140118618820000    <class 'sklearn.preprocessing._data.Normalizer'> 140118618819888 140118618820112    <class '__main__.MyDebug'> 140118618819944 140118618820392  
https://stackoverflow.com/questions/66773570/why-the-constructor-for-sklearn-transformer-within-columntransformer-is-invoked March 24, 2021 at 09:35AM

没有评论:

发表评论