Three questions for below code and its output:
- Why the constructor for MyDebug Transformer being invoked twice, first time for line 26, and second time for line 37?
- Why the two invocations show different parameter myname, especially weird for second invocation for line 37, why it doesn't take in the passed parameter, not even default value, but None instead as in the output?
- If you uncomment line 36, ct1.fit, it also invokes Transformer's transform function, which is only expected for ct1.fit_transform?
Environment: Python version is 3.6.10 and Sklearn version is 0.22.1
1 import numpy as np 2 from sklearn.compose import ColumnTransformer 3 from sklearn.preprocessing import Normalizer 4 from sklearn.base import BaseEstimator,TransformerMixin 5 from sklearn.pipeline import Pipeline 6 from datetime import datetime 7 8 9 10 class MyDebug(BaseEstimator, TransformerMixin): 11 def __init__(self, myname="HELP"): 12 print(f"intialized with myname: {myname}") 13 self._name = myname 14 print (f"Debug.__init__ being invoked for {myname}, {self._name}, {id(self)}") 15 def transform(self, X): 16 print (f"in {self._name} transform with type: {type(X)}, shape: {X.shape} at {datetime.now()}") 17 self.shape = X.shape 18 # what other output you want 19 return X 20 def fit(self, X, y=None, **fit_params): 21 print (f"in {self._name} fit with type: {type(X)}, shape: {X.shape} at {datetime.now()}") 22 return self 23 24 25 print("************************************************************") 26 ct1 = ColumnTransformer( 27 [("norm1", Pipeline(steps=[("norm", Normalizer(norm='l1')), ("debug", MyDebug("MYDEBUG_1"))]), [0, 1]), 28 ("norm2", Pipeline(steps=[("norm", Normalizer(norm='l1')), ("debug", MyDebug("MYDEBUG_2"))]), slice(2, 10))]) 29 30 print("************************************************************") 31 print(f"id(ct1): {id(ct1)}") 32 X = np.array([[0., 1., 2., 2., 0., 1., 2., 2.], 33 [1., 1., 0., 1., 1., 1., 0., 1.]]) 34 35 print("************************************************************") 36 # ret = ct1.fit(X) 37 ret = ct1.fit_transform(X) 38 print("************************************************************") 39 print(f"id(ct1): {id(ct1)}") 40 print(f"type(ret): {type(ret)}") 41 print(type(ct1.named_transformers_["norm1"]), id(ct1.named_transformers_["norm1"]), id(ct1.named_transformers_["norm2"]), "\n", 42 type(ct1.named_transformers_["norm1"].named_steps["norm"]), id(ct1.named_transformers_["norm1"].named_steps["norm"]), id(ct1.named_transformers_["norm2"].named_steps["norm"]), "\n", 43 type(ct1.named_transformers_["norm1"].named_steps["debug"]), id(ct1.named_transformers_["norm1"].named_steps["debug"]), id(ct1.named_transformers_["norm2"].named_steps["debug"])) Output:
************************************************************ intialized with myname: MYDEBUG_1 Debug.__init__ being invoked for **MYDEBUG_1, MYDEBUG_1**, 140118618819160 intialized with myname: MYDEBUG_2 Debug.__init__ being invoked for **MYDEBUG_2, MYDEBUG_2**, 140118618819216 ************************************************************ id(ct1): 140118618819328 ************************************************************ intialized with myname: None Debug.__init__ being invoked for **None, None**, 140118618819944 in None fit with type: <class 'numpy.ndarray'>, shape: (2, 2) at 2021-03-24 00:45:41.850603 in None transform with type: <class 'numpy.ndarray'>, shape: (2, 2) at 2021-03-24 00:45:41.851159 intialized with myname: None Debug.__init__ being invoked for **None, None**, 140118618820392 in None fit with type: <class 'numpy.ndarray'>, shape: (2, 6) at 2021-03-24 00:45:41.852955 in None transform with type: <class 'numpy.ndarray'>, shape: (2, 6) at 2021-03-24 00:45:41.852995 ************************************************************ id(ct1): 140118618819328 type(ret): <class 'numpy.ndarray'> <class 'sklearn.pipeline.Pipeline'> 140118618819776 140118618820000 <class 'sklearn.preprocessing._data.Normalizer'> 140118618819888 140118618820112 <class '__main__.MyDebug'> 140118618819944 140118618820392 https://stackoverflow.com/questions/66773570/why-the-constructor-for-sklearn-transformer-within-columntransformer-is-invoked March 24, 2021 at 09:35AM
没有评论:
发表评论