2020年12月19日星期六

fastai.text NameError: name 'BaseTokenizer' is not defined

I am a beginner of fastai and trying to build a model referring to Using RoBERTa with fast.ai for NLP.

I was trying to customize the tokenizer (as the code below):

from fastai.text import *  from fastai.metrics import *  from transformers import RobertaTokenizer    class FastAiRobertaTokenizer(BaseTokenizer):      """Wrapper around RobertaTokenizer to be compatible with fastai"""      def __init__(self, tokenizer: RobertaTokenizer, max_seq_len: int=128, **kwargs):           self._pretrained_tokenizer = tokenizer          self.max_seq_len = max_seq_len       def __call__(self, *args, **kwargs):           return self       def tokenizer(self, t:str) -> List[str]:           """Adds Roberta bos and eos tokens and limits the maximum sequence length"""           return [config.start_tok] + self._pretrained_tokenizer.tokenize(t)[:self.max_seq_len - 2] + [config.end_tok]  

But got an error message:

---------------------------------------------------------------------------  NameError                                 Traceback (most recent call last)  <ipython-input-6-41070aae72d1> in <module>  ----> 1 class FastAiRobertaTokenizer(BaseTokenizer):        2     """Wrapper around RobertaTokenizer to be compatible with fastai"""        3     def __init__(self, tokenizer: RobertaTokenizer, max_seq_len: int=128, **kwargs):        4         self._pretrained_tokenizer = tokenizer        5         self.max_seq_len = max_seq_len    NameError: name 'BaseTokenizer' is not defined  
  • fastai version: 2.1.8
  • torch version: 1.7.1
  • transformers version: 3.4.0

Did anyone get the same issue before?

https://stackoverflow.com/questions/65373407/fastai-text-nameerror-name-basetokenizer-is-not-defined December 20, 2020 at 02:59AM

没有评论:

发表评论