2021年3月11日星期四

IndexError with huggingface pegasus tokenizer

Trying to run a model a Pegasus model and receiving the following error for self.model.generate(input_ids, max_length=10240, num_beams=5, early_stopping=True). I've tried to increase the max_length and resize the token embeddings but neither of these have solved the issue.

return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)  IndexError: index out of range in self  

What solution is needed to fix this?

The code is fairly simple and is the below.

self.tokenizer = PegasusTokenizer.from_pretrained(model_name)  self.model = PegasusForConditionalGeneration.from_pretrained(model_name)  input_ids = self.tokenizer(                  text_data, max_length=10240, truncation=True, return_tensors="pt",  ).input_ids  self.model.resize_token_embeddings(len(self.tokenizer))  output = self.model.generate(                  input_ids, max_length=10240, num_beams=5, early_stopping=True,  )  summary = self.tokenizer.decode(output[0], skip_special_tokens=True)  

Full traceback:

Traceback (most recent call last):    File "summarise_news.py", line 151, in summarise_news      raw_article["content"]    File "summarise_news.py", line 137, in _summarise_article_contents      input_ids, max_length=10240, num_beams=5, early_stopping=True,    File "/Users/Ï/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context      return func(*args, **kwargs)    File "/Users/black/lib/python3.7/site-packages/transformers/generation_utils.py", line 847, in generate      model_kwargs = self._prepare_encoder_decoder_kwargs_for_generation(input_ids, model_kwargs)    File "/Users/black/lib/python3.7/site-packages/transformers/generation_utils.py", line 379, in _prepare_encoder_decoder_kwargs_for_generation      model_kwargs["encoder_outputs"]: ModelOutput = encoder(input_ids, return_dict=True, **encoder_kwargs)    File "/Users/black/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl      result = self.forward(*input, **kwargs)    File "/Users/black/lib/python3.7/site-packages/transformers/models/pegasus/modeling_pegasus.py", line 723, in forward      embed_pos = self.embed_positions(input_shape)    File "/Users/black/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl      result = self.forward(*input, **kwargs)    File "/Users/black/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context      return func(*args, **kwargs)    File "/Users/black/lib/python3.7/site-packages/transformers/models/pegasus/modeling_pegasus.py", line 139, in forward      return super().forward(positions)    File "/Users/black/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 126, in forward      self.norm_type, self.scale_grad_by_freq, self.sparse)    File "/Users/black/lib/python3.7/site-packages/torch/nn/functional.py", line 1852, in embedding      return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)  IndexError: index out of range in self  
https://stackoverflow.com/questions/66583106/indexerror-with-huggingface-pegasus-tokenizer March 11, 2021 at 09:02PM

没有评论:

发表评论