2021年2月7日星期日

How to extract the tree structure from NLTK tree without labels?

I'm using benepar (reference here) to parse sentences in French. I would like to get a tree-stuctured syntax representation that takes NP or PP as division without any extra label.

For example

  • Original Sentence:

    A man with a red helmet on a small moped on a dirt road .

  • Desired output:

    ( ( ( A man ) ( with ( a red helmet ) ) ) ( on ( ( a small moped ) ( on ( a dirt road ) ) ) ) . )

  • Parsed output:

    (NP (NP (DT A) (NN man)) (PP (IN with) (NP (NP (DT a) (JJ red) (NN helmet)) (PP (IN on) (NP (DT a) (JJ small) (VBN moped))) (PP (IN on) (NP (DT a) (NN dirt) (NN road))))) (. .))

    (SENT (NP (DET Un) (NC homme) (PP (P avec) (NP (DET un) (NC casque) (AP (ADJ rouge))) (PP (P sur) (NP (DET une) (ADJ petite) (NC mobylette)))) (PP (P sur) (NP (DET un) (NC+ (NC chemin) (P de) (NC terre))))) (PONCT .))

The code I have written for the Parsed output:

import spacy  from benepar.spacy_plugin import BeneparComponent    nlp = spacy.load('en')  nlp.add_pipe(BeneparComponent('benepar_en'))  doc = nlp('A man with a red helmet on a small moped on a dirt road .')    sent = list(doc.sents)[0]  print(sent._.parse_string)  
https://stackoverflow.com/questions/66095091/how-to-extract-the-tree-structure-from-nltk-tree-without-labels February 08, 2021 at 10:07AM

没有评论:

发表评论