I'm using benepar (reference here) to parse sentences in French. I would like to get a tree-stuctured syntax representation that takes NP or PP as division without any extra label.
For example
-
Original Sentence:
A man with a red helmet on a small moped on a dirt road .
-
Desired output:
( ( ( A man ) ( with ( a red helmet ) ) ) ( on ( ( a small moped ) ( on ( a dirt road ) ) ) ) . )
-
Parsed output:
(NP (NP (DT A) (NN man)) (PP (IN with) (NP (NP (DT a) (JJ red) (NN helmet)) (PP (IN on) (NP (DT a) (JJ small) (VBN moped))) (PP (IN on) (NP (DT a) (NN dirt) (NN road))))) (. .))
(SENT (NP (DET Un) (NC homme) (PP (P avec) (NP (DET un) (NC casque) (AP (ADJ rouge))) (PP (P sur) (NP (DET une) (ADJ petite) (NC mobylette)))) (PP (P sur) (NP (DET un) (NC+ (NC chemin) (P de) (NC terre))))) (PONCT .))
The code I have written for the Parsed output:
import spacy from benepar.spacy_plugin import BeneparComponent nlp = spacy.load('en') nlp.add_pipe(BeneparComponent('benepar_en')) doc = nlp('A man with a red helmet on a small moped on a dirt road .') sent = list(doc.sents)[0] print(sent._.parse_string) https://stackoverflow.com/questions/66095091/how-to-extract-the-tree-structure-from-nltk-tree-without-labels February 08, 2021 at 10:07AM
没有评论:
发表评论