I am trying to spot the differences between one sentence and another one in my dataset. I am using the following code:
import pandas as pd data = ['An empty world', 'So the word is', 'So word is', 'No word is'] df = pd.DataFrame(data, columns=['phrase']) bold = lambda x: f'<b>{x}</b>' def highlight_shared(string1, string2, format_func): shared_toks = set(string1.split(' ')) & set(string2.split(' ')) return ' '.join([format_func(tok) if tok in shared_toks else tok for tok in string1.split(' ') ]) highlight_shared('the cat sat on the mat', 'the cat is fat', bold) df['previous_phrase'] = df.phrase.shift(1, fill_value='') df['tokens_shared_with_previous'] = df.apply(lambda x: apply_formats(x.phrase, x.previous_phrase), axis=1) from IPython.core.display import HTML HTML(df.loc[:, ['phrase', 'tokens_shared_with_previous']].to_html(escape=False)) to get these results by checking the similarity using fuzzywuzzy or cosine distance in order to get information about the word which changes position from one to another. For example, if I consider this dataset:
SEQ An empty world So the word is So word is No word is the similarity between the first row and the second one is 0. There is similarity between row 2 and 3. They present almost the same words and the same position. I would like to visualize this change (missing word) if possible.
I am getting an error because of apply_formats (NameError: name 'apply_formats' is not defined). Do you know how I can replace it?
https://stackoverflow.com/questions/65571899/how-to-replace-apply-formats-in-python-3-0 January 05, 2021 at 09:05AM
没有评论:
发表评论