2021年1月2日星期六

include extra row before duplicated rows

I have two data frames SF and OF.

SF:        PartNumber  ParentPartNumber   Webname       Brand        Value_Size     Full Description               ImagePath                      Short Description     Weight       RetailPriceEUR    Isfeatured              2.5            2                Sidi         Si              S     Honeycomb elastic          https://link1,https://link2           Honey               2.3             331           yes       2.6            2                Sidi         Si              M     Honeycomb elastic          https://link1,https://link2            Honey              2.3             331           yes       2.7            2                Sidi         Si              L     Honeycomb elastic          https://link1,https://link2           Honey               2.3             331           yes       3.2            3                Shoei        Sho            S      E.Q.R.S.                   https://link3                          ERQS               1.5             331           yes       3.3            3                Shoei        Sho            M      E.Q.R.S.                   https://link3                          ERQS               1.5             331           yes       2.9            2                Sidi         Si              XL   Honeycomb elastic           https://link1,https://link2            Honey              2.3             331           yes  

OF:

SKU      Published         Name     Parent        Size     Full Description           ImagePath                          ShortDescription       Weight           RetailPriceEUR            Isfeatured           height   4          1              Bec   8          1              Lin      

What I want to do is to add an extra row before each duplicated row present in SF and append it to the OF data frame. For example, if there are duplicates in a parent like 2,2,3,3 the first row of 2 and the second row of 2 needs to be copied, and in addition to all rows, there have to be an extra one added before them with info as in the description. So the end result should look like

Result (SF rows appended in OF):

   SKU      Published         Name      Parent           Size             Full Description           ImagePath                          ShortDescription       Weight           RetailPriceEUR            Isfeatured           height   4          1              Bec   8          1              Lin   2          1              Sidi                        S,M,L,XL          Honeycomb elastic         https://link1,https://link2                                                                            yes   2.5        0              Honey        2               S                                                                                 Honey                2.3              331                        yes   2.6        0              Honey        2               M                                                                                 Honey                2.3              331                        yes   2.7        0              Honey        2               L                                                                                 Honey                2.3              331                        yes   2.9        0              Honey        2               XL                                                                                Honey                2.3              331                        yes   3 (extra)  1              Sho                         S,M                E.Q.R.S.                 https://link3                                                                                          yes   3          0              ERQS         3               S                                                                                 EQRS                 1.5               33                        yes   3          0              ERQS         3               M                                                                                 ERQS                 1.5               33                        yes         

This is what I have done so far

# create a DataFrame keep SF all duplicated rows  cond = SF.duplicated(keep=False)  df = SF.loc[cond, ['PartNumber', 'ParentPartNumber', 'Webname' , 'Value_Size']].copy()  df['Published'] = 0    # rename columns, and sorted columns  df.columns = ['SKU', 'Parent', 'Name', 'Size', 'Published']  df = df[OF.columns].copy()    # drop duplicates(keep the first duplicate row), and assign 1 to Published column  dfn = df.drop_duplicates(keep='first').copy()        # get rows with duplicated ParentPartNumber     SFs = SFs[SFs['ParentPartNumber'].duplicated(keep = False)]    #group by size  gk =( SFs.groupby('ParentPartNumber'['Size'].agg(','.join).reindex(SFs['ParentPartNumber'].unique()).reset_indx())    # drop duplicates(keep the first duplicate row),  SFs = SFs.drop_duplicates(subset='ParentPartNumber', keep='first').copy()  dfn['SKU'] = SFs['ParentPartNumber'].values  dfn['Name'] = SFs['WebName'].values  dfn['Attribute 1 value(s)'] = gk['Size'].values  dfn['Published'] = 1    # append first duplicate row to df, and sort index to keep row order  dfn = dfn.append(df).sort_index()    # append the duplicates rows to OF  print(OF.append(dfn))       

The issue is it works well with small sample size, however, for large files, it's not matching ParentPartNumber well and producing results where SKU = 3 should be below 3(extra) not above it

SKU      Published         Name      Parent           Size   4          1              Bec   8          1              Lin   2          1              Sidi                        S,M,L,XL   2.5        0              Honey        2               S   2.6        0             Honey        2               M   2.7        0              Honey        2               L   2.9        0              Honey        2               XL   *3          0              Sho         3               S*   3 (extra)  1              ERQS                         S,M   3          0              ERQS         3               M  

EDIT:

I have a source file and have to perform some operations given below to obtain the dataframe shown above. Please note that some of the variables are missing from the sample file. The operations are:

Create new row in OF with values as follows   OF Type = variable  OF SKU = SF ParentPartNumber  OF Name = SF WebName  OF YMM = ,0,0,,  OF Published = 1  OF Is featured = 0  OF Visibility in catalog = visible  OF Description = SF FullDescription  OF Tax status = taxable  OF In stock? = 1  OF Backorders allowed? = 0  OF Sold individually? = 0  OF Allow customer reviews? = 1  OF Images = SF ImagePath (when there is more than 1 link in SF replace | with , (comma))  OF Position = 0  OF Attribute = Size  OF Attribute 1 value(s) = all values from SF Size that belong to the same ParentPartNumber separated with comma  OF Attribute 1 visible = 0  OF Attribute 1 global = 0    then below that row copy all rows that belong to this ParentPartNumber as follows     OF Type = variation   OF SKU = SF PartNumber  OF Name = SF ShortDescription   OF YMM = ,0,0,,  OF Published = 0  OF Is featured = 0  OF Visibility in catalog = visible  OF Tax status = taxable  OF Tax class = parent  OF In Stock = 1   OF Backorders allowed? = 0  OF Sold individually? = 0  OF Weight (kg) = SF Weight  OF Allow customer reviews? = 0  OF Regular price = SF RetailPriceEUR  OF Parent = SF ParentPartNumber  
https://stackoverflow.com/questions/65546310/include-extra-row-before-duplicated-rows January 03, 2021 at 11:09AM

没有评论:

发表评论