I have two data frames SF and OF.
SF: PartNumber ParentPartNumber Webname Brand Value_Size Full Description ImagePath Short Description Weight RetailPriceEUR Isfeatured 2.5 2 Sidi Si S Honeycomb elastic https://link1,https://link2 Honey 2.3 331 yes 2.6 2 Sidi Si M Honeycomb elastic https://link1,https://link2 Honey 2.3 331 yes 2.7 2 Sidi Si L Honeycomb elastic https://link1,https://link2 Honey 2.3 331 yes 3.2 3 Shoei Sho S E.Q.R.S. https://link3 ERQS 1.5 331 yes 3.3 3 Shoei Sho M E.Q.R.S. https://link3 ERQS 1.5 331 yes 2.9 2 Sidi Si XL Honeycomb elastic https://link1,https://link2 Honey 2.3 331 yes
OF:
SKU Published Name Parent Size Full Description ImagePath ShortDescription Weight RetailPriceEUR Isfeatured height 4 1 Bec 8 1 Lin
What I want to do is to add an extra row before each duplicated row present in SF and append it to the OF data frame. For example, if there are duplicates in a parent like 2,2,3,3 the first row of 2 and the second row of 2 needs to be copied, and in addition to all rows, there have to be an extra one added before them with info as in the description. So the end result should look like
Result (SF rows appended in OF):
SKU Published Name Parent Size Full Description ImagePath ShortDescription Weight RetailPriceEUR Isfeatured height 4 1 Bec 8 1 Lin 2 1 Sidi S,M,L,XL Honeycomb elastic https://link1,https://link2 yes 2.5 0 Honey 2 S Honey 2.3 331 yes 2.6 0 Honey 2 M Honey 2.3 331 yes 2.7 0 Honey 2 L Honey 2.3 331 yes 2.9 0 Honey 2 XL Honey 2.3 331 yes 3 (extra) 1 Sho S,M E.Q.R.S. https://link3 yes 3 0 ERQS 3 S EQRS 1.5 33 yes 3 0 ERQS 3 M ERQS 1.5 33 yes
This is what I have done so far
# create a DataFrame keep SF all duplicated rows cond = SF.duplicated(keep=False) df = SF.loc[cond, ['PartNumber', 'ParentPartNumber', 'Webname' , 'Value_Size']].copy() df['Published'] = 0 # rename columns, and sorted columns df.columns = ['SKU', 'Parent', 'Name', 'Size', 'Published'] df = df[OF.columns].copy() # drop duplicates(keep the first duplicate row), and assign 1 to Published column dfn = df.drop_duplicates(keep='first').copy() # get rows with duplicated ParentPartNumber SFs = SFs[SFs['ParentPartNumber'].duplicated(keep = False)] #group by size gk =( SFs.groupby('ParentPartNumber'['Size'].agg(','.join).reindex(SFs['ParentPartNumber'].unique()).reset_indx()) # drop duplicates(keep the first duplicate row), SFs = SFs.drop_duplicates(subset='ParentPartNumber', keep='first').copy() dfn['SKU'] = SFs['ParentPartNumber'].values dfn['Name'] = SFs['WebName'].values dfn['Attribute 1 value(s)'] = gk['Size'].values dfn['Published'] = 1 # append first duplicate row to df, and sort index to keep row order dfn = dfn.append(df).sort_index() # append the duplicates rows to OF print(OF.append(dfn))
The issue is it works well with small sample size, however, for large files, it's not matching ParentPartNumber well and producing results where SKU = 3 should be below 3(extra) not above it
SKU Published Name Parent Size 4 1 Bec 8 1 Lin 2 1 Sidi S,M,L,XL 2.5 0 Honey 2 S 2.6 0 Honey 2 M 2.7 0 Honey 2 L 2.9 0 Honey 2 XL *3 0 Sho 3 S* 3 (extra) 1 ERQS S,M 3 0 ERQS 3 M
EDIT:
I have a source file and have to perform some operations given below to obtain the dataframe shown above. Please note that some of the variables are missing from the sample file. The operations are:
Create new row in OF with values as follows OF Type = variable OF SKU = SF ParentPartNumber OF Name = SF WebName OF YMM = ,0,0,, OF Published = 1 OF Is featured = 0 OF Visibility in catalog = visible OF Description = SF FullDescription OF Tax status = taxable OF In stock? = 1 OF Backorders allowed? = 0 OF Sold individually? = 0 OF Allow customer reviews? = 1 OF Images = SF ImagePath (when there is more than 1 link in SF replace | with , (comma)) OF Position = 0 OF Attribute = Size OF Attribute 1 value(s) = all values from SF Size that belong to the same ParentPartNumber separated with comma OF Attribute 1 visible = 0 OF Attribute 1 global = 0 then below that row copy all rows that belong to this ParentPartNumber as follows OF Type = variation OF SKU = SF PartNumber OF Name = SF ShortDescription OF YMM = ,0,0,, OF Published = 0 OF Is featured = 0 OF Visibility in catalog = visible OF Tax status = taxable OF Tax class = parent OF In Stock = 1 OF Backorders allowed? = 0 OF Sold individually? = 0 OF Weight (kg) = SF Weight OF Allow customer reviews? = 0 OF Regular price = SF RetailPriceEUR OF Parent = SF ParentPartNumber
https://stackoverflow.com/questions/65546310/include-extra-row-before-duplicated-rows January 03, 2021 at 11:09AM
没有评论:
发表评论