2021年5月4日星期二

Transposing partially only some elements of a dataframe conditioned to matching two specific variables

Good evening Stackoverflow experts.  

How can I transpose only a part of a dataframe to another part of the same dataframe conditioned to some particular values of some cells in the dataframe?

I have a dataframe in r with the following structure:

str(df)    # 'data.frame': 20 obs. of  11 variables:  #  $ Code1    : int  1 1 1 1 2 2 3 3 4 4 ...  #  $ SpeciesG1: chr  "Sp1" "Sp1" "Sp1" "Sp1" ...  #  $ Family   : chr  "Fam1" "Fam2" "Fam3" "Fam4" ...  #  $ Pct_feed : num  55.4 12.56 1.35 30.69 62.93 ...  #  $ Code2    : int  0 0 0 0 0 0 0 0 0 0 ...  #  $ SpeciesG2: chr  "0" "0" "0" "0" ...  #  $ Fam1     : int  0 0 0 0 0 0 0 0 0 0 ...  #  $ Fam2     : int  0 0 0 0 0 0 0 0 0 0 ...  #  $ Fam3     : int  0 0 0 0 0 0 0 0 0 0 ...  #  $ Fam4     : int  0 0 0 0 0 0 0 0 0 0 ...  #  $ Fam5     : int  0 0 0 0 0 0 0 0 0 0 ...  

The data looks as follows (this is only a small sample of a df that is several hundreds rows long):

df  #    Code1 SpeciesG1 Family Pct_feed Code2 SpeciesG2 Fam1 Fam2 Fam3 Fam4 Fam5  # 1      1       Sp1   Fam1    55.40     0         0    0    0    0    0    0  # 2      1       Sp1   Fam2    12.56     0         0    0    0    0    0    0  # 3      1       Sp1   Fam3     1.35     0         0    0    0    0    0    0  # 4      1       Sp1   Fam4    30.69     0         0    0    0    0    0    0  # 5      2       Sp1   Fam2    62.93     0         0    0    0    0    0    0  # 6      2       Sp1   Fam3    37.07     0         0    0    0    0    0    0  # 7      3       Sp2   Fam4    17.49     0         0    0    0    0    0    0  # 8      3       Sp2   Fam5    82.51     0         0    0    0    0    0    0  # 9      4       Sp2   Fam1    36.97     0         0    0    0    0    0    0  # 10     4       Sp2   Fam2    51.46     0         0    0    0    0    0    0  # 11     4       Sp2   Fam3    11.57     0         0    0    0    0    0    0  # 12     5       Sp3   Fam1    41.81     0         0    0    0    0    0    0  # 13     5       Sp3   Fam2     9.64     0         0    0    0    0    0    0  # 14     5       Sp3   Fam3    31.74     0         0    0    0    0    0    0  # 15     5       Sp3   Fam4     5.12     0         0    0    0    0    0    0  # 16     5       Sp3   Fam5    11.69     0         0    0    0    0    0    0  # 17     5       Sp4   Fam2    41.16     0         0    0    0    0    0    0  # 18     5       Sp4   Fam3    40.04     0         0    0    0    0    0    0  # 19     5       Sp4   Fam4     4.32     0         0    0    0    0    0    0  # 20     5       Sp4   Fam5    14.48     0         0    0    0    0    0    0  

I need to replace some of the zero values of columns 7-11 ('Fam1' ... 'Fam5'), in accordance to the following 3-step procedure:

Step 1: For all possible pairs of 'Code1' and 'SpeciesG1' I want the Pct_feed values replacing the zero values **matching ** the family names columns in columns 7→11 (i.e., 'Fam1', 'Fam2', 'Fam3', 'Fam4'  and' Fam5'), but expanding those values horizontally (or transposing, that is, all values in the same row).    Step 2: In that same single row created in Step 1, the values of 'Code1' and 'SpeciesG1' should replace the '0' values under columns 'Code2' and ' SpeciesG2'.    Step 3: All the rows, except the single row created in Step 1, with the same pair of values of 'Code1' and 'SpeciesG1', should be deleted.    The end result of this 3-step operation, as applied to df, should look as follows:    #Code1   SpeciesG1   Family  Pct_feed   Code2   SpeciesG2    Fam1    Fam2    Fam3    Fam4    Fam5  #1        Sp1        Fam1     55.40     1       Sp1          55.40   12.56  1.35     30.69    0  #2        Sp1        Fam2     62.93     2       Sp1          0       62.93  37.07    0        0  #3        Sp2        Fam4     6.11      3       Sp2          0       0      0        17.49    82.51  #4        Sp2        Fam1     36.97     4       Sp2          36.97   51.46  11.57    0        0  #5        Sp3        Fam1     41.81     5       Sp3          41.81   9.64   31.74    5.12     11.69  #5        Sp4        Fam2     41.16     5       Sp4          0       41.16  40.04    4.32     14.48    NOTE1: As a check, the sum of all Pct_feed values of a given pair of values of 'Code1' and 'SpeciesG1' should be 100.  NOTE2: I have looked into the dplyr package but I couldn't find a way of obtaining the results I need.    I would very much appreciate any help provided to solve this need.    Thank you very much    Jorge  
https://stackoverflow.com/questions/67394189/transposing-partially-only-some-elements-of-a-dataframe-conditioned-to-matching May 05, 2021 at 09:47AM

没有评论:

发表评论