2021年3月18日星期四

Assistance with bash script to correctly pull the required column and set proper header using filename regex

I have a folder with multiple files. Each file has a naming convention of 1000T.quant.sf, 1000G.quant.sf, 1001T.quant.sf, 1001G.quant.sf, and so on. The script I wrote needs modification with the header generation line. Basically, the script pulls the first column once and loops inside of all the files to pull column 5 for each file inside a directory to generate an overall matrix with those columns. The problem I ran into is substituting the column header with the properly. I want to substitute the header with string before *.quant.sf in each column but currently I have doubleheader. How can I resolve this?

Snippet:

cut -f 1 `ls *quant.sf | head -1` > tmp  for x in *quant.sf; do   printf "\t" >> tsamples   printf `echo $x | cut -d. -f 1` >> tsamples   cut -f 5 $x | paste tmp - > tmp2   mv tmp2 tmp  done  echo "" >> tsamples  cat tsamples tmp > transcipts.numreads  rm tsamples tmp  

Current output

       1001G   1001T   1005G   1005T   1006G  Name    NumReads        NumReads        NumReads        NumReads        NumReads  ENST00000456328.2       12.090  0.000   0.000   0.000   1.545  ENST00000450305.2       0.000   0.000   0.000   0.000   0.000  ENST00000488147.1       620.145 204.533 451.949 250.643 437.618  ENST00000619216.1       0.000   0.000   0.000   0.000   0.000  ENST00000473358.1       0.000   3.680   0.000   1.000   0.000  ENST00000469289.1       4.990   0.000   0.000   0.000   0.000  ENST00000607096.1       0.000   0.000   0.000   0.000   0.000  ENST00000417324.1       0.000   0.000   0.000   0.000   0.000    

Desired output:

Name                    1001G   1001T   1005G  1005T    1006G  ENST00000456328.2       12.090  0.000   0.000   0.000   1.545  ENST00000450305.2       0.000   0.000   0.000   0.000   0.000  ENST00000488147.1       620.145 204.533 451.949 250.643 437.618  ENST00000619216.1       0.000   0.000   0.000   0.000   0.000  ENST00000473358.1       0.000   3.680   0.000   1.000   0.000  ENST00000469289.1       4.990   0.000   0.000   0.000   0.000  ENST00000607096.1       0.000   0.000   0.000   0.000   0.000  ENST00000417324.1       0.000   0.000   0.000   0.000   0.000  

One input file contents:

$ head 1005T.salmon_quant.sf  Name    Length  EffectiveLength TPM     NumReads  ENST00000456328.2       1657    1441.000        0.000000        0.000  ENST00000450305.2       632     417.000 0.000000        0.000  ENST00000488147.1       1351    1170.738        4.987413        250.643  ENST00000619216.1       68      69.000  0.000000        0.000  ENST00000473358.1       712     512.539 0.045452        1.000  ENST00000469289.1       535     323.000 0.000000        0.000  ENST00000607096.1       138     18.000  0.000000        0.000  ENST00000417324.1       1187    971.000 0.000000        0.000  ENST00000461467.1       590     376.000 0.000000        0.000  
https://stackoverflow.com/questions/66701410/assistance-with-bash-script-to-correctly-pull-the-required-column-and-set-proper March 19, 2021 at 09:31AM

没有评论:

发表评论