I have a folder with multiple files. Each file has a naming convention of 1000T.quant.sf, 1000G.quant.sf, 1001T.quant.sf, 1001G.quant.sf, and so on. The script I wrote needs modification with the header generation line. Basically, the script pulls the first column once and loops inside of all the files to pull column 5 for each file inside a directory to generate an overall matrix with those columns. The problem I ran into is substituting the column header with the properly. I want to substitute the header with string before *.quant.sf in each column but currently I have doubleheader. How can I resolve this?
Snippet:
cut -f 1 `ls *quant.sf | head -1` > tmp for x in *quant.sf; do printf "\t" >> tsamples printf `echo $x | cut -d. -f 1` >> tsamples cut -f 5 $x | paste tmp - > tmp2 mv tmp2 tmp done echo "" >> tsamples cat tsamples tmp > transcipts.numreads rm tsamples tmp
Current output
1001G 1001T 1005G 1005T 1006G Name NumReads NumReads NumReads NumReads NumReads ENST00000456328.2 12.090 0.000 0.000 0.000 1.545 ENST00000450305.2 0.000 0.000 0.000 0.000 0.000 ENST00000488147.1 620.145 204.533 451.949 250.643 437.618 ENST00000619216.1 0.000 0.000 0.000 0.000 0.000 ENST00000473358.1 0.000 3.680 0.000 1.000 0.000 ENST00000469289.1 4.990 0.000 0.000 0.000 0.000 ENST00000607096.1 0.000 0.000 0.000 0.000 0.000 ENST00000417324.1 0.000 0.000 0.000 0.000 0.000
Desired output:
Name 1001G 1001T 1005G 1005T 1006G ENST00000456328.2 12.090 0.000 0.000 0.000 1.545 ENST00000450305.2 0.000 0.000 0.000 0.000 0.000 ENST00000488147.1 620.145 204.533 451.949 250.643 437.618 ENST00000619216.1 0.000 0.000 0.000 0.000 0.000 ENST00000473358.1 0.000 3.680 0.000 1.000 0.000 ENST00000469289.1 4.990 0.000 0.000 0.000 0.000 ENST00000607096.1 0.000 0.000 0.000 0.000 0.000 ENST00000417324.1 0.000 0.000 0.000 0.000 0.000
One input file contents:
$ head 1005T.salmon_quant.sf Name Length EffectiveLength TPM NumReads ENST00000456328.2 1657 1441.000 0.000000 0.000 ENST00000450305.2 632 417.000 0.000000 0.000 ENST00000488147.1 1351 1170.738 4.987413 250.643 ENST00000619216.1 68 69.000 0.000000 0.000 ENST00000473358.1 712 512.539 0.045452 1.000 ENST00000469289.1 535 323.000 0.000000 0.000 ENST00000607096.1 138 18.000 0.000000 0.000 ENST00000417324.1 1187 971.000 0.000000 0.000 ENST00000461467.1 590 376.000 0.000000 0.000
https://stackoverflow.com/questions/66701410/assistance-with-bash-script-to-correctly-pull-the-required-column-and-set-proper March 19, 2021 at 09:31AM
没有评论:
发表评论