2021年3月9日星期二

Parallel BLAST program takes too long

When I run the following code I do not even get one blast result. Could someone let me know if they spot a bug?

from Bio.Blast import NCBIWWW  from Bio import SeqIO  from Bio.Blast import NCBIXML  from multiprocessing import Pool  import time      def blast_sequences_parallel(seq_record):      result_handle = NCBIWWW.qblast("blastn", "nt", seq_record.seq, entrez_query='txid10239[viruses]')      blast_records = NCBIXML.parse(result_handle)      return blast_records      if __name__ == "__main__":      file = "file.fa"      get_number_of_seqs(file)      seq_records = SeqIO.parse(file, "fasta")      t1 = time.time()      p = Pool()      results = p.map(blast_sequences_parallel, seq_records)      p.close()      p.join()        print("Pool took:", time.time() - t1)      print(results)  

I have 73,000 sequences to run so I was trying to make it faster. I am running it on a super computer. Any suggestions for how much memory I would need and how many cores/nodes? I also have tried the following command in shell:

blastn -query file.fa -remote  

But I get an error message suggesting that I need to download a database? Is there a way to use the online server for the search? If there is a way, can I search only against virus genomes?

https://stackoverflow.com/questions/66558554/parallel-blast-program-takes-too-long March 10, 2021 at 12:50PM

没有评论:

发表评论