When I run the following code I do not even get one blast result. Could someone let me know if they spot a bug?
from Bio.Blast import NCBIWWW from Bio import SeqIO from Bio.Blast import NCBIXML from multiprocessing import Pool import time def blast_sequences_parallel(seq_record): result_handle = NCBIWWW.qblast("blastn", "nt", seq_record.seq, entrez_query='txid10239[viruses]') blast_records = NCBIXML.parse(result_handle) return blast_records if __name__ == "__main__": file = "file.fa" get_number_of_seqs(file) seq_records = SeqIO.parse(file, "fasta") t1 = time.time() p = Pool() results = p.map(blast_sequences_parallel, seq_records) p.close() p.join() print("Pool took:", time.time() - t1) print(results) I have 73,000 sequences to run so I was trying to make it faster. I am running it on a super computer. Any suggestions for how much memory I would need and how many cores/nodes? I also have tried the following command in shell:
blastn -query file.fa -remote But I get an error message suggesting that I need to download a database? Is there a way to use the online server for the search? If there is a way, can I search only against virus genomes?
https://stackoverflow.com/questions/66558554/parallel-blast-program-takes-too-long March 10, 2021 at 12:50PM
没有评论:
发表评论