Quantcast
Channel: Post Feed
Viewing all articles
Browse latest Browse all 3231

Picard Matequery Slows Process To A Crawl

$
0
0
I'm looking to iterate through an indexed BAM file using picard and perform various tests on both a read and it's mate. For some I would need the full SAMRecord for the mate so I can't just use the getMate() methods of the record. I can read through the file using the iterator but once I add in a line that creates the mate record the program slows to a crawl. I've tried the methods SAMFileReader methods queryMate(), query(). The rest of the query() methods are some variant of those (queryContained, queryAlignmentStart, queryOverlapping) and all end up calling the same path. I've traced down that one of the instances of speed loss ocurs when the BAMFileSpan is created in BAMFileReader.createIndexIterator(). Also it appears that no matter what indexing regions are created, an iterator is created that must re-traverse the whole file rather than looking at the file by an offset as would be the case with samtools mpileup. Is there a way to resolve this? Currently putting one line into my read loop changes the normal read time of about 20 seconds to not completing within hours. A bare bones version of the code is below. As is, it should run very quickly, even still on the order of minutes for large bam files, while if you comment out either of the methods for finding the mate record, it does not finish under ...

Viewing all articles
Browse latest Browse all 3231

Trending Articles