Picard Matequery Slows Process To A Crawl

August 6, 2013, 1:55 pm

≫ Next: ChIPseq paired-end bowtie2 concern regarding biological replicates

≪ Previous: How To Count Stand-Specific Paired-End Rna-Seq Reads Overlapping Known Protein Coding Genes ?

I'm looking to iterate through an indexed BAM file using picard and perform various tests on both a read and it's mate. For some I would need the full SAMRecord for the mate so I can't just use the getMate() methods of the record. I can read through the file using the iterator but once I add in a line that creates the mate record the program slows to a crawl. I've tried the methods SAMFileReader methods queryMate(), query(). The rest of the query() methods are some variant of those (queryContained, queryAlignmentStart, queryOverlapping) and all end up calling the same path. I've traced down that one of the instances of speed loss ocurs when the BAMFileSpan is created in BAMFileReader.createIndexIterator(). Also it appears that no matter what indexing regions are created, an iterator is created that must re-traverse the whole file rather than looking at the file by an offset as would be the case with samtools mpileup. Is there a way to resolve this? Currently putting one line into my read loop changes the normal read time of about 20 seconds to not completing within hours. A bare bones version of the code is below. As is, it should run very quickly, even still on the order of minutes for large bam files, while if you comment out either of the methods for finding the mate record, it does not finish under ...

↧

ChIPseq paired-end bowtie2 concern regarding biological replicates

August 1, 2014, 1:28 pm

≫ Next: Merging Illumina Paired End Reads

≪ Previous: Picard Matequery Slows Process To A Crawl

I am working with ChIP-seq paired-end data where there is concern that one or more of the biological replicates may not be very good, but it is unknown which replicate may have a problem (I suspect there has to be at least one poor replicate in the data). The first part of my question is very simple: what do you recommend that I do to find this replicate to either toss it or fix it with some quality trimming on the ends? For the moment, I tried using bowtie2 to trim 10 bp from the 5' and 3' ends of the reads in each of my samples just to see whether this fixes my problem. To define what I mean by "problem": basically, my final results (gene list) does not come out as I would expect it to come out (there are no genes of a certain type that I am looking for based on my biological intuition for what I should be seeing). When I run bowtie2 with the trimming options set, I do indeed get my .sam files okay, but my error file tells me: (ERR): bowtie2-align died with signal 2 (INT) 20172305 reads; of these: 20172305 (100.00%) were paired; of these: 3064536 (15.19%) aligned concordantly 0 times 13699211 (67.91%) aligned concordantly exactly 1 time 3408558 (16.90%) aligned concordantly >1 times ---- 3064536 pairs aligned concordantly 0 times; of these: 807633 (26.35%) aligned discordantly 1 time ---- 2256903 pairs aligned 0 times concordantly or discordantly; of these: 4513806 mates make up the pairs; of these: ...

↧

Merging Illumina Paired End Reads

July 9, 2013, 12:40 am

≫ Next: BWA MEM mate pair rescue

≪ Previous: ChIPseq paired-end bowtie2 concern regarding biological replicates

Dear All,

I have fastq a dataset containing forward and reverse sequences obtained through paired end module of Illumina platform. I am trying to merge these paired end reads. I have a query which I would wish to get cleared before I proceed further. Do I need to get the reverse complement of the reverse sequence dataset in order to carry on with the paired end merging?

I have referred few papers and tutorials on this , but they have not mentioned anything about doing a reverse complement. I am bit confused in this step. Kindly help me out!!!

Responses are highly appreciated!!!

↧

BWA MEM mate pair rescue

August 28, 2014, 7:06 am

≫ Next: Does Every Read In Paired-End Sam File Have The 0X0001 Flag?

≪ Previous: Merging Illumina Paired End Reads

Greetings,

Can someone spell out what this option means? I have several guesses, but would rather ask than guess.

-P In the paired-end mode, perform SW to rescue missing hits only but do not try to find hits that fit a proper pair.

↧

Does Every Read In Paired-End Sam File Have The 0X0001 Flag?

March 25, 2014, 4:40 pm

≫ Next: How To Read Maq Paired-End Alignemnt Data Using Bioconductor Packages

≪ Previous: BWA MEM mate pair rescue

Hello everyone, I have a simple, generic, question about SAM format flags in paired-end Illumina data. Does every read in a SAM file from a paired-end sequencing run automatically have the 0x0001 flag?

Thanks!

↧

How To Read Maq Paired-End Alignemnt Data Using Bioconductor Packages

February 21, 2013, 8:03 pm

≫ Next: Using Paired End And Orphaned Singles For De Novo Assembly

≪ Previous: Does Every Read In Paired-End Sam File Have The 0X0001 Flag?

Hi all I have maq paired-end alignment files that I want to read into R. I have tried to browse several packages and they all seem to depend on ShortRead package of bioconductor which does not currently support paired-end reads. Does anybody know of any Bioconductor packages which support paired-end alignment data. As there are lot of good coverage plot functions in bioconductor I want to utilize them thus it would be great if some one could suggest any package that has support for paired-end alignment reading. Also I have tried to use my own scripts for plotting coverage and I want to overlay the coverage of all the chromosomes in one graph (using different colors e.t.c) It would be great to know If anybody has tried to implement it too. Since the files are too big it takes a lot of time if I want to plot data for all the chromosomes at the same time. Following is the code that I found somewhere and have modified to my needs but it takes a lot of time if I want to plot the data for all chromosomes. Following is the code I use it for a single chromosome I am wondering how can I plot the data for all the chromosomes in one plot and more efficiently.

data <-read.table(file="croppedpileup.out",sep="\t",header=F)
colnames(data)<-c("pos","consensus","coverage")
depth<-mean(data[,"coverage"])
# depth now has the mean (overall)coverage
#set the bin-size
window<-101
rangefrom<-0
rangeto<-length(data[,"pos"])
data.smoothed<-runmed(data[,"coverage"],k=wi ...

↧

Using Paired End And Orphaned Singles For De Novo Assembly

September 27, 2013, 9:14 am

≫ Next: Macs Raises Error: No Such File Or Directory

≪ Previous: How To Read Maq Paired-End Alignemnt Data Using Bioconductor Packages

I have been using FastX to process reads prior to de novo assembly and mapping. What I have discovered and few have pointed out is the FastX will delete reads leaving reads unpaired which changes the order of the separate paired fastq files. While it is difficult to know if this is affecting assembly with Trinity, it is definitely a problem for assembly with Velvet/Oases and mapping with Bowtie or BWA. Because the order of the paired reads has changed due to deletions of low quality reads, the reads are no longer order properly and will not map as paired. There are some work arounds provided by sfg.stanford.edu and others to separate the reads that are still paired and place the orphaned reads in a separate file. But here is the problem, I would like to use paired reads in combination with single reads for de novo assembly. In Trinity, one designates as --right -- left or --singles, but you cannot do both. Question: Can any assembler use both paired and single reads at the same time for de novo assembly? Q2: Has anyone else run into this problem? Here is a related post: http://seqanswers.com/forums/showthread.php?t=24076 Q3: This issue is going to eliminate FastX from my pipeline of assembly and mapping. It seems like this should be a bigger issue but there is fairly little out there about this. Am I doing something wrong with FastX that is causin ...

↧

Macs Raises Error: No Such File Or Directory

November 28, 2013, 12:02 pm

≫ Next: Warnings In Bowtie Mapping

≪ Previous: Using Paired End And Orphaned Singles For De Novo Assembly

I am trying to run macs14 with sam files from paired-end data + control. Macs14 returns "No such file":

sb7904313:line2 $ macs14 -t /Volumes/Data/G6L2_G6L3/s5/clean_paired_sample.sam -c /Volumes/Data/G6L2_G6L3/s11/clean_paired_sample.sam 
-bash: macs14 -t /Volumes/Data/G6L2_G6L3/s5/clean_paired_sample.sam -c /Volumes/Data/G6L2_G6L3/s11/clean_paired_sample.sam : No such file or directory

Yet the files exist:

sb7904313:line2 $ ls -all /Volumes/Data/G6L2_G6L3/s5/clean_paired_sample.sam
-rw-r--r--  1 nn  staff  24776263204 28 Nov 11:51 /Volumes/Data/G6L2_G6L3/s5/clean_paired_sample.sam
sb7904313:line2 $ ls -all /Volumes/Data/G6L2_G6L3/s11/clean_paired_sample.sam
-rw-r--r--  1 nn  staff  14223812120 28 Nov 12:19 /Volumes/Data/G6L2_G6L3/s11/clean_paired_sample.sam

Macs14 seems to be correctly installed:

sb7904313:line2 $ macs14 --version
macs14 1.4.2 20120305

I am just stumped. Can you give advice?

↧

Warnings In Bowtie Mapping

October 16, 2013, 10:23 pm

≫ Next: How To Extract Information From Fastq Pair End Files

≪ Previous: Macs Raises Error: No Such File Or Directory

Hello, I am trying to use bowtie on small synthetic data for short read mapping. My command is ./bowtie -p 8 -t -S hg19 -1 synthetic_sample1.fq -2 synthetic_sample2.fq > bowtie.sam. The alignment stats is really bad.

Seeded quality full-index search: 00:03:12
# reads processed: 1000003
# reads with at least one reported alignment: 129 (0.01%)
# reads that failed to align: 999874 (99.99%)
Reported 129 paired-end alignments to 1 output stream(s)
Time searching: 00:03:17
Overall time: 00:03:17
[samopen] no @SQ lines in the header.
[sam_read1] missing header? Abort!

My error log says the following almost with every read I think. So the error log is a long list of similar errors

Warning: Exhausted best-first chunk memory for read chrY_25607312_25607822_7:0:0_4:0:0_13d4/1 (patid 986138); skipping read

When I googled the warning, I saw some suggestions of using --chunkmbs while running bowtie. I am not really sure what that does. I coudn't understand it from the manual. Still I used it with this command as it was suggested in one of the forums ./bowtie -p 8 -t --chunkmbs 256 -S hg19 -1 synthetic_sample1.fq -2 synthetic_sample2.fq > bowtie.sam. Then my error log says

Time loading reference: 00:00:01
Time loading forward index: 00:00:01
Time loading mirror index: 00:00:02
Error: Could not allocate ChunkPool of 268435456 bytes
Warning: Exhausted best-first chunk memory for read Error: Could not allocate ChunkPool ...

↧

How To Extract Information From Fastq Pair End Files

January 8, 2014, 2:40 am

≫ Next: Filter Paired-End Sam File For Xt:A:U

≪ Previous: Warnings In Bowtie Mapping

dear BioStars users,

I would like to extract from my pair-end fastq files information how many times my read is occurring in my fastq file.

So output could look -

my read (sequence) - how many times I found it in fastq file :

CCGGCTCGC - 140x CTTCGCGCC - 2x

I tried to use awk to comparing all reads to each other, but it does not work very well :-(

Is there any tool or idea how to compare all reads to each other and extract how many times is occurring my reads in fastq file?

Thank you so much for any idea and help! I hope my question is clear..

Paul.

↧

Filter Paired-End Sam File For Xt:A:U

April 13, 2012, 2:47 am

≫ Next: Bwa Sampe Segmentation Fault

≪ Previous: How To Extract Information From Fastq Pair End Files

Dear all,

I have a sam file (BWA output, paired-end reads). I would like to retain only reads which are "properly paired". This I would do by:

samtools view -f 0x002 file.sam > file_filtered.sam

Additionally I would like to retain only those pairs of reads where both reads have the XT:A:U tag. It is important to me that after the filterting step I still have the pairs together (so read1, read2, read1, read2, ...).

Any ideas how to do so?

Thanks for any help! Stefanie

↧

Bwa Sampe Segmentation Fault

June 12, 2012, 9:40 am

≫ Next: Take A Subset Of A Fastq Paired-End Sample

≪ Previous: Filter Paired-End Sam File For Xt:A:U

Hi everyone, I'm running bwa in the sampe mode and, after successfully processing >10M reads it fails with a segmentation fault (as follows) on what appears like a set of poorly-alignable reads. Any suggestions on what can be done to overcome this problem would be much appreciated. Many thanks!

#chunk processed ok
[bwa_read_seq] 2.8% bases are trimmed.
[bwa_sai2sam_pe_core] convert to sequence coordinate...
[infer_isize] (25, 50, 75) percentile: (492, 529, 561)
[infer_isize] low and high boundaries: 354 and 699 for estimating avg and std
[infer_isize] inferred external isize from 55618 pairs: 523.901 +/- 55.549
[infer_isize] skewness: -0.798; kurtosis: 0.914; ap_prior: 2.38e-04
[infer_isize] inferred maximum insert size: 897 (6.71 sigma)
[bwa_sai2sam_pe_core] time elapses: 9.46 sec
[bwa_sai2sam_pe_core] changing coordinates of 9766 alignments.
[bwa_sai2sam_pe_core] align unmapped mate...
[bwa_paired_sw] 76039 out of 91346 Q17 singletons are mated.
[bwa_paired_sw] 5151 out of 16392 Q17 discordant pairs are fixed.
[bwa_sai2sam_pe_core] time elapses: 52.09 sec
[bwa_sai2sam_pe_core] refine gapped alignments... 4.60 sec
[bwa_sai2sam_pe_core] print alignments... 1.93 sec
[bwa_sai2sam_pe_core] 11010048 sequences have been processed.

# failed chunk
[bwa_read_seq] 3.1% bases are trimmed.
[bwa_sai2sam_pe_core] convert to sequence coordinate...
[infer_isize] fail to infer insert size: too few good pairs
[bwa_sai2sam_pe_core] ti ...

↧

Take A Subset Of A Fastq Paired-End Sample

March 19, 2013, 11:36 am

≫ Next: What Does Requirebothendsmapped From Rsubread Package Means?

≪ Previous: Bwa Sampe Segmentation Fault

Hi,

I have two paired-end fastq compressed files coming from HiSeq RNA-SEq experiment, ie., pair.1.fastq.gz and pair.2.fastq.gz.

The files are very large so I wanted to just take a few million/thousand reads from each of them (by their respective pairs) and use that file for trying/debuuging purposes.

The results should be two paired-end files, i.e., pair.test.1.fastq.gz and pair.test.2.fastq.gz.

I'd be happy to hear some suggestions on how to do this or hear about tools available, thanks!

↧

What Does Requirebothendsmapped From Rsubread Package Means?

February 13, 2014, 7:05 am

≫ Next: Paired End Vs Single End Detection

≪ Previous: Take A Subset Of A Fastq Paired-End Sample

Hi,

I am using the featureCounts from the Rsubread package. And I am trying to understand what does the requireBothEndsMapped option do. The manual says

"logical indicating if both ends from the same fragment are required to be successfully aligned before the fragment can be assigned to a feature or meta-feature. This parameter is only appliable when isPairedEnd is TRUE."

My data is paired end and i am counting miRNA,, so i provided miRNA as gtf and my bam file as my input.

Is this requireBothEndsMapped applicable in my case? Can you explain me in a simple way?

↧

Paired End Vs Single End Detection

May 30, 2012, 8:47 am

≫ Next: Paired-End Bam Files

≪ Previous: What Does Requirebothendsmapped From Rsubread Package Means?

Hi, I am wondering if anyone has a subroutine or function (hopefully in Perl or pseudocode) to detect whether a fastq file is a shuffled paired-end or not. I think I've seen a few different syntaxes on headers and so I feel like it requires a little more knowledge on their formats than what I have right now.

Yes, I am still looking into it, but I am wondering if this has already been done, to save me time...

↧

Paired-End Bam Files

February 6, 2014, 12:55 am

≫ Next: Forum: Mapping Of Ngs Short Reads

≪ Previous: Paired End Vs Single End Detection

Hi,

Having two BAM files from NGS data, how can one check if they are the BAM files (left and right) from a paired end mapping of the same sample? Thanks for the help.

↧

Forum: Mapping Of Ngs Short Reads

September 2, 2012, 7:45 am

≫ Next: Collect Read Pairs Where At Least One Read Is Mapped

≪ Previous: Paired-End Bam Files

This is a simple explanation of how the mapping of short reads works !http://www.youtube.com/watch?v=1ZyoI-4ObSA&feature=related see the first 16 min ! It helped me a lot to understand the basic idea of short read mapping.

↧

Collect Read Pairs Where At Least One Read Is Mapped

June 24, 2013, 4:27 pm

≫ Next: X And Y Chromsome Crossover Position Alignments

≪ Previous: Forum: Mapping Of Ngs Short Reads

I might word my initial question like another post, but I really have the opposite meaning, i think:Filtering multiple flags with SAMtools

I am trying to remove paired-end reads from a .SAM file where neither segment is mapped

But by "remove" I don't mean collect. I want all the read pairs where the forward read OR the reverse read OR both reads are mapped then I will use bam2fastq to get the reads and assemble.

I think there are pieces missing from my reference. I will use all these reads to try to assemble a better reference. So if one read maps to the reference, but its pair does not, that is a good read for me; the reverse read is possibly part of the sequence that is missing from my reference.

What SAM flags should I set?

↧

X And Y Chromsome Crossover Position Alignments

March 25, 2013, 12:21 am

≫ Next: Is There An Elegant Way To Extract Only The Properly-Paired Reads In A Sam/Bam File?

≪ Previous: Collect Read Pairs Where At Least One Read Is Mapped

How do I find the genotypes on the X chromosome which match the Y SNPs listed in raw data from 23andMe, Ancestry or FTDNA

↧

Is There An Elegant Way To Extract Only The Properly-Paired Reads In A Sam/Bam File?

April 29, 2013, 7:28 am

≫ Next: Aligning Paired-End Reads In Single-End Mode

≪ Previous: X And Y Chromsome Crossover Position Alignments

I know I should be filtering for the following tags: 99,163,83,147 and I know that samtools would work to get all the pairs. For example:samtools view -F 0x99 -b in.bam I was wondering if there was a more elegant way to do this than running samtools four times to filter for each tag. It also occured to me, I would probably have to sort the bam files afterward to ensure that the pairs were in the same order, which means I have to run the sort function 4 times as well.

I would appreciate knowing if there was a better way to do this.

↧