Coverage For Pair-End Rna-Seq, Extend Reads Or Not?

February 27, 2013, 8:02 pm

≫ Next: Sorting Fastq Files After Trimming (Orphans And Pe)

HI, all I have a question when computing region coverage of pair-end RNASeq data. As showed by the sketch map, when computing region coverage, whether I should use actually mapped reads or extended reads? The coverage for impacted region may be different. I wonder which one is reasonable. I want to compute and compare coverage for small regions like 100 bp, not as large as a gene. Also, one of the compared sample is sequenced Pair-endly, the other is sequenced single-endly. Which way of computing is more suitable in this case? I have a Rip-Seq and want to use this RNA-Seq as a control to call peaks. The genome would be binned into 100bp regions and coverage of each region will be computed for both Rip-Seq and RNA-Seq. A following fisher's test would be used to select significantly enriched regions of Rip-Seq. The region is defined. "How many fragments were sequenced from this region?" is what I want to ask . If one of my defined regions happens to located in the internal region of two ends of a fragment, the coverage of it in RNA-Seq would be 0 if just mapped tags used. However, this region is surely covered by reads. Thank you very much!

Reference                       ========--------------------------------------------------===========
Actually mapped reads            ^^^^                                                            $$$$
Extended reads                   ^^^^^^^                                                  $$$$$$$$$$$
Impacted r ...

↧

Sorting Fastq Files After Trimming (Orphans And Pe)

December 23, 2012, 2:37 pm

≫ Next: Trimming Adapters For Paired-End Sequences

≪ Previous: Coverage For Pair-End Rna-Seq, Extend Reads Or Not?

I have a bunch of Illumina PE data that has been run through fastx trimmer and clipper. I am ready to map these reads, but am needing to create 2 files for paired end reads (the left and right hand reads in separate files) and a file with the orphaned reads. Of course the paired end files need to have the reads in the same order.

This has to be a common problem, but I can't seem to find a tool that parses fastq files in this way (I swear I searched the Biostar forum).

Any help would be greatly appreciated.

↧

Trimming Adapters For Paired-End Sequences

February 6, 2013, 10:41 am

≫ Next: How To Check If Illumina Fastq Is Single Or Paired End With Minimal Sequence Id

≪ Previous: Sorting Fastq Files After Trimming (Orphans And Pe)

Hi all,

I got illumina paired end fastq files. They told me to trim read 2 at the beginning for ~20 to 30 bp due to the WGA adapters.

Can we find the adapters by looking in to the quality? Which tool is good for trimming adapters by keeping the paired nature of the sequences? Is there any issues will come if I use bowtie2 in the downstream for aligning the trimmed sequences(trimming only one in pair) with ref?

Following is the example of read2:

@HWI-ST1162:139:C0H7WACXX:5:1101:1865:1112 2:N:0:CGATGTA GTCATGGTGTCTCTTCACAACAATGGAAACCCTAACTAAGACAAAGACTAATAGAAGTGTTTTTTTAGGAA

<9;>;>?1=>;=9=?########################################################

Thanks, Deepthi

↧

How To Check If Illumina Fastq Is Single Or Paired End With Minimal Sequence Id

March 14, 2014, 8:26 am

≫ Next: Download Large Paired-End Rna-Seq And Microarray Data Of The Same Sample (> 10 Gb Of *.Sra Files)?

≪ Previous: Trimming Adapters For Paired-End Sequences

Hi all, I am trying to check if a FASTQ is single or paired end. From wikipedia I saw that default format has to be like this:

@HWUSI-EAS100R:6:73:941:1973#0/1

but in my case the sequence id is like

@HWUSI-EAS100R:6:73:941:1973

with missing # part.. Can I assume that it is single end? I could not find a good source to learn from about it.. can you also point me to something like this? Thanks

↧

Download Large Paired-End Rna-Seq And Microarray Data Of The Same Sample (> 10 Gb Of *.Sra Files)?

June 28, 2012, 9:47 am

≫ Next: How To Interpret Crossbow Data

≪ Previous: How To Check If Illumina Fastq Is Single Or Paired End With Minimal Sequence Id

Hi,

I was trying to find if there is any large size paired-end RNA-Seq and microarray data of the same sample, but I wasn't able to find as much data as I wanted.

Could somebody please point out where I can find such data?

I hope to have a data set that has both paired-end RNA-Seq data and microarray data of the same sample.

What I'm thinking about are data sets that's at least 10 GB of *.sra files.

Thanks!

↧

How To Interpret Crossbow Data

September 29, 2013, 6:36 pm

≫ Next: Should We Dump Illumina Pair-End Mapping Results In Sam With Mapq=0, But Good Template Length

≪ Previous: Download Large Paired-End Rna-Seq And Microarray Data Of The Same Sample (> 10 Gb Of *.Sra Files)?

Hello,

May I know how to interpret Crossbow output. Is there any chance of building a SAM or BAM file from Crossbow output. I am looking into genome mapping. I know there are many tools available for genome mapping on hadoop cluster. But I like the Crossbow interface, very easy to use.

PS: I dont have a background in Bioinformatics. I am an IT student working on Hadoop infrastructure to assess its performance for an organization.

Thanks

↧

Should We Dump Illumina Pair-End Mapping Results In Sam With Mapq=0, But Good Template Length

December 18, 2012, 5:23 am

≫ Next: In Paired-End Data, If One Read Of The Pair Is Unmapped, Is That Pair Autmoatically Improper?

≪ Previous: How To Interpret Crossbow Data

hi, everyone! I am working on illumina pair-end sequencing

After mapping by bwa, I got a pair of reads with MAPQ=0, with both reads mapped to more than one place. But the Template Length is OK, and I find this pair of reads in this sam file only once.

So, should we dump this mapping pair?

SPECIFIC_ID 83 Y 58900709 0 92M = 58900679 -122 AGTGCATTCCATTCCAGTCTCTTCAGTTCGATTCCATTCCATTCGTTTCGATTCCTTTCCATTCCAGCCCATTCCATTCCATTGCATTCCTT DDDCDCDDD DDDDDDDDDDDDEEC?FFFFFHHHHHJJIJIHJIJJJJIJJJJJJIJHGJJIHFJJJIGIJJIHIJIJHJJIJJIJJIFHFFH XT:A:R NM:i:1 SM:i:0 AM:i:0 X0 :i:9 X1:i:16 XM:i:1 XO:i:0 XG:i:0 MD:Z:83C8 :i:9 X1:i:16 XM:i:1 XO:i:0 XG:i:0 MD:Z:83C8

SPECIFIC_ID 163 Y 58900679 0 92M = 58900709 122 AGAACCTTCCATTACACTCCCTTCCATTCCAGTGCATTCCATTCCAGTCTCTTCAGTTCGATTCCATTCCATTCGTTTCGATTCCTTTCCAT FHGHHIIJJ JGJIIJIJJIIJJIIJEIGEGHIIJJIJIGIIIFCD XT:A:R NM:i:0 SM:i:0 AM:i:0 X0 :i:2 X1:i:18 XM:i:0 XO:i:0 XG:i:0 MD:Z:92

Thanks very much!

↧

In Paired-End Data, If One Read Of The Pair Is Unmapped, Is That Pair Autmoatically Improper?

March 25, 2014, 4:44 pm

≫ Next: Paired-End Reads Alignment For Variant Calling ?

≪ Previous: Should We Dump Illumina Pair-End Mapping Results In Sam With Mapq=0, But Good Template Length

Hello everyone, I have a simple question about generic paired-end Illumina data. If one read of a pair is unmapped, does this automatically mean that the pair is improper and is (un)flagged in a SAM file?

Thanks!

↧

Paired-End Reads Alignment For Variant Calling ?

July 17, 2013, 2:55 pm

≫ Next: Paired-End Protocol For Micrornaseq

≪ Previous: In Paired-End Data, If One Read Of The Pair Is Unmapped, Is That Pair Autmoatically Improper?

I'm trying to do variant calling (SNPs, Indels) from exome-sequencing data, and the sequencing was done with paired end reads. I would like to use BWA for mapping/alignment, followed by PiCard and GATK to do variant calling.

The question now is how to do sequencing alignment with BWA. Should I use the short paired end reads to generate a single SAM file, like this:

bwa mem -M -v 1 -t 4 human_genome_ref.fasta read_For.fastq.gz read_Rev.fastq.gz > read_PE.sam

is this okay? or should I map individual reads to reference separately?

thanks a lot for your reply.

↧

Paired-End Protocol For Micrornaseq

July 5, 2013, 3:42 am

≫ Next: Resampling Fastq Sequences Without Replacement

≪ Previous: Paired-End Reads Alignment For Variant Calling ?

In another post, a guy wanted to know how to analyze paired-end data and use them to predict microRNAs.

I never heard about a paired-end protocol for miRNAseq and would be interested in some more information. Does anyone know this protocol?

My questions would be:

Since the reads from mature miRNAs are very short, do the two pairs completely overlap with each other?
Is this protocol still strand specific? Is the first mate always the one on the correct strand?
I don't think that recent tools like miRDeep, or miRanalyzer can handle this information. Are there tools which can?
...

I would be really thankful for some input! :)

↧

Resampling Fastq Sequences Without Replacement

March 15, 2013, 12:35 pm

≫ Next: Difference Between "Mate Pair" And "Pair-End"

≪ Previous: Paired-End Protocol For Micrornaseq

Hello, I want to extract a random sample (without replacement) of 7.5 million fastq sequences from illumina sequencing data that contains about 30 million sequences each in of the reads. I want to extract the same sequence from each of the two files ( e.g., if I extracted sequence no. 31 from read 1, I would want to extract the same sequence from read 2 also). How can I do this? Is there a script or module I can use? Any help would be appreciated. Thanks

↧

Difference Between "Mate Pair" And "Pair-End"

July 24, 2013, 2:55 am

≫ Next: Understanding Samtools Flagstat Output

≪ Previous: Resampling Fastq Sequences Without Replacement

Just as the title , I can't tell the difference between those two conception. :) waiting for your help.

↧

Understanding Samtools Flagstat Output

October 23, 2013, 10:12 pm

≫ Next: Merging Illumina Paired End Reads

≪ Previous: Difference Between "Mate Pair" And "Pair-End"

The following is the output of samtools flagstat command on bam file (paired-end) generated after markDuplicate of Picards.

7417232 + 0 in total (QC-passed reads + QC-failed reads)
287618 + 0 duplicates
4534962 + 0 mapped (61.14%:-nan%)
7417232 + 0 paired in sequencing
3708616 + 0 read1
3708616 + 0 read2
4528278 + 0 properly paired (61.05%:-nan%)
4534962 + 0 with itself and mate mapped

I am having difficulty in understanding whether the duplicates are pairs or single. If there are total of 7417232 pairs and out of them 287618 pairs are duplicates means, there are 3% of duplicate reads in my data. Is my understanding is correct ?

↧

Merging Illumina Paired End Reads

July 9, 2013, 12:40 am

≫ Next: Wgsim Mutations In Output After Setting Everything To 0

≪ Previous: Understanding Samtools Flagstat Output

Dear All,

I have fastq a dataset containing forward and reverse sequences obtained through paired end module of Illumina platform. I am trying to merge these paired end reads. I have a query which I would wish to get cleared before I proceed further. Do I need to get the reverse complement of the reverse sequence dataset in order to carry on with the paired end merging?

I have referred few papers and tutorials on this , but they have not mentioned anything about doing a reverse complement. I am bit confused in this step. Kindly help me out!!!

Responses are highly appreciated!!!

↧

Wgsim Mutations In Output After Setting Everything To 0

April 5, 2013, 3:55 pm

≫ Next: join non-overlapping paired-end reads

≪ Previous: Merging Illumina Paired End Reads

I was just wondering, is there any useful information on wgsim? Tutorial? Anything? I have been stuck with it for the last 2 weeks. I'm really not sure how to use it. I need it for a project of mine. For example, I downloaded a genome from NCBI. What I do is call wgsim like this:

./wgsim -e 0 -s 0 -N 1000 -1 30 -2 30 -r 0 -R 0 -X 0 -A 0 test_genome_one_row.fa read1.fa read2.fa

With this, I would expect that all reads would be the same as the parts of the genome since I set all its error parameters to 0. But somehow, I get reads with mutations(or something else, because they don't belong in the original genome.) What is going on in here and can somebody please explain wgsim's arguments and how can I really control its behaviour? Thanks!

↧

join non-overlapping paired-end reads

June 17, 2014, 7:48 am

≫ Next: Aligning Paired-End Reads In Single-End Mode

≪ Previous: Wgsim Mutations In Output After Setting Everything To 0

I have a batch of illumina paired end reads with a shorter than anticipated reverse sequence for a ~550bp amplicon.

The joining tools which I have used before (i.e. USEARCH, ea-utils) talk about merging the overlaps and discarding other sequences but is there a way to join forwards and reverses leaving a gapped interval?

Expected output:

>seq1

ATGCAGTCGATCGATCGACTGCATGCATCGATCGAATCGATCG--------------TGCAGTCGATCGTACG

>seq2

CAGTCTAGCTACGATCGATCGACTGCATACGTACGTACG-------------------CTGCATGCATCGTAG

>seq3

CTAGACTGATCGCTAGCCTAGCTACGATCGATCGACACTGCTGTCGACTGACTGATCGATCGATCGACTGCAT

Thanks

↧

Aligning Paired-End Reads In Single-End Mode

February 15, 2013, 4:28 pm

≫ Next: Samtools Count Paired-End Reads

≪ Previous: join non-overlapping paired-end reads

Hello,

I have a question on how to align paired-end reads.

In cases of very large fastq files, which make aligners like TopHat crash in a server with limited memory in RAM, I have seen people align one pair mate at a time, to prevent TopHat from crashing. I cann't come up with an argument against doing this, but am tempted to reason that one should always align paired-end reads in "paired-end mode".

My question is: How are downstream analyses affected if one aligns paired-end reads one pair mate at a time, setting BWA or Bowtie/TopHat to single-end mode?

Thanks, G.

↧

Samtools Count Paired-End Reads

April 15, 2013, 10:05 am

≫ Next: Hg19 Strand Information Of Sam Output.

≪ Previous: Aligning Paired-End Reads In Single-End Mode

Hi, I used tophat to align paired-end reads from an rna-seq experiment and I obtained an accepted_hits.bam alignment file. Using the accepted_hits.bam I'd like to count the number of properly aligned paired-end reads (i.e. both ends are aligned and the alignment make sense - something like that, not sure on the best definition here). How should I count the number of properly-aligned paired-end fragments? I'd like to have the count in terms of fragments instead of reads (i.e., each fragment should be counted once if both paired end reads aligned properly). Suggestions are welcome, I am new to processing paired-end data so thought asking here. I just tried this:

Sample_5> samtools flagstat accepted_hits.bam
46656617 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
46656617 + 0 mapped (100.00%:nan%)
46656617 + 0 paired in sequencing
24411125 + 0 read1
22245492 + 0 read2
19907436 + 0 properly paired (42.67%:nan%)
36273430 + 0 with itself and mate mapped
10383187 + 0 singletons (22.25%:nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

there seem to be some issue on how tophat reports the alignments but not sure here: http://seqanswers.com/forums/showthread.php?t=8186. Also, let me see if I understand this output correctly: seems there are 46 MM reads (that is counting reads from end1 + reads from end2?) Thanks! ...

↧

Hg19 Strand Information Of Sam Output.

March 14, 2014, 3:33 am

≫ Next: Need help using Shrimp2 on paired end color-space SOLiD data.

≪ Previous: Samtools Count Paired-End Reads

Hi all,

I've got Paired-End Illumina data mapped against the Human Hg19. When viewing the SAM output, how can I check if a pair mapped against the forward Hg19 genome sequence or against the reverse Hg19 genome sequence?

↧

Need help using Shrimp2 on paired end color-space SOLiD data.

April 15, 2014, 8:20 am

≫ Next: Counting sense reads in bacterial paired-end RNA-seq data

≪ Previous: Hg19 Strand Information Of Sam Output.

Hi, I have SOLiD reads which are paried-end (75bp and 35bp) in .csfasta and .QV.qual format. I would like to use Shrimp2 to align them. So far I have been having trouble using it. I used the following command:

gmapper -1 Sample/F3/reads/Hope_2014_02_20_1_01_13_0502_F3.csfasta -2 Sample/F5-DNA/reads/Hope_2014_02_20_1_01_13_0502_F5-DNA.csfasta $SCRATCH/human_hg19.fa -N 32 -p opp-in  > Sample.sam 2> Logs/Sample.log

This is my log file and the error is shown at the bottom. I'm not sure what that means.

- Processing genome file [/Refs/human_hg19.fa]
- Processing contig chr1
- Processing contig chr2
- Processing contig chr3
- Processing contig chr4
- Processing contig chr5
- Processing contig chr6
- Processing contig chr7
- Processing contig chr8
- Processing contig chr9
- Processing contig chr10
- Processing contig chr11
- Processing contig chr12
- Processing contig chr13
- Processing contig chr14
- Processing contig chr15
- Processing contig chr16
- Processing contig chr17
- Processing contig chr18
- Processing contig chr19
- Processing contig chr20
- Processing contig chr21
- Processing contig chr22
- Processing contig chrX
- Processing contig chrY
- Processing contig chrM

Loaded Genome
note: detected fastq format in input file [Sample/F3/reads/Hope_2014_02_20_1_01_13_0502_F3.csfasta]
- Processing read files [Sample/F3/reads/Hope_2014_02_20_1_01_13_0502_F3.csfasta , Sample/F5-DNA/reads/Hope_2014_02_20_1_01_13_0502_F5-DNA.csfasta]

note: quality v ...

↧