Counting sense reads in bacterial paired-end RNA-seq data

June 26, 2014, 12:37 am

≫ Next: Existing Tools For Post-Processing After Aligning Paired-End Reads With Blat?

≪ Previous: Need help using Shrimp2 on paired end color-space SOLiD data.

Hi, I'm trying to count reads mapping to sense strand. I have doubts which counts file I should chose from this pipeline. I think is "plate_R.counts" because has more reads counted in total. Am I right? Library creation kit -> E7420S NEBNext® Ultra™ Directional RNA Library Prep Kit for Illumina® I would also appreciate a nice tutorial to understand Illumina paired-end library preparation, alignment, counting... Thanks! P.S I read previous post asking similar questions but still I have doubts! https://www.biostars.org/p/92935/ ################################################# #BWA HT.seq bacterial paired-end RNA-seq pipeline ################################################# # Get the genome file from the command line genome_file=$1 # Get the fastq file from the command line fastq_file_R1=$2 # Get the fastq file from the command line fastq_file_R2=$3 #get gff GFF=$6 #BWA index (default settings) bwa index $genome_file #BWA align bwa mem -t 8 $genome_file $fastq_file_R1 $fastq_file_R2 | gzip -3 > P_S1_L001_aln-pe.sam.gz #Flagstat #Convert .sam to .bam to input to Flagstat samtools view -b -S -o P_S1_L001_aln-pe.bam P_S1_L001_aln-pe.sam.gz samtools flagstat P_S1_L001_aln-pe.bam #Count reads mapped with htseq-count samtools sort -n P_S1_L001_aln-pe.bam plate.sorted python -m HTSeq.scripts.count -m intersection-nonempty -f bam -a 10 -t mRNA -i Parent -r name -s yes plate.sorted.bam $GFF | awk 'n>=5 { print a[n%5] } ...

↧

Existing Tools For Post-Processing After Aligning Paired-End Reads With Blat?

August 9, 2012, 8:45 am

≫ Next: Tophat - Understated Number Of Reads In The "Align_Summary.Txt" File

≪ Previous: Counting sense reads in bacterial paired-end RNA-seq data

Hi,

I'm wondering if there are existing tools that can do post-processing on paired-end reads' BLAT output.

More specifically, I'd like the tool to "merge" the alignments from the alignments of each pair's ends and remove the inconsistent alignments while keeping the consistent ones. The tool should also produce output in PSL, SAM, or BAM.

Thanks!

↧

Tophat - Understated Number Of Reads In The "Align_Summary.Txt" File

February 13, 2014, 7:42 am

≫ Next: How To Assemble Genome Generated With Bac Clones?

≪ Previous: Existing Tools For Post-Processing After Aligning Paired-End Reads With Blat?

Hi all. I'm working with paired-end rna-seq data to assemble transcriptome of my species of interest. I've just realized that Tophat is understating the number of reads that I actually have and supplied in the input files for running tophat command. Here is a fragment of Tophat's progress report:

[2014-01-22 19:29:06] Beginning TopHat run (v2.0.10)
-----------------------------------------------
[2014-01-22 19:29:06] Checking for Bowtie
          Bowtie version:     2.1.0.0
[2014-01-22 19:29:06] Checking for Samtools
        Samtools version:     0.1.19.0
[2014-01-22 19:29:06] Checking for Bowtie index files (genome)..
[2014-01-22 19:29:06] Checking for reference FASTA file
    Warning: Could not find FASTA file bowtie/tritrypdb_tcongolense.fa
[2014-01-22 19:29:06] Reconstituting reference FASTA file from Bowtie index
  Executing: /usr/local/bin/bowtie2-inspect bowtie/tritrypdb_tcongolense > tophat/tmp/tritrypdb_tcongolense.fa
[2014-01-22 19:29:08] Generating SAM header for bowtie/tritrypdb_tcongolense
[2014-01-22 19:29:09] Reading known junctions from GTF file
[2014-01-22 19:29:10] Preparing reads
     left reads: min. length=100, max. length=100, 56927836 kept reads (17504 discarded)
    right reads: min. length=100, max. length=100, 56919726 kept reads (25614 discarded)

And here is the content of "align_summary.txt" file:

Left reads:
          Input     :   3877069
           Mapped   :   3102050 (80.0% of input)
            of these:    528309 (1 ...

↧

How To Assemble Genome Generated With Bac Clones?

July 30, 2013, 12:44 pm

≫ Next: How To Extract Information From Fastq Pair End Files

≪ Previous: Tophat - Understated Number Of Reads In The "Align_Summary.Txt" File

I have 2 fastq files from illimina with reads length 250b. Sequences from one file obtained by sequencing from "right" and from "left" in another. This is paired end sequencing. As it is whole genome shotgun technique, both fastq files comprise vector sequence, target sequnce in BAC, ecoli DNA and maybe plasmids DNA. So how i can assemble just my target DNA from BAC? What tools or assemblers i must use?

↧

How To Extract Information From Fastq Pair End Files

January 8, 2014, 2:40 am

≫ Next: Download Large Paired-End Rna-Seq And Microarray Data Of The Same Sample (> 10 Gb Of *.Sra Files)?

≪ Previous: How To Assemble Genome Generated With Bac Clones?

dear BioStars users,

I would like to extract from my pair-end fastq files information how many times my read is occurring in my fastq file.

So output could look -

my read (sequence) - how many times I found it in fastq file :

CCGGCTCGC - 140x CTTCGCGCC - 2x

I tried to use awk to comparing all reads to each other, but it does not work very well :-(

Is there any tool or idea how to compare all reads to each other and extract how many times is occurring my reads in fastq file?

Thank you so much for any idea and help! I hope my question is clear..

Paul.

↧

Download Large Paired-End Rna-Seq And Microarray Data Of The Same Sample (> 10 Gb Of *.Sra Files)?

June 28, 2012, 9:47 am

≫ Next: Aligning Reads To Specific Chromosome Using Bwa

≪ Previous: How To Extract Information From Fastq Pair End Files

Hi,

I was trying to find if there is any large size paired-end RNA-Seq and microarray data of the same sample, but I wasn't able to find as much data as I wanted.

Could somebody please point out where I can find such data?

I hope to have a data set that has both paired-end RNA-Seq data and microarray data of the same sample.

What I'm thinking about are data sets that's at least 10 GB of *.sra files.

Thanks!

↧

Aligning Reads To Specific Chromosome Using Bwa

February 27, 2013, 8:13 am

≫ Next: Filter Paired-End Sam File For Xt:A:U

≪ Previous: Download Large Paired-End Rna-Seq And Microarray Data Of The Same Sample (> 10 Gb Of *.Sra Files)?

Hello Everyone,

I have whole genome illumina paired end reads and I want to align my reads to specific chromosome (chr 21) using BWA.

I was thinking of aligning the entire reads to fasta file of the human chromosome 21. Is this the appropriate way to solve my problem or is there any specific command for BWA to solve this.

Any kind of help is appreciated

Thanks Suz

↧

Filter Paired-End Sam File For Xt:A:U

April 13, 2012, 2:47 am

≫ Next: Pair End Sequencing Problem

≪ Previous: Aligning Reads To Specific Chromosome Using Bwa

Dear all,

I have a sam file (BWA output, paired-end reads). I would like to retain only reads which are "properly paired". This I would do by:

samtools view -f 0x002 file.sam > file_filtered.sam

Additionally I would like to retain only those pairs of reads where both reads have the XT:A:U tag. It is important to me that after the filterting step I still have the pairs together (so read1, read2, read1, read2, ...).

Any ideas how to do so?

Thanks for any help! Stefanie

↧

Pair End Sequencing Problem

May 28, 2013, 3:43 pm

≫ Next: ChIPseq paired-end bowtie2 concern regarding biological replicates

≪ Previous: Filter Paired-End Sam File For Xt:A:U

Dear all,

I have a quick question on pair end sequencing. I used to work with Illumina without the pair end reads and I have dificulties to understand how the pair end reads work.

In the "old" system you removed the opposite strand since that primer had a cleavable site to remove it.. Now how does it work with the paired reads? Do you still remove the opposite strand? if so: how do you "flip" the DNA to read from the opposite side? Or do they not cut one of the primers anymore and sequence 1 strand using 1 primer and the opposite strand with primer 2 ?

↧

ChIPseq paired-end bowtie2 concern regarding biological replicates

August 1, 2014, 1:28 pm

≫ Next: High level of duplicate in one reads of paired-end data

≪ Previous: Pair End Sequencing Problem

I am working with ChIP-seq paired-end data where there is concern that one or more of the biological replicates may not be very good, but it is unknown which replicate may have a problem (I suspect there has to be at least one poor replicate in the data). The first part of my question is very simple: what do you recommend that I do to find this replicate to either toss it or fix it with some quality trimming on the ends? For the moment, I tried using bowtie2 to trim 10 bp from the 5' and 3' ends of the reads in each of my samples just to see whether this fixes my problem. To define what I mean by "problem": basically, my final results (gene list) does not come out as I would expect it to come out (there are no genes of a certain type that I am looking for based on my biological intuition for what I should be seeing). When I run bowtie2 with the trimming options set, I do indeed get my .sam files okay, but my error file tells me: (ERR): bowtie2-align died with signal 2 (INT) 20172305 reads; of these: 20172305 (100.00%) were paired; of these: 3064536 (15.19%) aligned concordantly 0 times 13699211 (67.91%) aligned concordantly exactly 1 time 3408558 (16.90%) aligned concordantly >1 times ---- 3064536 pairs aligned concordantly 0 times; of these: 807633 (26.35%) aligned discordantly 1 time ---- 2256903 pairs aligned 0 times concordantly or discordantly; of these: 4513806 mates make up the pairs; of these: ...

↧

High level of duplicate in one reads of paired-end data

July 21, 2014, 6:52 am

≫ Next: Take A Subset Of A Fastq Paired-End Sample

≪ Previous: ChIPseq paired-end bowtie2 concern regarding biological replicates

Hi,

We are doing some transcriptomic analysis on bovine immune blood cells and we seem's to have some problem with high levels of duplicate in our data. Our library were prepared with the illumina tru-seq stranded kit.

First, we sequence 50pb single-end to test our librairies and we found a high level of duplicated reads( ~80%) and around 2% of A's and T's stretch. We tough the problem was our librairies so we sent the RNA to our sequencing facility so they can do all the work (except RNA extraction)

So the sequencing facility did the library and the sequencing. To be sure that our data would be usable we sequence these datasets 100pb paired-end (3 sample in total). To our surprise, the high level of duplicate we saw in our first sequencing experiment was back but only in one read and the same read for all three sample. The other read was aroud 10% of duplicate in each sample.

Since our RNA seem ok when tested on Agilent bioanalyser technologies (RIN >9) and that the libraries were prepared by a sequencing facility of confidence, I'm here to ask you what could be wrong with our data?

Thanks a lot!
Olivier.

↧

Take A Subset Of A Fastq Paired-End Sample

March 19, 2013, 11:36 am

≫ Next: Macs Raises Error: No Such File Or Directory

≪ Previous: High level of duplicate in one reads of paired-end data

Hi,

I have two paired-end fastq compressed files coming from HiSeq RNA-SEq experiment, ie., pair.1.fastq.gz and pair.2.fastq.gz.

The files are very large so I wanted to just take a few million/thousand reads from each of them (by their respective pairs) and use that file for trying/debuuging purposes.

The results should be two paired-end files, i.e., pair.test.1.fastq.gz and pair.test.2.fastq.gz.

I'd be happy to hear some suggestions on how to do this or hear about tools available, thanks!

↧

Macs Raises Error: No Such File Or Directory

November 28, 2013, 12:02 pm

≫ Next: Trimming Adapters For Paired-End Sequences

≪ Previous: Take A Subset Of A Fastq Paired-End Sample

I am trying to run macs14 with sam files from paired-end data + control. Macs14 returns "No such file":

sb7904313:line2 $ macs14 -t /Volumes/Data/G6L2_G6L3/s5/clean_paired_sample.sam -c /Volumes/Data/G6L2_G6L3/s11/clean_paired_sample.sam 
-bash: macs14 -t /Volumes/Data/G6L2_G6L3/s5/clean_paired_sample.sam -c /Volumes/Data/G6L2_G6L3/s11/clean_paired_sample.sam : No such file or directory

Yet the files exist:

sb7904313:line2 $ ls -all /Volumes/Data/G6L2_G6L3/s5/clean_paired_sample.sam
-rw-r--r--  1 nn  staff  24776263204 28 Nov 11:51 /Volumes/Data/G6L2_G6L3/s5/clean_paired_sample.sam
sb7904313:line2 $ ls -all /Volumes/Data/G6L2_G6L3/s11/clean_paired_sample.sam
-rw-r--r--  1 nn  staff  14223812120 28 Nov 12:19 /Volumes/Data/G6L2_G6L3/s11/clean_paired_sample.sam

Macs14 seems to be correctly installed:

sb7904313:line2 $ macs14 --version
macs14 1.4.2 20120305

I am just stumped. Can you give advice?

↧

Trimming Adapters For Paired-End Sequences

February 6, 2013, 10:41 am

≫ Next: How To Read Maq Paired-End Alignemnt Data Using Bioconductor Packages

≪ Previous: Macs Raises Error: No Such File Or Directory

Hi all,

I got illumina paired end fastq files. They told me to trim read 2 at the beginning for ~20 to 30 bp due to the WGA adapters.

Can we find the adapters by looking in to the quality? Which tool is good for trimming adapters by keeping the paired nature of the sequences? Is there any issues will come if I use bowtie2 in the downstream for aligning the trimmed sequences(trimming only one in pair) with ref?

Following is the example of read2:

@HWI-ST1162:139:C0H7WACXX:5:1101:1865:1112 2:N:0:CGATGTA GTCATGGTGTCTCTTCACAACAATGGAAACCCTAACTAAGACAAAGACTAATAGAAGTGTTTTTTTAGGAA

<9;>;>?1=>;=9=?########################################################

Thanks, Deepthi

↧

How To Read Maq Paired-End Alignemnt Data Using Bioconductor Packages

February 21, 2013, 8:03 pm

≫ Next: How To Rearrange Paired End Bam File?

≪ Previous: Trimming Adapters For Paired-End Sequences

Hi all I have maq paired-end alignment files that I want to read into R. I have tried to browse several packages and they all seem to depend on ShortRead package of bioconductor which does not currently support paired-end reads. Does anybody know of any Bioconductor packages which support paired-end alignment data. As there are lot of good coverage plot functions in bioconductor I want to utilize them thus it would be great if some one could suggest any package that has support for paired-end alignment reading. Also I have tried to use my own scripts for plotting coverage and I want to overlay the coverage of all the chromosomes in one graph (using different colors e.t.c) It would be great to know If anybody has tried to implement it too. Since the files are too big it takes a lot of time if I want to plot data for all the chromosomes at the same time. Following is the code that I found somewhere and have modified to my needs but it takes a lot of time if I want to plot the data for all chromosomes. Following is the code I use it for a single chromosome I am wondering how can I plot the data for all the chromosomes in one plot and more efficiently.

data <-read.table(file="croppedpileup.out",sep="\t",header=F)
colnames(data)<-c("pos","consensus","coverage")
depth<-mean(data[,"coverage"])
# depth now has the mean (overall)coverage
#set the bin-size
window<-101
rangefrom<-0
rangeto<-length(data[,"pos"])
data.smoothed<-runmed(data[,"coverage"],k=wi ...

↧

How To Rearrange Paired End Bam File?

May 16, 2013, 10:17 am

≫ Next: Paired End Vs Single End Detection

≪ Previous: How To Read Maq Paired-End Alignemnt Data Using Bioconductor Packages

Hello all,

I have a paired end bam file and I want to use bedtools for them. After merging, the paired end read alignments are not lying next to each other. It is making problems in the bedtools process. Is there any tool available to rearrange the paired end read alignments in bam file?

Thanks, Deeps

↧

Paired End Vs Single End Detection

May 30, 2012, 8:47 am

≫ Next: How To Determine If Paired–End Illumina Rnaseq Reads Are Strand–Specific

≪ Previous: How To Rearrange Paired End Bam File?

Hi, I am wondering if anyone has a subroutine or function (hopefully in Perl or pseudocode) to detect whether a fastq file is a shuffled paired-end or not. I think I've seen a few different syntaxes on headers and so I feel like it requires a little more knowledge on their formats than what I have right now.

Yes, I am still looking into it, but I am wondering if this has already been done, to save me time...

↧

How To Determine If Paired–End Illumina Rnaseq Reads Are Strand–Specific

March 16, 2013, 7:59 am

≫ Next: Identifying mutations from Paired-End Sequencing data

≪ Previous: Paired End Vs Single End Detection

I've been provided with more than a billion reads of RNAseq data for a poorly annotated nematode species. They appear to be 2x100 paired-end Illumina reads – I currently know frustratingly little about the RNAseq protocol used, but need to perform assemblies using Trinity.

Trinity demands that I specify whether or not the reads are strand–specific, and also which strand is which through the --SS_lib_type parameter, which needs to be either FR or RF.

For each tissue sample, I have been given paired fwd and rev FASTQ files. How can I tell i) whether the data is indeed strand-specific, and ii) which strand is which, so that I know whether to use FR or RF with Trinity.

Any thoughts much appreciated. Here are the top four lines from two corresponding FASTQ files I've been given:

head -n 4 Tmuris_adult_R4*
==> Tmuris_adult_R4_fwd.fastq <==
@HS23_6814:1:1101:1592:2250#4/1
GCGGTATCAGTTGGTAAACCCTGCAGGCGCTCGCATAACGGTCGAAGGCTTTTTGCGGATCGTCGTCATTGTCGTTGACCTCAGCATCGCNCACCTCCTC
+
B3:64JGADLBACJHH3EACD@DJAHLJDIENFEKIJJ6LE-HFJH57H7L9=BAFI8@FK>,GBDH764,5,4A='+G+,+,*E++@+2!+:+1>1=+4

==> Tmuris_adult_R4_rev.fastq <==
@HS23_6814:1:1101:1592:2250#4/2
CGAACCCNGTATNTTTGCGCTACTNTGTCTCCTACGCCTTTGTCTGTCTTGCCTGCATGGCTAACACTGCCCTGTTGGTTCAAGTGTCGTCTGCCGGAAG
+
:ABEGGH!G8EJ!8EJE6IEFBIH!HF8EKDD66FFAMDCKE/5>D5LD?E=?AHG>=AE5@E5I@CGB<KK@GG<B2E:H@2I9ICI?C@HC2@2:0@2

↧

Identifying mutations from Paired-End Sequencing data

April 29, 2014, 5:06 am

≫ Next: Align Paired End Reads Using Blast

≪ Previous: How To Determine If Paired–End Illumina Rnaseq Reads Are Strand–Specific

Hello! I'm trying to get mutations from paired-end sequenced reads aligned with BWA using SamTools. Coverage is about 16,000. Generally it works fine, but in one fragment (TGGGC) i see that in reads sequenced from left to right there is deletion of G (TGGC) in 12,000 out of 14000 cases, but in reads sequenced from right to left there is no this mutation at all. So is there deletion in heterozygote or there is just problem with sequencing (was carried out with Ion Torrent PGM), or there is problem with alignment?

↧

Align Paired End Reads Using Blast

March 24, 2014, 3:17 pm

≫ Next: Estimating Insert Size From Paired End Data.

≪ Previous: Identifying mutations from Paired-End Sequencing data

Hi all, Has anyone align illumina paired end reads using BLAST, I used gsnap to do the alignment first, then use BLAST to align the reads which were not mapped by gsnap. It seems that BLAST can only align single end reads. I aligned the two files separately and got results. But I don't know exactly how to deal with those results of BLAST. 1. Should I include the paired reads from two files or include all the reads as results? 2. How to merge the results with the sam file? Because I want to do assembly next step, I want to merge the blast results with gsnap results. Any comments would be appreciated. Thanks.

↧