Counting sense reads in bacterial paired-end RNA-seq data
Existing Tools For Post-Processing After Aligning Paired-End Reads With Blat?
Hi,
I'm wondering if there are existing tools that can do post-processing on paired-end reads' BLAT output.
More specifically, I'd like the tool to "merge" the alignments from the alignments of each pair's ends and remove the inconsistent alignments while keeping the consistent ones. The tool should also produce output in PSL, SAM, or BAM.
Thanks!
Tophat - Understated Number Of Reads In The "Align_Summary.Txt" File
[2014-01-22 19:29:06] Beginning TopHat run (v2.0.10)
-----------------------------------------------
[2014-01-22 19:29:06] Checking for Bowtie
Bowtie version: 2.1.0.0
[2014-01-22 19:29:06] Checking for Samtools
Samtools version: 0.1.19.0
[2014-01-22 19:29:06] Checking for Bowtie index files (genome)..
[2014-01-22 19:29:06] Checking for reference FASTA file
Warning: Could not find FASTA file bowtie/tritrypdb_tcongolense.fa
[2014-01-22 19:29:06] Reconstituting reference FASTA file from Bowtie index
Executing: /usr/local/bin/bowtie2-inspect bowtie/tritrypdb_tcongolense > tophat/tmp/tritrypdb_tcongolense.fa
[2014-01-22 19:29:08] Generating SAM header for bowtie/tritrypdb_tcongolense
[2014-01-22 19:29:09] Reading known junctions from GTF file
[2014-01-22 19:29:10] Preparing reads
left reads: min. length=100, max. length=100, 56927836 kept reads (17504 discarded)
right reads: min. length=100, max. length=100, 56919726 kept reads (25614 discarded)
And here is the content of "align_summary.txt" file:
Left reads:
Input : 3877069
Mapped : 3102050 (80.0% of input)
of these: 528309 (1 ...
How To Assemble Genome Generated With Bac Clones?
I have 2 fastq files from illimina with reads length 250b. Sequences from one file obtained by sequencing from "right" and from "left" in another. This is paired end sequencing. As it is whole genome shotgun technique, both fastq files comprise vector sequence, target sequnce in BAC, ecoli DNA and maybe plasmids DNA. So how i can assemble just my target DNA from BAC? What tools or assemblers i must use?
How To Extract Information From Fastq Pair End Files
dear BioStars users,
I would like to extract from my pair-end fastq files information how many times my read is occurring in my fastq file.
So output could look -
my read (sequence) - how many times I found it in fastq file :
CCGGCTCGC - 140x CTTCGCGCC - 2x
I tried to use awk to comparing all reads to each other, but it does not work very well :-(
Is there any tool or idea how to compare all reads to each other and extract how many times is occurring my reads in fastq file?
Thank you so much for any idea and help! I hope my question is clear..
Paul.
Download Large Paired-End Rna-Seq And Microarray Data Of The Same Sample (> 10 Gb Of *.Sra Files)?
Hi,
I was trying to find if there is any large size paired-end RNA-Seq and microarray data of the same sample, but I wasn't able to find as much data as I wanted.
Could somebody please point out where I can find such data?
I hope to have a data set that has both paired-end RNA-Seq data and microarray data of the same sample.
What I'm thinking about are data sets that's at least 10 GB of *.sra files.
Thanks!
Aligning Reads To Specific Chromosome Using Bwa
Hello Everyone,
I have whole genome illumina paired end reads and I want to align my reads to specific chromosome (chr 21) using BWA.
I was thinking of aligning the entire reads to fasta file of the human chromosome 21. Is this the appropriate way to solve my problem or is there any specific command for BWA to solve this.
Any kind of help is appreciated
Thanks Suz
Filter Paired-End Sam File For Xt:A:U
Dear all,
I have a sam file (BWA output, paired-end reads). I would like to retain only reads which are "properly paired". This I would do by:
samtools view -f 0x002 file.sam > file_filtered.sam
Additionally I would like to retain only those pairs of reads where both reads have the XT:A:U tag. It is important to me that after the filterting step I still have the pairs together (so read1, read2, read1, read2, ...).
Any ideas how to do so?
Thanks for any help! Stefanie
Pair End Sequencing Problem
Dear all,
I have a quick question on pair end sequencing. I used to work with Illumina without the pair end reads and I have dificulties to understand how the pair end reads work.
In the "old" system you removed the opposite strand since that primer had a cleavable site to remove it.. Now how does it work with the paired reads? Do you still remove the opposite strand? if so: how do you "flip" the DNA to read from the opposite side? Or do they not cut one of the primers anymore and sequence 1 strand using 1 primer and the opposite strand with primer 2 ?
ChIPseq paired-end bowtie2 concern regarding biological replicates
High level of duplicate in one reads of paired-end data
Hi,
We are doing some transcriptomic analysis on bovine immune blood cells and we seem's to have some problem with high levels of duplicate in our data. Our library were prepared with the illumina tru-seq stranded kit.
First, we sequence 50pb single-end to test our librairies and we found a high level of duplicated reads( ~80%) and around 2% of A's and T's stretch. We tough the problem was our librairies so we sent the RNA to our sequencing facility so they can do all the work (except RNA extraction)
So the sequencing facility did the library and the sequencing. To be sure that our data would be usable we sequence these datasets 100pb paired-end (3 sample in total). To our surprise, the high level of duplicate we saw in our first sequencing experiment was back but only in one read and the same read for all three sample. The other read was aroud 10% of duplicate in each sample.
Since our RNA seem ok when tested on Agilent bioanalyser technologies (RIN >9) and that the libraries were prepared by a sequencing facility of confidence, I'm here to ask you what could be wrong with our data?
Thanks a lot!
Olivier.
Take A Subset Of A Fastq Paired-End Sample
Hi,
I have two paired-end fastq compressed files coming from HiSeq RNA-SEq experiment, ie., pair.1.fastq.gz and pair.2.fastq.gz.
The files are very large so I wanted to just take a few million/thousand reads from each of them (by their respective pairs) and use that file for trying/debuuging purposes.
The results should be two paired-end files, i.e., pair.test.1.fastq.gz and pair.test.2.fastq.gz.
I'd be happy to hear some suggestions on how to do this or hear about tools available, thanks!
Macs Raises Error: No Such File Or Directory
I am trying to run macs14 with sam files from paired-end data + control. Macs14 returns "No such file":
sb7904313:line2 $ macs14 -t /Volumes/Data/G6L2_G6L3/s5/clean_paired_sample.sam -c /Volumes/Data/G6L2_G6L3/s11/clean_paired_sample.sam
-bash: macs14 -t /Volumes/Data/G6L2_G6L3/s5/clean_paired_sample.sam -c /Volumes/Data/G6L2_G6L3/s11/clean_paired_sample.sam : No such file or directory
Yet the files exist:
sb7904313:line2 $ ls -all /Volumes/Data/G6L2_G6L3/s5/clean_paired_sample.sam
-rw-r--r-- 1 nn staff 24776263204 28 Nov 11:51 /Volumes/Data/G6L2_G6L3/s5/clean_paired_sample.sam
sb7904313:line2 $ ls -all /Volumes/Data/G6L2_G6L3/s11/clean_paired_sample.sam
-rw-r--r-- 1 nn staff 14223812120 28 Nov 12:19 /Volumes/Data/G6L2_G6L3/s11/clean_paired_sample.sam
Macs14 seems to be correctly installed:
sb7904313:line2 $ macs14 --version
macs14 1.4.2 20120305
I am just stumped. Can you give advice?
Trimming Adapters For Paired-End Sequences
Hi all,
I got illumina paired end fastq files. They told me to trim read 2 at the beginning for ~20 to 30 bp due to the WGA adapters.
Can we find the adapters by looking in to the quality? Which tool is good for trimming adapters by keeping the paired nature of the sequences? Is there any issues will come if I use bowtie2 in the downstream for aligning the trimmed sequences(trimming only one in pair) with ref?
Following is the example of read2:
@HWI-ST1162:139:C0H7WACXX:5:1101:1865:1112 2:N:0:CGATGTA GTCATGGTGTCTCTTCACAACAATGGAAACCCTAACTAAGACAAAGACTAATAGAAGTGTTTTTTTAGGAA
+
<9;>;>?1=>;=9=?########################################################
Thanks, Deepthi
How To Read Maq Paired-End Alignemnt Data Using Bioconductor Packages
data <-read.table(file="croppedpileup.out",sep="\t",header=F)
colnames(data)<-c("pos","consensus","coverage")
depth<-mean(data[,"coverage"])
# depth now has the mean (overall)coverage
#set the bin-size
window<-101
rangefrom<-0
rangeto<-length(data[,"pos"])
data.smoothed<-runmed(data[,"coverage"],k=wi ...
How To Rearrange Paired End Bam File?
Hello all,
I have a paired end bam file and I want to use bedtools for them. After merging, the paired end read alignments are not lying next to each other. It is making problems in the bedtools process. Is there any tool available to rearrange the paired end read alignments in bam file?
Thanks, Deeps
Paired End Vs Single End Detection
Hi, I am wondering if anyone has a subroutine or function (hopefully in Perl or pseudocode) to detect whether a fastq file is a shuffled paired-end or not. I think I've seen a few different syntaxes on headers and so I feel like it requires a little more knowledge on their formats than what I have right now.
Yes, I am still looking into it, but I am wondering if this has already been done, to save me time...
How To Determine If Paired–End Illumina Rnaseq Reads Are Strand–Specific
I've been provided with more than a billion reads of RNAseq data for a poorly annotated nematode species. They appear to be 2x100 paired-end Illumina reads – I currently know frustratingly little about the RNAseq protocol used, but need to perform assemblies using Trinity.
Trinity demands that I specify whether or not the reads are strand–specific, and also which strand is which through the --SS_lib_type parameter, which needs to be either FR or RF.
For each tissue sample, I have been given paired fwd and rev FASTQ files. How can I tell i) whether the data is indeed strand-specific, and ii) which strand is which, so that I know whether to use FR or RF with Trinity.
Any thoughts much appreciated. Here are the top four lines from two corresponding FASTQ files I've been given:
head -n 4 Tmuris_adult_R4*
==> Tmuris_adult_R4_fwd.fastq <==
@HS23_6814:1:1101:1592:2250#4/1
GCGGTATCAGTTGGTAAACCCTGCAGGCGCTCGCATAACGGTCGAAGGCTTTTTGCGGATCGTCGTCATTGTCGTTGACCTCAGCATCGCNCACCTCCTC
+
B3:64JGADLBACJHH3EACD@DJAHLJDIENFEKIJJ6LE-HFJH57H7L9=BAFI8@FK>,GBDH764,5,4A='+G+,+,*E++@+2!+:+1>1=+4
==> Tmuris_adult_R4_rev.fastq <==
@HS23_6814:1:1101:1592:2250#4/2
CGAACCCNGTATNTTTGCGCTACTNTGTCTCCTACGCCTTTGTCTGTCTTGCCTGCATGGCTAACACTGCCCTGTTGGTTCAAGTGTCGTCTGCCGGAAG
+
:ABEGGH!G8EJ!8EJE6IEFBIH!HF8EKDD66FFAMDCKE/5>D5LD?E=?AHG>=AE5@E5I@CGB<KK@GG<B2E:H@2I9ICI?C@HC2@2:0@2
Identifying mutations from Paired-End Sequencing data
Hello! I'm trying to get mutations from paired-end sequenced reads aligned with BWA using SamTools. Coverage is about 16,000. Generally it works fine, but in one fragment (TGGGC) i see that in reads sequenced from left to right there is deletion of G (TGGC) in 12,000 out of 14000 cases, but in reads sequenced from right to left there is no this mutation at all. So is there deletion in heterozygote or there is just problem with sequencing (was carried out with Ion Torrent PGM), or there is problem with alignment?
Align Paired End Reads Using Blast
Hi all, Has anyone align illumina paired end reads using BLAST, I used gsnap to do the alignment first, then use BLAST to align the reads which were not mapped by gsnap. It seems that BLAST can only align single end reads. I aligned the two files separately and got results. But I don't know exactly how to deal with those results of BLAST. 1. Should I include the paired reads from two files or include all the reads as results? 2. How to merge the results with the sam file? Because I want to do assembly next step, I want to merge the blast results with gsnap results. Any comments would be appreciated. Thanks.