Hi all. I'm working with paired-end rna-seq data to assemble transcriptome of my species of interest. I've just realized that Tophat is understating the number of reads that I actually have and supplied in the input files for running tophat command. Here is a fragment of Tophat's progress report:
[2014-01-22 19:29:06] Beginning TopHat run (v2.0.10)
-----------------------------------------------
[2014-01-22 19:29:06] Checking for Bowtie
Bowtie version: 2.1.0.0
[2014-01-22 19:29:06] Checking for Samtools
Samtools version: 0.1.19.0
[2014-01-22 19:29:06] Checking for Bowtie index files (genome)..
[2014-01-22 19:29:06] Checking for reference FASTA file
Warning: Could not find FASTA file bowtie/tritrypdb_tcongolense.fa
[2014-01-22 19:29:06] Reconstituting reference FASTA file from Bowtie index
Executing: /usr/local/bin/bowtie2-inspect bowtie/tritrypdb_tcongolense > tophat/tmp/tritrypdb_tcongolense.fa
[2014-01-22 19:29:08] Generating SAM header for bowtie/tritrypdb_tcongolense
[2014-01-22 19:29:09] Reading known junctions from GTF file
[2014-01-22 19:29:10] Preparing reads
left reads: min. length=100, max. length=100, 56927836 kept reads (17504 discarded)
right reads: min. length=100, max. length=100, 56919726 kept reads (25614 discarded)
And here is the content of "align_summary.txt" file:
Left reads:
Input : 3877069
Mapped : 3102050 (80.0% of input)
of these: 528309 (1 ...