<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Recent posts to Discussion</title><link>https://sourceforge.net/p/bbmap/discussion/</link><description>Recent posts to Discussion</description><atom:link href="https://sourceforge.net/p/bbmap/discussion/feed.rss" rel="self"/><language>en</language><lastBuildDate>Wed, 05 Feb 2025 21:16:04 -0000</lastBuildDate><atom:link href="https://sourceforge.net/p/bbmap/discussion/feed.rss" rel="self" type="application/rss+xml"/><item><title>filterbyname</title><link>https://sourceforge.net/p/bbmap/discussion/general/thread/3daaa5762d/?limit=25#cf10</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;Does filterbyname remove all reads in the fastq files that match a read name in the txt file? Are duplicates in the input files an issue?&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Samantha M Zarnick</dc:creator><pubDate>Wed, 05 Feb 2025 21:16:04 -0000</pubDate><guid>https://sourceforge.netc20ec0a407f35874b25972f5644105559afee34b</guid></item><item><title>suggested bbmerge improvements</title><link>https://sourceforge.net/p/bbmap/discussion/general/thread/b838879b5e/?limit=25#1981</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;I was looking for a java-based alternative to either cutadapt or ngmerge, and bbmerge seems to be able to do much of what both other tools can do (primarily I was looking for fixed read trimming, quality trimming, and paired read dovetail trimming). A few suggested improvements:&lt;br/&gt;
* Allow simultaneous fixed trimming and q-trimming. Right now it seems like q-trimming supercedes fixed trimming if both are specified, regardless of which end you want to trim in each case; but there are situations where e.g. one might want to trim a fixed number of bases from the 5' end and quality-trim the 3' end.  Additionally, one might want to trim &lt;em&gt;at least&lt;/em&gt; X bases from one end, but perform additional quality trimming beyond that if necessary. I'd recommend just implementing an order of operations, e.g. fixed trimming &amp;gt; quality trimming &amp;gt; adapter trimming &amp;gt; merging or dovetail trimming (similar to what cutadapt uses).&lt;br/&gt;
* Enable an option for q-trimming that properly handles 2-color chemistry (where 3' G bases may represent null cycles beyond the end of the template but nevertheless have high quality scores, so conventional trimming algorithms do not trim them) - cutadapt implemented something like this a while back.  Also relevant for quality trimming in BBMap.&lt;br/&gt;
* Add the ability to put fixed trimmed sequences or trimmed adapter sequences in the fastq comment as a properly formatted SAM tag (this is purely just a nice-to-have and maybe beyond scope, but cutadapt allows a lot of both read and read-name editing like this that really enhances its flexibility for downstream applications).&lt;/p&gt;
&lt;p&gt;Thanks!&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Eric Boyden</dc:creator><pubDate>Wed, 12 Jun 2024 21:29:52 -0000</pubDate><guid>https://sourceforge.net4ea0b2d18a6445baf7c34ff295b0a4a2509ad294</guid></item><item><title>suggested improvements</title><link>https://sourceforge.net/p/bbmap/discussion/general/thread/a6bf092e5d/?limit=25#699a</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;I was looking for a java-based global short-read aligner and this software performs excellently, with lots of neat features that many other aligners lack. In descending order of priority, here are some ideas for improvement:&lt;br/&gt;
* Update SAM spec to v1.6 (currently 1.3/1.4), primarily to produce an updated @HD line to indicate query grouped output (GO:query). Many downstream tools (e.g. some fgbio tools) rely on this and won't work unless this is explicitly specified, and resorting just to add this annotation is time-consuming. (Also according to SAM spec, =/X CIGAR strings were added with v1.3 not v1.4 &lt;a href="https://samtools.github.io/hts-specs/SAMv1.pdf;" rel="nofollow"&gt;https://samtools.github.io/hts-specs/SAMv1.pdf;&lt;/a&gt; this option probably shouldn't be tied to sam spec at all but should be renamed.)&lt;br/&gt;
* &lt;code&gt;killbadpairs&lt;/code&gt; and &lt;code&gt;requirecorrectstrand&lt;/code&gt; should prevent paired reads from mapping to different chromosomes/contigs; currently such reads are passed through since they're technically not on "opposite" strands.&lt;br/&gt;
* Add functionality to move fastq read comments (everything after the first whitespace) into a SAM tag, possibly as an alternative/add-on to &lt;code&gt;trimreaddescriptions&lt;/code&gt;.  This feature is offered by several other aligners, including BWA, Bowtie2, MiniMap2, and SNAP.&lt;br/&gt;
* Add support for ubam input, so that realigning bams doesn't require reverting them back to fastqs (which may also be complicated to do without losing ubam metadata, although adding the ability to move fastq comments to sam tags will help).&lt;br/&gt;
Also FWIW, the default &lt;code&gt;minid=0.76&lt;/code&gt; seems to be a bit too sensitive for human PE150 data, and allows a fairly high rate of nonspecific alignments of junk reads in NTCs, at least with our data. Raising this value to 0.85 cleaned everything up, with output similar to that of other common aligners (BWA, Bowtie2).  Unsurprisingly, this error rate is also more consistent with Bowtie2's default error tolerance in global mode.&lt;br/&gt;
Thanks!&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Eric Boyden</dc:creator><pubDate>Wed, 12 Jun 2024 21:13:12 -0000</pubDate><guid>https://sourceforge.netc3754746501e23006d841df4622866875fd2c068</guid></item><item><title>GC content for paired-reads</title><link>https://sourceforge.net/p/bbmap/discussion/general/thread/03c7258449/?limit=25#e1c4</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;Hi, I recently read on biostars &lt;span&gt;[https://www.biostars.org/p/9546248/]&lt;/span&gt;HERE that I could use your tools to determine the GC content of my reads. My reads are paired reads though and I wanted to adjust this to determine the GC content for each chromosome. I was able to manually split my bam file into the various chromosomes but am unsure how best to use reformat and stats on these files. When I tried to tell reformat that it was paired, and provide two output file names, it didn't work. It runs properly if I say they are unpaired but I'm worried that if I say the reads are unpaired, that will change the answer for the overall chromosome GC content, Thoughts? Thank you in advance!&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Priscilla Glenn</dc:creator><pubDate>Wed, 06 Dec 2023 19:47:00 -0000</pubDate><guid>https://sourceforge.net2398aace3e9d4f66f00ecf3c9aac5807144e2de2</guid></item><item><title>finding references (phiX174 and adapters) for bbduk</title><link>https://sourceforge.net/p/bbmap/discussion/general/thread/2738b79bda/?limit=25#3af2</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;Hello,&lt;/p&gt;
&lt;p&gt;I am currently trying to make a more reproducible workflow (removing weird paths to files mostly) and I am having trouble figuring out the shortcut that can be used for phiX174 as a reference for BBDuk.&lt;/p&gt;
&lt;p&gt;It looks like the adapters file can be called like this: "ref=adapters", eliminating the extra path before it. Is there a similar way to write the phiX174_ill.ref.fa.gz file?&lt;/p&gt;
&lt;p&gt;Thanks!&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Jay Osvatic</dc:creator><pubDate>Mon, 20 Mar 2023 11:49:13 -0000</pubDate><guid>https://sourceforge.net6ddcdfecdb11b159dd5763f57a9d4e578dee955b</guid></item><item><title>Quality filtering on minimum average quality</title><link>https://sourceforge.net/p/bbmap/discussion/general/thread/e3cdab2085/?limit=25#d46b</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;Seems like it is done with&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nv"&gt;math&lt;/span&gt;.&lt;span class="nv"&gt;log&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;sum&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;[&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;ord&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;c&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;33&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="nv"&gt;c&lt;/span&gt; &lt;span class="nv"&gt;in&lt;/span&gt; &lt;span class="nv"&gt;line4&lt;/span&gt;]&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nv"&gt;len&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;line4&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;,&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;rather than just&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="nv"&gt;sum&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;[&lt;span class="nv"&gt;ord&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;c&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;33&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="nv"&gt;c&lt;/span&gt; &lt;span class="nv"&gt;in&lt;/span&gt; &lt;span class="nv"&gt;line4&lt;/span&gt;]&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nv"&gt;len&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;line4&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;So, its the quality score of the average probability.&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Jake</dc:creator><pubDate>Wed, 21 Sep 2022 18:02:45 -0000</pubDate><guid>https://sourceforge.net3eb15986ecf3fa2c7047de72ce303f009c3d6430</guid></item><item><title>Quality filtering on minimum average quality</title><link>https://sourceforge.net/p/bbmap/discussion/general/thread/e3cdab2085/?limit=25#edf3</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;I've been using bbduk for quality filtering and I pretty much just took it for granted. However, I've been looking at it as I usually use minavequality=15 which removes about 15-20% of our reads.  minavequality=30 removes almost 100% of them.&lt;/p&gt;
&lt;p&gt;However when I do the quality averaging by hand, almost all of my reads have an average quality above 30. I've been taking the ASCII values of the quality scores and subtracting 33, summing them up and dividing by the length. &lt;/p&gt;
&lt;p&gt;How is bbduk computing the average quality?&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Jake</dc:creator><pubDate>Fri, 16 Sep 2022 22:16:32 -0000</pubDate><guid>https://sourceforge.net2f277afec509f16136f82f7e50b00dacc7bb32d6</guid></item><item><title>Java issue</title><link>https://sourceforge.net/p/bbmap/discussion/general/thread/3c86b3b2c7/?limit=25#6c6b</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;I am trying to run bbsplit:&lt;br/&gt;
bbsplit.sh build=1 threads=12 ref_x=$ref_Human ref_y=$ref_Mouse path=$indexded_ref&lt;/p&gt;
&lt;p&gt;I am running into following error:&lt;br/&gt;
Error: Could not find or load main class align2.BBSplitter&lt;br/&gt;
Caused by: java.lang.ClassNotFoundException: align2.BBSplitter&lt;br/&gt;
srun: error: b001: task 0: Exited with exit code 1&lt;/p&gt;
&lt;p&gt;Is there a way to solve this issue ?&lt;/p&gt;
&lt;p&gt;Thank you.&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Muku</dc:creator><pubDate>Wed, 03 Aug 2022 19:40:30 -0000</pubDate><guid>https://sourceforge.net893722121fe5a29d12c5bafbf581a5a0d287c40e</guid></item><item><title>Gene coverage detection output from pileup.sh</title><link>https://sourceforge.net/p/bbmap/discussion/general/thread/8390587524/?limit=25#8d37</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;I am currently using pileup.sh to determine the coverage of genes in my metagenomes by comparing my read mapping alignment file to the gene predictions of the contigs identified by Prodigal. I was wondering if anyone could tell me the output for the gene_coverage.tmp file, and if there are the same headers as the contig_comverage.tmp files?  I am not sure if I should use the avgDepth value or the depthSum for my coverage value when I go to normalize these counts to counts per million (CPM). &lt;/p&gt;
&lt;p&gt;I am adding a screenshot that shows the column headers in this gene_coverage tmp file. Thanks for your help and for making such a great set of tools!&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Hannah Freund</dc:creator><pubDate>Mon, 20 Dec 2021 17:04:07 -0000</pubDate><guid>https://sourceforge.net126ca574b62ab34d6aea5e6fc2f1f108e2e030e4</guid></item><item><title>Gene coverage detection output from pileup.sh</title><link>https://sourceforge.net/p/bbmap/discussion/general/thread/211dbd4329/?limit=25#5560</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;I am currently using pileup.sh to determine the coverage of genes in my metagenomes by comparing my read mapping alignment file to the gene predictions of the contigs identified by Prodigal. I was wondering if anyone could tell me the output for the gene_coverage.tmp file, and if there are the same headers as the contig_comverage.tmp files?  I am not sure if I should use the avgDepth value or the depthSum for my coverage value when I go to normalize these counts to counts per million (CPM). &lt;/p&gt;
&lt;p&gt;I am adding a screenshot that shows the column headers in this gene_coverage tmp file. Thanks for your help and for making such a great set of tools!&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Hannah Freund</dc:creator><pubDate>Mon, 20 Dec 2021 17:04:03 -0000</pubDate><guid>https://sourceforge.netdbe95a8dd59eea669551abeefb6909d01f4287ad</guid></item></channel></rss>