Velvet Advisor


Questions

I have million reads.

They are reads.

Each read is base-pairs long.

I estimate my genome size to be megabases (million bases).

I would like to have about fold k-mer coverage for my assembly (defined below, suggest between 10 and 30)


Answer

You have a yield of megabases.

You have about fold nucleotide coverage of your genome.

We recommend trying k= for your Velvet assembly.

The Velvet sequence type is:


Tips

If you are using paired reads, you need to either (1) remember to interleave the two reads files first with shuffleSequences.pl which comes with Velvet; or (2) use the new "-separate" option.

Make sure you check the MAXKMERLENGTH your Velvet was compiled with by typing velveth by itself.

Read the Velvet-Users mailing list and ask questions if you need help.

Try using VelvetK to a priori suggest a good k-value.

Try using VelvetOptimiser to help find the best k-value automatically.

We recommend using the -exp_cov auto and -cov_cutoff auto options to velvetg when first exploring your data.


What is k-mer coverage?

All coverage values in Velvet are provided in k-mer coverage, i.e. how many times has a k-mer been seen among the reads. The relation between k-mer coverage Ck and standard (nucleotide-wise) coverage C is Ck = C * (L - k + 1) / L where k is your hash length, and L you read length.

The choice of k is a trade-off between specificity and sensitivity. Longer kmers bring you more specificity (i.e. less spurious overlaps); but reduce depth of coverage. The sweet spot is related to the properties of your genome and to the error rate in the reads. See Titus Brown's blog post for a good description.

Experience shows that kmer coverage should be above 10 to start getting decent results. If Ck is above 20, you might be "wasting" coverage. Experience also shows that empirical tests with different values of k are not costly to run.


Written by Torsten Seemann, Victorian Bioinformatics Consortium, Monash University, AUSTRALIA.