Seq — a language for bioinformatics

Seq is a programming language for computational genomics and bioinformatics. With a Python-compatible syntax and a host of domain-specific features and optimizations, Seq makes writing high-performance genomics software as easy as writing Python code, and achieves performance comparable to (and in many cases better than) C/C++.

Source code is available on GitHub. You can also learn more about Seq from our paper or by visiting our Gitter chatroom.

What’s new in 0.10?

Version 0.10 brings a slew of improvements to the language and compiler, including:

  • Nearly all of Python’s syntax is now supported, including empty collections ([], {}), lambda functions (lambda), *args/**kwargs, None and much more

  • Compiler error messages now pinpoint exactly where an error occured with compile-time backtraces

  • Runtime exceptions now include backtraces with file names and line numbers in debug mode

  • GDB and LLDB support

  • Various syntax updates to further close the gap with Python

  • Numerous standard library improvements

Caution

The default compilation and execution mode is now “debug”, which disables most optimizations. Pass the -release argument to seqc to enable optimizations.

Frequently Asked Questions

What is the goal of Seq?

One of the main focuses of Seq is to bridge the gap between usability and performance in the fields of bioinformatics and computational genomics, which have an unfortunate reputation for hard-to-use, buggy or generally poorly-written software. Seq aims to make writing high-performance genomics or bioinformatics software substantially easier, and to provide a common, unified framework for the development of such software.

Why do we need a whole new language? Why not a library?

There are many great bioinformatics libraries on the market today, including Biopython for Python, SeqAn for C++ and BioJulia for Julia. In fact, Seq offers a lot of the same functionality found in these libraries. The advantages of having a domain-specific language and compiler, however, are the higher-level constructs and optimizations like Pipelines, Sequence matching, Inter-sequence alignment and Genomic index prefetching, which are difficult to replicate in a library, as they often involve large-scale program transformations/optimizations. A domain-specific language also allows us to explore different backends like GPU, TPU or FPGA in a systematic way, in conjunction with these various constructs/optimizations, which is ongoing work.

What about interoperability with other languages and frameworks?

Interoperability is and will continue to be a priority for the Seq project. We don’t want using Seq to render you unable to use all the other great frameworks and libraries that exist. Seq already supports interoperability with C/C++ and Python (see interop).

I want to contribute! How do I get started?

Great! Check out our contribution guidelines and open issues to get started. Also don’t hesitate to drop by our Gitter chatroom if you have any questions.

What is planned for the future?

See the roadmap for information about this.