text.skipToContent text.skipToNavigation

CLC Assembly Cell

适用于二代测序数据的序列作图比对及从头组装
  • 支持双端测序结果数据的序列作图比对与组装
  • 支持短读长片段和长读长片段的组装
  • 支持在双端测序数据组装过程中scaffolds构建
  • 支持集群架构
  • 可以同CLC Genomics Workbench完全整合
CLC Assembly Cell是一种用于对于二代测序产生的结果进行序列作图比对及从头组装的高性能计算方案。CLC Assembly Cell利用SIMD指令来对序列的组装算法进行并行加速,使得它成为市面上最为快速的二代测序数据组装工具。CLC Assembly Cell的命令行接口使得它能够非常容易的被添加到脚本文件或是其他的二代测序工作流程中。
Cat No./ID: 832240
CLC Assembly Cell, Desktop License
CLC Assembly Cell, Desktop License

CLC Assembly Cell适用于分子生物学应用。该产品不适用于疾病的诊断、预防或治疗。


0
Benchmarking
We compared the performance of the industry standard HGAP1 when run on a high performance computer to the performance of a De Novo Assembly workflow in CLC Assembly Cell. Please note that our De Novo Assembly Pipeline was run on a standard laptop for this comparison.
Performance
高效的性能
CLC Assembly Cell是市面上最为快速的二代测序数据组装工具,它利用SIMD指令来对序列的组装算法进行并行加速。如果利用CLC Assembly Cell来进行大型的组装工作,如对人类或植物基因组双端测序结果的从头组装,所需的CPU时间一般小于其他组装工作的1/10。CLC Assembly Cell的一项特殊的功能在于它可以根据内存容量进行自动的调节,所以即便在只有几GB内存大小的电脑上也可以进行大型的数据组装工作。比如,在小于4GB内存的条件下就可以进行人类基因组的序列比对。
对于双端测序及混合数据集的支持
CLC Assembly Cell不仅可以支持所有测序平台产生的双端测序数据的组装,还可以支持将不同方法得到的双端或单端数据集进行组装。此外,对于基于双端测序产生的contigs,CLC Assembly Cell还可以进行将其进行进一步的拼接而构建scaffolds。

支持集群架构
CLC Assembly Cell支持利用多核计算机进行处理。对于同一集群中多台计算机上平行进行的额外组装工作,CLC bio可提供将组装工作进行分配计算的定制化解决方案。正是由于组装结果可进行灵活的整合,将组装工作分配到多台计算机上变的非常容易。

软件的可视化及下游分析
CLC Assembly Cell已同CLC Genomics Workbench进行了完全的整合,使得利用CLC Assembly Cell便可进行大量的下游分析以及先进的组装工作可视化。
Applications
CLC Assembly Cell的主要功能在于序列的作图比对以及从头组装。序列作图比对功能包括对于长、短读长片段的组装,无论短读长片段的比对结果是否存在间隙。从头组装支持对于双端测序数据的组装及拼接。其他可以支持的分析功能包含报表创建、去除重复序列、序列质量修剪以及SNP鉴定。CLC Assembly Cell已同CLC Genomics Workbench进行了完全的整合,从CLC Assembly Cell输出的结果数据可以直接导入CLC Genomics Workbench进行进一步的分析。
Services
Cluster support

Multiple CLC Assembly Cells can be run in parallel on a multi-node cluster.
In practice, almost every cluster is set up differently, and we therefore don’t provide an off-the-shelf solution that is guaranteed to work on your computer cluster. Instead we provide a free to download, free to use, and free to modify Perl script, as an example.

Job node distribution for CLC Assembly Cell

Multiple CLC Assembly Cells can be run in parallel on a multi-node cluster and as almost every cluster is set up differently, we provide the below free to download, free to use, and free to modify Perl script as an example. Please note that this is not an off-the-shelf solution that is guaranteed to work on your computer cluster but you are welcome to adjust it to fit your needs.

The script cluster_schedule distributes jobs defined in the schedule file on a number of nodes. An example could be distribution of CLC Assembly Cell reference assembly jobs. This requires an installation of CLC Assembly Cell on each node, and the best performance is reached if the reference sequence is stored locally on each node.

Each job is a list of commands which cluster_schedule will run in order on one node. If one of the commands in a job fails (error code is not zero) no more commands in the job is executed and the job is considered failed. If all commands in a job complete successfully (error codes are zero) the job is a success.
The nodes the jobs are run on can be defined on the command line or in the schedule_file. The nodes defined in command line replace all nodes defined in the schedule_file.

Each job is run on one node and each command is executed on the node using ssh.
Therefore, to use cluster_schedule make sure that all nodes are set up to use automatic ssh authentication.
 
fragment fix placeholder