세미나 안내
[장소변경]2014년 8월 13일 세미나 일정
생물정보학 세미나 공지
서울대학교 생물정보연구소와 생물정보학 협동과정 공동 주최로 특별 세미나를 아래와 같이 열고자 하오니, 많은 참여 바랍니다.
일시: 2014. 8. 13.(수) 11:00
연사: 양영익 박사(J. Craig Venter Institute)
장소: 220동 625호 대회의실
Title: SFA-SPA: a suffix array based short peptide assembler for metagenomic data
Metagenomics extends the power of genomic analysis to entire communities of microbes, bypassing the need to isolate and culture individual bacterial community members. In this paradigm, DNA is extracted and sequenced directly from an environmental sample. Because of cost effectiveness, next generation sequencing (NGS) technology is routinely used in metagenomics studies. The deep coverage of sequence reads from NGS compensates for a limitation due to the short read length. The massive array of sequence data from an environmental sample provides rich information to understand the microbial community. On the other hand, many new challenges are also exposed such as unknown taxonomic origins and abundance distributions, sequence variations, and non-uniform sequencing depths. Due to these confounding factors, the computational challenge of reconstructing genomes from metagenomic reads is very challenging, with nucleotide assembly often resulting in fragmented contigs. Furthermore, many read sequences remain unassembled. Owing to the poor assembly, subsequent downstream analyses, including analysis of protein sequences in these data, become challenging and unreliable.
We had previously introduced algorithm and software for the accurate reconstruction of protein sequences from short peptides identified on nucleotide reads in a metagenomic dataset. It has been tested on multiple simulated and real data sets and it out-performed a competing strategy, gene prediction after assembly on nucleotide reads, in multiple criteria such as accuracy, read assembled rate, and chimera rate.
Here we present significant computational improvements to the short peptide assembly algorithm that make it practical to reconstruct proteins from large metagenomic datasets containing several hundred million reads, while maintaining accuracy. The improved computational efficiency is achieved using a suffix array data structure that allows for fast querying during the assembly process, and a redesign of the assembly steps that also facilitates a multithreaded execution.
최종학력 : Ph.D in Computer Science, Indiana University. 2010
최근연구활동
Development of novel short peptide sequence assembler for metagenomic sequence analysis.
Human oral metagenome/metatranscriptome sequence analysis using next generation sequencing data.
Genome/Metagenome projects
서울대학교 생물정보연구소
생물정보학 협동과정 공동주최