- Job Title
- Senior Software Developer / Bioinformatician (Genome Annotation and Analysis)
- Post Number
- 1004177
- Closing Date
- 6 Jun 2022
- Grade
- SC5
- Starting Salary
- £41,193 - £50,448
- Hours per week
- 37
- Expected/Ideal Start Date
- 20 Jun 2022
- Months Duration
- 36
Job Description
Main Purpose of the Job
Develop accurate and scalable computational pipelines and software tools for the annotation and analysis of complex eukaryotic genomes. With responsibility for leading the development of EI's genome annotation toolkit and supporting large scale annotation projects. This will involve collaborations with EI faculty and biologists and computational scientists from UK and international institutions. Develop methods that are robust to different genome characteristics, assembly contiguity and amount, quality, and types of data. Build pipelines to support pan-genome analysis and annotation of alternative haplotypes. Utilise modern workflow management and containerisation systems to automate complex workflows. Expand EI’s gene annotation pipelines to incorporate data from the latest sequencing technologies. Prepare manuscripts for submission and present results at internal and external meetings. Keep up to date with modern software engineering best practices, data analysis best practices and new technologies. The postholder will also have the opportunity to mentor junior bioinformaticians, interns and visitors.
Key Relationships
Reporting to David Swarbreck (Head of Core Bioinformatics) and interacting on a daily basis with members of the core bioinformatics group. Will interact with all EI staff and external collaborators but particularly the group leaders, domain experts and leads in relevant genomics research.
Main Activities & Responsibilities
- Percentage
- Designing and implementing new workflows within the framework of EIs genome annotation toolkit REAT (Robust Eukaryotic Annotation Toolkit, https://github.com/EI-CoreBioinformatics/reat)
- 20
- Develop tools to project, consolidate, validate, and resolve differences in gene annotations across clades, subspecies, and haplotypes.
- 20
- Develop approaches for integrating data from a variety of sequencing technologies and maintain and develop existing pipelines / annotation tools
- 15
- Engage in research projects and analysis, developing bespoke tools and scripts.
- 15
- Collaborate with EI faculty groups and UK and international partners on large-scale annotation and analysis projects.
- 15
- Manage the release of data and tools to public repositories
- 5
- Communicate the work of EI in oral (both in internal seminars and external conferences) and written presentations and to publish novel findings and technology/tool development in scientific journals.
- 5
- Keep up to date with modern software engineering best practices, data analysis best practices and new technologies.
- 5
- As agreed with line manager, any other duties commensurate with the nature of the post
Person Profile
Education & Qualifications
- Requirement
- Importance
- PhD or equivalent experience in a relevant subject area (e.g. bioinformatics, computational biology, computer science)
- Essential
Specialist Knowledge & Skills
- Requirement
- Importance
- Excellent programming skills in various languages (ideally including Python, R, shell scripting) including the ability to interpret legacy code
- Essential
- Proficient skills with UNIX tools and version control (such as git, subversion)
- Essential
- Familiarity with cluster compute environments and workflow management systems
- Essential
- Understanding of testing frameworks and best practices
- Essential
- Knowledge of methods for genome annotation
- Essential
- Knowledge of pipeline workflow programming
- Essential
- Broad and in-depth understanding of bioinformatics tools
- Essential
Relevant Experience
- Requirement
- Importance
- Experience designing and implementing computational methods, tools and algorithms for large scale data analysis
- Essential
- Experience of workflow management systems e.g. snakemake, cromwell, nextflow
- Essential
- Experience developing software in either Python, C/C++ or Java
- Essential
- Have experience using High Performance Computing (HPC) resources and job scheduling systems
- Essential
- Experience with version control
- Essential
- Experience with next generation sequencing (RNA-Seq, DNA-Seq) and large-scale data analysis
- Desirable
- Experience of working in a high throughput genomics environment with disciplined delivery of high quality usable data sets
- Desirable
- Experience with genome annotation tools and approaches
- Desirable
Interpersonal & Communication Skills
- Requirement
- Importance
- Excellent leadership skills
- Essential
- Capable of working with domain specialists to gather system requirements
- Essential
- Strong organisational and record keeping skills
- Essential
- To be able to communicate clearly with EI staff at all levels
- Essential
- Able to work independently
- Essential
- Capable of writing academic papers and grants
- Essential
- Good communication skills, both written and verbal
- Essential
- Good interpersonal skills, with the ability to work well as part of a team
- Essential
- Networking and influencing skills and the ability to build effective collaborative links
- Desirable
Additional Requirements
- Requirement
- Importance
- Attention to detail
- Essential
- Promotes equality and values diversity
- Essential
- Willingness to participate in occasional training of other institute members and visitors, and as part of formal training programmes hosted at the institute
- Essential
- Willingness to embrace the expected values and behaviours of all staff at the Institute, ensuring it is a great place to work
- Essential
- Willingness to work outside standard working hours when required
- Essential
- Able to present a positive image of self and the Institute, promoting both the international reputation and public engagement aims of the Institute
- Essential
Who We Are
Earlham Institute
Earlham Institute is a vibrant, contemporary research institute and registered charity, working in an area of rapid technological development and innovation.
Earlham Institute is strategically funded by the BBSRC to lead the development of a skill base in bioinformatics and a genomics technology platform for UK bioscience. The Institute is located on the Norwich Research Park, together with its partners: the John Innes Centre, the Institute of Food Research, The Sainsbury Laboratory, the University of East Anglia and the Norfolk and Norwich University Hospital. The research park has an excellent reputation for research in plant and microbial sciences, interdisciplinary environmental science and food, diet and health, to which Earlham Institute contributes strengths in genomics and bioinformatics. Close links exist between the NRP partners and new opportunities for collaboration in exciting new initiatives are under development. The NRP recently received £26M of government investment to facilitate innovation and further develop infrastructure to attract science and technology companies to the Park to enhance the vibrant environment and realise economic impact from research investment.
Earlham Institute is a UK hub for innovative Bioinformatics through research, analysis and interpretation of multiple, complex data sets. It hosts one of the largest computing hardware facilities dedicated to life science research in Europe. This has been boosted recently by an e-Infrastructure grant to expand the data storage capacity to a multi-petabyte unit, deploying a high performance cluster and large-memory server enabling the allocation of processes requiring several terabytes of computing memory.
Earlham Institute’s state of the art DNA sequencing facility operates multiple complementary technologies for data generation that provide the foundation for analyses furthering our fundamental understanding of genomes and how they function. We aim to be at the forefront of technological advances and are developing and implementing technologies to generate and analyse new types of data. We also develop novel platforms to provide access to computational tools and processing capacity for multiple academic and industrial users and promoting applications of computational bioscience. Earlham Institute has one fully owned subsidiary, Genome Enterprise Ltd (GEL) via which it offers genomic and bioinformatics services on a trading basis and works with commercial providers on a partnership basis. Earlham Institute also receives specific funding to enable knowledge exchange programmes which are supported across the institute teams.
Department
Digital Biology
Group Details
The Core Bioinformatics Group led by David Swarbreck provides high quality computational analyses of a range of sequence data types and develops software and analysis pipelines that support the Institute's Core Strategic Programmes and National Capability in Genomics and Single Cell Analysis (NCGS). The National Capability provides access to state of the art genomics technologies for research groups throughout the UK as well as delivering sequencing and genomics services to the associated Science Faculty groups and commercial customers. The group is involved in the analysis of sequence data from a wide variety of sequencing platforms and works alongside Earlham research groups to establish new software tools and analysis methods.
Management and Leadership
- Requirement
- Importance
- Ability to lead projects to a successful conclusion
- Essential
- Coordinate work with others on collaborative projects
- Essential
Living in Norfolk
Advertisement
Senior Software Developer / Bioinformatician (Genome Annotation and Analysis)
Applications are invited for a Senior Software Developer/Bioinformatician to join the Core Bioinformatics Group at the Earlham Institute (EI), based in Norwich, UK.
The role:
This post will focus on developing novel computational tools and pipelines for large scale annotation of genes and other genomic features across a diverse range of species including plants, mammals, fish and protists.
This role will have responsibility for leading the development of EI's genome annotation pipelines and supporting large scale annotation / analysis projects. A key focus will be continuing the development of REAT - EI's Robust and Extendable eukaryotic Annotation Toolkit (https://github.com/EI-CoreBioinformatics/reat), building new workflows to address specific challenges relevant to the annotation of protist and plant genomes including annotation of non-culturable protists using single cell data.
The role will develop scalable and robust methods to annotate eukaryotic species with large, repeat rich and polyploid genomes, utilising data from cutting edge sequencing technologies including PacBio IsoSeq and Nanopore.
As part of the Core Bioinformatics Group you will:
• Design and develop accurate and scalable computational pipelines for the annotation and analysis of complex eukaryotic genomes
• Continue development of existing pipelines / annotation tools
• Develop tools to support pan-genome analysis and annotation of alternative haplotypes
• Develop approaches for integrating data from a variety of sequencing technologies
• With the core bioinformatics team produce high-quality, evidence-based genome annotation of protein-coding, lncRNA and pseudogenes
• Utilise modern workflow management and containerisation systems to automate complex workflows.
• Keep up to date with modern software engineering best practices, data analysis best practices and new technologies.
• Mentor junior bioinformaticians, interns and visitors.
• Collaborate with EI faculty groups and UK and international partners on large-scale projects including DToL.
The ideal candidate:
The successful candidate will possess a PhD or equivalent experience in computational biology, bioinformatics, computer science or a similar subject. They will also have a strong background in developing computational tools / pipelines and experience of large scale data analysis.
Candidates should possess a high level of skill and demonstrable experience of building tools and pipelines in Python, along with R and Bash scripting. Familiarity with cluster compute environments and workflow management systems is also essential.
The ideal candidate will also be highly collaborative and have demonstrable experience of successfully managing their time to deliver analyses and input into multiple projects
Additional information:
Salary on appointment will be within the range £41,193 to £50,448 per annum depending on qualifications and experience. A market supplement may also apply depending on skills and experience. This is a full-time post for a contract of 3 years.
As a Disability Confident employer, we guarantee to offer an interview to all disabled applicants who meet the essential criteria for this vacancy.
The closing date for applications will be 6 June 2022.