Research Software Engineer (Collaborative Open Omics and Darwin Tree of Life)
Post Details
Job Title
Research Software Engineer (Collaborative Open Omics and Darwin Tree of Life)
Post Number
Closing Date
26 Feb 2020
Starting Salary
£39,990 - £48,775
Hours per week
Project Title
Collaborative Open Plant Omics (COPO): a community-driven platform for plant science
Months Duration

Job Description

Main Purpose of the Job

Develop and maintain the Collaborative Open Omics (COPO) platform: a data management and deposition tool (

Continue to develop COPO to become a single point-of-access for end users, and a range of web services via their APIs, that enable scientists to attribute metadata, deposit data and ascribe accessions to these first-class research objects, and link these rich bundles with their publications.

Manage the strategic development and delivery of high-performance middleware that provides essential tracking and lifecycle processing of data managed by the COPO platform.

Devise novel approaches to improve large-scale data description, public deposition and access across the UK and international life science research community.

Contribute to, promote, and implement FAIR standards for data-driven informatics in the biosciences.

Engage with life scientists, particularly in EI’s strategic remit of non-biomedical big data research, by attending national and international meetings to promote data management and related infrastructure, and by providing training on the COPO platform.

Key Relationships

Internal: Reporting to the Head of Research e-Infrastructure leader with a key role to play in the development and delivery of the COPO infrastructure. Working with the EI Capabilities in Genomics, e-Infrastructure and Training to support data management, develop materials, and deliver training on the COPO platform.
External: Developers of the ISA Tools suite at University of Oxford; data repositories at the European Bioinformatics Institute (EBI); representatives of the UK and international bioscience community; UK and international data management scientists; API and service providers for data and analysis management services such as CyVerse and CyVerse UK, Galaxy, CKAN, figshare, etc

Main Activities & Responsibilities

Work with the Head of Research e-Infrastructure and colleagues in the group to develop the Collaborative Open Omics platform, as well as EI’s tools for data and metadata management that are driven by analytical demands, technological advances and institute strategy needs.
Reduce the burden of data management for researchers, enabling them to share their data and results more effectively by establishing interoperability middleware between a range of existing data services.
Establish associated best practice guidelines for bioscience data management, flow, exposure and publication strategies.
Provide an efficient and cost-effective data management platform to exploit EI’s hardware and software expertise, promoting EI as a National Capability for UK bioscience.
Communicate the work of EI in oral and written presentations and publish novel technological developments in scientific journals.
Network widely with academic and industrial groups on the NRP, throughout the UK and internationally, including the GARNET plant community, the European Bioinformatics Institute, other genomics facilities nationally and internationally, and with bioinformaticians globally, such as the CyVerse and Galaxy efforts.
As agreed with line manager, any other duties commensurate with the nature of the role.

Person Profile

Education & Qualifications

BSc in a relevant area, equivalent professional qualification, or strong evidence of practical role in a suitable area of computer science or software development.
MSc and/or PhD in a relevant discipline, e.g. computer science, bioinformatics.

Specialist Knowledge & Skills

Strong experience in software development in Python 3, with a focus on Django, and associated API and web-based UI development.
Proficient in the Javascript programming language and web application and UI frameworks, and associated libraries.
Knowledge of ontologies.
Experience with MongoDB NoSQL database management systems.
Experience in platform- and software-as-a-service (PaaS, SaaS) architectures, i.e. the “Cloud”, especially with Docker.
Experience in websockets and event queues.
Experience in usage of high-performance computing infrastructures.

Relevant Experience

Strong experience in open source software engineering and associated tools, i.e. version control (git/GitHub) agile practices, code review, etc.
Experience of working in a rapidly changing technological environment.
Experience of working in a high throughput genomics environment.
Experience in data management and metadata attribution.
Knowledge of research and funding within the UK science base.
Track record in data management, preferably in the field of bioinformatics or another “big data” discipline.

Interpersonal & Communication Skills

Excellent communication skills, both written and verbal.
Excellent interpersonal skills, with the ability to work well as part of a team
Excellent time management and organisational skills
Promotes and strives for continuous improvement
Interest in and understanding of research into user experience (UX).

Additional Requirements

Attention to detail
Promotes equality and values diversity
Willingness to embrace the expected values and behaviours of all staff at the Institute, ensuring it is a great place to work
Able to present a positive image of self and the Institute, promoting both the international reputation and public engagement aims of the Institute
Ability to maintain confidentiality and security of information where appropriate
Willingness to work outside standard working hours when required

Who We Are

Earlham Institute

Earlham Institute is a vibrant, contemporary research institute and registered charity, working in an area of rapid technological development and innovation.

Earlham Institute is strategically funded by the BBSRC to lead the development of a skill base in bioinformatics and a genomics technology platform for UK bioscience. The Institute is located on the Norwich Research Park, together with its partners: the John Innes Centre, the Institute of Food Research, The Sainsbury Laboratory, the University of East Anglia and the Norfolk and Norwich University Hospital. The research park has an excellent reputation for research in plant and microbial sciences, interdisciplinary environmental science and food, diet and health, to which Earlham Institute contributes strengths in genomics and bioinformatics. Close links exist between the NRP partners and new opportunities for collaboration in exciting new initiatives are under development. The NRP recently received £26M of government investment to facilitate innovation and further develop infrastructure to attract science and technology companies to the Park to enhance the vibrant environment and realise economic impact from research investment.

Earlham Institute is a UK hub for innovative Bioinformatics through research, analysis and interpretation of multiple, complex data sets. It hosts one of the largest computing hardware facilities dedicated to life science research in Europe. This has been boosted recently by an e-Infrastructure grant to expand the data storage capacity to a multi-petabyte unit, deploying a high performance cluster and large-memory server enabling the allocation of processes requiring several terabytes of computing memory.

Earlham Institute’s state of the art DNA sequencing facility operates multiple complementary technologies for data generation that provide the foundation for analyses furthering our fundamental understanding of genomes and how they function. We aim to be at the forefront of technological advances and are developing and implementing technologies to generate and analyse new types of data. We also develop novel platforms to provide access to computational tools and processing capacity for multiple academic and industrial users and promoting applications of computational bioscience. Earlham Institute has one fully owned subsidiary, Genome Enterprise Ltd (GEL) via which it offers genomic and bioinformatics services on a trading basis and works with commercial providers on a partnership basis. Earlham Institute also receives specific funding to enable knowledge exchange programmes which are supported across the institute teams.


Digital Biology

Group Details

The Davey group focuses on research into understanding how best to manage, represent and analyse data for open science.

We explore new hardware, algorithms and methodologies to develop tools for life science informatics, such as:

• Large-scale data visualisation
• Bioinformatics training
• Assembly algorithms for microbial metagenomics
• Novel infrastructure platforms for disseminating/publishing data and software

The group develops novel open software frameworks to enable scientists to describe, deposit and use research data, currently around exemplar use cases in plants. We lead the Collaborative Open Plant Omics (COPO) project, a web platform and API (Application Programming Interface) for grouping, describing and publishing raw and processed ‘omics data, and research objects such as workflows, software, manuscripts, posters, presentations and images. We are active members of the international W​heat Information System (WheatIS) initiative, where we are developing the Grassroots infrastructure to expose EI’s wheat data, notably our recent release of the new wheat assembly (plus several others) as a single BLAST portal powered by our National Capability HPC architecture. We use the Grassroots architecture to contribute to the yellow rust ​Field Pathogenomics project, the CerealsDB project, and collaborate closely with URGI, INRA on the WheatIS.

We deploy community-focused analytical web platforms by leading the Galaxy Development and Training project and the computational infrastructure and informatics node for CyVerse UK. We aim to offer these services to the UK community, and provide training and best practice around research data management.

Our infrastructure resources are built on open principles and are designed to be queried programmatically with concise APIs, broadening their availability and reuse.

We also hold current grants relating to metagenomics and technology development. The MetaCortex project seeks to develop assem


Advert Text

Applications are invited for a Research Software Engineer to join the Group of Dr Rob Davey at the Earlham Institute (EI), based in Norwich, UK to work on information systems for data brokering.


The Collaborative Open Omics (COPO) project is a community-driven platform for making life science data openly available ( Whilst COPO was originally developed for plant scientists (BB/L024055/1), EI has extended its reach to other life science domains and data management practices. As such, COPO helps life science researchers to implement data and metadata management practices to submit their research outputs to public repositories. For example, EI is providing resources for sequencing and data management to the recently awarded Darwin Tree of Life project (, a hugely exciting endeavour to sequence all eukaryotic organisms within the UK as part of the Earth BioGenome Project ( This role will be instrumental in developing powerful systems to power the data sharing and reuse aspects of the project.

EI is supported by the BBSRC to ensure that biological science in the UK has access to a skill base in genomics technologies and bioinformatics to deliver programmes leading to improved food security, advances in industrial biotechnology, and improved human health and wellbeing.

The role:

This role will be central to the development of COPO strategy to help manage the sample metadata and data submission for the Darwin Tree of Life project, as well as EI and other UK researchers to make their research data better described and publicly available. Therefore, this is an exciting role with international collaborators and a global reach. The post holder will develop potentially ground-breaking advances to improve the access and understanding of data deposition and dissemination services to enable the life science research community to manage and publicise their data effectively.

Working within a vibrant research group with a diversity of backgrounds and interests in data management, processing, and visualisation, the post holder will work with other Research Software Engineers and DevOps staff to develop and maintain COPO to support the life science community, and deploy COPO within EI’s cloud environment, CyVerse UK.

Through interactions with the EI Advanced Training team, the post holder will develop suitable training programmes around COPO and promote the benefits of accurate and timely data management through user experience research, workshops and other training events.

The ideal candidate:

Candidates should possess a BSc in a relevant computational or informatics /data science subject, or have an established background of working in a suitable area of computer science or software development. They should have extensive experience of software development in Python 3, potentially with experience in the Django web framework. Proficiency in Javascript and web-based UI frameworks is essential. Knowledge of ontologies and data management processes is desirable. Candidates should also expect to work in a fun, open, and rapidly changing technological environment.

Additional information:

Salary on appointment will be within the range £39,990 to £48,775 per annum depending on qualifications and experience.  This is a full-time post for a contract of three years.

We especially welcome applications from underrepresented minorities in the computational sector. As a Disability Confident employer, we guarantee to offer an interview to all disabled applicants who meet the essential criteria for this vacancy.