In early 2013, the Bioinformatics Resource Australia – EMBL carried out a survey of Australian life scientists in order to identify areas in which BRAEMBL could support those researchers to make optimal use of bioinformatics capabilities. More than 200 responses were received from across Australia, representing 750 researchers from all areas of biology.
Overall the picture is of ubiquitous use of the tools and data of bioinformatics, with a clear indication that it is no longer in the hands of specialist bioinformaticians but widely used by laboratory scientists.
Lack of expertise was identified as the single biggest difficulty facing researchers in their bioinformatics activities, and training as the most valuable thing that BRAEMBL could do to support those activities. Dry-lab researchers also highlighted a need for better bioinformatics community networks.
Key conclusions of the survey were:
Bioinformatics is a key activity in Australian research as evidenced not only by the content of responses but also simply by the number of responses
The areas of interest reflect the “central dogma” of molecular biology
Not only bioinformaticians but also laboratory scientists see bioinformatics as core to their work
Geographic location imposes significant but not crippling limitations on exploitation of bioinformatics
Users are more likely to report satisfactory service (hardware, software and support) if it is provided within their own group
There is a very marked concern about lack of expertise and access to expertise in bioinformatics
Training and community building are the most sought after services
There is a significant demand for training of a more general nature, in computer programming and statistics
The Bioinformatics Resource Australia – EMBL (BRAEMBL) was formed in early 2013 as an extension of the Australian EBI mirror project, an initiative designed to remove barriers of geographical remoteness for Australian bioinformatics. BRAEMBL also incorporates the Specialised Facility in Bioinformatics, a National Computational Infrastructure project to make compute resources available to bioinformaticians on a competitive basis.
The evolution to BRAEMBL required a reconsideration of the missions of these projects into three major goals, one of which was for BRAEMBL to enable optimal exploitation of the tools and data of bioinformatics by Australian scientists. To support this goal, it was necessary to identify the range of bioinformatics activities and needs in Australia, and so the BRAEMBL Community Survey was carried out in an attempt to get as much input as possible from all people who might be users of bioinformatics.
The survey ran throughout February 2013, during which time it was advertised as widely as possible through mailing lists, professional networks, social media, conferences, seminars and websites. Responses were collated and analysed at the end of the month, although the survey has remained active since then to allow a continued opportunity for members of the community to provide input. The survey consisted of a mixture of multiple choice and free text responses, all of which were optional.
210 responses were received by the initial analysis date, representing the views of a self-reported 750 people. Responses came from the ACT and all Australian states except the Northern Territory, with the majority coming from Victoria, Queensland and NSW (together just over 75% of responses).
Figure 1 – Distribution of survey respondents by state and organisation type
Respondents were evenly split between wet-lab and dry-lab researchers, with a small number in scientific support, and were largely from universities and other academic research institutes. Just under half described themselves as ‘researchers’ and about a quarter as ‘students’, with ‘principle investigators’ as the third largest group. Survey respondents were active across practically all fields of biology, with the most common areas being bioinformatics research, genomics, molecular biology, cell biology and genetics.
We are optimistic that the broad distribution of demographics suggests that the sampled population is relatively representative of the Australian life sciences research community as a whole. However, it should be recognised that the nature of advertising the survey means that the number of people aware of it is unknown, and also that those completing the survey were a self-selected population.
The number of responses alone is also indicative of the value Australian scientists place on bioinformatics as a research tool. A similar survey in Europe garnered a little over four times as many responses in total from a community perhaps fifteen times larger than that in Australia.
Use of bioinformatics resources
The vast majority of survey respondents were existing users of bioinformatics tools, with 85% of wet-lab scientists, and essentially all dry-lab scientists, reporting using bioinformatics tools at least occasionally. Over 90% of researchers using bioinformatics tools (both wet- and dry-lab) also made use of remote databases.
Figure 2 – Frequency of bioinformatics usage for wet- and dry-lab researchers
Database usage reflected the research area distribution of respondents, with gene, genome and expression data being the most popular, and protein resources less so. Similarly, GenBank was the most popular bioinformatics resource, with Ensembl and the UCSC browser also frequently used. Pathway and interaction databases were popular resources, highlighting the importance of systems biology approaches to many researchers.
Figure 3 – Usefulness of database types, normalised to percentage of total responses
In addition to their use of public databases, respondents also reported generating significant quantities of their own data – 53% produced between 1Gb and 1Tb data each month, with 25% producing more and 22% less. These figures were similar for both wet and dry-lab researchers.
Access to infrastructure resources
Respondents were generally satisfied with their access to software, databases and high-performance computing resources, with over 75% describing their level of support in these areas as adequate. In contrast, fewer than 50% described bioinformatics support staff levels as adequate. There was also a substantial minority of people who wanted bioinformatics support, but felt that they had no available access from in-house, collaborative, or even external sources.
For all four types of infrastructure resources, levels of satisfaction were highest when provision was within the group or organisation. This is typified by the situation that, despite the rapid development of national and cloud compute infrastructure resources, most access to high-performance computing and bioinformatics software was still provided from within the organisation.
However BRAEMBL clearly cannot put support staff into every group, nor do they believe that local implementation of technology is sustainable and best practice. The challenge to BRAEMBL then is to establish and implement a way to make remote resources as easy to use as local resources.
Figure 4 – Level of satisfaction with bioinformatics infrastructure resource types related to the location of those resources. Blue indicates adequate, red is inadequate
Disadvantaged by geography
One of the principle goals of the EBI mirror, the precursor of BRAEMBL, was to reduce the perceived disadvantage to researchers caused by their physical isolation from the major bioinformatics resources available to North American and European scientists, in particular database access. The survey results supported this premise, with one third of respondents reporting that their access to data was disadvantaged by geography. Access to IT resources and bioinformatics expertise was considered more affected (40% and 54% respectively reporting some level of disadvantage).
Views were comparable across most states other than Western Australia, where location was felt to be much more of a disadvantage for bioinformatics access. This may reflect the comparative isolation of WA even within Australia, or it may be an artefact caused by the smaller number of responses from that state. Nonetheless, there is a clear role for BRAEMBL across the country in helping to reduce the impacts of this geographical disadvantage.
Figure 5 – Level of research disadvantage experienced as a result of geographical isolation from biological databases, IT resources, and bioinformatics expertise
The most emphatic outcome of the survey was the overwhelming demand for bioinformatics training and concern about lack of bioinformatics expertise within the Australian life science community. Only four (<2%) respondents to the survey indicated that training would be not at all useful to them, with 75% of the remainder identifying at least one area of training that would be ‘very useful’. Statistical analysis training was the most requested, with 95% of respondents identifying it as somewhat or very useful. Next-generation sequencing and network/pathway analysis were other popular areas for training.
Figure 6 – Level of interest for training in different bioinformatics areas, normalised to percentage of total responses
Tools and databases
Respondents were asked to list up to five of their most important bioinformatics tools or databases, and on average they named about two each. In total 165 different tools were identified, although only about one third (57) were mentioned more than once. Despite having been initially developed over twenty years ago, Blast remains one of the most popular bioinformatics tool, second only to the UCSC genome browser. The statistical program R was also highly mentioned, reiterating the recognition that bioinformatics, and indeed life science generally, should be based on a strong statistical foundation.
Figure 7 – The most popular bioinformatics tools and resources – only those listed by four or more respondents are included
Most important issues
Finally, respondents were asked what their biggest single difficulty was in bioinformatics, and what would be the most useful thing that BRAEMBL could offer them. Analysis of this question relied on a semi-subjective interpretation of free-text responses, but overall the main problem described was a lack of expertise in bioinformatics (40% of all responses) while the most useful thing was, by a long way, to offer training (50% of all responses).
Figure 8 – What is your biggest bioinformatics difficulty (blue chart on left) and what is the most useful thing that BRAEMBL could do for you (red chart on right). Semi-subjective categorisation of free-text responses normalised to percentage of total responses
A breakdown by wet- and dry-lab researchers identified a third area of concern to the latter group (who are most likely to be full-time bioinformaticians), namely that they also felt a need for a better bioinformatics community and network.
Figure 9 – The three main areas of bioinformatics needs in Australia separated by researcher type
While training was clearly considered to be the most beneficial activity of BRAEMBL, we recognise that it is possible this may in fact just be a symptom of the current lack of expertise and access to support. Other activities such as providing easier access to bioinformatics support capabilities, or developing simpler bioinformatics tools and pre-prepared analysis pipelines, may provide an alternative solution without all biologists necessarily having to train in bioinformatics.
The BRAEMBL team is now working on ways to best deliver the bioinformatics capabilities which the survey has identified as most needed by the research community. Specific areas of action are:
- Training: BRAEMBL will work with BioPlatforms Australia, CSIRO and other bioinformatics training providers to develop and offer more bioinformatics and statistics training options in Australia.
- Community building: BRAEMBL will support the role of the Australian Bioinformatics Network in establishing a virtual community for bioinformaticians and other bioinformatics-oriented researchers.
- User support: Australian users clearly need help in exploiting the tools and data of bioinformatics. The scale and complexity of today’s data resources and services makes it difficult or impossible to copy them all to local systems. While recent IT advances greatly facilitate remote access to tools and services, users are still more comfortable when their bioinformatics needs are satisfied within their own group. A major role of BRAEMBL will be to support users through the inevitable trend to remote access to tools.