The Pistoia Alliance, a global, not-for-profit alliance that advocates for greater collaboration in life sciences R&D, today launches the second phase of its DataFAIRy: Bioassay project, which aims to convert bioassay data into machine-readable formats that adhere to the FAIR guiding principles of Findable, Accessible, Interoperable and Reusable. The current pilot phase has been sponsored by AstraZeneca, Bristol Myers Squib, Novartis and Roche, and has successfully annotated 496 assays using a Natural Language Processing model that has been custom-built to recognize life sciences language. This second phase aims to scale the annotation process by 10 to 100-fold, and eventually promote the data model to become the industry standard.
Biological assays are analytical methods that are crucial for testing compounds being considered for new drugs, as well as monitoring environmental toxicity. There are currently more than 1.3 million biological assay protocols that exist in plain-text formats, such as published papers or vendor notes. Selection and validation of assays currently requires a labor-intensive search, taking scientists up to 12 weeks per assay. Adhering to the DataFAIRy model will reduce the time scientists spend searching and planning assay experiments. In addition, assay metadata is a popular data type for post-hoc data mining. But most of these published data and metadata are not in a form suitable for automated mining. They are partially annotated in public data banks, but the volume, depth and quality of these annotations are inadequate for addressing many current and future business questions. Yet, Gartner predicts that 85 percent of AI projects will deliver erroneous outcomes due to data issues, for example information not being machine readable. Projects such as DataFAIRy are therefore crucial to AI adoption being successful in the life sciences.
“The DataFAIRY model we have developed will not only reduce the time bench scientists spend searching for assay information. It may also allow them to skip experiments known to have failed in the past. “
“For the duration of my career, which has spanned the last thirty years, unstructured data has been a major problem for scientists. As the volume, variety and complexity of assay information continues to increase, organizations must manage their data more effectively, so that researchers can make the most out of their time and organizations can fully realize the benefits of digital transformation,” explains Dr Vladimir Makarov, Project Manager of The Pistoia Alliance AI and ML Centre of Excellence. “The DataFAIRY model we have developed will not only reduce the time bench scientists spend searching for assay information. It may also allow them to skip experiments known to have failed in the past. In turn, this will decrease the costs for companies and accelerate vital research.”
Although digitalization has made companies more aware of the importance of robust data management, the lack of industry standards is still a barrier to successful annotation and management of protocols, including assays. Adopting the FAIR principles is the first step towards enabling greater data sharing between organizations and helping scientists cope with the growing volume and complexity of data generated. Additionally, current data models are not built to recognize scientific language so a new model must be created to automate the annotation of these valuable resources. The second stage of the DataFAIRy project will further develop a model of this kind in a community-wide collaborative way.
“AI and Natural Language processing tools need to be built with scientific terminology in mind in order to be successful,” continues Dr Makarov. “The DataFAIRy model we have built will automate the annotation process so that assays are searchable and reusable, speeding up valuable research. We hope that this model will become the community standard for the publication of new assays and for the management of existing assays across vendors, regulatory agencies, and publishers, in addition to pharma and biotech.”
If you are interested in supporting the next stage of the DataFAIRy project, please contact email@example.com. Or, for assistance adopting FAIR in your organization you can download the Pistoia Alliance’s free FAIR Toolkit, which contains method tools, training and use cases, allowing organizations to learn from industry successes.
firstname.lastname@example.orgSee all the latest jobs in Science