Informatics Collaborations

KU Informatics research and software engineering begin with our own experience and deep expertise with biological collections information, but we intentionally develop our activities as multidisciplinary, open-source partnerships with the aim of producing effective, generalizable and sustainable collections cyberinfrastructure.

Biodiversity Informatics is an integrative and multidisciplinary research area, it spans computer science, data networking, software engineering, several environmental biology disciplines, and education. To meet research challenges with computational methods, biodiversity informaticists integrate across those areas and at several levels—with theory, models, concepts, methods, and data.

The Informatics Division supports biological specimen, tissue, and DNA repositories with robust software for processing and publishing the data amassed in collections. We champion the open-source software model, which enables our software engineering efforts to be leveraged by research collections around the globe.

Our three largest informatics research collaborations are:

Specify Collections Consortium

The information associated with the billions of specimens of plant and animals curated in museums around the globe is the empirical foundation for research in many biological studies including: species discovery, phylogenetics, macroecology and phylogeography. Information from museum voucher specimens is also important in applied contexts, e.g., for conservation planning and for predicting the impacts of climate change on plant and animal species.

For several decades now, biological collections have been building databases containing information about the specimens, tissues and DNA samples curated in their collections. They enthusiastically continue that work to this day, in order to provide internet access to their previously sequestered, specimen information. But keeping current with information management technologies, software engineering, electronic security, and technical support is expensive, and most museums are not focused or financially-resourced to become centers of software engineering. Nor can most afford for-profit, commercial software platforms. The biodiversity collections community needed an economic model for specimen data management software where long-term costs and benefits of modern collections information systems could be shared. In 2019, we founded the Specify Collections Consortium to create that organizational and economic framework for technology support of biological collections computing.

The Specify Collections Consortium (SCC), headquartered in the KU Biodiversity Institute Informatics Division, is an international membership organization of natural history museums, federal agencies and related research repositories, organized under KU's administrative umbrella and non-profit status. Member institutions are committed to the open-source economic model for the collaborative development of software that processes data associated with biological specimens. The Consortium strives to advance the science mission of biological repositories in two ways, by efficiently handling collection data management, and by bringing their data into broader research communities and initiatives.

As of 2022, the Consortium counts 80+ collection institutions as members from 24 countries which manage over 300+ discrete collections of plants, birds, fish, insects, amphibians, reptiles, mammals, and fossil plants, invertebrates and vertebrate specimens. Member institutions range in size from networks of national museums (Natural History Museums of Denmark, SANBI supported museums, South Africa) to federal government agencies (CSIRO, Australia; NIWA, New Zealand) to university museums, and free-standing research centers in the U.S. and internationally. The Specify Consortium is governed by a Board of Members, with advisory committees, and employs six staff situated on the KU campus. The Specify Consortium is the first self-sustaining, international, institutional membership initiative within the biodiversity collections community focused on software platforms and tools for research processing of biological collections information. Visit the Specify Collections Consortium website to learn more.

The Lifemapper/BiotaPhy Project

Lifemapper

is a suite of Informatics Division research projects focused on using scalable, high-throughput computing for creating distribution models for very large species data sets with very large continental and global geographic extents. Species distribution models are projections of where a species should be present based on algorithms that use georeference data (latitude and longitude) of species localities vouchered by specimens in museums. Since its launch in 2000, the Lifemapper Project has prospered with numerous research and education collaborators co-funded by the US National Science Foundation to produce software tools, workflows, and platforms and high-school curricula about the range and diversity of terrestrial plant and animal species. Today, we are applying Lifemapper computational technologies to compute global scale patterns of biological diversity integrating evolutionary information (phylogenetics) in a three-way collaboration we call “BiotaPhy”. That engagement is a six-year campaign with colleagues at the Universities of Michigan, Florida, New Mexico, and Mississippi State.

BiotaPhy has created a ‘standing wave’ of computational capacity for community support of data-intensive research to harvest the massive stores of online, global-scale, biodiversity, ecological, and evolutionary data. Mid-scale computation platforms such as BiotaPhy function as ‘science expediters’ in two ways. First, they scale to process heterogeneous big data with high-performance computing, reducing software and skill set impediments to cross-domain data integration and computing at scale. Second, they expedite the dissemination of transformative analytical concepts and computational paradigms by making them more readily accessible, intellectually and practically, for learning, experimentation and reuse.

Responding to the maturity of the biodiversity data community, we have designed our BiotaPhy software platform to be an effective community response to the software engineering and computational scaling challenges faced by biodiversity and conservation scientists interested in multidisciplinary research and heterogeneous, big-data computing. The BiotaPhy Platform and its open source software components, democratize the biodiversity research community’s access to computational infrastructure for transformative, data-intensive research with collections data. Visit the BiotaPhy GitHub wiki website to learn more.

DataONE

The Biodiversity Institute is a founding partner of DataONE, a distributed data repository for diverse kinds of earth science data. Biodiversity Institute Research Professor David Vieglais is DataONE’s Director. DataONE’s vision is to ensure the preservation of environmental science data from atmospheric, ecological, hydrological, and oceanographic sources and to provide secure long-term access to that data for scientists, land-managers, policy makers, students, educators, and the public. Visit the DataONE website to learn more.