Investing in a national infrastructure to digitize Swedish Natural History collections

Publicerat av GBIF-Sweden --

On 9 January 2014 GBIF Sweden in collaboration with Digisam held a well-attended symposium on the planned digitization of Swedish Natural History collections. The symposium attracted interest by governmental representatives, researchers and digitization experts from Sweden, Norway, Finland, the Netherlands and Germany.

Fredrik Ronquist started the day by outlining the challenges and status of the digitization of Swedish Natural History collections. His presentation highlighted the benefits of investing in a digitization infrastructure which can support ongoing and new research projects in systematics and ecology. With new mass-digitization techniques the goal of completely digitizing Swedish Natural History collections is now attainable at high speed and acceptable costs that are outweighed by the research that is enabled by a national infrastructure providing historic biodiversity information. GBIF Sweden is currently preparing a proposal to establish a project starting a national digitization initiative for Swedish Natural History collections. (Presentation – PDF)

Rolf Källman presented the role of Digisam in the existing effort to digitize Swedish national heritage. All Swedish agencies and institutions are targeted by this national strategy for digitization, digital access and digital preservation and are tasked to develop plans for their own digitization. His presentation highlighted that the digitization of Swedish Natural History collections is an integral part of this larger national effort and can integrate with existing experiences and infrastructures. (Presentation – PDF)

As one of several presentations addressing the value of biodiversity information Victor Galaz reflected on the importance of biological information in general with regard to the governance of environmental systems. He explained that biodiversity information is not only about monitoring, early warnings and learning; it is also about the restoration of systems, stewardship of nature and innovation – it is about finding smart ways of using the data, with a wider amount of actors. New information stores such as historic biodiversity information may enable new actors and information uses previously not anticipated.

Hedvig Kjellström explored the Big Data perspective of this historic biodiversity information store. When natural history collections are digitized, automated exploration is possible. Digitization opens up possibilities to mine the natural history collections for emergent information that is not directly apparent from a human observing the data sample by sample. Information that has been hidden in the sheer number of data points can suddenly turn visible. For example, visualization of the large amount of data lets us see new information. (Presentation – PDF)

Olaf Bánki presented the role of GBIF International as an intergovernmental organisation in facilitating the publication and mediation of global biodiversity information. He highlighted the global reach of national biodiversity information stores and presented examples of the broad range of research supported by GBIF data. (Presentation – PDF)

Matthias Obst presented research building on both current and historic biodiversity information. With reference to Swedish marine fauna he concluded that with the help of historic biodiversity information it is possible to document changes in ecosystem diversity over large geographical areas and time-scales allowing a more meaningful assessment of the severity of the loss of diversity in this ecosystem.

Riitta Tegelberg presented the Finnish digitization centre Digitarium. The strategy for the mass-digitization of natural history collections in Finland is driven by a growing demand for open access data and cost-effective digitization. Digitarium aims to provide an infrastructure for processes and research in 2015, and that the most important collections should be digitized in 2025. (Presentation – PDF)

Suzanne de Jong-Kole presented Naturalis Biodiversity Center in the Netherlands. Naturalis centralizes Dutch Natural History collections and provides the digitization infrastructure for various types of collections. Her presentation focused on the “Digistreets” established for herbarium sheets which have a digitization throughput of 35000 herbarium sheets every day. (Presentation – PREZI)

Claes Persson presented the herbarium at the University of Gothenburg and discussed the benefits of large-scale digitization of botanical collections such as facilitation of taxonomical studies, discovery of new species and a higher visibility, accessibility and utilisation of herbariums. He presented herbariums participation in a successful digitization pilot initiated by GBIF Sweden and run at the Media Conversion Centre (MKC) of the Swedish National Archives in Fränsta. (Presentation – PDF)

This was followed by MKC’s perspective of this pilot project presented by Adam Rönnlund. He introduced MKC’s existing processes and infrastructure designed for high-throughput digitization of print media. Adam explained the special challenges of digitizing herbarium sheets resulting in an optimized process developed for this pilot by digitization experts at MKC. (Presentation – PDF)

Tove von Euler presented the results of a second GBIF-initiated digitization pilot run locally at the herbarium of the Museum of Evolution in Uppsala. Using a high-throughput scanner several digitization approaches were tested. In the fastest setup one image of each sheet was taken with three sheets being scanned simultaneously resulting in 220 scanned sheets per hour. She concluded that this method proved to be efficient and user-friendly with the advantage of requiring no transport or freezing costs. (Presentation – PDF)

Several of the previous presentations pointed out that the transcription of label information from e.g. herbarium sheets presents the major challenge in any digitization project; automatic text recognition techniques are a possible solution. Anders Brun provided an overview of the state of the art of OCR and HTR (handwritten text recognition). He explained that text recognition remains a major technical and research challenge, but that the existing advances may allow to at least partially source information from e.g. herbarium sheets which could either support transcription efforts or allow to explore image information beyond textual content. (Presentation – PDF)

The challenge of digitizing entomological collections was presented by Johannes Bergsten and Oleksandr Holovachov. They explored insect digitization both at a drawer and specimen level, both of which can deliver valuable information for research or collection management. While whole-drawer imaging allows very fast digitization of entomological collections the label data will often be captured incompletely, whereas individual specimen digitization will deliver complete label information but remains a time-consuming exercise. (Presentation – PDF)

Sven Milton concluded the day by presenting several robotic and optical solutions employed in other domains that could be employed to speed up digitization rates for Natural History collections. He emphasized the importance of reaching out to and collaborating with technical experts to develop tailored solutions.

The symposium presentations underlined that the technical advances in digitization make a large-scale digitization of Swedish Natural History collections a feasible exercise which is achievable not only within ambitious time-frames but also at costs that are far outweighed by the benefits of the resulting information infrastructure.

The complete symposium program with abstracts can be found here.