Data-driven science is more than just a buzzword, it’s big business. The sheer volume of data produced everyday is transforming the way we do business across the globe. It will be those that can manage the data into meaningful, actionable chunks in real-time that will be able to exploit these petascale data sets for predictive trends to find new ways to solve problems.

What is big data?

Big data refers to the large volume of structured and unstructured data that is typically generated by science instrumentation and cameras; medical equipment, genomic sequences, and multi-dimensional capture devices. You might recall the image of a Black Hole that was recently captured by the Event Horizon Telescope. That image data was lots of files spread across 5 petabytes stored on about half a ton of hard drives. The equivalent of about 1.39 billion copies of David Bowie’s “Space Oddity”. That’s big data.

At the same time, as instruments advance, you also might hear someone talking about big data in reference to the size of an individual file. For example, a file that came off of a genome sequencer that is a terabyte in size. This is also big data.

Source: Event Horizon Telescope Collaboration et al.

Big data for the common good

Science is undergoing a renaissance in the “data-age”. Data-driven science is proving its power to detect undiscovered correlations, find new solutions to health and disease management, environmental resilience and more.

Reports suggest that some researchers now account for 5TB each with some of the largest universities having more than 1,000 such people. To put this into perspective, the 2009 3D film Avatar generated about 1 petabyte of data (1000 terabytes), at the time this was more data than any other movie in history.

Unlike the movie industry, however, research funding is increasingly contingent on long-term data storage. To move into a petabyte institution requires cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation.

How can we help take your university or research institution into the data age?

We are the experts in delivering expertise and systems for life science computing at petabyte scale. Our Mediaflux platform delivers an automated research data management system to curate, ingest and tag scientific data with metadata to allow greater findability, improved research and re-using of data, and improved accessibility whether the data is held in primary storage or deep archive.

Case study

Arcitecta is working with one of the world’s largest pharmaceutical companies, Novartis NIBR, on one of the biggest ever storage virtualisation projects. This transformational change will help researchers spend less time managing data and more time on science, as well as a providing a significant reduction in overall storage costs to NIBR now and into the future. To prove their credentials, Arcitecta demonstrated the capability of their software Mediaflux with a small subset 2.3 billion files and ten petabytes of data. This provided Novartis with insights never before available, and a pathway to saving tens of millions of dollars in storage costs.

Want to know more about Mediaflux?