Latest News
  • 22 December 2015

    UNSW Delivers Safe, Searchable and Shareable Research Data Archive with Mediaflux and SGI

    UNSW Australia (The University of New South Wales) has launched their Research Data Archive, an institution-wide long-term storage platform designed specifically for researchers to securely search and share data with colleagues, and to comply with research data policies and codes of practice. This platform is integrated with UNSW’s Research Data Management Plans and takes advantage of the advanced metadata management capabilities of Mediaflux and the extensible storage architecture of SGI to make storage both safe and smart.

    The Data Archive allows researchers to keep a complete and traceable copy of their data in a durable and accessible manner. Once files are uploaded, they are locked and versioned. Researchers can use the store to track the full history of their evolving projects and by integrating it with the self-service portal, it gives researchers direct control of who accesses their data. It also gives the University visibility on which research areas are generating and storing large amounts of data.

    The Data Archive is a key element of the UNSW’s long-term data service strategy, which goes beyond simply making simple storage available to providing a storage service aligned to institutional data management practice and other smart data capabilities. The move to a metadata-based store, using Mediaflux, takes care of many of the repetitive data-tagging issues faced by individual researchers and research projects; such as automatically linking project information to individual data files. Advanced metadata tools also makes searching easier to improve re-use of valuable datasets.

    “One of our goals over the next couple of years is to give researchers better tools to mine their own data and to aid data discovery across projects and disciplines” said Luc Betbeder-Matibet, Director Faculty IT Services at UNSW. “The aim is not just to provide file storage but to support research practice at all stages of the research lifecycle. Starting with Archive Data, which is a common problem for all the projects on campus, and aligning this service with our Data Management approaches is one more step we are taking towards making UNSW a great place to carry out data-intensive research work.”

    The service is available to both researchers and UNSW’s Higher Degree Research candidates. It is free to use and does not impose any quotas. It is a joint service launched by UNSW Research Division, the UNSW Library, and UNSW IT.

    “By taking a strategic view of how UNSW data is managed, it is not only building on its reputation as one of the top research-intensive universities globally,” said Jason Lohrey, chief technology officer of Arcitecta: “The University is enhancing the institution’s linkages with industry and, embedding data integrity and lifecycle management into UNSW’s research culture.”

    Using the Data Archive

    Accessing the UNSW Data Archive is simple. Researchers:

    • complete a Research Data Management Plan in which they decide who will access their data and how it will be managed;
    • login into the Data Archive service using their preferred method (a number of interfaces are supported including: Web Browser/HTML5, Mediaflux Desktop, SFTP client, or ATERM command line script); and
    • upload or download their data to folders created by their Research Data Management Plans

    Files uploaded to the Data Archive are automatically tagged with key project metadata. Researchers can also add their own tags and take advantage of the automatic metadata extraction capabilities of Mediaflux to make their data easier to find.

    How Researchers Interact with the Data Archive

    A key factor in the rollout of the UNSW Data Archive service has been the importance of providing researchers with a number of options for uploading and downloading data.

    Often, the size and frequency of data moves will determine which approach is preferred. UNSW provides detailed online guidelines on the Data Archive support website, to assist researchers.

    About the University of NSW

    UNSW Australia is one of the country's leading research and teaching universities. Established in 1949, it is ranked among the top 50 universities in the world, renowned for the quality of its graduates and its world-class research.

    UNSW is a founding member of the Group of Eight, a coalition of Australia's leading research-intensive universities, and of the prestigious international network Universitas 21. With more than 50,000 students from over 120 countries, it is one of Australia’s most cosmopolitan universities.

    The main UNSW campus is located on a 38-hectare site at Kensington, seven kilometres from the centre of Sydney. Other major campuses are UNSW Art & Design in the Sydney suburb of Paddington and UNSW Canberra at the Australia Defence Force Academy (ADFA).

    In addition to UNSW Canberra at ADFA, UNSW has eight Faculties - Art & Design; Arts & Social Sciences; Built Environment; Business School; Engineering; Law; Medicine; and Science – which offer an extensive range of undergraduate, postgraduate and research programs.

    Visit www.dataarchive.unsw.edu.au/ for more information.

    Download the UNSW case study

  • 15 December 2015

    Mediaflux® Cluster Edition Adds Scale-Out Capability to Parallelize I/O

    Arcitecta has added a significant enhancement to the Mediaflux data management platform with the release of Mediaflux Cluster Edition, a scale-out capability that enables multiple Mediaflux servers to work in parallel to accommodate high-throughput requirements for large data environments.

    The addition of clustering capabilities to Mediaflux software means that as data sets grow and throughput requirements increase over time, customers can add additional Mediaflux Cluster Nodes as needed to increase the I/O (input/output) capacity of their existing Mediaflux system.

    In this way, IT administrators can future-proof their system, increasing throughput incrementally as needed without impacting existing systems or user workflows.

    Mediaflux is being used in data environments where rapidly growing volumes of data require the ability of the system to increase throughput to keep ahead of the growth.

    With Mediaflux Cluster Edition, the throughput of a customer’s existing system can be increased by simply adding as many Mediaflux Cluster Nodes as required.

    This ability to incrementally expand an existing system protects Mediaflux customers’ investment, and enables them to proactively accommodate future requirements without disruption to users.

    Mediaflux software has long been used by environments that depend on its ability to provide global management of data across disparate data types, across otherwise incompatible storage environments, and across multiple locations.

    Leveraging the power of metadata, Mediaflux enables users to find and act upon data in ways that would otherwise be difficult or impossible in such heterogeneous environments using conventional approaches.

    The addition of Mediaflux Cluster Edition enhances this capability by adding virtually unlimited scale-out capacity to the system. Existing Mediaflux customers, for example, can expand I/O on current systems by simply adding as many Mediaflux Cluster Nodes as needed to keep ahead of evolving requirements.

    As with any Mediaflux installation, the cluster node software runs on almost any computer, small or large, and on standard operating systems. Hardware choices may therefore be tuned to the specific I/O load and customer’s preference.

    Cluster nodes may be added or removed at any time, using any mixture of hardware as required to achieve the throughput requirements. This flexibility means that customers can repurpose existing servers, or minimize the need for new servers, to dramatically increase I/O throughput with the existing storage infrastructure.

    Originally released to selected customers one year ago, Mediaflux Cluster Edition has since been refined to deliver a multi-fold increase in performance across all data types and networks. Arcitecta has deployments where throughputs of 10s of TB per hour are now possible with a relatively small cluster. When these workflows require additional throughput, more Cluster Nodes can be added at any time to keep ahead of requirements.

    Journaling and other improvements to the underlying Mediaflux XODB NoSQL database that dramatically increase transaction rates also contribute, along with Clustering, to expanding Mediaflux’s ability to manage many billions of digital assets in a live system.

    “Democratizing access to data is a key tenet of Mediaflux,” said Mr. Lohrey. “The scale-out capability of Mediaflux Cluster Edition enhances our ability to deliver on this promise, by enabling much greater flexibility and scalability to support increased customer workflows.”

  • 14 December 2015

    Meeting the demands of modern genomic studies

    For many leading genome labs getting access to more data has the capacity to improve the depth of their insights and analyses of individual and community responses to disease.

    Genomics continues to be an important element in the investigative tools used by researchers to improve our understanding of the impact of our genetic makeup on our responses to different diseases and treatment regimes.

    However, genomic labs across the world face similar challenges and these are:

    • the volumes and velocity of data created from gene sequences, and
    • the diversity of data sources researchers need to access and analyse.

    Often, as data volumes and types grow, researchers take more time manually searching archives, emails and databases to locate data they need to begin their work. Once located, researchers have to collate these data before the real work can begin. All of which takes valuable time that could be spent on analysing trends, testing hypotheses and gaining insights.

    So, leading researchers in Australia and internationally are using Mediaflux to improve the management and access to the rapidly increasing volumes of data generated from genomic studies.

    Now, once sequencing is complete, researchers use the powerful metadata extraction capabilities and workflow tools within Mediaflux to:

    • automatically extract key attributes about each study, such as the experiment number, sequencing techniques, individual phenotypes, and sample types,
    • immediately find out if the required files are online, or if they have been moved to lower cost storage, and
    • more quickly find and retrieve data they need so they can begin their real work.

    This means researchers can now hone in on the data they need in hours instead of days or even weeks. In this way, researchers are able to accelerate their efforts into devising solutions to some of the toughest biomedical problems we face today.

    The result is that researchers can now focus their attention on significant research, rather than data wrangling.