Our presentations will discuss data management concepts critical to facilitating the use of high-performance computing. We hope to see you at one of our scheduled presentations or in the Exhibit area.
Presented by Graham Beasley
High-performance computing (HPC) systems are often used to process vast amounts of data in
scientific research, engineering, and other fields. As data is generated at a greater rate and
complexity, the management of data becomes a significant challenge for HPC users. Imagine trying
to find a deceased researcher’s data files from seventeen years ago. Now, let’s say you found
the data but there are 400 data sets that are just binary files numbered sequentially and you
don’t know which of the results are of interest to you?
The importance of metadata in HPC moving forward cannot be overstated. It is a fundamental
technology that enables researchers to better understand their data, making it easier to search,
access, and analyze. Metadata can describe the structure, content, and context of data,
empowering researchers to quickly find and access the information they need without having to
search through vast amounts of data. It can also be used to ensure the quality and accuracy of
data including tracking the data's provenance and usage.
Metadata is especially useful in large-scale scientific research projects where data is shared
among multiple researchers and institutions. Evolving technologies like Digital Object
Identifiers (DOI) have been as revolutionary for digital data as the Dewey Decimal System was
for books in libraries. Metadata can help researchers understand how the data was generated,
what assumptions were made, and what processing steps were taken. It can also be used to ensure
data consistency and accuracy across different locations, allowing researchers to collaborate
more effectively, and deliver results sooner.
And metadata saves money. HPC systems often require massive amounts of storage, which can be
difficult to manage without an efficient and scalable data management solution. Metadata can be
used to organize data and make it more manageable, reducing the amount of storage required, and
free you up to choose the storage technology that best fits your workflow. This can lead to
significant cost savings for institutions, as well as improved data management and processing.
A discussion between industry experts, including:
Graham Beasley, Chief Operating Officer of the international software company Arcitecta, Louisville, CO.
Torey Battelle, Associate Director, Arizona State University Knowledge Enterprise Research Technology Office.
Greg Madden, Chief Information Officer, NCAR.
A student career panel where you will hear from HPC professionals about how they entered the HPC workforce and what they are looking for when hiring new employees. This will be a one hour session broken into presentations, with time allocated for questions.
Presented by Graham Beasley
As HPC workloads continue to grow in size and complexity, efficient data management becomes
increasingly crucial for the overall performance of the system. This presentation explores the
impact of data management on HPC workloads and how critical it is becoming for efficient HPC.
Frequently, test results cannot be duplicated, data sets get misplaced, and the proper handling
of data is compromised during storage upgrades or staff changes. The datasets for research
projects are continuously expanding, demanding efficient data and storage management.
Groundbreaking discoveries often come with their own unique data management challenges. This
includes the need to handle large volumes of data, manage data access and sharing, and ensure
data consistency and integrity. Increasingly, research is done collaboratively and often across
continents. There are often constraints on the data due to: privacy, who funded its collection;
as well as access, security and storage costs. The importance of achieving the right data
management strategy becomes paramount as the size and complexity of HPC datasets continue to
increase.
Join us as we discuss the impact of data management on HPC workflows, survey some of the current
research trends and future directions, and explore real-world use cases and best practices from
organizations optimizing their data management in support of breakthrough research. The session
will explore data management for immediate computational needs as well as alternatives for
long-term data access, management, and preservation. This is an interactive session where we
invite the audience to share best practices.
Target Audience: Everyone - whether representing a big organization or themselves. Efficient
data management is crucial for maximizing system performance ensuring data is preserved and
accessible. This presentation delves into the challenges of managing expanding research
datasets, emphasizing the need for effective data management strategies by using real-world use
cases and best practices.
Like dark matter - unseen, but everywhere.
XODB makes big data better.
Move data faster for greater discoveries with HPC.
A technical overview of strategies in data management.