ARCITECTA @ RMACC HPC SYMPOSIUM 2022

Wednesday August 3rd — CU Boulder Wolf Law School, Boulder, CO, USA

Arcitecta will join the Rocky Mountain Advanced Computing Consortium (RMACC) at the 2022 RMACC HPC Symposium to discuss data management concepts critical to facilitating the widespread practical use of high-performance computing. We hope to see you at one or more of our five presentations on the schedule.

Data Orchestration Platform

10:50am-11:35am MT

Room 205, CU Boulder Wolf Law School

Presented by Arcitecta’s Graham Beasley and Craig Vanderborgh

Learn more

Organizations are often focused on their high performance compute (HPC) cluster(s) in their quest to reduce time to insights and to stay ahead of the competition. Yet this singular focus on HPC cluster infrastructure can result in missing the bigger picture – the need for data processing pipelines in which the HPC processing is central, but only a step in a larger holistic scheme for handling the data.

A quantum of HPC processing is typically the venerable batch-oriented “job”, yet to execute an HPC job the required data for its execution is often copied on an ad hoc basis from either an archive or other persistent storage to the scratch storage on the HPC cluster. Then, when the job is complete, the results must be copied out from scratch, back to an archive or other persistent storage, also usually on an ad hoc basis.

This data processing approach was first employed with the earliest “stored program” computers such as the IBM 704 and remains little-changed. Today, organizations can create automated data processing pipelines by employing a contemporary data management solution. We’ll examine how this alternative approach can dramatically increase HPC cluster throughput, reduce human error, minimize storage costs, and even increase the reproducibility of numerical experiments.

Careers Opportunities and Marketing Yourself to Employers

11:40am-12:25pm MT

Wittemeyer Courtroom, CU Boulder Wolf Law School

Arcitecta’s Graham Beasley joins a panel discussion

Learn more

Join us for a panel discussion on what career opportunities exist for students as they enter the workforce. There are many different paths one can take, including large HPC companies, national labs, or even being your own boss! Our panel of experts will discuss what the typical day at their job looks like and what skills they need to be successful.

The panelists will also discuss how to make a great first impression at a job interview and how to stand out when you are applying for a job. It’s not just about the skills you offer, it’s how to put your best foot forward to stand out as “the one.” Hear from marketing and industry professionals on what employers want to know and how to prepare for the questions employers are going to ask.

Optimizing Data Storage Strategies
for Rapidly Scaling Hybrid Research Environments

11:40am-12:25pm MT

Room 204, CU Boulder Wolf Law School

Presented by Spectra Logic's Steve Paulson and Pete Halpern.
Arcitecta's Craig Vanderborgh joins the panel discussion.

Learn more

As research computing becomes increasingly distributed and data intensive – spanning multiple data centers, edge locations and public cloud providers with increasing velocity of data growth – multi-cloud and multi-site research organizations face unprecedented challenges in their storage management strategies, grappling with the balance of constraints of legacy technologies with the need to continue spurring innovation in research. Enabling effective collaboration across distributed locations requires unified data access, with storage capacity and performance that scale to meet a multitude of disparate requirements. The right strategies require elasticity, and a cost of scale that is manageable in economically unstable times.

The analytics revolution has brought two things into focus as we consider this challenge: the value of data will be best realized in hybrid and multi-cloud environments, and centralizing data access is a key factor to accelerating discovery. Join us as we explore real-world use cases that exemplify how organizations are optimizing their data storage, management and growth strategies for more effective and scalable data collection, governance and classification in support of breakthrough research.

Does HPC need data management?

2:45pm-3:30pm MT

Room 205, CU Boulder Wolf Law School

Presented by Arcitecta’s Graham Beasley, Craig Vanderborgh,
and Spectra Logic’s Steve Paulson

Learn more

Most HPC discussions are about the compute cluster, job scheduling, optimization techniques, operating system, and even visualization of the results. What is often overlooked is that before any of this magic can happen, one needs to get the right input datasets in the right format at the right place at the right time!

Ever-growing datasets are a critical component to HPC organizations, enabling groundbreaking scientific, medical, and technological discoveries, but they come with their own unique set of challenges. As this data supports essential research initiatives, it is imperative that an emphasis be placed on effective data and storage management to accelerate outcomes, while preserving data for future use and examination. Common data management woes include siloed data, where data becomes unfindable and inaccessible; unprotected data that can be jeopardized by ransomware; overburdened environments that lose performance as they scale to petabytes or exabytes; and primary storage that becomes overloaded with inactive data sets making storage cost prohibitive.

As storage technologies continue to grow in capacity, vast amounts of data are now readily accessible for sharing and collaboration. Finding the right data and having it in the correct location drives HPC workload performance. Today’s data sets are easily searchable with embedded metadata and additional tagging but having this at your fingertips is not always available. Keeping scratch space cleared and active data hot is the goal of any HPC storage administrator but getting this to happen day in and day out is not always easy. This talk will address real-world use cases and invite audience participation for sharing best practices.

Metadata, the Hidden Champion for Advanced Data Management

3:45pm-4:30pm MT

Room 205, CU Boulder Wolf Law School

Presented by Arcitecta’s Graham Beasley

Learn more

The concept of file metadata has been around longer than most of today’s commonly used operating systems. In the HPC domain today many software solutions embody some form of metadata management. These capabilities are usually for system metadata, e.g., the time a file is modified, accessed, or created. This approach has served the community well for over 40 years.

Yet during this time, typical file and data set sizes have increased dramatically. Many industries went through a digital revolution about 20 years ago as analog data producers began to produce digital data instead. Before that time an X-ray, MRI, or even a high-resolution photograph was probably not digital. After the digital revolution, instrument resolutions and the corresponding size of machine-generated data sets have increased much faster than the price of storage has decreased. This has affected nearly every vertical market and industry that relies upon data, and resolutions delivered from satellites, medical instruments, and even consumer products such as cameras continue to increase.

Metadata is now commonplace, and thanks to technologies embedded in devices like your smartphone’s camera nearly everyone is familiar with at least some of the enabling capabilities of metadata.

Despite these system metadata-driven advances, most users and organizations have yet to discover the benefits of user-defined metadata. This metadata can be used by organizations to help manage the lifecycle of data. It can help organizations achieve and maintain legal and regulatory compliance. Perhaps most importantly it can help organizations reduce storage costs, enabling smarter and better investments in science.

Registration closed

ARCITECTA @ RMACC HPC SYMPOSIUM 2022

Data Orchestration Platform

Careers Opportunities and Marketing Yourself to Employers

Optimizing Data Storage Strategies
for Rapidly Scaling Hybrid Research Environments

Does HPC need data management?

Metadata, the Hidden Champion for Advanced Data Management

resources

ARCITECTA TO PRESENT AT THE RMACC HPC SYMPOSIUM 2022

An overview of Mediaflux

Managing Data for HPC with Mediaflux

Storage Virtualisation with Mediaflux

Metadata technical brief

Next generation research data management

You’re a data company, we’re data people. Let's start a conversation.

ARCITECTA @ RMACC HPC SYMPOSIUM 2022

Data Orchestration Platform

Careers Opportunities and Marketing Yourself to Employers

Optimizing Data Storage Strategies for Rapidly Scaling Hybrid Research Environments

Does HPC need data management?

Metadata, the Hidden Champion for Advanced Data Management

resources

ARCITECTA TO PRESENT AT THE RMACC HPC SYMPOSIUM 2022

An overview of Mediaflux

Managing Data for HPC with Mediaflux

Storage Virtualisation with Mediaflux

Metadata technical brief

Next generation research data management

You’re a data company, we’re data people. Let's start a conversation.

Optimizing Data Storage Strategies
for Rapidly Scaling Hybrid Research Environments