The need for data management is ever increasing as the growth of data continues. Estimates of growth abound: from "data capacity is growing globally 40% year-on-year", to "data is doubling every 18 months", and "the growth of unstructured data is outpacing structured data".
One thing is certain: the growth of data is certainly outstripping the growth of budgets for information technology in general, and data management in particular.
At the same time, there are many data management segments and layers, from managing the storage itself, to managing access, security, performance and structure, to managing databases and data warehouses. The area is diverse and complex.
Often, specific applications will be utilized to bring structure to a portion of an organization's data, and the rest will remain "unstructured", and will be managed in an ad-hoc fashion in files and folders. That the unstructured data is the fastest growing market portion indicates there is a significant need to fill the gap in the classes of applications that bring structure to data.
We believe that Mediaflux stands above other data management offerings.
It is not a point solution: it is designed from the ground up to manage any type of unstructured data, as well as structured data, and relationships between structured and unstructured data.
For many decades now a fundamental method for organizing data has been to store it in discrete units called files. In turn files have been organized into directories, or folders. The volume of data, both in terms of the number of files and the size of files makes this increasingly untenable. A new paradigm is needed.
While still able to present data as files and directories, Mediaflux is a bridge between the old world of directories and files and the new world of data and metadata.
Mediaflux stores data transparently to users, but makes any part of it rapidly discoverable by leveraging the power of metadata.
Mediaflux shortens the time between data capture and decision; it is a revolutionary product that can be integrated into existing environments incrementally, with an evolutionary approach.
Mediaflux is a multi-user platform for ingesting, storing and discovering any type of data.
|Mediaflux is not just||but also|
Metadata is the Key
Metadata describes the data in any way desired. Metadata fragments, stored as encoded XML, are metaphorically attached to data, as illustrated in Figure 2.
Metadata is the key to the rapid discovery of data. Metadata can be:
- Automatically extracted as data is ingested – for example, geospatial co-ordinates or bounding boxes, images types and resolutions, and text can all be extracted by plug-in content analyzers. See Data Types for a complete list.
- Automatically generated – for example, revision histories and audit trails.
- User generated – existing metadata may be updated or new metadata added manually at any time. Examples are annotations, labels, tags, comments, and workflow-specific actions.
Figure 2 - Metadata fragments attached to data underpin rapid discovery
Metadata can conform to any standard or to your own customized schema. Metadata can be ingested while invalid with a subsequent workflow to action and correct it.
Mediaflux automates processes and workflows, from the most simple to the most complex, to efficiently and effectively manage data through its lifecycle.
Examples include (though the possibilities are endless):
- Scheduling a reminder e-mail notice for a specific time or event
- A quality assurance process, requiring people to review and sign-off changes to data before it is published
- Managing transcoding of video into several formats. Different computers might perform different types of transcoding. There may be many computers performing the transcoding – using work lists automatically provides load-balancing.
Workflow can be utilized to automate packaging, quality assurance, and analysis processes, or process transitions may be user initiated.
Two-tier Client Server Architecture
Mediaflux is deployed in a two-tier client server architecture, combining a client and one or more Mediaflux servers.
Mediaflux combines a multi-versioned metadata management and revision control system with many other services including geo-location, workflow, federation, replication, NFS and web serving into a single package that is simple to install, operate and administer.
Figure 3 - Mediaflux ecosystem
Mediaflux clients include the general purpose Mediaflux Desktop and Aterm, and special purpose and third party clients such as DaRIS, WildHealth, MACDDAP, Clinical Knowledge Manager, Clinical Viewer, and CAReHR.
A Large Feature Set
Mediaflux is feature-rich, providing a wide range of capabilities to address a wide range of problems.
- Discovery - Data can be quickly located with text-based or geospatial-based searches. The query engine supports free text search terms, with support for dynamic suggestions (with authorization enforced) for the completion of part words. Queries can execute local to the server to which the caller is connected, or can be distributed to any number of interconnected Mediaflux servers.
- Designed for large data - Mediaflux scales to billions of files and petabytes of data: large datasets may be packaged based on patterns defining which files to process or ignore, which to coalesce and which are related. See Data Workflow for more details.
- Replication - Data can be automatically shared to multiple systems. See Federation Services for more details.
- Versioning - Earlier versions are automatically preserved when data is updated.
- Traceable - A complete audit trail allows results to be traced back to source data.
- Integrated with tiered data stores - Data can be streamed to lowest cost tier.
- High performance - Parallel I/O for ingestion and replication.
- Auditing - All operations are captured in an audit trail.
- Access control - Flexible access control based on hierarchical (actor, control, subject) triplets; access control lists and fine grain control to metadata document level. Plus the addition of Secure Wallets to store passwords, private keys or data in a secure manner
- Integration - Integrate with any other accessible system.
- Low cost of ownership with bounded and known costs and commercial grade support.
Figure 4 - Mediaflux supports federated repositories, with schemas and views
Where is Mediaflux used?
Mediaflux is applicable to a wide range of markets. See Solutions for details.
Mediaflux is deployed in a two-tier client server architecture, combining a client and one or more Mediaflux servers. Mediaflux eliminates the need to develop most application independent infrastructure, allowing resources to be concentrated on application specific aspects such as business logic and user interfaces. The architecture performs significantly better than other architectures where there are network connections between the business logic and data tiers.
Mediaflux has the following environmental dependencies:
- A Java Virtual Machine (JVM) version 1.6 or later
- A host environment consisting of an operating system and storage.
There are no other dependencies. Mediaflux contains its own embedded web server, object database and services such as a scheduler.
Figure 1 - Mediaflux two-tier client server architecture
Mediaflux implements a service-oriented architecture (SOA). Everything is a service. Mediaflux can be extended by installing additional plugin services. Plugin services can be installed into a live system. The plugin services are version controlled, and access controlled using standard Mediaflux features, as the plugin services are data, like any other data managed by Mediaflux.
Those services may communicate with any external system, using technologies that are appropriate for that system. For example, communication may be via JDBC, web-services, drop folders, message queues, e-mail, etc. The entire gamut of possibilities can be catered for. Similarly, external systems can make calls directly to Mediaflux via a diverse number of protocols.
Figure 2 - Mediaflux can readily communicate with external systems
The service interface, business logic and data are client independent. They can be accessed directly from the command-line, other systems or for most users, a user interface such as the Mediaflux Desktop. Multiple concurrent interfaces can be in operation at the same time.
Authentication and Authorisation
The server can be configured to use an LDAP repository for authentication and group/role identification. If Active Directory is used, then it is configured for Kerberos. It is possible to operate the server with a mixture of local Mediaflux domains and Active Directory/LDAP domains. The server can send e-mail.
Access control is based on hierarchical (actor, control, subject) triplets, access control lists and fine grain control to the metadata document level.
Communications can be limited to encrypted communications only, from specific IP address ranges, etc. The system will automatically black list IP addresses that exceed configurable failure thresholds. White lists may also be specified. Access to services is very tightly controlled.
All metadata is managed using XML, to ensure maximum system and application interoperability. The inbuilt XSLT processor allows dynamic generation of any other text format such as HTML, text, etc.
Figure 3 - Anatomy of an Asset
- Aterm command line console application
- Arcitecta Desktop (optional).
Interfaces & Software Development Kits
Additionally, Mediaflux includes software development kits for creating thin and thick (including middleware) clients, transformations, content analysis and indexing, dynamic web-server pages and workflow processes.
The Mediaflux API supports the following interfaces:
- Java via Mediaflux's Java SDK
- Microsoft .NET (VB.NET, C# or any other CLI compliant language) via the Mediaflux .NET Client DLL
- Google Web Toolkit (GWT) client library for Web applications
- REST-style XML interface over HTTP/S
- SOAP Web Service interface which provides direct access to all 700+ Mediaflux services
- Additionally, any command-line or shell based client application (e.g. Perl, Python, Ruby) can be supported via the Mediaflux command line tool (ATERM).
Mediaflux is pure Java, with some optional native extensions for administration on Windows based platforms such as installing the server as a Windows service. Mediaflux will run on any Java 1.5+ compatible server. Mediaflux is tested and deployed on Windows 2000 / 2003 / XP / Vista / 7, Linux, Solaris, IRIX, AIX, and Mac OS/X. Mediaflux can be installed on desktop PC's to support small workgroups through to high-performance computing (HPC) systems with hundreds or thousands of CPU's for data and compute intensive applications.
The specifications of a Mediaflux server system depend largely on the number of concurrent users and transaction rates. There are Mediaflux deployments on laptops that support a few users, while others deployments run on multi-processor machines to support hundreds of users.
Minimum Server Requirements
- 512MB RAM
- Pentium III (1GHz) (or equivalent)
- Oracle JRE 1.6+.
Mediaflux uses its own embedded, high-performance native XML database for enhanced functionality, significantly higher performance than RDBs, and simpler administration and deployment. Mediaflux XODB supports typical database capabilities such as transactions, on-line backup etc. Mediaflux XODB provides even greater flexibility whilst enabling the system to scale beyond 109 assets.
Mediaflux can inter-operate with any database.
Data is stored in a content store, which can be can one or more of:
- The same database as the metadata (zero administration)
- A specified file-system - Windows UNC paths supported
- A hierarchical file-system.
The types of content stores can be easily extended.
Although network bandwidth requirements depend on the amount of data exchanged between a Mediaflux client and server, Mediaflux provides XML compression for lower bandwidth connections. For example, there are some applications that connect to a Mediaflux server via a mobile telephone network and satellite phone.
Mediaflux supports raw TCP/IP, HTTP and HTTPS. Typically, all network traffic is tunnelled via HTTP or encrypted HTTPS to allow Mediaflux applications to operate through most firewalls.
For further detailed information about the various components that comprise the Mediaflux software suite the following Downloads are available.
Two-page summary of Mediaflux data + metadata management platform.
Two-page overview of XODB, Mediaflux’s powerful binary XML object database.
Two-page overview of Mediaflux Federation and Replication, simplifying the management of distributed data.
Two-page summary of our Clinical Audit Research electronic Health Record (CAReHR)
Two-page summary of Arcitecta’s Mediaflux System Administration Services program. This is a contracted service by which Arcitecta's Mediaflux experts may be engaged to provide comprehensive and pro-active operational management of your Mediaflux system.
Mediaflux is a flexible data management platform designed to enable multiple workflows with any kind of digital content. These data may be in a single storage pool, or aggregated across different and often incompatible storage and data types. To accommodate different uses of the data, Mediaflux offers multiple interface choices and access protocols which accommodate different workflows, privileges and behavior within the same system.
This PDF features a story that appeared in Agricultural Science Journal.
It describes the work of the Taronga Conservation Society Australia, in detecting, diagnosing and responding to emerging disease from wild and feral animals, by tracking and documenting outbreaks and sharing the findings with other experts in the field. The Taronga Conservation Society Australia does this via the Australian Registry of Wildlife Health.
This story is provided with permission of Agricultural Science, the official journal of the Australian Institute of Agricultural Science and Technology.
Visit Agricultural Science Journal for more information.
Enhancing data management to help identify the biomarkers that provide an early indication of the onset of mental illness is important to improving patient outcomes and health benefits.
This infographic describes the research being undertaken by the Cooperative Research Centre for Mental Health (CRC) into the causes of mental illness.
The threat of disease emerging from wild and feral animals is the most significant and growing threat to Australian and international biosecurity.
This infographic talks about the work of the Taronga Conservation Society Australia, via the Australian Registry of Wildlife Health, to effectively detect, diagnose and respond to emerging diseases from free-ranging wildlife.
Working in partnership with RDSI and SGI, Arcitecta is delivering scalable and customised data-connected research platforms that will provide the Australian research community shared access to nationally distributed data centres, or Nodes, which already contain over 11 petabytes (11,000 terabytes) of content and are expected to grow to over 55 Petabytes funded by RDSI.
These data sets cover a broad range of specialties, from high-energy physics to the humanities, from climate change to cancer research, and much more.
This infographic describes the benefits Mediaflux brings to Australian researchers using RDSI Nodes.
Image of the poster presented by the Cooperative Research Centre for Mental Health (CRCMH) at the recent HIC 2015 conference, which covered the work being undertaken by the CRCMH in analysing data from large longitudinal cohort studies to identify the biomarkers that provide an early indication of mental illness such as Alzheimer’s Disease, Parkinson’s Disease, schizophrenia and mood disorders.
Image of the poster presented by Dr Schulz on the clinical and research services provided by The Royal Melbourne Hospital, Victorian Infectious Diseases Service to address the complex healthcare needs of patients with HIV and viral Hepatitis.
Dr Schulz described how Arcitecta’s Clinical Audit Research electronic Health Record (CAReHR) - developed in collaboration with three tertiary hospitals – has been adapted from its use in immigrant health to deliver better clinical care for people living with chronic Hepatitis and/or HIV.
University provides researchers with safe, searchable, and shareable data management platform built with Mediaflux® and SGI® InfiniteStorage
Leveraging the power of metadata to solve big data problems.
Information management is recognised as a core enabler for business and government. Studies from leading management institutes and industry analysts point to the increase in performance from companies and organisations using their information effectively, compared to those unable to derive an advantage from their enterprise knowledge.
But, as organisations grow, critical answers to key business issues are harder to get. This is because data needed to inform important decisions have to be drawn from multiple offices and locations, each with different environments and many of which are not connected to each other.
Where To Buy
Mediaflux may be purchased from Arcitecta or from our global resellers and partners.
Arcitecta contact details
The Americas +1 303.800.5999
Asia Pacific +61 3 8683 8523
Europe Middle East and Africa +61 3 8683 8523
Arcitecta resellers and partners
More information relating to SGI and Mediaflux can be found at the following locations
Copyright Arcitecta Pty Ltd 2014 – All rights reserved. Arcitecta, Mediaflux Desktop and CAReHR are trademarks of Arcitecta Pty Ltd: Mediaflux is a registered trademark of Arcitecta Pty Ltd in the U.S.A. and a trademark of Arcitecta Pty Ltd in Australia. All other trademarks are the property of their respective holders.