The Storage Resource Management (SRM) technology was initiated by the Scientific Data Management Group at Lawrence Berkeley National Laboratory (LBNL) and developed in response to the growing needs of managing large datasets on a variety of storage systems.
Dynamic storage management is essential to ensure:
prevention of data loss,decrease of error rates of data replication, anddecrease of the analysis time by ensuring that analysis tasks have the storage space to run to completion.Storage Resource Managers (SRMs) address issues by coordinating storage allocation, streaming the data between sites, and enforcing secure interfaces to the storage systems (i.e. dealing with special security requirements of each storage system at its home institution.) In a production environment, using SRMs has reduced error rates of large-scale replication from 1% to 0.02% in the STAR project. Furthermore, SRMs can prevent job failures. When running jobs on clusters, some of the local disks get filled before the job finishes resulting in loss of productivity, and therefore a delay in analysis. This occurs because space was not dynamically allocated and previous unneeded files were not removed. While there are tools for dynamically allocating compute and network resources, SRMs are the only tool available for providing dynamic space reservation, guaranteeing secure file availability with lifetime support, and automatic garbage collection that prevents clogging of storage systems.
The SRM specification has evolved into an international de facto standard, and many projects have committed to using this technology, especially in the HEP and HENP communities, such as the Worldwide Large Hadron Collider (LHC) Computing Grid (WLCG) that supports ATLAS and CMS.The SRM approach is to develop a uniform standard interface that allows multiple implementations by various institutions to interoperate. This approach removes the dependence on a single implementation and permits multiple groups to develop SRM systems for their specific storage resources. Hence, it became crucial to the interoperation of storage systems for such large-scale projects that have to manage and distribute massive amounts of data efficiently and securely. Without such a unifying technology, such projects cannot scale and are bound to fail. This problem will only grow over time as computing facilities move into the petascale regime.
Another important problem that SRMs address is storage clogging. Storage clogging is a critical problem for large-scale shared storage systems since the removal of files after they are used is not automated. This increases the cost of storage and slows the analysis and discovery process. SRMs help unclog temporary storage systems, by providing lifetime management of accessed files. This capability is crucial to efficient usage of storage under cost constraints.
SRMs also serve as gateways to secure data access. By limiting external access to all storage systems through a standard SRM interface, one can assure not only authenticated access, but also the enforcement of authorized access to files.The SRM technology was highly successful in SciDAC-1 and is currently used in production in several large collaborations. SRM implementations that interoperate have been developed at LBNL, FNAL, and TJNAF, as well as several sites in Europe. Furthermore, this technology increases the scientist’s productivity by eliminating the tedious and time-consuming tasks of managing storage, performing robust data movement, and dealing with security requirements at various storage sites.
In addition to leading the SRM standard development by coordinating with multiple institutions, the LBNL team has developed SRM systems to disk storage and mass storage systems, including HPSS. These SRMs have been used in several application domains, including multiple projects at the SDM center, Earth System Grid, the STAR experiment, and the Open Science Grid (OSG). As data sets continue to grow and become ever more complex, these projects depend on the continued development and support of the SRM implementations from LBNL. It is essential to capitalize on the SciDAC-1 successes and sustain current projects that depend on the SRM technology, further improving and deploying SRMs in additional projects and application domains, and continued evolution of the SRM standard. Specifically, based on past experience, we have identified important features that require further development and coordination. These include sophisticated aspects of resource monitoring that can be used for performance estimation, authorization enforcement, and accounting tracking and reporting to enforce quota usage in SRMs. Another aspect that needs further development is SRMs for multi-component storage systems. Such systems, made of a combination of multiple disk arrays, parallel file systems, a reference
Ever curious about what that abbreviation stands for? fullforms has got them all listed out for you to explore. Simply,Choose a subject/topic and get started on a self-paced learning journey in a world of fullforms.
Allow To Receive Free Coins Credit 🪙