Monday, April 14, 2008

Storage Area Network Basics! SAN Management Overview

The storage area network (SAN) centralizes enterprise storage by interconnecting storage devices and subsystems through a dedicated high-speed network fabric, such as Fibre Channel, FICON OR ESCON. A SAN can also extend beyond the local data center, connecting storage systems at remote geographic locations through WAN links like ATM or SONET. Once implemented and configured, the SAN's storage resources can be managed centrally, allowing administrators to organize, provision and allocate that storage to users or applications operating on the network across an organization. Centralization also allows administrators to monitor performance, troubleshoot problems and manage the demands of storage growth. If you're new to storage area network technology, or just need to refresh the basics, this guide covers the essential concepts of configuration, provisioning, performance and capacity management, and monitoring and troubleshooting. SAN hardware leaders include StoreVault, Snap Server 700i Series and the Nexsan Technologies Company.

RAID configuration
RAID technology serves two purposes in the disk array or server; it can be used to improve storage I/O performance through striping, and it can bring redundancy to the RAID group through mirroring and parity techniques. When implementing RAID, it's necessary to select an appropriate RAID level and specify a RAID group size (the number of disks committed to the group). For example, use RAID 1 when top performance is essential. This mirrors the contents of one disk to another but uses twice the number of disks. Other RAID levels protect disk groups by striping parity information across each disk in the group. RAID 5 uses one additional disk for parity data, while RAID 6 uses two extra disks, allowing the loss of two drives simultaneously. RAID 6 has become more prominent in recent years due to the popularity of SATA drives, which are high-capacity drives that take longer to rebuild.

Rebuild time is a serious issue when configuring RAID arrays. When a disk fails, it takes time to rebuild the failed disk's contents. During a rebuild, the RAID group is inaccessible or operates at reduced performance. But as disk capacities have burgeoned, rebuild times have become problematic. Now that SATA disks are routinely at 750 GB with 1 TB drives available, failures can take hours to rebuild. Such long rebuilds expose the RAID array to a greater potential for multiple disk failures and data loss. Look for disk arrays that offer fast rebuild times and predictive fault features that can start a rebuild to a spare disk before a complete disk failure occurs.
Another issue comes in changes to the RAID setup. Traditionally, a RAID group was a static entity once a level and group were selected. To change a RAID level or group size, the group would have to be rebuilt from scratch using the new size and level, and then reloaded from a backup. An increasing number of RAID platforms support dynamic RAID groups, allowing administrators to change levels and group sizes on the fly.


SAN provisioning
To centralize storage on a SAN while restricting access to authorized users or applications; the entire storage environment should not be accessible to every user. Administrators must carve up the storage space into segments that are only accessible to specific users. This management process is known as provisioning. For example, some amount of data center storage may be provisioned for an Oracle database that might only be accessible to a purchasing department, while other space may be apportioned for personnel records accessible to the human resources department.


The major challenge with provisioning relates to storage utilization. Once space is allocated, it cannot easily be changed. Thus, administrators typically provision ample space for an application's future use. Unfortunately, storage capacity that is provisioned for one application cannot be used by another, so space that is allocated, but unused, is basically wasted until called for by the application. This need to allocate for future expansion often leads to significant storage waste on the storage area network. One way to alleviate this problem is through thin provisioning, which essentially allows an administrator to "tell" an application that some amount of storage is available but actually commit far less drive space -- expanding that storage in later increments as the application's needs increase.

Provisioning is accomplished through the use of software tools. Tools typically accompany major storage products. For example, EMC's Celerra NAS family includes Celerra Manager software for provisioning. The issue for administrators is to seek a provisioning tool that offers heterogeneous support that covers the storage platforms currently in their environment. Otherwise, IT staff will need to use multiple provisioning tools, increasing management difficulty.
SAN performance and capacity management
SAN performance can be adversely affected when storage runs low, resulting in application performance problems and service level issues. Many IT organizations guard against this threat by overbuying and overprovisioning storage, but this frequently results in wasted capital since the additional storage investment is not necessarily utilized. Organizations are embracing performance and capacity planning practices to avoid unexpected storage costs and disruptive upgrades. The goal is to predict storage needs over time and then budget capital and labor to make regular improvements to the storage infrastructure.

In actual practice, SAN performance and capacity planning can be extremely difficult. It's virtually impossible to predict the storage needs of an application or department over time without a careful assessment of past growth and a comprehensive evaluation of future plans. In fact, many organizations forego the expense and effort of a formal process unless a mission-critical project or serious performance problem demands it. Organizations choosing to sustain an ongoing performance and capacity planning effort will need either comprehensive storage resource management (SRM)-type tools, or a capacity planning application.

SAN monitoring/troubleshooting
SAN problems can be particularly difficult to isolate -- further complicated by the complex configurations and interrelationships between the servers, switches and storage platforms that often populate a storage area network. A working SAN is a digital ecosystem unto itself and seemingly innocuous changes in one place can have a catastrophic impact on another.
The best SAN troubleshooting is typically proactive and usually involves establishing a performance baseline of critical characteristics before problems ever arise. It's then a simple matter to compare a current baseline against a "known good" baseline. This often reveals problems quickly and can identify any performance changes as the result of upgrades or reconfigurations.


Another key to effective SAN troubleshooting is comprehensive change management policies. By tracking changes and restricting change activities to authorized IT personnel, an administrator can avoid unexpected trouble and quickly correlate help requests with recent SAN changes.

0 Comments:

Post a Comment

<< Home