Backing-up to AWS: Basics of Storage Gateways Types

AWS ‘s Storage Gateway solutions are designed to be used as a backup destinations for your infrastructure. There are 3 types of Storage Gateway solutions offered by AWS:  File, Volume, and Tape Gateway.

Overall process is, you either deploy a local on-premises VM ( Hyper-V/Vmware VM), or a cloud based one which is in turn of course runs on AWS EC2 instance. You need to add an additional virtual disk to the  Storage Gateway,  to cache the data before it uploads it. The disk size has to be a minimum of 150GB, and you can add several drives for a total of 16 TB in size across all drives. You can’t allocate the drive with 150GB to begin with, and then increase its size down the road, you will have to add a new disk, if you want to increase the cache size.

There is an additional requirement for Volume and Tape Storage gateways; you will need to have an “Upload Buffer” drive(s) along with caching drives.  Upload buffer drive has to be a minimum of  150GB and a maximum of 2TB in size.
As name suggests, Upload buffer’s purpose is straight forward; backup data from the cache drives are transferred to the  upload buffer drive, and afterwards it gets copied to AWS’s storage, then buffer gets re-filled from the cache drive with more data, and so on.
Cache drives purposes is on the other hand is twofold;  besides pumping more data into Upload Buffer, it keeps the  cache of your most recent backup data, depending on the cache drive size. It will check the cache drive to see if the data is still available on the cache drive, if that is the case,  then  you don’t have to  pull data down from AWS storage, and of course not incurring  data transfer (charged per GB of data retrieved) charges from  AWS.

Data Storage:Compression, de-duplication or deltas ?

File based Storage gateway (NFS) doesn’t make use of any compression or de-duplication mechanisms. But as per FAQ ” uses multipart uploads and copy put, so only changed data is uploaded to S3 which can reduce data transfer”. Basically, it will compare your current file with the one that was already uploaded, and upload only changed bits, which is still good, and should reduce the amount of traffic that traverses your network, and amount of data stored.
Volume based Storage (SAN snapshots) gateway compresses all the data, prior to uploading it to AWS. This potentially should reduce your  data transfer and storage charges.
VTL (Virtual Tape Library) based gateway doesn’t de-duplicate  or use compression for data storage.

Cache Drive size

Amazon suggests cache drive size equal of 20% of your backup data.  It might be wise to move the cache to a higher capacity and low cost storage solution , and  increase the cache size to 30-35%, If you recover data quite often.  Local cache size doesn’t mean you will be paying less for the AWS storage.  As previously mentioned, the larger the cache means the less time you will need to wait for the  data (recent backup) to be pulled down from AWS, and of course this should incur less data transfer charges overall.    Let’s not forget that it cost more to download the data from AWS, than it cost to store or upload.
Make sure to setup the CloudWatch to monitor the Storage Gateway  to identify necessary metrics. Over time this should help with narrowing down the size for  cache drive size, as well as how much cache you use locally vs download from AWS storage while restoring  new/old data.

Fees and Pricing

Storage gateway has a per Gb of data transfer price associated with it. But, the price per/Gb turns into a monthly flat $125 fee if you upload more than $125 worth of data for that month.

Then there are various kinds of fees based on the Storage Gteway type; Tape, File, it Volume. Fees are for  data storage, type of S3 storage, number of requests made, location (region) where the data stored,  data transfer (download), and archived data retrieval  (if data is archived, which cost much less than regular storage) fees.

Accessing the Storage Gateway

There are couple of things you need to keep in mind, if you are looking to manage the SGs remotely.  Storage Gateway will not be visible /accessible in your AWS Storage Gateway panel if you try to view them outside of your network.  To be more precise, your Storage Gateway it will not show up in your AWS Storage Gateway dashboard, unless you accessing the AWS dashboard from within your company network, and network you are in can access the SG/Backup Network. This might happen if the network with servers is heavily firewalled, and isolated  from the user side of the network.

This might happen if the network with servers is heavily firewalled, and isolated  from the user side of the network.

 The Storage Gateway must be able to access the several AWS endpoints for it to function properly. These are:
  • anon-cp.storagegateway.region.amazonaws.com:443
  • client-cp.storagegateway.region.amazonaws.com:443
  • proxy-app.storagegateway.region.amazonaws.com:443
  • dp-1.storagegateway.region.amazonaws.com:443
  • storagegateway.region.amazonaws.com:443

As well as the following endpoint on the CloudFront, it contains the list of regions and required endpoints for Storage Gateways.

 

 Region indicates your gateway region endpoint. If your Storage Gateway  deployed in US West Oregon region, then the endpoint will look like the following: storagegateway.us-west-2.amazonaws.com:443.

You could either allow all AWS regions or only the ones your SG needs an access to,  depending in your security requirements.

Monitoring the Gateway

You can monitor each gateway for number of various metrics using CloudWatch Metrics. You will need to identify GatewayID and GatewayName before being able to do so.  As a minimum you should monitor  how much data is used from local cache to restore new/old data, Cache and Buffer drives usage, data transferred, queue writes, working storage and Upload buffer free/used .
You should keep an eye on  the monitoring screen, at least first half a dozen backups or so. This will help you to identify any bottlenecks during your backups.  Bottlenecks could be on the cache/buffer drives; disk size is not large enough, or  a network related; not enough or a throttled bandwidth.

Would I use AWS to backup my data to it? It depends.  If you remember there are 3 types of Storage Gateways; File, Volume, and Tape.  You could even use the Volume  SG to send your hourly/daily/weekly SAN snapshots to AWS.  AWS is capable of delivering  any amount of storage you might need. As long you are solvent to pay them for it. But I doubt I would use it for my daily backups or SAN based snapshots. It might end up costing me more in time that it  takes to retrieve the data and in Uncle Sam’s currency  ( $$$) than it’s worth.

Given all that, I do see myself using File or VTL Storage Gateways  for monthly, off-site backups. Just make sure you have enough $$$ to pay for all those TBs of data.  Just 100 TB of archived data on VTL ( the most cheapest solution) will cost you about $525 per month, that is $400 for archives, and $125 monthly Storage Gateway fee.

 

References:

http://docs.aws.amazon.com/storagegateway/latest/userguide/AWSStorageGatewayMetricsList-common.html

http://docs.aws.amazon.com/storagegateway/latest/userguide/Main_monitoring-gateways-common.html#UsingCloudWatchConsole-common

https://aws.amazon.com/getting-started/projects/replace-tape-with-cloud/services-costs/

https://aws.amazon.com/storagegateway/faqs/

http://docs.aws.amazon.com/storagegateway/latest/userguide/Requirements.html#requirements-host

http://docs.aws.amazon.com/storagegateway/latest/userguide/Main_TapesIssues-vtl.html#creating-recovery-tape-vtl

http://docs.aws.amazon.com/storagegateway/latest/userguide/StorageGatewayConcepts.html#storage-gateway-vtl-concepts