Transferring Bulk Data to the Cloud

It’s starting to happen… media companies are beginning to move some or all of their high-valued content to the public cloud. One driving point is that major public cloud vendors are gaining the respect and trust of media and other companies big and small. Of course, there are reasons to keep some content local including uncompressed video editing, critical real-time operations and others. However, some media workflows can leverage the cloud including long-term archive, high-volume processing, media asset management and consumer streaming.

Let’s consider an example of leveraging the cloud for storage. Say a media enterprise has 100 TB of bulk data that it wants to transfer to a public cloud for archive, with occasional access. There are at least three ways to realize the transfer of data.

One method is to use the public internet; a second is to use a private network; and third, load the content into physical drives and “mail” these to the cloud vendor for import. Each method is in use today and has a time/cost tradeoff.

For the first method, using a 1 Gbps (used at 80 percent capacity) upload rate, it would take about 12 days to upload the 100 TB. This is optimistic and assumes an ideal continuous upload rate not throttled by inevitable network congestion. For the case of a private “internet bypass” network with a 10 Gbps link the upload time is reduced to about 1.2 days. Private networks have low latency and low packet loss.

Fig. 1: Amazon’s 100 Petabyte Snowmobile truck

However, private transports can be costly and usually require a two-stage link. The first stage link is from the customer datacenter to a “cloud access location” (e.g. 60 Hudson St. in New York City, One Wilshire in Los Angeles). This link is normally some form of private Ethernet from vendors such as AT&T, Comcast, Verizon or Level 3 and may require a long-term contract. The second stage link is from the access location to the cloud. An example of this link type is Amazon’s Direct Connect, Microsoft Azure’s ExpressRoute and Google’s Direct Peering. Each has a different fee structure, but are mostly “pay as you go” at some level up to 10 Gbps.

Once private cloud connectivity is established it will be used for more than bulk data uploads. It is becoming the trusted method to link the datacenter with a cloud and avoid the performance and reliability issues of the internet.

A SNOWBALL’S CHANCE

The third method for bulk data transfer to a cloud is physical drive transport (flash and hard disk). The major cloud vendors all support sending actual drives with data to their cloud facility for import, however, this can be troublesome. There is a need to specify, purchase, maintain, format, package, ship and track the individual drives.

In October 2015 Amazon introduced “Snowball,” designed to simplify bulk transfer. Snowball is a portable purpose-built appliance owned by Amazon, has 80 TB of internal encrypted storage, is rugged enough to withstand a 6G jolt, tamper proof, includes a 10Gigabit Ethernet port and is weather resistant.

The Snowball model is faster, more secure and an efficient way to transport bulk data. Here are the steps to perform a transfer:

  • Use the Amazon self-service page to order the appliance(s). These are shipped to the customer facility.
  • Connect the Snowball to your network and log onto it with provided credentials.
  • Copy your data into Snowball and ship the unit(s) back to Amazon. The unit is ready to ship without any packaging. A shipping label will automatically appear on the “E Ink” display.
  • When Amazon receives the unit(s) they will load the data into your S3 storage account.

There is a usage charge of only $250 per job, plus any shipping charges (e.g. FedEx). Users have up to 10 days (starting the day after delivery) to copy data to the appliance and ship it out. There is no cost to load the data from a Snowball into S3 or Glacier storage. Of course, once the data is loaded into storage there are charges to keep it persistent.

Depending on your bulk data size, Snowball may be the right choice compared to using internet or private networks to upload. However, what if you have petabytes of data to move? In this case even using many Snowballs may not be practical. Enter the Amazon Snowmobile.

Snowmobile is a new way to move massive volumes of data to the cloud, including video libraries, image repositories or even complete data center migrations. Transferring data with Snowmobile is secure, fast and cost-effective. Users can transfer 100 PB in a few weeks compared to over 20 years using a 1 Gbps connection or using about 1,250 Snowballs. Fig. 1 shows a Snowmobile at its product release announcement in Las Vegas, November 2016. Internally it is stuffed with servers, storage drives, IP switching gear and power distribution.

This is major league data transfer. The 45-foot long Snowmobile truck is driven to your site and Amazon personnel assist in setting up the transfer. The Snowmobile website states: “A fully powered Snowmobile requires ~350 KW. Snowmobile can be connected to available utility power sources at your location if sufficient capacity is available. Otherwise, AWS can dispatch a separate generator set along with the Snowmobile if your site permits such generator use.”

So, there are many options for transferring bulk data to the cloud; from low data rate internet connections to the massive Snowmobile. If you face a data migration problem, factor these methods into your decision logic.

The adage “Compute where the data is” makes even more sense in a cloud environment. The more data stored in the cloud, the more related compute and distribution is likely to occur; it’s a beneficial synergy.

If you want to leverage the cloud for business operations, but your data is “stuck at home,” consider doing a bulk transfer. One added benefit of cloud storage is its long-term durability. The cloud vendor is responsible for managing the headache of replacing aging storage hardware. It’s not your worry.

Al Kovalick is the founder of Media Systems consulting in Silicon Valley and author of “Video Systems in an IT Environment (2nd ed).” He is a frequent speaker at industry events and a SMPTE Fellow. For a complete bio and contact information, visit www.theAVITbook.com.

Al Kovalick