What does it take
to make the
Put another way,
in the context of an end
user running a media-focused
what parameters create
an environment such that the user cannot
discern if the app is running locally or in a
remote cloud? For example, using a video
editor app and doing the classic jog/shuttle
function across the timeline, can a user tell
by the “feel of the app” that the runtime
code is local or remote? If the app feels
in all aspects then the cloud is invisible
to the end user. For SaaS apps in particular
it’s good to aim for this goal; users will demand
For sure, it’s not easy to create an invisible
cloud environment. There are many aspects
of Quality of Service that determine
the user experience. In part one
of this series, the concepts of cloud
access across the Internet transport chain
QoS were examined. In this column, compute,
storage, and reliability are examined;
networking is out of scope for this discussion.
Fig. 1 outlines the four domains of focus
and loosely models the Infrastructure-as-a Service (IaaS) cloud architecture. Let’s
look each at each of these four areas:
CLOUD COMPUTE QOS
In cloud-speak, a CPU “Instance” provides
a “predictable” amount of dedicated
compute capacity. It may be charged per
hour consumed or by some other means.
The Instance is comprised of one or more
virtual CPU core(s), some DRAM, some local
hard disk memory, and I/O. They vary from
small to massive with ~100x the power
compared to a small Instance. Machine Images
(MIs) are preconfigured with an operating
system; Linux, Windows, other. These
run on CPU Instances.
Instance QoS determines the expected
running performance. The two most valued
metrics are benchmark results and jitter
(execution time uncertainty). Here is an example
of compute QoS for
Benchmark A: 10 GB
file transcode from MPEG2
to H.264. The average execution
time running on Instance
Type X is 12 minutes.
Repeated tests running
Benchmark A over different
days and hours yields a minimum
run time of 6 minutes
and a maximum run time of
18 minutes, with a 95 percent
confidence level. The time difference
is the jitter uncertainty of the test.
Ideally, the jitter would be nearly zero
and it would be so if the instance was totally
dedicated to your execution needs. In
reality, the instance is sharing the CPU hardware
with other users in a virtualized environment.
Plus, the usage loading will change
day to day and hour to hour. So the fastest
execution time may fall during a weekend
night and the slowest time likely on Monday
morning, for example.
You are probably thinking, “This is a
problem.” True, it’s not ideal but rather a
tradeoff in a shared hardware environment.
Planning for worst case times (18, not 6 minuntes),
or adding more horsepower (faster
instances) or paying for dedicated resources
are practical ways to set acceptable limits
on the execution uncertainty. Users are suggested
to run their own benchmarks since
cloud vendors rarely provide them. Without
paying for dedicated resources, strict determinism
is not easy to achieve in a cloud environment.
CLOUD STORAGE QOS
Cloud storage is a bit of an elephant.
Why? There are so many different types and
uses of cloud storage and each has associated
QoS metrics. First, there is the storage
coupled to each Instance. Let’s skip this type
since it is Instance bound. Other types are
persistent (lives after an Instance is closed)
• Block Storage. Provides block level storage
volumes for use with Instances. The
Instance can interact with the volume
just as it would with a local drive, formatting
it with a file system and installing applications.
• Object Storage. R/W “unlimited” data
objects into repository. Each stored object
is retrieved via a unique key value.
These stores can be accessed for general
file storage. Apps such as DropBox and
GoogleDrive use cloud-based object storage.
OpenStack’s Swift Store is an example.
This type makes a good extension for
second tier media facility storage.
• Archive Storage. Typically, for infrequently
accessed data with retrieval times longer
than for Object Storage. Higher access
latency is traded off for less expensive
storage. Amazon’s Glacier product is an
example. This type is appropriate for offsite
archiving with excellent durability.
The QoS metrics of interest are (1) R/W
transfer rate, (2) latency for a single transaction
and (3) I/O operations per second
(IOPS). Each storage type will have different
metrics. Obtaining QoS metrics from a cloud
provider may be a challenge. Some vendors
spec IOPS and transfer rates for select levels
of storage while being silent on other
As with CPU metrics, benchmarking is a
solid method to get a handle on storage QoS.
Try then buy. Cloud providers will hard-spec
some storage products but not all. So, users
CLOUD RELIABILITY QOS
Compute and networking system reliability
is often measured by uptime percentage.
For example, a compute Instance’s uptime
may be quoted as 99.95 percent. This means
the Instance may be down for about 4.5
hours per year. This is on par, or better, with
the reliability of an in-house data center.
Storage reliability is measured by availability
percentage and durability. Availability is similar
to uptime. Durability is the integrity of
the stored data. At least one cloud vendor offers
object storage with 99.999999999 percent
durability. It’s very difficult to achieve
this level of durability using in-house means.
Beyond native reliability, good design
practices may be applied to build systems
with virtually ultra-reliability and durability.
Methods including load balancing, mirrored
components, dual linking, error correction,
and geographic diversity are in the reliability
arsenal of good design practices in the
Understanding QoS is vital for successful
use of the cloud. Maintaining target QoS
metrics will result in services and apps that
meet your business needs.
Al Kovalick is the founder of Media Systems
consulting in Silicon Valley. He is the
author of “Video Systems in an IT Environment
(2nd ed).” He is a frequent speaker at
industry events and a SMPTE Fellow. You
can reach Al via TV Technology.