Latest from Tv Technology in Cloud-processing

CPU/GPU Architectures for AI in the Cloud

karl@ivideoserver.tv (Karl Paulsen) — Thu, 03 Oct 2024 15:15:22 +0000

In this installment we will investigate how a cloud processing system is structured to address and react to artificial intelligence technologies; how a GPU and CPU (central processing unit) compare architecturally and application-wise. It is important to know that GPU (server) devices are generally employed more frequently than CPU servers in AI cloud environments, particularly for tasks related to machine learning and deep learning.

There is comparatively little parallel processing done in this type of CPU operation vs. GPU operation for AI processes. A CPU-based cloud’s principal functionality is for arithmetic/computational functionality for database or ordered processes such as those in human resources, pharmaceutical or financial functions. The general practice compute or storage-based cloud is composed of a great deal of servers made up primarily as CPU-type architectures with a modest number of multiple cores (somewhere around 4 to 64 cores per processor) per CPU and a lot of general purpose I/O (input/output) type interfaces connected into the cloud network. These CPU devices are designed mainly for single-thread operations.

The primary reason for this is that CPUs are not well-adapted for multirepetitive operations that require continual incremental changes in the core systems such as for deep learning, machine learning (ML), large language models (LLM) or for applications aimed at AI.

While CPUs are versatile and essential for many tasks, the GPU’s efficiency for deep learning is much better and used in most all-AI multithreaded workflows, which will be exemplified throughout this article.

Graphics Processing Units
GPUs—the more familiar term for a “Graphics Processing Unit”—are designed with thousands of cores with their primary purpose to enable many calculations simultaneously. Functionally, this makes the GPU-based compute platform ideal for the highly parallel nature of deep learning tasks, where large matrices of data need to be processed quickly.

GPUs, when not specifically employed in AI practices, are often found in graphics functions—usually in the graphics cards. The GPU is integral to modern gaming, enabling higher-quality visuals and smoother gameplay. The same goes for certain (gaming) laptops and/or tablet devices and where the applications or performance vary widely.

A GPU is composed of many smaller and more specialized cores vs. the CPU. By working together, the GPU cores deliver massive performance when a processing task can be divided up across many cores at the same time (i.e., in parallel). This functionality is typical of graphical operations such as shading or polygon processing and replication or real-time rendering (see Fig. 1, above).

Inference
One routine part of AI is its ability to make predictions, aka “interference,” which in AI means “when a trained model is used to make predictions.” The GPU is often preferred when the application requires low-latency and high throughput. Real-time imager recognition and natural language processing are the more common applications where GPUs are used in the cloud. The diagram in Fig. 2 depicts where inference fits in such an AI workflow.

Fig. 2: Workflow diagram for general AI applications (Image credit: Karl Paulsen)

In certain cases, CPUs may still be used for inference, as when power efficiency is more critical or when the models are not as complex. How systems in the cloud apply the solution is sometimes automatic and sometimes driven by the coding solution as defined by the user.

Deep Learning
A method in AI that teaches computers to process data in a way that is inspired by the human brain is called “deep learning.” Such models can recognize complex patterns in pictures, text, sounds and other data to produce accurate insights and predictions. Training deep-learning models will require processing vast amounts of data and adjusting millions (or even billions) of parameters, often in parallel. This capability for parallel processing is a key element in the architecture of GPUs which in turns allows them to handle such AI tasks much faster and more efficiently than CPUs.

Although CPUs can be employed in training models, the process is significantly slower, making them less practical for training large-scale or deep-learning models.

Companies like NVIDIA have developed GPUs specifically for AI workloads, such as their Tesla and A100 series, which are optimized for both training and inference. NVIDIA makes the A100 Tensor Core GPU, which provides up to 20 times higher performance over the prior generation and can be partitioned into seven GPU instances to dynamically adjust to shifting demands.

As mentioned earlier, general-purpose CPUs are not as specialized for AI, and their performance in these tasks often lags that of GPUs.

Ecosystem Support
The AI software ecosystem, including frameworks like TensorFlow, PyTorch and CUDA, are heavily optimized for GPU acceleration, making it easier to achieve performance gains and as such GPUs will be heavily deployed in cloud-centric implementations. In smaller-scale applications these frameworks can run on CPUs, but they don’t usually offer the same level of performance as when running on GPUs.

Cost Efficiency and Growth
Despite their higher cost per unit compared to CPUs, GPUs can be more cost-effective for AI workloads since they can process tasks more quickly, which can lead to lower overall costs, especially in large-scale AI operations. For some specific AI tasks or smaller-scale projects, CPUs may still be cost-effective, but they generally offer lower performance for the same cost found in large-scale AI applications.

As AI and ML applications grow, demand for GPU servers has also increased, particularly in AI-focused cloud services provided by companies like Google Cloud, AWS and Azure. Other vendors, such as Oracle, moderate their cloud solutions with specialization in data processing business operations.

Lightweight models routinely utilize simpler CPU-based solutions that need less computationally intensive tasks. Besides such lightweight implementations, often deployed “on-prem” or in a local datacenter, general-purpose computing may employ CPUs, which are essential for tasks that require versatility or are not easily parallelized.

A general-purpose computer is one that, given the application and required time, should be able to perform the most common computing tasks. Desktops, notebooks, smartphones and tablets are all examples of general-purpose computers. This is generally NOT how a cloud is engineered, structured or utilized.

Costs and Performance Ratios
In scenarios where budget is a concern and the workload doesn’t require the power of GPUs, CPUs can be a practical choice. However, in similar ways to Moore’s Law, the price to performance ratios for GPUs keep expanding.

Per epochai.org: “… of [some] 470 models of graphics processing units (GPUs) released between 2006 and 2021, the amount of floating-point operations/second per $ (hereafter FLOP/s per $) has doubled every ~2.5 years. So stand by … we may see the cost of GPUs vs. CPUs impacting what kind of processors are deployed into which kinds of compute devices shifting sooner rather than later.”

While CPU servers are still widely used in AI clouds for certain “compute-centric” single thread tasks, GPU servers are more commonly employed for the most demanding AI workloads, especially those involving deep learning, due to their superior parallel processing capabilities and efficiency. Given what we hear and read on almost every form of media today, AI will absolutely be impacting what we do in the future and where we do it as well.

Media Processing in the Cloud? The Big Problem On Everyone’s Mind

Ken Haren — Wed, 06 Feb 2019 20:45:30 +0000

Over the last few months, the narrative elements surrounding the needs, concerns and technology plans of media companies have started to converge into common themes. One very clear message is that media companies are struggling with the constant growth in the number of outlets they are required to deliver content to.

While, on the face of it, this growth is a good thing, a very real issue is that each new outlet requires that some portion—often a very large portion—of their back catalog needs to be repurposed in some way to make it suitable for the outlet in question. More and more of an organization’s technology investment is directed at creating a media supply chain that offers greater flexibility in throughput and efficiencies in processing. This doesn’t always directly refer to pricing—the efficiency may present itself in a reduced time to market in provisioning a new outlet for use. For many companies, though not all, the logical solution is to move the media supply chain up into “the cloud” and make use of the well-understood advantages that such architectures promise.

This migration has already taken place for many organizations, and others are in process. This is particularly true for those companies who obtain most, if not all, of their program masters from external production companies. In these scenarios, the original masters are already located “in the cloud,” so the ingest operations and the master storage are actually already in the cloud. Now they need to have the rest of the content supply chain follow the media.

After spending years looking into the relative pros and cons of several supply chain scenarios, we find that there is one simple truth: for any viable solution, you want to put the processing where the media is. If a company’s source material and delivery destination are in the cloud, then it doesn’t make any sense to process the material anywhere but in the cloud. The scalability of cloud processing, along with the idea of consumption pricing for the infrastructure and processing make this a “no-brainer” decision—especially in the mind of the CFO, for whom consumption pricing and the ability to move the expense over to the op-ex budget are like catnip.

But there is subtlety to be considered here: the cost to download the media back to an on-prem system or even over to some other media company (the so-called “egress charges”) if you need to perform some of your media processing “on the ground” can be substantial. Don’t forget that in general, we’re talking about high bandwidth mezzanine files in many cases, which prohibits the use of significant compression on the material. The cost of transfer back to the facility has to be factored into the overall solution’s cost/benefit equation. Again, to be clear, you need to have the processing where the media is.

It's NOT just about transcoding

While it is true that transcoding/packaging of the media is a substantial part of the process of prepping for a new outlet, it is far from the only issue. In any on-premise workflow, graphics are added/removed/altered, dialog is replaced, legislative advisories are added/removed/replaced, promos are inserted, branding snipes (animated graphic elements) are added, and much more. In many cases, these additions and alterations are performed by an automated “bag and tag” edit function running largely autonomously.

What media companies really need is the ability to have all of the tools that they use on-premise to create a property available to them in the cloud—including the workflow automation engine that ties all of these processes together into an efficient, cohesive whole. Transcoding alone simply does not suffice. For example, more and more outlets require IMF packages as the mechanism of delivery. These are not simply transcoded copies of the original master (which may have been made many years ago), but program segments that require significant processing in order to create the multiple components that make up an IMF deliverable. It’s not just transcoding!

It's also not just an application running on a virtual machine

The simplest approach to “cloudifying” a processing solution for many customers is simply to run up a number of virtual machines on some cloud platform and install instances of the monolithic application that they’ve been using on-prem on those machines. While it is indeed a simple way to get started in a cloud-based solution, it fundamentally disallows several of the most favorable aspects of cloud-based compute—on-the-fly scalability and pay-as-you-go pricing.

Such an approach only offers the same method of scalability as the on-prem solution: purchase of sufficient permanent licenses to cover your greatest throughput needs. That is just not tenable in any real-world scenario. An intelligent processing platform must ideally be based on a microservices architecture, so that the individual actions in any workflow can be scaled through standard cloud management means (of course, the automation engine must be “cloud aware” also for this to be achievable).

Do you still need on-premise media processing?

Many discussions on cloud-based media processing seem to be making the point that the only solution moving forward is a 100 percent cloud-based architecture. This is simply not the case. There are a number of scenarios where cloud-based processing—and particularly processing hosted by a public cloud provider—is not preferable or even feasible. There are data ownership provisos in many source agreements that prevent material from being housed on a public cloud platform. There are also some scenarios in which an on-prem platform can actually be more cost effective than a cloud approach—mainly those where the “run rate” business is well known, and there is less need for “bursts” of processing.

For many media organizations, the solution is to go for a hybrid approach—cover the run-rate business, or a significant portion of it, with on-prem processing—but with a “safety blanket” capability to process in the cloud where it makes most sense (process where the source material is), or when the company has a burst of work which cannot be fulfilled with on-prem processing within some pre-determined time constraint.

Many CFOs find this approach to be attractive too, as it makes pretty strong financial sense. Indeed, this is the approach that other industries have adopted as they’ve made the transition to cloud-based processing. The secret here, though, is to ensure that both on-prem and cloud-based workflows can offer all of the same capabilities with no exceptions and hopefully with the same interface. Once again, it’s not only about transcoding. ALL of the processing steps and options need to be available in both scenarios for this approach to be successful.

Solutions where you want them, when you need them

The choice of where the media supply chain should be located is, as previously stated, largely predicated on the location of the media to be processed. This will naturally vary from company to company—and may actually vary within an individual company based on the details of source masters and the company’s strategic and tactical goals for present and future operations. I believe that the hybrid approach is the one that will make sense to many organizations and I would encourage companies to seek solutions with this level of flexibility as they consider the challenges that lie ahead.