As I explained in a previous blog post, GPUs have accelerated Artificial Intelligence evolution massively.
However, building a GPUs server is not that easy. And failing to create an appropriate infrastructure can have consequences on training time. If you use GPUs, you should know that there are 2 ways to connect them to the motherboard to allow it to connect to the other components (network, CPU, storage device). Solution 1 is through PCI Express and solution 2 through SXM2. We will talk about SXM2 in the future. Today, we will focus on PCI Express. This is because it has a strong dependency with the choice of adjacent hardware such as PCI BUS or CPU.
This is a major element to consider when talking about deep learning as data loading phase is a waste of compute time, so bandwidth between components and GPUs is a key bottleneck in most deep learning training contexts. PCIe lanes are used to communicate between PCIe Devices or between PCIe and CPU. A lane is composed of 2 wires: one for inbound communications and one, which has double the traffic bandwidth, for outbound. Lane communications are similar to network Layer 1 communications – it’s all about transferring bits as fast as possible through electrical wires! However, the technique used for PCIe Link is a bit different as the PCIe device is composed of xN lanes. In our previous example N=16 but it could be any power of 2 from 1 to 16 (1/2/4/8/16).
The Physical Layer (PL) is responsible for negotiating the terms and conditions for receiving the raw packets (PLP for Physical Layer Packets) i.e the lane width and the frequency with the other device. You should be aware that only the smallest number of lanes of the two devices will be used. This is why choosing the appropriate CPU is so important. CPUs have a limited number of lanes that they can manage so having a nice GPU with 16 PCIe Lanes and having a CPU with 8 PCIe Bus lanes will be as efficient as throwing away half your money because it doesn’t fit in your wallet.