In this paper is presented PyTorch-Direct, a GPU-centric data access paradigm with a novel circular shift indexing optimization for GNN training to reduce training time, CPU utilization, and power consumption. #PyTorch#DeepLearning
PyTorch-Direct presents a new class of tensor called “unified tensor.” While a unified tensor resides in host memory, its elements can be accessed directly by the GPUs, as if they reside in GPU memory.
To support seamless transition of applications from the original PyTorch to PyTorch-Direct, it's presented a programming interface for using unified tensors, giving consistency with the existing PyTorch GPU tensor declaration mechanism.
"With PyTorch-Direct, the time spent for accessing irregular data structures in host memory is reduced on average by 47.1% compared to the baseline PyTorch approach."
PyTorch-Direct also can speedup end-to-end GNN training by up to 1.62x depending on GNN architecture and input graph. Furthermore, by reducing the CPU workload, PyTorch-Direct provides 12.4% to 17.5% of reduced system power consumption during GNN training.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Researchers have proposed a new class of accelerators named Self Adaptive Reconfigurable Arrays, which comprise of both a reconfigurable array and a hardware unit capable of determining an optimized configuration for the array at runtime.
Also, it's proposed a neural network called ADAPTNET which recommends an array configuration and dataflow for the current layer parameters.
An integrated custom hardware ADAPTNETX runs ADAPTNET at runtime and reconfigures the array, making the entire accelerator self sufficient.
The SARA accelerator implementation (SAGAR) is capable of providing the same mapping flexibility as a collection of 1024 4×4 arrays working as a distributed system while achieving 3.5x more power efficiency and 3.2x higher compute density than the baseline.
Here is the first list of AMD patents in 2021. Together with this list, I am preparing some articles (yes, there will be several articles) to give a due analysis to the set presented here until the end of this month. Stay tuned!
2/2
Follow the thread! 🦊
Patent: GPU cache management based on locality type detection - AMD
Finally, after so many setbacks, here's a new list of AMD patents, bringing AMD's latest developments in CPU, GPU, package and more. More details will come soon in the next articles. (3/4 - 4/4)
Follow the thread!
Patent: Activation Function Functional Block for Electronic Devices - AMD
Finally, after so many setbacks, here's a new list of AMD patents, bringing AMD's latest developments in CPU, GPU, package and more. More details will come soon in the next articles. 2/4
Follow the thread!
Patent: System and method for scheduling instructions in a multithread simd architecture with a fixed number of registers - AMD