Multi-pass distributed data shuffle
Опубликовано: 14-09-2022
Автор(ы): Hossein Ahmadi, Mohsen VAKILIAN
Принадлежит: Google LLC
Реферат: A system and method for repartitioning data in a distributed network. The method may include executing, by one or more processors (210), a first pass of a data set from a plurality of first sources (400) to a plurality of first sinks (410), each first sink collecting data from one or more of the first sources, and executing, by the one or more processors, a second pass of the data set from a plurality of second sources (420) to a plurality of second sinks (430), each one of the plurality of first sinks corresponding to one of the plurality of second sources, and each second sink collecting data from one or more of the second sources. Executing the first and second passes causes the data set to be repartitioned such that one or more second sinks collect data that originated from two or more of the first sources.
Multi-Pass Distributed Data Shuffle
Номер патента: US20240295979A1. Автор: Mohsen VAKILIAN,Hossein Ahmadi. Владелец: Google LLC. Дата публикации: 2024-09-05.