You are on page 1of 2

Ab Initio Replicate vs Broadcast

ITtoolbox as adapted from Abinitio-L discussion group Summary: What is the difference between Replicate and Broadcast? Full Article: Disclaimer: Contents are not reviewed for correctness and are not endorsed or recommended by ITtoolbox or any vendor. Popular Q&A contents include summarized information from ITtoolbox Abinitio-L discussion unless otherwise noted.
8/8/2005 By ITtoolbox Popular Q&A Team for

1. Adapted from a response by Srinivas On Sunday, June 05, 2005 Broadcast and Replicate are similar components but generally Replicate is used to increase Component Parallelism, emitting multiple straight flows to seperate pipelines. Broadcast is used to increase data parallelism by feeding records to fan-out or all-to-all flows. 2. Adapted from a response by rajaravisankar_adari On Monday, June 06, 2005 Replicate is old component when compared to broadcast. You can use Broadcast as join component, where as Replicate you can't use as join. By Default, Replicate is Straight flow and Broadcast is fan-out or All-To-All Flow. Broadcast is used for Data Parallism whereas Replicate is used for Component Parallesim. 3. Adapted from a response by azaman On Monday, June 06, 2005 Replicate Supports component parallelism Input File -------> Replicate --------> Format ---->Output File | | | --------->Rollup-------> output File Broadcast Supports data parallelism Input File1 (MF) -----------------> JOIN -----------> Output File ^ | | Input File 2(Serial)---> Broadcast --> Input File2 is a serial file and it is being joined with a mf, input file2, without being partitioned. The compoment, Broadcast, is writing data to all partitions of Input file1, creating an implicit fan out flow. 4. Adapted from a response by Remediator On Monday, June 06, 2005 The short answer is that the Replicate copies a flow while a Broadcast multiplies it. Broadcast is a

partitioner where Replicate is a simple flow-copy mechanism. Replicate appears in over 90% of all AI graphs (across the board of all implementations worldwide) where Broadcast appears in less than 1% of all graphs. You won't see any difference in the two until you start using data-parallel, then it will go south rather quickly. Here's an experiment: Use a simple serial input file, followed by a broadcast, then a 4-way multifile output file component. If you run the graph with say, 100 records from the input file, it will create 400 records in the output file - 100 records for each flow partition encountered. If you had used a Replicate, it would have read and written 100 records.

You might also like