🔥 Leaked Details on GPT-4 🔍

Hold on to your seats, folks! Leaked information about GPT-4 has surfaced, and it's mind-blowing! Get ready to dive into the exciting world of advanced AI models. 🚀

🔹 𝗣𝗮𝗿𝗮𝗺𝗲𝘁𝗲𝗿𝘀 𝗰𝗼𝘂𝗻𝘁: GPT-4 boasts an impressive size, with over 10 times the parameters of GPT-3. Rumour has it that it's equipped with approximately 1.8 trillion parameters spread across 120 layers. That's massive!

🔹 𝗠𝗶𝘅𝘁𝘂𝗿𝗲 𝗼𝗳 𝗘𝘅𝗽𝗲𝗿𝘁𝘀 (𝗠𝗼𝗘): OpenAI has cleverly implemented a mixture of experts model to keep costs reasonable. GPT-4 utilizes 16 experts within the system, each comprising around 111 billion parameters for MLP. Two experts are assigned per pass during forward passes, ensuring efficient processing.

🔹 𝗠𝗼𝗘 𝗥𝗼𝘂𝘁𝗶𝗻𝗴: While advanced routing algorithms are often discussed in the literature, OpenAI's approach to MoE routing for the current GPT-4 model is supposedly relatively simple but effective.

🔹 𝗦𝗵𝗮𝗿𝗲𝗱 𝗣𝗮𝗿𝗮𝗺𝗲𝘁𝗲𝗿𝘀 𝗳𝗼𝗿 𝗔𝘁𝘁𝗲𝗻𝘁𝗶𝗼𝗻: GPT-4 benefits from approximately 55 billion shared parameters for attention, contributing to its powerful capabilities.

🔹 𝗜𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗘𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝗰𝘆: Incredibly, each forward pass for GPT-4's inference stage only utilizes around 280 billion parameters and 560 TFLOPs. This is in contrast to the 1.8 trillion parameters and 3,700 TFLOPs required by a purely dense model. OpenAI has found an efficient way to deliver exceptional performance.

🔹𝗧𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝗗𝗮𝘁𝗮𝘀𝗲𝘁:GPT-4 has been trained on an extensive dataset of approximately 13 trillion tokens. It's important to note that these tokens include epochs and are not unique.

🔹 𝗧𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝗖𝗼𝘀𝘁𝘀: OpenAI's training FLOPS for GPT-4 amount to a staggering ~2.15e25. This training was conducted over 90 to 100 days using around 25,000 A100s, resulting in an estimated cost of $63 million for this run alone.

🔹 𝗧𝗿𝗮𝗱𝗲𝗼𝗳𝗳𝘀 𝘄𝗶𝘁𝗵 𝗠𝗼𝗘: OpenAI made several tradeoffs in the implementation of the Mixture of Experts. While research has shown that using 64 to 128 experts can yield better loss, OpenAI opted for 16 experts to ensure generalizability and convergence. It's a delicate balancing act.

🔹 𝗜𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗖𝗼𝘀𝘁: Inference with GPT-4 comes at a higher cost compared to the 175 billion parameter Davinci model. An estimate suggests around $0.0049 per 1,000 tokens for 128 A100s and $0.0021 per 1,000 tokens for 128 H100s. These costs assume high utilization and substantial batch sizes.

These are just a few highlights from the leaked details on GPT-4. The advancements in parameters, training, and capabilities are truly astonishing. As we eagerly await official announcements, the future of AI continues to excite and inspire us all.

What are your thoughts on this let me know in the comment...!!!