Unlock AI power-ups — upgrade and save 20%!
Use code STUBE20OFF during your first month after signup. Upgrade now →
By Nate Gentile
Published Loading...
N/A views
N/A likes
Get instant insights and key takeaways from this YouTube video by Nate Gentile.
Liang We Feng's Journey and High Flyer
📌 Liang We Feng started as an engineering student at Seyang University in China but pivoted to finance due to better earning potential, forming a study group to explore financial markets.
💻 He applied his engineering skills to quant trading, moving from manual trading to mathematical models and computer-driven trading systems.
💰 In 2016, after graduation, he founded High Flyer, an investment firm entirely based on automated computer trading decisions, which grew to manage $8 billion in assets and became a top-four firm in China.
DeepSeek AI Launch and Core Models
🚀 Liang leveraged High Flyer's capital to found DeepSeek AI (Hanoo Deep Seek Artificial Intelligence Basic Technology Research Corporation Limited) in July 2023, focusing on AI application in finance.
🤖 In November 2024, DeepSeek launched two Large Language Models (LLMs): DeepSeek R1 and DeepSeek V3.
📊 DeepSeek V3 demonstrated superior performance in benchmarks like MMLU (Massive Multitask Language Understanding) and DROP (Discrete Reasoning Over Paragraphs), outperforming models like Claude 3.5 and GPT-4o in certain aspects.
Cost Efficiency and Open Source Strategy
💲 DeepSeek V3's API usage cost is approximately $1.1 per million tokens, making it about 10 times cheaper than GPT-4o's standard rate of $10 per million output tokens.
🔓 A major disruption: DeepSeek released both V3 and R1 as completely Open Source (free to download and run on private infrastructure), contrasting with proprietary models like GPT and Gemini.
📈 The open-source release garnered over one million total downloads in the first week, primarily by technical users with necessary infrastructure (e.g., R1 requires about 16 Nvidia A100 GPUs, costing roughly half a million dollars to set up).
Technological Advancements in DeepSeek Models
🧠 DeepSeek's superior performance and efficiency stem from adopting the Mixture of Experts (MoE) architecture, where specialized sub-models handle specific prompts rather than running the entire model.
📉 The MoE architecture in R1 (671 billion total parameters) activates only 37 billion parameters per inference, drastically reducing computational load compared to full activation models like GPT.
⚙️ Efficiency was further boosted by implementing a mixed-precision framework using FP8, which reduces memory footprint and training time by using lower precision for many parameters while strategically maintaining higher precision where needed.
DeepSeek R1 and Reasoning Breakthrough
🤔 DeepSeek R1 excels at complex reasoning tasks (like multi-step logic puzzles) by employing an automated Reinforcement Learning (RL) process, avoiding the costly human feedback (RLHF) used by OpenAI.
🤖 R1 achieves reasoning capabilities comparable to OpenAI’s specialized reasoning model, ChatGPT O1, by iteratively refining answers based on automated feedback that scores how close the output is to a deterministic correct answer.
🛠️ While DeepSeek models are superior in industrial, logical, and scientific tasks, GPT models remain better for tasks requiring high levels of human-like creativity, pleasantness, and storytelling.
Overcoming Hardware Restrictions (The H800 Challenge)
🛑 DeepSeek achieved its training success despite US restrictions preventing the sale of high-end Nvidia H100 GPUs (used for training models like GPT) in China.
🇨🇳 DeepSeek trained its models using 2,048 Nvidia H800 GPUs (the restricted Chinese version), which have significantly lower inter-GPU communication speeds (300 GB/s vs. 600-900 GB/s) and lower memory bandwidth compared to the US versions.
💻 To combat slow communication, DeepSeek developed complex custom software utilizing low-level PTX assembly language (instead of standard CUDA) to heavily compress data transmission between GPUs, maximizing the limited bandwidth.
Key Points & Insights
➡️ DeepSeek represents a major competitive shift by offering models equal or superior to leading US models, available at a fraction of the cost and released as Open Source.
➡️ Software optimization (MoE, FP8, PTX customization) can drastically offset hardware limitations, evidenced by DeepSeek's success using restricted H800 GPUs to achieve near-H100 performance levels.
➡️ The company monetizes its free models by offering managed services via its API, knowing that many large entities will pay for the convenience rather than managing the significant infrastructure cost (around $500k setup) required to run the models locally.
📸 Video summarized with SummaryTube.com on Nov 10, 2025, 03:07 UTC
Find relevant products on Amazon related to this video
As an Amazon Associate, we earn from qualifying purchases
Full video URL: youtube.com/watch?v=RFoEDLmLKpo
Duration: 37:06
Get instant insights and key takeaways from this YouTube video by Nate Gentile.
Liang We Feng's Journey and High Flyer
📌 Liang We Feng started as an engineering student at Seyang University in China but pivoted to finance due to better earning potential, forming a study group to explore financial markets.
💻 He applied his engineering skills to quant trading, moving from manual trading to mathematical models and computer-driven trading systems.
💰 In 2016, after graduation, he founded High Flyer, an investment firm entirely based on automated computer trading decisions, which grew to manage $8 billion in assets and became a top-four firm in China.
DeepSeek AI Launch and Core Models
🚀 Liang leveraged High Flyer's capital to found DeepSeek AI (Hanoo Deep Seek Artificial Intelligence Basic Technology Research Corporation Limited) in July 2023, focusing on AI application in finance.
🤖 In November 2024, DeepSeek launched two Large Language Models (LLMs): DeepSeek R1 and DeepSeek V3.
📊 DeepSeek V3 demonstrated superior performance in benchmarks like MMLU (Massive Multitask Language Understanding) and DROP (Discrete Reasoning Over Paragraphs), outperforming models like Claude 3.5 and GPT-4o in certain aspects.
Cost Efficiency and Open Source Strategy
💲 DeepSeek V3's API usage cost is approximately $1.1 per million tokens, making it about 10 times cheaper than GPT-4o's standard rate of $10 per million output tokens.
🔓 A major disruption: DeepSeek released both V3 and R1 as completely Open Source (free to download and run on private infrastructure), contrasting with proprietary models like GPT and Gemini.
📈 The open-source release garnered over one million total downloads in the first week, primarily by technical users with necessary infrastructure (e.g., R1 requires about 16 Nvidia A100 GPUs, costing roughly half a million dollars to set up).
Technological Advancements in DeepSeek Models
🧠 DeepSeek's superior performance and efficiency stem from adopting the Mixture of Experts (MoE) architecture, where specialized sub-models handle specific prompts rather than running the entire model.
📉 The MoE architecture in R1 (671 billion total parameters) activates only 37 billion parameters per inference, drastically reducing computational load compared to full activation models like GPT.
⚙️ Efficiency was further boosted by implementing a mixed-precision framework using FP8, which reduces memory footprint and training time by using lower precision for many parameters while strategically maintaining higher precision where needed.
DeepSeek R1 and Reasoning Breakthrough
🤔 DeepSeek R1 excels at complex reasoning tasks (like multi-step logic puzzles) by employing an automated Reinforcement Learning (RL) process, avoiding the costly human feedback (RLHF) used by OpenAI.
🤖 R1 achieves reasoning capabilities comparable to OpenAI’s specialized reasoning model, ChatGPT O1, by iteratively refining answers based on automated feedback that scores how close the output is to a deterministic correct answer.
🛠️ While DeepSeek models are superior in industrial, logical, and scientific tasks, GPT models remain better for tasks requiring high levels of human-like creativity, pleasantness, and storytelling.
Overcoming Hardware Restrictions (The H800 Challenge)
🛑 DeepSeek achieved its training success despite US restrictions preventing the sale of high-end Nvidia H100 GPUs (used for training models like GPT) in China.
🇨🇳 DeepSeek trained its models using 2,048 Nvidia H800 GPUs (the restricted Chinese version), which have significantly lower inter-GPU communication speeds (300 GB/s vs. 600-900 GB/s) and lower memory bandwidth compared to the US versions.
💻 To combat slow communication, DeepSeek developed complex custom software utilizing low-level PTX assembly language (instead of standard CUDA) to heavily compress data transmission between GPUs, maximizing the limited bandwidth.
Key Points & Insights
➡️ DeepSeek represents a major competitive shift by offering models equal or superior to leading US models, available at a fraction of the cost and released as Open Source.
➡️ Software optimization (MoE, FP8, PTX customization) can drastically offset hardware limitations, evidenced by DeepSeek's success using restricted H800 GPUs to achieve near-H100 performance levels.
➡️ The company monetizes its free models by offering managed services via its API, knowing that many large entities will pay for the convenience rather than managing the significant infrastructure cost (around $500k setup) required to run the models locally.
📸 Video summarized with SummaryTube.com on Nov 10, 2025, 03:07 UTC
Find relevant products on Amazon related to this video
As an Amazon Associate, we earn from qualifying purchases

Summarize youtube video with AI directly from any YouTube video page. Save Time.
Install our free Chrome extension. Get expert level summaries with one click.