What is Quant Trading?

Quant trading is when the action of buying and selling assets is performed by a computer using advanced mathematical models, rather than manually by a person.

What is the difference between DeepSeek V3's API cost and GPT-4o's?

DeepSeek V3 costs $1.1 per million output tokens, which is approximately 10 times cheaper than GPT-4o's standard rate of $10 per million tokens.

What is the Mixture of Experts (MoE) architecture used by DeepSeek?

MoE involves having several smaller, specialized models ("experts") that handle different topics. A router analyzes the prompt and directs it only to the relevant experts, using only a fraction of the total parameters for processing.

How many parameters does DeepSeek R1 have in total, and how many are activated during execution?

DeepSeek R1 has 671 billion total parameters across all experts, but only 37 billion parameters are activated during execution because only the relevant experts are selected.

How did DeepSeek train the reasoning model R1 cheaply without extensive human feedback like OpenAI uses (RLHF)?

DeepSeek used automatic reinforcement learning (RL) by selecting deterministic problems (like math or logic puzzles) and rewarding the model positively when its generated answers, especially those exhibiting chain-of-thought reasoning, moved closer to the correct, verifiable outcome.

What key technological constraint forced DeepSeek to develop novel software optimization for training?

US export restrictions limited China's access to top-tier NVIDIA H100 GPUs; DeepSeek had to use the less powerful H800 version, which has significantly slower inter-GPU communication bandwidth.

Lo que no te contaron de DEEPSEEK: La IA China

Liang We Feng's Journey and High Flyer
📌 Liang We Feng started as an engineering student at Seyang University in China but pivoted to finance due to better earning potential, forming a study group to explore financial markets.
💻 He applied his engineering skills to quant trading, moving from manual trading to mathematical models and computer-driven trading systems.
💰 In 2016, after graduation, he founded High Flyer, an investment firm entirely based on automated computer trading decisions, which grew to manage $8 billion in assets and became a top-four firm in China.

DeepSeek AI Launch and Core Models
🚀 Liang leveraged High Flyer's capital to found DeepSeek AI (Hanoo Deep Seek Artificial Intelligence Basic Technology Research Corporation Limited) in July 2023, focusing on AI application in finance.
🤖 In November 2024, DeepSeek launched two Large Language Models (LLMs): DeepSeek R1 and DeepSeek V3.
📊 DeepSeek V3 demonstrated superior performance in benchmarks like MMLU (Massive Multitask Language Understanding) and DROP (Discrete Reasoning Over Paragraphs), outperforming models like Claude 3.5 and GPT-4o in certain aspects.

Cost Efficiency and Open Source Strategy
💲 DeepSeek V3's API usage cost is approximately $1.1 per million tokens, making it about 10 times cheaper than GPT-4o's standard rate of $10 per million output tokens.
🔓 A major disruption: DeepSeek released both V3 and R1 as completely Open Source (free to download and run on private infrastructure), contrasting with proprietary models like GPT and Gemini.
📈 The open-source release garnered over one million total downloads in the first week, primarily by technical users with necessary infrastructure (e.g., R1 requires about 16 Nvidia A100 GPUs, costing roughly half a million dollars to set up).

Technological Advancements in DeepSeek Models
🧠 DeepSeek's superior performance and efficiency stem from adopting the Mixture of Experts (MoE) architecture, where specialized sub-models handle specific prompts rather than running the entire model.
📉 The MoE architecture in R1 (671 billion total parameters) activates only 37 billion parameters per inference, drastically reducing computational load compared to full activation models like GPT.
⚙️ Efficiency was further boosted by implementing a mixed-precision framework using FP8, which reduces memory footprint and training time by using lower precision for many parameters while strategically maintaining higher precision where needed.

DeepSeek R1 and Reasoning Breakthrough
🤔 DeepSeek R1 excels at complex reasoning tasks (like multi-step logic puzzles) by employing an automated Reinforcement Learning (RL) process, avoiding the costly human feedback (RLHF) used by OpenAI.
🤖 R1 achieves reasoning capabilities comparable to OpenAI’s specialized reasoning model, ChatGPT O1, by iteratively refining answers based on automated feedback that scores how close the output is to a deterministic correct answer.
🛠️ While DeepSeek models are superior in industrial, logical, and scientific tasks, GPT models remain better for tasks requiring high levels of human-like creativity, pleasantness, and storytelling.

Overcoming Hardware Restrictions (The H800 Challenge)
🛑 DeepSeek achieved its training success despite US restrictions preventing the sale of high-end Nvidia H100 GPUs (used for training models like GPT) in China.
🇨🇳 DeepSeek trained its models using 2,048 Nvidia H800 GPUs (the restricted Chinese version), which have significantly lower inter-GPU communication speeds (300 GB/s vs. 600-900 GB/s) and lower memory bandwidth compared to the US versions.
💻 To combat slow communication, DeepSeek developed complex custom software utilizing low-level PTX assembly language (instead of standard CUDA) to heavily compress data transmission between GPUs, maximizing the limited bandwidth.

Key Points & Insights
➡️ DeepSeek represents a major competitive shift by offering models equal or superior to leading US models, available at a fraction of the cost and released as Open Source.
➡️ Software optimization (MoE, FP8, PTX customization) can drastically offset hardware limitations, evidenced by DeepSeek's success using restricted H800 GPUs to achieve near-H100 performance levels.
➡️ The company monetizes its free models by offering managed services via its API, knowing that many large entities will pay for the convenience rather than managing the significant infrastructure cost (around $500k setup) required to run the models locally.

📸 Video summarized with SummaryTube.com on Nov 10, 2025, 03:07 UTC

Related Products

📜Transcript

📄Video Description

Recently Summarized Videos

💎Related Tags

Related Products

Loading Similar Videos...

Recently Summarized Videos

Get the Chrome Extension

Lo que no te contaron de DEEPSEEK: La IA China

AI Summary of "Lo que no te contaron de DEEPSEEK: La IA China"

Related Products

📜Transcript

📄Video Description

Recently Summarized Videos

💎Related Tags

AI Summary of "Lo que no te contaron de DEEPSEEK: La IA China"

Related Products

Loading Similar Videos...

Recently Summarized Videos

Get the Chrome Extension