摘要:TMTPOST -- Nvidia Corporation shares closed around 16.9% lower on Monday for their worst day since March 16, 2020. The day also sa
TMTPOST -- Nvidia Corporation shares closed around 16.9% lower on Monday for their worst day since March 16, 2020. The day also saw Nvidia wiped out nearly $600 billion in market value as shares plunged, shattering the biggest daily drop record in market capitalization for the U.S. listed companies set by the artificial intelligence (AI) chip giant in last September. U.S. AI-related stocks suffered a bloodbath on a shock from Chinese AI startup DeepSeek.
Credit:Medium
The mobile application for DeepSeek, jumped to the No.1 spot in app stores this weekend, dethroning OpenAI’s ChatGPT as the most downloaded free app in U.S. on Apple’s App Store. On iOS, DeepSeek became the No. 1 free app in the U.S. App Store and 51 other countries on Monday, according to mobile app analytics firm Appfigures. The soar of DeepSeek in app store follows its AI models went viral at U.S.social media platform X last weekend.
What made Silly Valley astonished is that it took just $5.58 million for DeepSeek to train its V3 large language model (LLM). The startup claimed it used 2,048 Nvidia H800 chips, a downgraded version of Nvidia’s H100 chips designed to comply with U.S. export restrictions. DeepSeek only spent 2.6 million H800 hours for a model much better than Meta’s, while Meta could have trained DeepSeek-V3 at least 15 times using the company’s budget of the Llama 3 model family.
DeepSeek earlier this month released open-source DeepSeek-R1, its reasoning models that it claims performance comparable to leading offerings like OpenAI’s o1 at a fraction of the cost. Several third-party tests have found that DeepSeek actually outperforms OpenAI's latest model. R1 contains 671 billion parameters and its “distilled” versions range in size from 1.5 billion parameters to 70 billion parameters. The full R1 can be available through DeepSeek’s API at prices 90%-95% cheaper than o1.
DeepSeek’s reasoning model is “one of the most amazing and impressive breakthroughs I’ve ever seen—and as open source, a profound gift to the world,” investor Marc Andreessen said Friday on X. Andreessen, who runs influential Silicon Valley venture capital firm Andreessen Horowitz, described R1 as "AI's Sputnik moment," a reference to when the Soviet Union launched the first artificial Earth satellite in 1957, kicking off the Space Race.
Developers on Hugging Face have created more than 500 derivative models of R1 that have racked up 2.5 million downloads in total--five times the number of downloads the official R1 has gotten, Clem Delangue, the CEO of the platform, said in a post on X.
An Nvidia spokesperson called DeepSeek “an excellent AI advancement”. “DeepSeek’s work illustrates how new models can be created using that technique, leveraging widely-available models and compute that is fully export control compliant,” the person told CNBC on Monday.
DeepSeek’s inference requires significant numbers of Nvidia graphics processing units (GPUs) and high-performance networking, Reuters quoted Nvidia comments. The company also stressed that DeepSeek used approved GPU versions designed for the Chinese market, countering claims about potential export violations.
DeepSeek’s power implications for AI training punctures some of the capex euphoria which followed major commitments from Stargate and Meta last week, commented brokerage firm Jefferies. Given DeepSeek’s performance comparable to GPT-4o for a fraction of the computing power, Jefferies there are potential negative implications for the builders, as pressure on AI players to justify ever increasing capex plans could ultimately lead to a lower trajectory for data center revenue and profit growth.
Another Wall Street firm Nomura believes DeepSeek’s effectiveness for training LLMs may lead to concerns for reduced hardware demand, while the market may become more anxious about the return on large AI investment, if there are no meaningful revenue streams in the near- term.
However, Citigroup questioned the notion that DeepSeek’s achievements were done without the use of advanced GPUs to fine tune it and/or build the underlying LLMs the final model is based on through the Distillation technique. Citi doesn’t expect leading AI companies would move away from more advanced GPUs which provide more attractive $/TFLOPs at scale.
来源:钛媒体