DeepSeek finally released a new model and paper. And because this DeepSeek-OCR release is a bit different from what everyone expected, and DeepSeek releases are generally a big deal, I wanted to do a brief explainer of what it is all about.In short, they explore how vision

发布时间: 2025-10-22 08:30:15

1分

数据加载中

DeepSeek finally released a new model and paper. And because this DeepSeek-OCR release is a bit different from what everyone expected, and DeepSeek releases are generally a big deal, I wanted to do a brief explainer of what it is all about.
In short, they explore how vision
时政
( twitter.com )

DeepSeek finally released a new model and paper. And because this DeepSeek-OCR release is a bit different from what everyone expected, and DeepSeek releases are generally a big deal, I wanted to do a brief explainer of what it is all about.

In short, they explore how vision encoders can improve the efficiency of LLMs in processing and compressing textual information. And the takeaway is that rendering text as images and feeding that to the model results in more efficient compression than working with text directly.

My first intuition was that this sounds very inefficient and shouldn't work as well as using text tokenizers (or alternatives like Byte Latent Transformer) to prepare the input. It actually reminded me of the line of research I saw years ago, where researchers represented 3D molecules as 3D inputs or 2D images for ConvNets instead of using graph neural nets. This shouldn't work well and should be prone to overfitting.

In the case of DeepSeek-OCR, why even try such an approach? I imagine it started as a curiosity, but then it may have turned into an interesting idea for long-context scaling in LLMs and how to make it cheaper by using vision tokens and representations. (An image can say more than a thousand words, but who would have thought that an image of text can say 1000 words more efficiently!)

In any case, this DeepSeek-OCR approach turns out to be surprisingly efficient. In particular, they found that at a fixed precision of 97% for long-context decoding (i.e., how well the model can compress information into a latent representation and reconstruct it), the OCR version needed 10 times fewer visual tokens than text tokens. In other words, the OCRed version can compress information 10x better than the text version.

How is it different compared to other VLLM architectures?
- They don't use a monolithic ViT as encoder, instead they fuse local and global vision features through a clever 16x convolutional compressor (this can handle high-resolution inputs efficiently in terms of memory and token counts).
- They are (to the best of my knowledge) those who use an MoE as a decoder.
I think it's an interesting, refreshing approach, and the twist here is that it works surprisingly well.
However, I don't think that visual representations of text will solve the limitations of LLMs. Also, while it is popular to dislike text tokenizers like BPE, image representations are messy as well (one has to deal with aspect ratios, resolutions, croppings, color intensity variations, brightness levels, etc.). Still, it’s an interesting idea. Also, if this approach is more efficient than regular black&white text, I am curious to see compression ratios when we add syntax color to code.

Regarding code, this may be an interesting alternative for storing contextual information, as spacing and subword tokenization remain challenges in traditional tokenizers. (Especially when working with code that uses many custom variable names that may not be represented in vocabularies and that have to be broken down into many individual subword tokens.)

Overall, it’s still such an esoteric concept to encode text in images that I am (still) surprised it could do well (and may it would only make sense for very long documents or special domains like OCR or code, not general language modeling).

(PS: Personally, I expected the DeepSeek team to follow up with a V4 model using the sparse attention mechanism they tried in V3.2 recently, but maybe that's still forthcoming. Now, after reading this paper, V4 is perhaps going to be a VLLM.)

点击图片查看原图

Markdown支持

评论加载中...

您可能感兴趣的：更多

Awesome DeepSeek Integrations，很多有意思的集成 DeepSeek 的项目，也许能给你带来灵感
deepseek-ai/awesome-deepseek-integration
IT技术
( twitter.com)

7个月前 • 宝玉 • -- 点击 0 评论

deepseek 对接claude code，不用router了，直接干。。
export ANTHROPIC_BASE_URL=
export ANTHROPIC_AUTH_TOKEN=sk-
export ANTHROPIC_MODEL=deepseek-chat
export ANTHROPIC_SMALL_FAST_MODEL=deepseek-chat
时政
( api.deepseek.com)

2个月前 • AI-金刚 • -- 点击 0 评论

Finally! 😂
有趣
( twitter.com)

4个月前 • The Figen • -- 点击 • 下载视频 0 评论

00:00:17

My cartoon on #DeepSeek
• Australia urges citizens to be cautious about using DeepSeek.
• DeepSeek blocked in Italy after data protection authority requested.
• OpenAI says it has evidence DeepSeek used its model to train competitor.
• My graphic novel is also censored
时政
( twitter.com)

8个月前 • 巴丢草 Bad ї ucao • -- 点击 0 评论

DeepSeek finally released a new model and paper. And because this DeepSeek-OCR release is a bit different from what everyone expected, and DeepSeek releases are generally a big deal, I wanted to do a brief explainer of what it is all about.
In short, they explore how vision
时政
( twitter.com )

时政

Awesome DeepSeek Integrations，很多有意思的集成 DeepSeek 的项目，也许能给你带来灵感
deepseek-ai/awesome-deepseek-integration
IT技术
( twitter.com)

IT技术

deepseek 对接claude code，不用router了，直接干。。
export ANTHROPIC_BASE_URL=
export ANTHROPIC_AUTH_TOKEN=sk-
export ANTHROPIC_MODEL=deepseek-chat
export ANTHROPIC_SMALL_FAST_MODEL=deepseek-chat
时政
( api.deepseek.com)

时政

Finally! 😂
有趣
( twitter.com)

有趣

My cartoon on #DeepSeek
• Australia urges citizens to be cautious about using DeepSeek.
• DeepSeek blocked in Italy after data protection authority requested.
• OpenAI says it has evidence DeepSeek used its model to train competitor.
• My graphic novel is also censored
时政
( twitter.com)

时政

finally a solid ranking
有趣
( twitter.com)

有趣

When you finally finish school
有趣
( twitter.com)

有趣

When he finally realized 😭
有趣
( twitter.com)

有趣

When he finally realized 😭
有趣
( twitter.com)

有趣

When he finally realized 😭
有趣
( twitter.com)

有趣

DeepSeek finally released a new model and paper. And because this DeepSeek-OCR release is a bit different from what everyone expected, and DeepSeek releases are generally a big deal, I wanted to do a brief explainer of what it is all about. In short, they explore how vision 时政 ( twitter.com )

时政

Awesome DeepSeek Integrations，很多有意思的集成 DeepSeek 的项目，也许能给你带来灵感 deepseek-ai/awesome-deepseek-integration IT技术 ( twitter.com)

IT技术

deepseek 对接claude code，不用router了，直接干。。 export ANTHROPIC_BASE_URL= export ANTHROPIC_AUTH_TOKEN=sk- export ANTHROPIC_MODEL=deepseek-chat export ANTHROPIC_SMALL_FAST_MODEL=deepseek-chat 时政 ( api.deepseek.com)

时政

Finally! 😂 有趣 ( twitter.com)

有趣

My cartoon on #DeepSeek • Australia urges citizens to be cautious about using DeepSeek. • DeepSeek blocked in Italy after data protection authority requested. • OpenAI says it has evidence DeepSeek used its model to train competitor. • My graphic novel is also censored 时政 ( twitter.com)

时政

finally a solid ranking 有趣 ( twitter.com)

有趣

When you finally finish school 有趣 ( twitter.com)

有趣

When he finally realized 😭 有趣 ( twitter.com)

有趣

When he finally realized 😭 有趣 ( twitter.com)

有趣

When he finally realized 😭 有趣 ( twitter.com)

有趣

创建一个新帐户

登录

DeepSeek finally released a new model and paper. And because this DeepSeek-OCR release is a bit different from what everyone expected, and DeepSeek releases are generally a big deal, I wanted to do a brief explainer of what it is all about.
In short, they explore how vision
时政
( twitter.com )

Awesome DeepSeek Integrations，很多有意思的集成 DeepSeek 的项目，也许能给你带来灵感
deepseek-ai/awesome-deepseek-integration
IT技术
( twitter.com)

deepseek 对接claude code，不用router了，直接干。。
export ANTHROPIC_BASE_URL=
export ANTHROPIC_AUTH_TOKEN=sk-
export ANTHROPIC_MODEL=deepseek-chat
export ANTHROPIC_SMALL_FAST_MODEL=deepseek-chat
时政
( api.deepseek.com)

Finally! 😂
有趣
( twitter.com)

My cartoon on #DeepSeek
• Australia urges citizens to be cautious about using DeepSeek.
• DeepSeek blocked in Italy after data protection authority requested.
• OpenAI says it has evidence DeepSeek used its model to train competitor.
• My graphic novel is also censored
时政
( twitter.com)

finally a solid ranking
有趣
( twitter.com)

When you finally finish school
有趣
( twitter.com)

When he finally realized 😭
有趣
( twitter.com)

When he finally realized 😭
有趣
( twitter.com)

When he finally realized 😭
有趣
( twitter.com)