Docling + DeepSeekOCR 연동 (로컬만) + Huggingface local path

Docling + DeepSeekOCR 연동 (로컬만) + Huggingface local path

2026. 1. 8. 17:42ㆍ카테고리 없음

[Docling 라이브러리]
1. PDF,PPT 등 문서를 마크다운, JSON 등 구조화 하는 라이브러리 (IBM에서 만듬 + 아파치 라이센스)
2. Docling 내부 OCR 은 3가지 종류 (RapidOCR, EasyOCR, Tesseract)
3. 이 중 EasyOCR을 사용중
4. 성능이 별로여서 deepSeekOCR을 연결하고 싶었음
5. 내부 라이브러리 옵션을 보니 VLLM을 사용해서 서버를 띄우고 API 형식으로만 가능 (2026.1 ver)

6. HuggingFace에서 가장 잘나가는 오픈소스 OCR 검색 -> DeepSeekOCR 당첨 !
7. 내부라이브러리를 수정
- DeepseekOption과 모델 추가
- 폐쇄망에서도 되게 모델 다운후 경로 지정하게 만듬 (폐쇄망에서도 되는 것을 지향)

[참고] - GIT https://github.com/rafaeltuelho/docling

GitHub - rafaeltuelho/docling: Get your documents ready for gen AI

Get your documents ready for gen AI. Contribute to rafaeltuelho/docling development by creating an account on GitHub.

github.com

[내부 라이브러리 수정 위치]

[신규 추가]

- docling/models/deepseek_ocr_model .py

[수정 및 변경]

- docling/models/defaults.py

- docling/datamodel/pipe_options.py

#NOTE:추가부

class DeepSeekOcrOptions(OcrOptions):

"""Options for the DeepSeek OCR engine.

DeepSeek-OCR is a Vision-Language Model (VLM) based OCR engine that uses

transformer models for document understanding and text extraction.

See: https://github.com/deepseek-ai/DeepSeek-OCR

Device Support:

- CUDA (NVIDIA GPU): Optimal performance with flash_attention_2 and bfloat16.

Uses the official 'deepseek-ai/DeepSeek-OCR' model.

- MPS (Apple Silicon M1/M2/M3/M4): Supported via MPS-compatible model fork.

Requires PyTorch 2.7.0+ for the aten::_upsample_bicubic2d_aa operator.

Automatically switches to 'Dogacel/DeepSeek-OCR-Metal-MPS' model with

float16 precision and eager attention.

- CPU: Not supported. Use EasyOcrOptions, TesseractOcrOptions, or

RapidOcrOptions for CPU-only environments.

Example:

>>> from docling.datamodel.pipeline_options import DeepSeekOcrOptions

>>> # Basic usage - auto-detects CUDA or MPS

>>> options = DeepSeekOcrOptions()

>>>

>>> # Custom prompt for specific OCR tasks

>>> options = DeepSeekOcrOptions(prompt="<image>\\nConvert to markdown.")

"""

kind: ClassVar[Literal["deepseekocr"]] = "deepseekocr"

# DeepSeek-OCR is multilingual by default, no specific language configuration needed

lang: List[str] = []

# Model configuration

# Default is the official CUDA model; MPS users will auto-switch to MPS-compatible fork

repo_id: str = "deepseek-ai/DeepSeek-OCR"

# 이미지 프롬프터

prompt: str = "<image>\nFree OCR."

# Image processing parameters

base_size: int = 1024

image_size: int = 640

crop_mode: bool = True

# 기본 파라미터

max_new_tokens: int = 4096

temperature: float = 0.0

do_sample: bool = False

# 커스텀 사용 유뮤

trust_remote_code: bool = True

# Attention implementation:

# - "flash_attention_2": 가속기 (GPU 없으면 X)

# - "eager": Mac 관련 애들 (MPS)

# - "sdpa": 현재 지원X

attn_implementation: Optional[str] = None

model_config = ConfigDict(

extra="forbid",

protected_namespaces=(),

)

local_model_path : Optional[str] = None

[주의]

pip install docling을 사용하면 안된다. (deepseekOCR) 공식 지원까지

GIT에 올리고 다운받게 requirements.txt 수정

[해야할일]

DockerFile 수정 및 Jenkins 빌드 준비

Huggingface Local Path로 다운하게 하기

내부 로직중 아래와 같은 것이 있었다.

설명: 그냥 허깅페이스에서 해당 모델 받아와라

내가 하고싶은것: 미리 다운받아 놓고 로컬 경로에서 가져오게 하고싶었음.

[방법1] huggingface cache 폴더에서 deepseek껏만 빼오자 !

- huggingface cache 경로 : C:\Users\노트북이름\.cache

- hub 폴더 안에 모델 폴더가 있을거임

[제한점]

간혹 modules, transformers라는 폴더가 생김 -> deepseek case......

이 친구들을 묶어서 경로를 지정해주면 되지 않나 ...??????? 라는 생각

=> 결과 대 실패

[해결책] 1개의 폴더에 다운을 다 받고 경로 지정하면됌

여기서

요렇게 !

요렇게 사용

728x90

영차 영차

영차 영차

태그

최근글

댓글

공지사항

아카이브

Huggingface Local Path로 다운하게 하기

내가 하고싶은것: 미리 다운받아 놓고 로컬 경로에서 가져오게 하고싶었음.

[해결책] 1개의 폴더에 다운을 다 받고 경로 지정하면됌

티스토리툴바