[Python] pyannote + whisper 사용

[Python] pyannote + whisper 사용

2024. 6. 4. 16:46ㆍ파이썬

[찾아보게 된 이유]

1. whisper로 웹에서 전송된 mp3파일을 STT(Spepech To Text)를 진행했음.

2. 화자 식별이 되면 좋지 않을까? 하다 찾아봄

3. 용량이 그리크지않고, 코드도 별로 길지 않아 둘다 합쳐보기로함

4. 다행히 라이브러리가 있었음

[라이브러리 설치 및 준비]

1번. pip install pyannote.audio

2번 2개 링크 접속 후 accept 받기

https://huggingface.co/pyannote/segmentation-3.0

pyannote/segmentation-3.0 · Hugging Face

This repository is publicly accessible, but you have to accept the conditions to access its files and content. The collected information will help acquire a better knowledge of pyannote.audio userbase and help its maintainers improve it further. Though thi

huggingface.co

https://huggingface.co/pyannote/speaker-diarization-3.1

pyannote/speaker-diarization-3.1 · Hugging Face

huggingface.co

3번 개인 키 발급 받기 (New Token -> 이름 하고싶은거 지정)

hf.co/settings/tokens

Hugging Face – The AI community building the future.

huggingface.co

4번 git clone하기

git clone https://github.com/yinruiqing/pyannote-whisper.git

https://github.com/yinruiqing/pyannote-whisper.git

GitHub - yinruiqing/pyannote-whisper

Contribute to yinruiqing/pyannote-whisper development by creating an account on GitHub.

github.com

[코드]

import whisper

from pyannote.audio import Pipeline

from pyannote_whisper.utils import diarize_text

from datetime import datetime

def log_time(label):

return f'{label}: {datetime.now().strftime("%Y-%m-%d %H:%M:%S")}'

#pyannote

pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization-3.1", use_auth_token="huggingface_primary_key")

#whisper 모델 로드

model = whisper.load_model("small")

model_load_time = log_time('MODEL_LOAD_TIME')

# whisper 적용

model_start_time = log_time('MODEL_START')

asr_result = model.transcribe("ko_s2.wav")

model_end_time = log_time('MODEL_END & PIPELINE_START')

#pyannote 적용

diarization_result = pipeline("ko_s2.wav")

pipeline_end_time = log_time('PIPELINE_END')

# 결과 병합

final_result = diarize_text(asr_result, diarization_result)

merge_time = log_time('MERGE')

# txt 파일로 저장

with open("Result_test.txt", "w", encoding="utf-8") as f:

# 시간정보 기입

f.write(f'[INFO]\n{model_load_time}\n{model_start_time}\n{model_end_time}\n{pipeline_end_time}\n{merge_time}\n')

#stt + 화자인식 결과

for seg, spk, sent in final_result:

f.write(f'[{seg.start:.2f} --> {seg.end:.2f}] {spk} : {sent}\n')

결과

[생각해봐야할 것]

기존 whisper만 사용하면 mp3파일 시간의 절반이 소요되었음

화자분리(pyannote)를 추가하면 그만큼 시간이 증가

줄이거나 쓰레드처리 혹은 큐 처리를 해서 처리하지만 사용자가 증가하면 aws서버가 안받쳐줄듯............................

728x90

'파이썬' 카테고리의 다른 글

[issue] Flask AWS close_wait 다중 발생 (0)	2024.06.10
[Python] 음성 관련 라이브러리 (pyannote,pedalboard,noisereduce,pydub,pytube) (0)	2024.06.04
[Python] flask audio file Not Loading (0)	2024.05.30
[Python] Whisper 모델 이용해서 뭐라도 만들기 (0)	2024.05.27
[issue] pytube 버그 발생 (0)	2024.05.23

영차 영차

영차 영차

태그

최근글

댓글

공지사항

아카이브

[찾아보게 된 이유]

[라이브러리 설치 및 준비]

[코드]

'파이썬' 카테고리의 다른 글

관련글

티스토리툴바