[sLLM 로컬 프로젝트] 3.Notebook llama 를 통해 Meta 의 LLM 가이드라인을 분석하기 (feat. Meta 가 목표하는 llama의 활용법)

* 지난번처럼 본 글도 o1-preview 의 도움을 받았으나, 매크로 생성 글은 아닙니다.

* 이번 글은 Notebook llama 를 중 Step-1 코드를 구동해보는 기록이라 내용에 분량이 좀 있습니다.

* 따라서 결론 5번(5. Notebook llama 의 시사점 및 코드 실행 결과)만 보시는 것도 좋습니다.

1. Notebook llama 란?

Meta 가 Notebook llama 를 발표했습니다. 아니, 발표라고 해야할지 공유라고 해야할지. Github 에 Notebook llama 라는 걸 출시했습니다. 먼저 Notebook llama 의 Github 링크는 아래와 같습니다.

https://github.com/meta-llama/llama-recipes/blob/main/recipes/quickstart/NotebookLlama/README.md

llama-recipes/recipes/quickstart/NotebookLlama/README.md at main · meta-llama/llama-recipes

Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A...

github.com

공식 Notebook llama 웹사이트는 아래인데, 별다른 내용은 없습니다.

https://www.notebookllama.ai/

Notebook Llama | Llama API

Notebook Llama is deploys Meta's Llama recipe for NotebookLM on the Llama family. It is an open-source project. It leverages Meta's Llama AI family: Llama 3.2, Llama 3.1 and open-source Parler text to speech model.

www.notebookllama.ai

메인 키워드는 두 개입니다. llama 와 Podcast. 팟캐스트와 관련해서 llama 를 사용하는 가이드를 공유한 사례입니다.

깃허브를 좀 더 살펴보면, Notebook llama 의 구성은 다음과 같습니다.

llama-recipes > recipes > quickstart > Notebookllama Files

README.md 파일을 좀 더 살펴보면, 다음과 같은 그림이 있습니다.

llama-recipes > recipes > quickstart > Notebookllama > README.md

Step 1) PDF 전처리 입니다. pdf 파일을 llama 로 깨끗한 txt 를 뽑아냅니다. (llama3.2:1B로)

Step 2) 대본 작성입니다. txt 파일을 llama로 팟캐스트 대본을 작성합니다. (이번엔 llama3.1:70B로)

Step 3) 대본을 업데이트합니다. 대본을 "More Dramatic" 하게 업데이트합니다. (llama3.1:8B)

Step 4) 텍스트를 오디오로 변경합니다. (4단계에서는 llama 가 아닌, parler-tts 랑 bark/suno 를 쓰네요)

결론적으로.. 특정 정보에 대한 PDF 가 있으면 자동 팟캐스트(음성이랑 대본 다) 를 만들 수 있는 - local LLM 을 쓰는 - 가이드입니다. 현실적으로 제가 해 볼 수 있는 거 Step 1 이겠네요. 노트북 llama 라고 되어 있어서 노트북에서 굴리는 llama 인줄 알았는데, Step 1 만 그런 것 같습니다. 😢

2. Notebook llama - Step 1 을 살펴보자

깃허브에는 Step-1 PDF-Pre-Processing-Logic.ipynb 파일로 되어 있는데, 파이썬이니 라이브러리부터 살펴봅시다.

import PyPDF2
from typing import Optional
import os
import torch
from accelerate import Accelerator
from transformers import AutoModelForCausalLM, AutoTokenizer

from tqdm.notebook import tqdm
import warnings

먼저 본 프로젝트 2편에서 다루었던 PyPDF2 가 있네요. PDF 를 읽는 라이브러리로 PyPDF2를,
파이썬에서 타입을 지정하기 위해 Optional을,
운영체제 파일을 읽기 위해 os 를,
딥러닝 인공지능 구동을 위해 torch 를,
주피터 notebook 을 위한 tqdm 을, (*추가 저는 이후 from tqdm import tqdm 을 썻습니다.)
파이썬 경고를 위한 warings 를 썻네요.

제가 잘 모르는 라이브러리는

Hugging Face 의 Accelerator 와 AutoModelForCausalLM, AutoTokenizer 입니다.

Accelerator, transformers(AutoModelForCausalLM, AutoTokenizer), 라이브러리를 사용하는 코드를 찾아보면,

accelerator = Accelerator()
model = AutoModelForCausalLM.from_pretrained(
    DEFAULT_MODEL,
    torch_dtype=torch.bfloat16,
    use_safetensors=True,
    device_map=device,
)
tokenizer = AutoTokenizer.from_pretrained(DEFAULT_MODEL, use_safetensors=True)
model, tokenizer = accelerator.prepare(model, tokenizer)

에서 해당 라이브러리들을 쓰고 있습니다.

Accelerator 는 https://huggingface.co/docs/accelerate/en/package_reference/accelerator

Accelerator

A context manager that facilitates distributed training or evaluation on uneven inputs, which acts as a wrapper around torch.distributed.algorithms.join. This is useful when the total batch size does not evenly divide the length of the dataset. join_uneven

huggingface.co

AutoModelForCausalLM 와 AutoTokenizer 는 https://huggingface.co/docs/transformers/en/model_doc/auto

Auto Classes

Instantiate one of the feature extractor classes of the library from a pretrained model vocabulary. The feature extractor class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pr

huggingface.co

에서 세부 내용을 확인하실 수 있습니다.

위 부분은 Hugging face 를 잘 사용해보진 않아서, 일단 Hugging face 를 가입하겠습니다.

3. Hugging Face 의 llama3.2:1B 를 사용하기

깃허브에서 제공한 코드를 그대로 돌리면, 다음과 같은 에러가 나타납니다.

---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
File c:\Python311\Lib\site-packages\huggingface_hub\utils\_http.py:406, in hf_raise_for_status(response, endpoint_name)
    405 try:
--> 406     response.raise_for_status()
    407 except HTTPError as e:

File c:\Python311\Lib\site-packages\requests\models.py:1024, in Response.raise_for_status(self)
   1023 if http_error_msg:
-> 1024     raise HTTPError(http_error_msg, response=self)

HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct/resolve/main/config.json

The above exception was the direct cause of the following exception:

GatedRepoError                            Traceback (most recent call last)
File c:\Python311\Lib\site-packages\transformers\utils\hub.py:389, in cached_file(path_or_repo_id, filename, cache_dir, force_download, resume_download, proxies, token, revision, local_files_only, subfolder, repo_type, user_agent, _raise_exceptions_for_missing_entries, _raise_exceptions_for_connection_errors, _commit_hash, **deprecated_kwargs)
    387 try:
    388     # Load from URL or cache if already cached
--> 389     resolved_file = hf_hub_download(
    390         path_or_repo_id,
    391         filename,
    392         subfolder=None if len(subfolder) == 0 else subfolder,
    393         repo_type=repo_type,
    394         revision=revision,
...
    414         "`token=<your_token>`"
    415     ) from e

OSError: You are trying to access a gated repo.
Make sure to request access at https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct and pass a token having permission to this repo either by logging in with `huggingface-cli login` or by passing `token=<your_token>`.
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...

Hugging Face 에 가입하고, 사용할 Llama 3.2:1b 를 찾아봅시다.

https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct

meta-llama/Llama-3.2-1B-Instruct · Hugging Face

Model Information The Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimi

huggingface.co

Agreement sign before llama3.2:1B Hugging face Screen

Expand to review and access 를 클릭하고, 스크롤을 내려 Agreement 에 이름, 생년월일 등을 기재합니다.

Submit 을 누르면, Agreement 관련하여 access 를 받게될 것이라고 나타납니다.

개인 계정 Settings - Gated Repositories 를 들어가면, Repo 에 대한 승인을 확인할 수 있습니다.

https://huggingface.co/settings/gated-repos

Hugging Face – The AI community building the future.

huggingface.co

이제 개인 계정 토큰을 생성해봅시다. 개인 토큰은 외부 공유하지 마세요! (전 Permissions 은 Repositories 의 Read 만 체크했습니다)

https://huggingface.co/settings/tokens

Hugging Face – The AI community building the future.

huggingface.co

이제 토큰을 추가하시고 구동하시면 정상 작동되실 수 있으나, 저는 아래와 같이 코드 수정했습니다.

+ 4. 디버깅

격은 오류는 대표적으로 두 가지였습니다.

1. Hugging Face 토큰 설정 오류

[기존 코드]

accelerator = Accelerator()
model = AutoModelForCausalLM.from_pretrained(
    DEFAULT_MODEL,
    torch_dtype=torch.bfloat16,
    use_safetensors=True,
    device_map=device,
)
tokenizer = AutoTokenizer.from_pretrained(DEFAULT_MODEL, use_safetensors=True)
model, tokenizer = accelerator.prepare(model, tokenizer)

[업데이트 코드]

token = '생성한 토큰을 넣으세요'
accelerator = Accelerator() #Accelerator 를 먼저 실행해야 뒤 코드가 인식됩니다.

model = AutoModelForCausalLM.from_pretrained(
    DEFAULT_MODEL,
    torch_dtype=torch.bfloat16,
    use_safetensors=True,
    device_map=device,
    use_auth_token=token
)
tokenizer = AutoTokenizer.from_pretrained(
    DEFAULT_MODEL,
    use_safetensors=True,
    use_auth_token=token
)

model, tokenizer = accelerator.prepare(model, tokenizer)

2. 경로 설정 오류

[업데이트 코드]

base_dir = os.path.dirname(os.path.abspath(__file__))  # 현재 스크립트의 디렉토리
resources_dir = os.path.join(base_dir, 'resources')

# pdf_path = './resources/2402.13116v3.pdf'
pdf_filename = '2402.13116v4_small.pdf'
pdf_path = os.path.join(resources_dir, pdf_filename)
DEFAULT_MODEL = "meta-llama/Llama-3.2-1B-Instruct"

저는 위 사항을 해결하고 정상 구동되었던 것 같은데, github 에 있는 pdf 파일이 무거우니 테스트 용도로는 page 1 만 남기도 해보시길 바랍니다.

5. Notebook llama 의 시사점 및 코드 실행 결과

결론적으로 이걸 어디에 쓸 수 있고, 어떤 의미를 가질까요?

Meta 에서 제공한 공식적인 llama3.2:1B 외 타 모델들의 사용방법을 제공한, 공식 가이드 및 공식 프롬프트

로서 의미가 크다고 생각합니다.

프롬프트를 보시죠.

SYS_PROMPT = """
You are a world class text pre-processor, here is the raw data from a PDF, please parse and return it in a way that is crispy and usable to send to a podcast writer.
The raw data is messed up with new lines, Latex math and you will see fluff that we can remove completely. Basically take away any details that you think might be useless in a podcast author's transcript.
Remember, the podcast could be on any topic whatsoever so the issues listed above are not exhaustive
Please be smart with what you remove and be creative ok?
Remember DO NOT START SUMMARIZING THIS, YOU ARE ONLY CLEANING UP THE TEXT AND RE-WRITING WHEN NEEDED
Be very smart and aggressive with removing details, you will get a running portion of the text and keep returning the processed text.
PLEASE DO NOT ADD MARKDOWN FORMATTING, STOP ADDING SPECIAL CHARACTERS THAT MARKDOWN CAPATILISATION ETC LIKES
ALWAYS start your response directly with processed text and NO ACKNOWLEDGEMENTS about my questions ok?
Here is the text:
"""

처음 시작부터, "You are a world - class text pre-processor" 라고 시작합니다.

단순히 LLM을 칭찬하는 게 재미있다기보다, 논리적으로 LLM 을 다룰 때 미사여구 및 형용사가 필요 없다고 생각하였으나 공식 Github 의 가이드에 저 문구를 제공하였다는 것은 두 가지 요소가 있다고 생각합니다.

1) text pre-processor 의 역할 부여와 함께, 가장 성능 좋은 text pre-processor 를 명시

2) llama LLM 모델들의 설계를 고려할 때, 구체적인/추가적인 정의 및 언급, 표현이 LLM 의 성능을 올리는 데 최소한 부정적이지는 않다는 점

그리고 무었보다, sLLM 인 llama3.2:1B 에게 줄글 형식의 프롬프트 명령을 내리더라도, 일단 양이 많은 게 명령 이행이 원활할 수도 있다는 점입니다.

그렇다면 모델의 성능을 고정하는 부분은 어떨까요? LLM 의 결과를 고정하는 방법은 다양하지만, 대표적으로 temperature 와 top_p 가 있습니다. 해당 코드를 살펴보죠.

def process_chunk(text_chunk, chunk_num):
    """Process a chunk of text and return both input and output for verification"""
    conversation = [
        {"role": "system", "content": SYS_PROMPT},
        {"role": "user", "content": text_chunk},
    ]
    
    prompt = tokenizer.apply_chat_template(conversation, tokenize=False)
    inputs = tokenizer(prompt, return_tensors="pt").to(device)
    
    with torch.no_grad():
        output = model.generate(
            **inputs,
            temperature=0.7,
            top_p=0.9,
            max_new_tokens=512
        )

temperature 가 0.7 top_p 가 0.9 입니다. 이게 항상 최적의 값이다, 라고 할 순 없으나 최소한 동봉된 프롬프트를 실행하기에는 나름 최적의 값이라고 볼 수도 있습니다. 그리고 max token 도 512로 해놓았네요.

본 예제 코드는 어디에 쓸 수 있을까요?

Meta는 본 코드를 통해서, 최소한 llama3.2:1B 가 PDF 에서 txt 를 깔끔하게 추출하는 소위 말하는 '텍스트 처리기' 정도의 역할은 가능하다고 정의한 것 같습니다. 제가 이전 글에서 PDF 요약기로 쓰려고 했는데, 딱 그 정도의 일은 가능한 sLLM 이라는 말로도 해석할 수 있습니다.

하지만, llama3.2:1B 를 Step-1 이 아닌, Step-2, Step-3, Step-4 에서 사용하지 않은 건 사용자가 기능으로서 원하는 llama 의 역할에 따라 파라미터의 양이 다르다는 것 또한 말하고 있습니다. 언제, 어떤 요구사항 별로 달라야 하는지가 주요 질문이었는데, 최소한 아래처럼 구분될 수 있다는 것입니다.

Step 1) PDF 전처리 - llama3.2:1B

Step 2) 일정 분량의 텍스트 생성/작성 - llama3.1:70B

Step 3) 텍스트 업데이트/갱신 - llama3.1:8B

Step 4) 텍스트를 오디오로 변환 - parler-tts 랑 bark/suno // 이건 LLM 이 할 일이 아님.

따라서 Step 1 의 역할을 하는 본 코드는

1) PDF 를

2) 팟캐스트 작가 에게 줄 수 있도록

3) 텍스트를 정리해서 줘

4) 단, 본 프롬프트에 대한 답변은 텍스트에 주지 마.

인 LLM 프로그램입니다.

+ pdf 파일의 page 1 만 남기고 코드를 실행했을 때 결과입니다.

Preview of final processed text:

BEGINNING:
Tao Shen4, Reynold Cheng1, Jinyang Li1, Can Xu5, Dacheng Tao6, Tianyi Zhou2

1The University of Hong Kong2University of Maryland3Microsoft 4University of Technology Sydney5Peking University6The University of Sydney shawnxxh,chongyangtao,hishentao }@gmail.com {minglii,tianyi }@umd.edu ckcheng@cs.hku.hk   
utility in model compression and self-improvement
Our survey is meticulously structured around three foundational pillars: algorithm, skill, and verticalization    
providing a comprehensive examination of knowledge discovery mechanisms, the enhancement of specific cognitive abilities
and their practical implications across diverse fields
Crucially, the survey navigates the interaction between data augmentation and knowledge discovery
illustrating how data augmentation emerges as a powerful paradigm within the knowledge discovery framework        
to bolster large language models' performance
By leveraging data augmentation to generate context-rich, skill-specific training data
antly, we firmly advocate for compliance with the legal terms that regulate the use of LLMs, ensuring ethical and lawful application of knowledge distillation
ilities. These advancements lie in their emergent abilities, where models display capabilities beyond explicit training objectives, enabling diverse tasks with remarkable proficiency. They excel in understanding and generation, driving applications from creative to complex problem-soantly, we firmly advocate for compliance with the legal terms that regulate the use of LLMs, ensuring ethical and lawful application of knowledge distillation
ilities. These advancements lie in their emergent abilities, where models display capabilities beyond explicit training objectives, enabling diverse tasks with remarkable proficiency. They excel in understanding and generation, driving applications from creative to complex problem-solving.
s, particularly in light of the advantages offered by open-source models. A significant drawback is their limited accessibility and higher cost, making them inaccessible to individuals and smaller organizations. In terms of data privacy and security, using these proprietary models frequently entails sending sensitive data to external servers, raising concerns about data privacy and security.
ir proprietary nature, which can limit their use in certain applications. In contrast, open-source models like LLaMA and Mistral offer several advantages.

6. 개인적인 추가 고민사항

깃허브에서 제공된 pdf 파일이 논문과 같은 수준의 pdf 분량이긴 한데 page 1 만 처리하는데도 1시간이 걸렸습니다.
Hugging Face 를 통해 llama3.2:1B 를 쓰는 건 Hugging Face 의 llama3.2:1B 서버가 사양이 낮아서 오래걸렸던 것인지, text chunk 를 쪼개고 처리하는 게 로컬 컴퓨터의 성능이 낮아서 처리가 오래 걸렸던 것인지 확인해보면 좋을 것 같습니다.
위 코드를 ollama 에서 구동할 수 있도록 처리한다면, 프롬프트만 좀 바꾸면 제가 원하는 PDF 번역 요약기로도 활용할 수 있지 않을까 싶네요.
- >> 이후 추가적으로 ollama 로 구동해봤습니다, 다만
- 1 page pdf 정도는 인텔 16GB RAM 노트북에서 5분 정도가 걸려 처리가 가능합니다. 아무래도 hugging face 의 llama 모델을 구동하는 데 시간이 많이 소요되는 듯 합니다.
- temperature 0.7, top_p 09, max tokens 512 로 진행했습니다. (hugging face 코드는 new_max_tokens 였긴 합니다)
- hugging face 모델은 팟캐스트의 대본처럼 출력했습니다. - Beginning 부분처럼요.
- 하지만 ollama 로 구동한 결과는 그다지 대본같지는 않았습니다. 인공지능에서도 LLM 은 어렵네요.

A Survey on Knowledge Distillation of Large Language Models 
Xiaohan Xu1, Ming Li2, Chongyang Tao3, Tao Shen4, Reynold Cheng1, Jinyang Li1, Can Xu5, Dacheng Tao6, Tianyi Zhou2 

The University of Hong Kong2University of Maryland3Microsoft 4University of Technology Sydney5Peking University6The University of Sydney {shawnxxh,chongyangtao,hishentao }@gmail.com {minglii,tianyi }@umd.edu ckcheng@cs.hku.hk Abstract —In the era of Large Language Models (LLMs), Knowledge Distillation (KD) emerges as a pivotal methodology for transferring advanced capabilities from leading proprietary LLMs, such as GPT -4, to their open-source counterparts like LLaMA and Mistral. Additionally, as open-source LLMs flourish, KD plays a crucial role in both compressing these models, and facilitating their self-improvement by employing themselves as teachers. This paper presents a comprehensive survey of KD's role within the realm of LLM, highlighting its critical function in imparting knowledge to large language models
advanced knowledge to smaller models and its utility in model compression and self-improvement

our survey is meticulously structured around three foundational pillars algorithm skill and verticalization providing a comprehensive examination of knowledge mechanisms the enhancement of specific cognitive abilities and their practical implications across diverse fields
crucially the survey navigates the interaction between data augmentation and knowledge domains illustrating how data augmentation emerges as a powerful paradigm within the knowledge framework to bolster large language models' performance by leveraging data augmentation to generate context-rich skill-specific training data
kd transcends traditional boundaries enabling open-source models to approximate contextual adeptness ethical alignment and deep semantic insights characteristic of their proprietary counterparts
distillation and proposing future research directions. By bridging the gap between proprietary and open-source Large Language Models (LLMs), this survey underscores the potential for more accessible, efficient, and powerful AI solutions. Most importantly, we firmly advocate for compliance with the legal terms that regulate the use of Large Language Models (LLMs), ensuring ethical and lawful application of knowledge distilled from these models. An associated Github repository is available at https://github.com/Tebmer/Awesome-Knowledge-Distillation-of-LLMs.

...

kept technical terms (LLMs, models, parameters) and removed unnecessary phrases (e.g., "rich knowledge" in #2)

reorganized sentence structure for clarity
removed repeated words (e.g., "capabilities", "proficiency") to improve readability
redefine our interaction with technology despite remarkable capabilities of proprietary LLMs like GPT4 and Gemini they are not without shortcomings particularly when viewed in light of advantages offered by open-source models
a significant drawback is their limited accessibility higher cost (OpenAI et al 2023) these proprietary models often come with substantial usage fees and restricted access making them less attainable for individuals and smaller organizations
in terms of data privacy security using these proprietary LLMs frequently entails sending sensitive data to external servers which raises concerns about data privacy and security
this aspect is especially critical for users handling confidential information moreover the general purpose design of proprietary LLMs while powerful may not always align with specific needs of niche applications
constraints of accessibility cost and adaptability thus
present significant challenges in leveraging the full potential of proprietary LLMs
in contrast to proprietary LLMs open-source models like llaMA and mistral bring several notable advantages
one of the primary benefits being a more transparent and publicly available architecture 
this shift towards open-source model development has led to increased collaboration and community-driven improvement

'IT > 인공지능 AI' 카테고리의 다른 글

[sLLM 로컬 프로젝트] 2.ollama 와 Python 으로 소통하기 (feat. 파일 로컬 LLM 에 요청 및 응답 수신 성공) (3)	2024.10.22
[sLLM 로컬 프로젝트] 1.ollama 와 OpenWebui 설정하기 (feat. 첨부 파일 리딩 기능 개선 필요) (3)	2024.10.07
ollama Windows 와 ollama open-webui 의 기본 인공지능 모델 저장 경로 (0)	2024.08.08

Med&Tech

[sLLM 로컬 프로젝트] 3.Notebook llama 를 통해 Meta 의 LLM 가이드라인을 분석하기 (feat. Meta 가 목표하는 llama의 활용법)

1. Notebook llama 란?

2. Notebook llama - Step 1 을 살펴보자

3. Hugging Face 의 llama3.2:1B 를 사용하기

+ 4. 디버깅

5. Notebook llama 의 시사점 및 코드 실행 결과

6. 개인적인 추가 고민사항

'IT > 인공지능 AI' 카테고리의 다른 글

티스토리툴바

[sLLM 로컬 프로젝트] 3.Notebook llama 를 통해 Meta 의 LLM 가이드라인을 분석하기 (feat. Meta 가 목표하는 llama의 활용법)

1. Notebook llama 란?

2. Notebook llama - Step 1 을 살펴보자

3. Hugging Face 의 llama3.2:1B 를 사용하기

+ 4. 디버깅

5. Notebook llama 의 시사점 및 코드 실행 결과

6. 개인적인 추가 고민사항

'IT > 인공지능 AI' 카테고리의 다른 글

관련글

티스토리툴바