
i've made these changes, however ... diff --git a/app/config.py b/app/config.py index 51356a0..4c1f224 100644 --- a/app/config.py +++ b/app/config.py @@ -24,6 +24,7 @@ class LLMSettings(BaseModel): None, description="Maximum input tokens to use across all requests (None for unlimited)", ) + hf_tokenizer_id: Optional[str] = Field(None, description="HuggingFace model_id to apply the chat template locally.") temperature: float = Field(1.0, description="Sampling temperature") api_type: str = Field(..., description="AzureOpenai or Openai") api_version: str = Field(..., description="Azure Openai version if AzureOpenai") diff --git a/app/llm.py b/app/llm.py index 18a13af..c15a7e4 100644 --- a/app/llm.py +++ b/app/llm.py @@ -71,6 +71,11 @@ class LLM: except KeyError: # If the model is not in tiktoken's presets, use cl100k_base as default self.tokenizer = tiktoken.get_encoding("cl100k_base") + if llm_config.hf_tokenizer_id is not None: + import transformers + self.hf_tokenizer = transformers.AutoTokenizer.from_pretrained(llm_config.hf_tokenizer_id) + else: + self.hf_tokenizer = None if self.api_type == "azure": self.client = AsyncAzureOpenAI( @@ -252,6 +257,12 @@ class LLM: "model": self.model, "messages": messages, } + if self.hf_tokenizer is Non: + params["messages"] = messages + client_api_completions = self.client.chat.completions + else: + params["prompt"] = self.hf_tokenizer.apply_chat_template(messages, tokenize=False) + client_api_completions = self.client.completions if self.model in REASONING_MODELS: params["max_completion_tokens"] = self.max_tokens @@ -265,7 +276,7 @@ class LLM: # Non-streaming request params["stream"] = False - response = await self.client.chat.completions.create(**params) + response = await client_api_completions.create(**params) if not response.choices or not response.choices[0].message.content: raise ValueError("Empty or invalid response from LLM") @@ -279,7 +290,7 @@ class LLM: self.update_token_count(input_tokens) params["stream"] = True - response = await self.client.chat.completions.create(**params) + response = await client_api_completions.create(**params) i don't expect them to work. i expect the two api endpoints to be too different, maybe a couple other concerns i'm thinking it makes sense to configure an endpoint and run it, gives one a direct look at the issue without having to cross reference things 1622 the working api i've last used is in one of the zinc binaries hardcoded. i'd pull it out of there and put it in openmanus 1629 ummmm ummm ummm i'm out of space install all its gpu depencies for local evaluation. gotta disable those. 1650 (Pdb) print(params["prompt"]) <|begin▁of▁sentence|>You are OpenManus, an all-capable AI assistant, aimed at solving any task presented by the user. You have various tools at your disposal that you can call upon to efficiently complete complex requests. Whether it's programming, information retrieval, file processing, or web browsing, you can handle it all.<|User|>hi<|User|>You can interact with the computer using PythonExecute, save important content and information files through FileSaver, open browsers with BrowserUseTool, and retrieve information using GoogleSearch. PythonExecute: Execute Python code to interact with the computer system, data processing, automation tasks, etc. FileSaver: Save files locally, such as txt, py, html, etc. BrowserUseTool: Open, browse, and use web browsers.If you open a local HTML file, you must provide the absolute path to the file. WebSearch: Perform web information retrieval Terminate: End the current interaction when the task is complete or when you need additional information from the user. Use this tool to signal that you've finished addressing the user's request or need clarification before proceeding further. Based on user needs, proactively select the most appropriate tool or combination of tools. For complex tasks, you can break down the problem and use different tools step by step to solve it. After using each tool, clearly explain the execution results and suggest the next steps. Always maintain a helpful, informative tone throughout the interaction. If you encounter any limitations or need more details, clearly communicate this to the user before terminating. <|Assistant|> ....................... 1840 p well i'm not finding any information on the tool calling formats for deepseek. their public api has tool calling with a caveat that it doesn't work well, and a handful of libraries have patched it on with prompt tuning, but it's not direct my sambanova 405b access is limited despite temporarily free some interest in figuring out deepseek's tool calling prompt by probing their api, maybe not the thing atm oops! _would need a free tool calling model to make most task agents plug-and-play!_ now maybe the quickest way to work aroudn this would be to look at what format the system expects, and just tell a model to output in that format