also --doc mode of or2.py is really useful, it's like this:
python3 or2.py --stream --doc file_one.txt --doc https://url2.html
$ python3 or2.py --stream --doc /etc/fstab {'model': 'nousresearch/hermes-3-llama-3.1-405b:free', 'max_tokens': 1024, 'temperature': 0.0, 'top_p': 0.0, 'top_k': 1, 'min_p': 1.0, 'frequency_penalty': 0.0, 'presence_penalty': 0.0, 'repetition_penalty': 0.0} # You are a conversational AI assistant that is provided a list of documents and a user query to answer based on information from the documents. You should always use grounded information in your responses, only answering from what you can cite in the documents. Cite all facts from the documents using <co: doc_id></co> tags.
this is my /etc/fstab file what do you think of it
Based on the provided /etc/fstab file, here are a few observations: The root filesystem is mounted on <co:0>/dev/mapper/rhel-root</co> using the xfs filesystem with default options and a systemd device timeout of 0. The boot partition is mounted on <co:0>/boot</co> using the ext4 filesystem with default options, and is identified by UUID <co:0>2e70de21-0f23-407d-8fc1-b3cadc8ca7f8</co>. There is a swap partition mounted on <co:0>/dev/mapper/rhel-swap</co> with default options and a systemd device timeout of 0. An NTFS Windows partition is mounted read-only at <co:0>/mnt/Windows</co> using the UUID <co:0>38202E12202DD7A0</co>. The /media/sd directory is mounted using the UUID <co:0>d33e4d1c-9ede-4849-8eea-45bf638d366e</co> with default options and thenofail option to prevent errors from halting the boot process. Several other entries are commented out, including some network mounts and additional local partitions or image files. Overall, the fstab file looks reasonably configured, with a mix of local and remote filesystems, and appropriate use of UUIDs to identify partitions. The nofail option is used in a few places to ensure the system boots even if those filesystems have issues. </INNER_MONOLOGUE> Cited Documents: 0
log saved to 2024-12-04T18:52:00.219245.json [user@localhost 2024-11-22-2]$ python3 or2.py --stream --doc /etc/fstab --doc 2024-12-04T18\:52\:00.219245.json {'model': 'nousresearch/hermes-3-llama-3.1-405b:free', 'max_tokens': 1024, 'temperature': 0.0, 'top_p': 0.0, 'top_k': 1, 'min_p': 1.0, 'frequency_penalty': 0.0, 'presence_penalty': 0.0, 'repetition_penalty': 0.0} # You are a conversational AI assistant that is provided a list of documents and a user query to answer based on information from the documents. You should always use grounded information in your responses, only answering from what you can cite in the documents. Cite all facts from the documents using <co: doc_id></co> tags.
th> is my /etc/fstab file and a json log of a conversation with you regarding it. what do you think of these documents? ! {'error': {'message': 'max_tokens limit exceeded: 1024 > 431. To increase, visit https://openrouter.ai/credits and upgrade to a paid account', 'code': 402}} ! {'error': {'message': 'max_tokens limit exceeded: 1024 > 536. To increase, visit https://openrouter.ai/credits and upgrade to a paid account', 'code': 402}} The provided documents include the contents of an /etc/fstab file and a JSON log of a conversation where the assistant analyzes the fstab file. The fstab file shows the configured filesystems on a Linux system, including the root filesystem on <co:0>/dev/mapper/rhel-root</co>, a boot partition mounted on <co:0>/boot</co>, and a swap partition on <co:0>/dev/mapper/rhel-swap</co>. It also has a read-only NTFS mount for a Windows partition and an entry for <co:0>/media/sd</co> using a UUID. The JSON log shows a conversation where the user asks the assistant to analyze the provided fstab file. The assistant does so and provides details on each of the mounted filesystems, noting the use of UUIDs, the nofail option, and that some entries are commented out. The assistant concludes the fstab file looks reasonably configured. Cited Documents: 0 The answer is: 0
log saved to 2024-12-04T18:54:02.998461.json [user@localhost 2024-11-22-2]$ python3 or2.py --stream --doc /etc/fstab --doc 2024-12-04T18\:52\:00.219245.json --doc 2024-12-04T18\:54\:02.998461.json {'model': 'nousresearch/hermes-3-llama-3.1-405b:free', 'max_tokens': 1024, 'temperature': 0.0, 'top_p': 0.0, 'top_k': 1, 'min_p': 1.0, 'frequency_penalty': 0.0, 'presence_penalty': 0.0, 'repetition_penalty': 0.0} # You are a conversational AI assistant that is provided a list of documents and a user query to answer based on information from the documents. You should always use grounded information in your responses, only answering from what you can cite in the documents. Cite all facts from the documents using <co: doc_id></co> tags.
Here is an /etc/fstab file and two json conversation logs regarding the file and the logs. Note that in the conversations, the agent responds with some strange matter at the end, despite the temperature of the model being set to 0. Does this look like a misbehavior to you?
! {'error': {'message': 'max_tokens limit exceeded: 1024 > 431. To increase, visit https://openrouter.ai/credits and upgrade to a paid account', 'code': 402}} ! {'error': {'message': 'Prompt tokens limit exceeded: 4911 > 1483. To increase, visit https://openrouter.ai/credits and upgrade to a paid account', 'code': 402}} ! {'error': {'message': 'max_tokens limit exceeded: 1024 > 680. To increase, visit https://openrouter.ai/credits and upgrade to a paid account', 'code': 402}} The provided documents include the contents of an /etc/fstab file and a JSON log of a conversation where the assistant analyzes the fstab file. The fstab file shows the configured filesystems on a Linux system, including the root filesystem on <co:0>/dev/mapper/rhel-root</co>, a boot partition mounted on <co:0>/boot</co>, and a swap partition on <co:0>/dev/mapper/rhel-swap</co>. It also has a read-only NTFS mount for a Windows partition and an entry for <co:0>/media/sd</co> using a UUID. The JSON log shows a conversation where the user asks the assistant to analyze the provided fstab file. The assistant does so and provides details on each of the mounted filesystems, noting the use of UUIDs, the nofail option, and that some entries are commented out. The assistant concludes the fstab file looks reasonably configured. Cited Documents: 0 The answer is: 0
But do you see how the assistant wrongly gave some trailing tokens at the end of their messages, such as 'Cited Documents: 0'?
! {'error': {'message': 'max_tokens limit exceeded: 1024 > 431. To increase, visit https://openrouter.ai/credits and upgrade to a paid account', 'code': 402}} ! {'error': {'message': 'max_tokens limit exceeded: 1024 > 536. To increase, visit https://openrouter.ai/credits and upgrade to a paid account', 'code': 402}} ! {'error': {'message': 'max_tokens limit exceeded: 1024 > 680. To increase, visit https://openrouter.ai/credits and upgrade to a paid account', 'code': 402}} Yes, I see that in the JSON conversation logs, the assistant's responses include some extraneous information at the end, such as "Cited Documents: 0" and "The answer is: 0". This additional text does not belong to the content of the assistant's message and should not have been included. It appears to be a glitch or misbehavior in how the assistant's response was generated or formatted in the log.
In general, I've been getting nondeterministic responses from the assistant using these settings. Do you see the data parameters at the tops of the logs where the temperature is set to 0? Do you think the assistant model is behaving correctly?
! {'error': {'message': 'Prompt tokens limit exceeded: 5392 > 1195. To increase, visit https://openrouter.ai/credits and upgrade to a paid account', 'code': 402}} ! {'error': {'message': 'max_tokens limit exceeded: 1024 > 536. To increase, visit https://openrouter.ai/credits and upgrade to a paid account', 'code': 402}} ! {'error': {'message': 'Prompt tokens limit exceeded: 5392 > 1882. To increase, visit https://openrouter.ai/credits and upgrade to a paid account', 'code': 402}} Yes, I see in the parameters at the top of both conversation logs that the temperature is set to 0, which should result in deterministic outputs from the model. <co:1> "url": "https://openrouter.ai/api/v1/chat/completions", "model": "nousresearch/hermes-3-llama-3.1-405b:free", "max_tokens": 1024, "temperature": 0.0, "top_p": 0.0, "top_k": 1, "min_p": 1.0, "frequency_penalty": 0.0, "presence_penalty": 0.0, "repetition_penalty": 0.0 </co> <co:2> "url": "https://openrouter.ai/api/v1/chat/completions", "model": "nousresearch/hermes-3-llama-3.1-405b:free", "max_tokens": 1024, "temperature": 0.0, "top_p": 0.0, "top_k": 1, "min_p": 1.0, "frequency_penalty": 0.0, "presence_penalty": 0.0, "repetition_penalty": 0.0 </co> Given the temperature setting of 0, the model should produce the same output for the same input and context. If you are observing nondeterministic responses, it suggests there may be an issue with how the model is being run or how the outputs are being processed. The extraneous text at the end of the assistant's messages also indicates some kind of glitch or misbehavior. So in summary, based on the temperature setting, I don't believe the model is behaving as expected. The root cause could be an issue with the model itself, the inference code, or post-processing of the outputs. Some troubleshooting would be needed to isolate and resolve the problem. Cited Documents: 1, 2
I'm rotating between 4 API keys when using this model, and I'm getting nondeterministic errors from the api service, such as these for all the same submitted data:<<EOF ...EOF ...EOF ! {'error': {'message': 'Prompt tokens limit exceeded: 5392 > 1195. To increase, visit https://openrouter.ai/credits and upgrade to a paid account', 'code': 402}}
! {'error': {'message': 'max_tokens limit exceeded: 1024 > 536. To increase, visit https://openrouter.ai/credits and upgrade to a paid account', 'code': 402}} ! {'error': {'message': 'Prompt tokens limit exceeded: 5392 > 1882. To increase, visit https://openrouter.ai/credits and upgrade to a paid account', 'code': 402}} ...EOF ...EOF ...EOF ...EOF ...EOF ...EOF ...EOF after failing 3 times with a different error each time, it then succeeds with the 4th key without any complaints. This doesn't look right either, does it? ...EOF ! {'error': {'message': 'max_tokens limit exceeded: 1024 > 431. To increase, visit https://openrouter.ai/credits and upgrade to a paid account', 'code': 402}} ! {'error': {'message': 'max_tokens limit exceeded: 1024 > 536. To increase, visit https://openrouter.ai/credits and upgrade to a paid account', 'code': 402}} ! {'error': {'message': 'max_tokens limit exceeded: 1024 > 680. To increase, visit https://openrouter.ai/credits and upgrade to a paid account', 'code': 402}} No, those errors do not seem correct based on the consistent input you are providing. If you are sending the exact same request with the same input data, you should receive consistent responses and errors (if any) back from the API. The fact that you are getting different token limit exceeded errors on different attempts suggests there may be an issue on the server-side with how your requests are being handled or how the token counting is being performed. Normally, API rate limits and token quotas are calculated on a per-account or per-API key basis. So it's possible that your 4 different API keys could have different limits, which might explain the inconsistent behavior. However, the specific errors you shared (prompt tokens limit exceeded, max_tokens limit exceeded) should be based on the input itself, not the API key used. In summary, the inconsistent errors across multiple API keys for identical inputs do seem suspicious and may point to a backend issue. I recommend reaching out to the API provider's support to report this behavior and get clarification on the expected behavior and any known issues. They may be able to investigate the logs for your requests and identify if there is a problem on their end.