
Karl Semich wrote:
so i was working on 'zinc' -- Zinc Is Not Cline https://github.com/karl3wm/zinc (very very slowly of course) and things got intense and i stopped :s
basically it needs boost and you make a build dir and build it with cmake and that makes "chat" and "complete" binaries that can do chatting or completion with llama 3.1 405b on sambanova, hardcoded right now
i'm not sure how to uhhh add preservation of configuration data to it uhhh actually uhhh anyway i looked up the llama prompt formats and made prompts (scripts) to use llama to help build it, it's comparable to uhh cline? claude? with short input, kind of somewhat
the scripts are in scripts/ so right now you can go things like uhh `echo Update this file to have lots of comments about walruses. | python3 scripts/llama_file_flat.py src/configuration.cpp | build/complete | tee src/update_of_configuration.cpp.txt` and llama will act like cline and read the file and output a new file. but it's all manual right now. it's possibly a [sad] point to stop in that respect
but yeah it shows how easy it would be to make something like cline uhh
Did mention that after I posted here about Cline and Roo Cline, Roo Cline was renamed to Roo Code within a day? It's now outpaced Cline. Meanwhile I've been poking very very slowly at ZinC, very very slowly -- https://github.com/karl3wm/zinc ! It's ... funny .. and intense ... to work on. Kind of a new space. Disorienting. Today I didn't work on it. I specifically avoided it. Because I hadn't been sleeping and eating much the prior two days. I spent a lot of today sleeping. I also moved forward on rescuing my stolen car, which was slated for being crushed at a scrapyard with all of my worldly possessions inside. (long story short). It's now on hold but mostly it wasn't my doing. So because I haven't been working on ZinC I thought I'd talk about it instead. Sadly this will unsettle the ZinC energy, but it's somewhat fun! ZinC is still at https://github.com/karl3wm/zinc . Right now the main binary is 'chatabout' but of course these binaries and scripts are just tests and attempts to bootstrap a little recursion into the development. With `chatabout` (which likely has a bug right now who knows, but one could try `mkdir build; cd build; cmake ..; make; ./chatabout`, it detects a couple formats of shell expansion ("$(cmd) and $(<file)") and converts it to text-based citation formats with all the content embedded, so you can reference files and command outputs while talking with the language model. Of course I just made this quickly to meet small goals. (... and the python and bash scripts went farther and let the language model write files ...). I won't be running an example here because I'm ZinC-fasting today. Abstinence. Oh, the current bug with chatabout is that it crashes if you type in a url that doesn't return a 200 status code, because it doesn't catch the http error that is thrown. Let's talk about some of my silly development struggles. I'm silly partly because it's just a "fun project" so I make funny decisions. One of them was the API provider. Over last weekend the sambanova endpoint started reporting errors that my account was disabled. This was a bit of a weird one. I contacted support and got a reply on Monday that they had changed their system over the weekend to turn free accounts into paid accounts with an introductory token gift. My account didn't match what I expected, different balance than I was told (.. different usage history than I thought I should have ..) ... it worked again but it was funny ... Right now I've been using Targon instead of SambaNova. Targon's one of pseudo-blockchain-businesses where they call themselves blockchain but have a ton of off-chain infrastructure. So [of course?] it's quite enticing in the [boss/ai/ml] mindset. Targon presently has a timing-related bug where requests terminate with an error mid-stream before completing. At least, my streaming requests do. To resolve this more robustly I ended up changing chatabout to craft the prompt by hand using the completion openai-style endpoing rather than the chat openai-style endpoint. Then when a request was interrupted midstream I could append the tokens manually and continue it automatically. This made it behave nice and robustly! Still some small errors on my end but because of the retry behavior they turn into performance issues rather than crashes. Let's quote that code! The chatabout source is at https://github.com/karl3wm/zinc/blob/main/cli/chatabout.cpp . Remember it's just a quick script, my hope was to integrate all its parts into reusable components. Seems to be slow doing that ... Hmm this doesn't look like the latest version ... I thought I had dropped the completion tokens ... Anyway here's the region where I added the retry loop kinda from lines 176 to 219:
Log::log(zinc::span<StringViewPair>({ {"role", "user"}, {"content", msg}, }));
This is just local logging to a file I haven't added blockchain logging at this time. I was thinking if I observed myself doing that I could make it optional.
messages.emplace_back(HodgePodge::Message{.role="user", .content=move(msg)});
Here you can see the introduction of a HodgePodge class which I made quickly to bandaid the Targon issue. HodgePodge has a more flushed-out Message structure than my OpenAI class so as to handle the message parts that the large language model prompt generation templates use. The first line of the file now has "#include <zinc/hodgepodge.hpp>" inserted.
// it might be nice to terminate the request if more data is found on stdin, append the data, and retry // or otherwise provide for the user pasting some data then commenting on it or hitting enter a second time or whatnot
Doesn't look like I'm likely to prioritize that, unsure. Maybe !
prompt = HodgePodge::prompt_deepseek3(messages, "assistant" != messages.back().role);
Here's where it generates the prompt. HodgePodge::prompt_deepseek3 generates the prompt that the official DeepSeek3 tokenizer uses for its chat interfaces. It converts a sequence of message objects into a string of tokens to be appended to.
msg.clear();
It stores the assistant message in a std::string msg, appending to it as it streams in.
cerr << endl << "assistant: " << flush;
I use std::cerr for the chat label so that if the output is piped or redirected it doesn't include this decoration.
do { try { //for (auto&& part : client.chat(messages)) {
This was old code, when I just used the chat interface on the server. Some LLMs let you prompt the assistant in an unbroken manner using the chat interface, but right now I'm using DeepSeek V3 and their default prompt template doesn't allow this; it appends end-sentence tokens and such to the provided input, preventing the model from resuming where it left off.
std::string finish_reason; for (auto&& part : client.complete(prompt + msg)) {
Here I'm calling `complete` instead of `chat` with the raw prompt with any existing message content appended to it. As `msg` extends, the prompt extends too, as if it were still inside the generation loop on the server.
msg += part; cout << part << flush; try { finish_reason = part.data["finish_reason"].string(); } catch (std::out_of_range const&) {} } retry_assistant = finish_reason == "stop" ? false : true;
Here's the new normal c<i have experienced internal interruption. sending email as-is } catch (std::runtime_error const& e) { retry_assistant = true; cerr << "<..." << e.what() << "...reconnecting...>" << flush; /*} catch (std::system_error const& e) { if (e.code() == std::errc::resource_unavailable_try_again) { retry_assistant = true; cerr << "<...reconnecting...>" << flush; } else { throw; }*/ } Log::log(zinc::span<StringViewPair>({ {"role", "assistant"}, {"content", msg}, })); } while (retry_assistant); messages.emplace_back(HodgePodge::Message{.role="assistant", .content=move(msg)}); msg.clear();