Personally, I like it. However, I like being able to comment and upvote more. At the same time, I'd be reluctant to say the least to hand over my login credentials. It could be quite cool to see this turned into a FOSS RES-style browser extension. Or maybe even a commercial product. I already paid for the HACK app.
We were unfortunately disappointed to discover that, yes, Voyage, Cohere, and Jina all train on the data of their API customers by default.
Voyage's terms say:
> you grant Voyage AI (and its successors and assigns) a worldwide, irrevocable, perpetual, royalty-free, fully paid-up, right and license to use, copy, reproduce, distribute, prepare derivative works of, display and perform the Customer Content: ... (iii) to train, improve, and otherwise further develop the Service (such as by training the artificial intelligence models we use).
Cohere's terms say:
> YOU GRANT US A ... RIGHT TO ... USE ... ANY DATA ... TO ... IMPROVE AND ENHANCE THE COHERE SOLUTION AND OUR OTHER OFFERINGS AND BENCHMARK THE FOREGOING, INCLUDING BY SHARING API DATA AND FINETUNING DATA WITH THIRD PARTIES ...
Jina's terms say:
> Jina AI shall, subject to applicable mandatory data protection requirements, be entitled to retain data uploaded to the Jina AI Systems or otherwise provided by the Customer or collected by Jina AI in the course of providing the Services and to use such data in anonymized/pseudonymized format for its business purposes including to improve its artificial intelligence applications.
In my experience, maintaining a very popular software library, supporting open source, and blogging have absolutely all contributed to my success, and, additionally, as someone who is now a founder seeking like-minded, highly skilled engineers, those are key signals for an attractive hire.
I can understand though, perhaps in a work environment where management is unlikely to be able to retain high skilled talent, you may want 'low-profile' workers that aren't going to have as many competitors chasing after them...
Further to @dust42, BERT is an encoder, GPT is a decoder, and T5 is an encoder-decoder.
Encoder-decoders are not in vogue.
Encoders are favored for classification, extraction (eg, NER and extractive QA) and information retrieval.
Decoders are favored for text generation, summarization and translation.
Recent research (see, eg, the Ettin paper: https://arxiv.org/html/2507.11412v1 ) seems to confirm the previous understanding that encoders are indeed better for “encoder task” and vice-versa.
Fundamentally, both are transformers and so an encoder could be turned into a decoder or a decoder could be turned into an encoder.
The design difference comes down to bidirectional (ie, all tokens can attend to all other tokens) versus autoregressive attention (ie, the current token can only attend to the previous tokens).
You can use an encoder style architecture with decoder style output heads up top for denoising diffusion mode mask/blank filling.
They seem to be somewhat more expensive on short sequences than GPT style decoder-only models when you batch them, as you need fewer passes over the content and until sequence length blows up your KV cache throughout cost, fewer passes are cheaper.
But for situations that don't get request batching or where the context length is so heavy that you'd prefer to get to exploit memory locality on the attention computation, you'd benefit from diffusion mode decoding.
A nice side effect of the diffusion mode is that it's natural reliance on the bidirectional attention from the encoder layers provides much more flexible (and, critically, context-aware) understanding so as mentioned, later words can easily modulate earlier words like with "bank [of the river]"/"bank [in the park]"/"bank [got robbed]" or the classic of these days: telling an agent it did wrong and expecting it to in-context learn from the mistake (in practice decoder-only models basically merely get polluted from that, so you have to re-wind the conversation, because the later correction has literally no way of backwards-affecting the problematic tokens).
That said, the recent surge in training "reasoning" models to utilize thinking tokens that often get cut out of further conversation context, and all via a reinforcement learning process that's not merely RLHF/preference-conditioning, is actually quite related:
discrete denoising diffusion models can be trained as a RL scheme during pre training where the training step is provided the outcome goal and a masked version as the input query, and then trained to manage the work done in the individual steps on it's own to where it eventually produces the outcome goal, crucially without prescribing any order of filling in the masked tokens or how many to do in which step.
Until we got highly optimized decoder implementations, decoders for prefill were often even implemented by using the same implementation as an encoder, but logit-masking inputs using a causal mask before the attention softmax so that tokens could not attend to future tokens.
Over the past couple months, we, a team of Aussie legal and AI experts, have been working on building a new type of legal AI company — a company that, instead of trying to automate legal jobs, is trying to automate legal tasks.
We want to make lawyers’ lives easier, not replace them.
We’ve been laser-focused on building small and efficient yet still highly accurate, specialized models for some of the most time-consuming and mundane legal tasks lawyers have to perform. Stuff like running through a thousand contracts just to locate any clauses that would allow you to get out early.
We just finished training our first set of models, focused on document and clause classification, probably the most common problem we see come up. Our benchmarks show our models to be far more accurate and almost more efficient than their closest general-purpose competitors.
Today, we’re making those models publicly available via the Isaacus API, the world’s first legal AI API.
Our models don’t require any finetuning because they’re zero-shot classifiers — you give them a description of what you’re looking for (for example, “This is a confidentiality clause.”) and they pop out a classification score.
Because our models are so small, which they have to be to be able to process reams of legal data at scale, they can sometimes be a bit sensitive to prompts. To help with that, however, we’ve preoptimized an entire library of prompts, including what we call, universal templates, which let you plug in your own arbitrary descriptions of what you’ve looking for.
We’ve made our prompt library available via the Isaacus Query Language or IQL. Another world first — it’s a brand-new AI query language designed specifically for using AI models to analyze documents.
You can invoke query templates using the format “{IS <query_template_name>}”. You can also chain queries together using Boolean and mathematical operators, like so: “{This is a confidentiality clause.} AND {IS unilateral clause}”.
We think our API is pretty neat and we hope you will too.
This is just the beginning for us — over the course of this year, we’re planning on releasing text extraction and embedding models as well as a second generation of our Kanon legal foundational model.
MessagePack can encode rows as well and then you just need to manage linking the keys during deserialization. In fact, it can encode arbitrary binary without needing base64 like JSON.
Although MessagePack is definitely not a drop-in replacement for JSON, it is certainly extremely useful.
Unlike JSON, you can’t just open a MessagePack file in Notepad or vim and have it make sense. It’s often not human readable. So using MessagePack to store config files probably isn’t a good idea if you or your users will ever need to read them for debugging purposes.
But as a format for something like IPC or high-performance, low-latency communication in general, MessagePack brings serious improvements over JSON.
I recently had to build an inference server that needed to be able to communicate with an API server with minimal latency.
I started with gRPC and protobuf since it’s what everyone recommends, yet after a lot of benchmarking, I found a way faster method to be serving MessagePack over HTTP with a Litestar Python server (it’s much faster than FastAPI), using msgspec for super fast MessagePack encoding and ormsgpack for super fast decoding.
Not sure how this beat protobuf and gRPC but it did. Perhaps the Python implementation is just slow. It was still faster than JSON over HTTP, however.
reply