Quick Reads (Under 5 Minutes)

Curated Datasets for Language Models

Greenshoots in development of curated model training datasets build of open source, licensed, content from wired: https://lnkd.in/eqEDEnNA

“Fairly Trained announced that it has awarded its first certification(https://lnkd.in/eFaStpGh) for a large language model built without copyright infringement

KL3M and was developed by Chicago-based legal tech consultancy startup 273 Ventures, using a curated training dataset of legal, financial, and regulatory documents… for ‘risk-averse’ clients like law firms.

On Wednesday, researchers released what they claim is the largest available AI dataset for language models composed purely of public domain content. Common Corpus, as it is called… has been posted to the open source AI platform Hugging Face” (https://lnkd.in/ePaWxsQX

Technology AIOpen Source

Leveraging LLM for Codebase Migration

In my research project using LLM assistance to convert a codebase from one language to another, in about a day, I’ve been able to incorporate the existing react front end to the new codebase. This involved: copying it across, creating separate builds for front and back end, updating dependencies to work with a current version of node. Dealing with the resulting configuration, dependency, scoping and circular dependency issues.

The LLM assistance was a huge accelerator. I am very rusty with webpack and unfamiliar with React. That said, I had to climb enough of the learning curve to ask good questions before I unlocked the results I needed. And it was a co-worker who suggested I monorepo the front and back ends.

Technology LLMReact

Enhancing AI Project Backlog Graphics

Reworked my example of working with AI assistance to build a project backlog. Made the graphic more legible.

Product Management AIProject Management

Enhancing Generative AI Interactions

This insight, that iterative, multi-agent interactions drive better output from generative AI systems is profound and I appreciate the thought that it’s how we use these technologies that will represent the short term leap forward rather than the step change improvement of the individual models.

Pulling it even more to the now, think about this approach in how you interact in threaded chat discussions. Rather than simply asking it to do the thing, build up context by asking framing questions, reasoning with it over how to approach the problem and where to focus, before asking it to generate the output.

(simple example https://lnkd.in/e-xsDMri)

Technology Generative AIIterative Learning

Enhancing Generative AI Through Interaction

This insight, that iterative, multi-agent interactions drive better output from generative AI systems is profound and I appreciate the thought that it’s how we use these technologies that will represent the short term leap forward rather than the step change improvement of the individual models.

Pulling it even more to the now, think about this approach in how you interact in threaded chat discussions. Rather than simply asking it to do the thing, build up context by asking framing questions, reasoning with it over how to approach the problem and where to focus, before asking it to generate the output.

(simple example https://lnkd.in/e-xsDMri)

Technology Generative AICollaboration

Understanding Retrieval Augmented Generation

I know Neal and am an advocate for his teams’ platform, eyelevel.

I love their explanation of what Retrieval Augmented Generation (RAG) does:

“RAG exists for two key reasons:

• To allow you to inject new information into a pre-trained language model

• To minimize hallucination by simplifying problems”

It’s that second motivation that gets lost in all the exuberance over expanding context windows.

Creating an observable set of verifiable, relevant, and correct inputs will lead to more accurate and safer outputs. (shocking, I know)

As responsible technologists we we need to maintain our accountability through better transparency and tracability in the outputs generated by autonomous and semi-autonomous systems.

I also love the observation that addressing the problem of hallucinations is “not magic, it’s hard”.

Technology RAGtechnology

Concerns Over Password Restrictions

Why would a site restrict password length below what is created by a strong password generator? “Cannot exceed 20 characters.” This for an (“ai powered”) security service promoted to help me protect my infra. Legacy system integrations? Outdated policies? Both very reassuring.

Technology securitypasswords

The Impact of De-Anonymization on Reviews

I believe it's crucial to recognize how smaller companies often have an edge in identifying who wrote reviews, especially when they can access social channels on platforms like Glassdoor. This raises important questions about the social and technical implications of de-anonymization. We need to think critically about how this affects transparency and trust in feedback.

Culture de-anonymizationworkplace culture

Refactoring Mailer.ts for Flexibility

Learned my lesson. Oh, Claude3… “I want to refactor mailer.ts to accept two optional arguments, apikey and transport. If it receives the apikey it will create the sendgrid transport, otherwise it will use the transport passed in. I then want my test to create a mock object that can act as the transport and test all the expectations against that so that I don’t have to mock nodemailer.

Technology refactoringtesting

Using LLMs for Project Backlogs

I try to use LLM chat assistants responsibly. Using the speed of the LLM to reduce repetitive work, organize content and help generate insights, while ensuring people review and are ultimately accountable for the content. Here’s an example of how I build a software project backlog. Here’s a more readible version of this flow: https://lnkd.in/e_d76NkG

Product Management LLMproject management

Navigating Code Coverage Challenges

Chaotic day. In the quiet moments, I worked on tests. As I close coverage gaps I find bugs in the newly converted code. Valuable as that is, it doesn’t verify behavior matches the original code. So, great prep with AI assisted rewrites is beefing up test coverage in the old code.

Technology testingsoftware development

Boosting Test Coverage with AI Assistance

A day where I could barely string 30 min. I raised test coverage from 75% to 96%. 50 additional tests cases. Fixtures. Refactoring. Bug fixing. And remember, this is my second week with node.js and jest in six years. So, the AI assistance is significantly increasing my speed.

Technology Node.jsJest