The Teknoids Website

Category work

Notes on better search 8/18/2023

Goal: better, more focused search for www.cali.org. In general the plan is to scrape the site to a vector database, enable embeddings of the vector db in Llama 2, provide API endpoints to search/find things. Hints and pointers. Llama2-webui –… Continue Reading →

Configuring Jupyter Notebook in Windows Subsystem Linux (WSL2) | by Cristian Saavedra Desmoineaux | Towards Data Science

Here’s a great quick start guide to getting Jupyter Notebook and Lab up and running with the Miniconda environment in WSL2 running Ubuntu. When you’re finished walking through the steps you’ll have a great data science space up and running… Continue Reading →

Demystifying Text Data with the unstructured Python Library | Saeed Esmaili

In the world of data, textual data stands out as being particularly complex. It doesn’t fall into neat rows and columns like numerical data does. As a side project, I’m in the process of developing my own personal AI assistant…. Continue Reading →

AI Reading List 7/6/2023

What I’m reading today. Researchers from Peking University Introduce ChatLaw: An Open-Source Legal Large Language Model with Integrated External Knowledge Bases — This includes links to the article and Github repo Why Embeddings Usually Outperform TF-IDF: Exploring the Power of… Continue Reading →

AI Reading List 7/5/2023

The longer holiday weekend  edition. Opportunities and Risks of LLMs for Scalable Deliberation with Polis — Polis is a platform that leverages machine intelligence to scale up deliberative processes. In this paper, we explore the opportunities and risks associated with… Continue Reading →

AI Reading List 6/28/2023

What I’m reading today. Semantic Search with Few Lines of Code — Use the sentence transformers library to implement a semantic search engine in minutes Choosing the Right Embedding Model: A Guide for LLM Applications — Optimizing LLM Applications with… Continue Reading →

AI Reading List 6/27/2023

What I’m reading today. How Unstructured and LlamaIndex can help bring the power of LLM’s to your own data All You Need to Know to Build Your First LLM App — A Step-by-Step Tutorial to Document Loaders, Embeddings, Vector Stores… Continue Reading →

An interesting approach to pruning large language models

Large language models (LLM) are notoriously huge and expensive to work with. An LLM requires a lot of specialized hardware to train and manipulate. We’ve seen efforts to transform and quantize the models that result in smaller footprints and models… Continue Reading →

Colarusso: Sample Notebook for Extracting Data from OCRed PDFs Using Regex and LLMs

One can use this notebook to build a pipeline to parse and extract data from OCRed PDF files. Warning: When using LLMs for entity extraction, be sure to perform extensive quality control. They are very susceptible to distracting language (latching… Continue Reading →

vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention

LLMs promise to fundamentally change how we use AI across all industries. However, actually serving these models is challenging and can be surprisingly slow even on expensive hardware. Today we are excited to introduce vLLM, an open-source library for fast… Continue Reading →

« Older posts Newer posts »

© 2024 Teknoids — Powered by WordPress

Theme by Anders NorenUp ↑