Datapipes

Learn how to use LLMs and create datasets
with simple and reproducible notebooks.

✨ Datafast - NEW! ✨

A convenient way to create datasets

📊 Quickstart: Text Classification

Learn how to create text classification datasets with Datafast.

Open in Colab
🔗 Datafast GitHub Repository

Visit the Datafast GitHub repository for more information and resources.

Visit GitHub

How to use LLMs as a beginner

Work in progress, more guides coming...

🚀 How to use an OpenAI Chat model

Quick start guide to implementing OpenAI Chat models in your project.

Open in Colab
🧠 How to use Anthropic Claude model

A brief intro to using Anthropic’s Claude for chat or text completions.

Open in Colab
🤖 How to use Google Gemini model

Learn how to integrate Google’s Gemini into your workflow.

Open in Colab

Structured Output

Work in progress, more guides coming...

📝 Structured Output with OpenAI

Generate structured JSON or XML responses using OpenAI’s chat models.

Open in Colab
📝 Structured Output with Anthropic

Harness Anthropic’s Claude for structured data generation.

Open in Colab
📝 Structured Output with Google Gemini

Leverage Google’s Gemini for consistent, structured outputs.

Open in Colab

LLM Dataset Creation

Work in progress, more guides coming...

🔧 Simple Question Generation with Distilabel and OpenAI

Create a quick question-generating pipeline using Distilabel + OpenAI.

Open in Colab
🔧 Getting Started with Genstruct7B

Build your own text generation pipeline with the Genstruct7B model.

Open in Colab
🔧 Create a self-instruct pipeline using Distilabel + OpenAI

Create a self-instruct pipeline using Distilabel + OpenAI.

Open in Colab
🔧 Create a Text Classficiation Dataset (fluff detection) and publish dataset

We use the FREE Gemini API, an exisiting dataset of person, and dynamic prompts to generate a diverse dataset and publish it to your huggingface hub.

Open in Colab
📚 Everything you need to know to work with the 🤗 Datasets library

The most relevant pieces of the Hugging Face Datasets library documentation, with a focus on text data handling and processing.

Open in Colab

LLM Evaluation

Work in progress, more guides coming...

Automated Metrics

📊 Evaluation 101

Intro to basic metrics for evaluating language model outputs.

Open in Colab

Dataset Creation and Filtering

Learn how to create and filter datasets for your specific needs

🎯 Semantically Filter Existing Datasets

Learn how to filter existing datasets to kickstart domain-specific projects.

Open in Colab