Lesson 3 of 5
Capture & Organize Workflow
Estimated time: 8 minutes
Capture & Organize Workflow
Your notes apps are connected — but what about everything else? The article you read on your phone, the podcast insight during your commute, the random idea at 2 AM. In this lesson, you'll build capture pipelines so nothing valuable slips through the cracks.
Prerequisites
The Capture Pipeline
Input Sources Processing Knowledge Base
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Chat message │ │ Extract text │ │ │
│ Web clip │ │ Identify type │ │ Chunked │
│ PDF upload │────────>│ Add metadata │───────>│ Embedded │
│ Voice note │ │ Auto-tag │ │ Tagged │
│ Email forward │ │ Chunk & embed │ │ Linked │
│ Screenshot │ │ │ │ │
└──────────────┘ └──────────────┘ └──────────────┘
Quick Capture via Chat
The fastest way to save something is to tell the bot directly. No context-switching, no opening another app.
You: Remember: the best time to send marketing emails
is Tuesday 10am according to the HubSpot studyBot: Saved to your knowledge base.
Tags: #marketing #email #research
Source: manual capture (chat)You: Save this quote: "The best way to predict the future
is to invent it" - Alan KayBot: Saved quote by Alan Kay.
Tags: #quotes #innovation
Source: manual capture (chat)Trigger words that activate capture: remember, save this, note that, capture, store.
Bulk Capture
After a meeting or conference, just dump everything into chat: "Remember these key takeaways from today's product meeting: 1) We're targeting Q3 for launch, 2) Budget approved for $50k, 3) Sarah is the new PM." The bot saves it all as one structured entry.
Web Clipping
Save articles and web pages without leaving your browser.
Install the OpenClaw Web Clipper from your browser's extension store. When you find something worth saving:
- Click the OpenClaw icon in your toolbar
- Choose: Full page, Selection, or Simplified article
- Add optional tags or notes
- Click Save to Second Brain
The extension extracts clean text (stripping ads and navigation), adds it to your knowledge base, and syncs within seconds.
openclaw extensions install web-clipperPDF and Document Upload
Research papers, reports, ebooks — PDFs contain some of the most valuable knowledge, and they're notoriously hard to search later.
documents:
watch_folders:
path: "~/Documents/Research"
auto_index: true
file_types: [pdf, docx, epub]
path: "~/Downloads"
auto_index: false # Only index when manually triggered
file_types: [pdf]
pdf_processing:
ocr: true # Handle scanned PDFs
extract_images: false # Skip image extraction
table_extraction: true # Convert tables to structured text
max_pages: 500 # Safety limitYou: [Attaches quarterly-report-q4.pdf]
Index this reportBot: Processing "Quarterly Report Q4 2024" (42 pages)...Indexed successfully:
42 pages processed
156 chunks created
Key topics: revenue growth, customer acquisition,
product roadmap, hiring plan
Tags: #business #quarterly-report #2024
Sample queries you could ask:
"What was Q4 revenue growth?"
"What's on the product roadmap?"
"How many new hires are planned?"Auto-Tagging and Organization
You should not have to manually organize everything. Configure auto-tagging rules so content is categorized as it arrives.
auto_tagging:
enabled: true
ai_tags: true # Let AI suggest tags based on content
max_ai_tags: 5 # Limit auto-generated tagsrules:
match: "source:notion AND workspace:Work"
tags: [work]
match: "source:obsidian AND path:Books/**"
tags: [book-notes, reading]
match: "content contains 'quarterly'"
tags: [business, reports]
match: "source:web-clip AND domain:arxiv.org"
tags: [research, academic]
collections: # Group related content
name: "Conference Notes"
auto_add: "tag:conference"
name: "Book Summaries"
auto_add: "tag:book-notes"
name: "Work Projects"
auto_add: "tag:work AND tag:project"[You clip an article from hbr.org]Bot: Saved "Why Digital Transformations Fail" (hbr.org)Auto-tags applied:
#business (rule: domain hbr.org)
#digital-transformation (AI-suggested)
#management (AI-suggested)
#strategy (AI-suggested)
Added to collection: "Work Projects" (matched: business + strategy)For audio/video content, use OpenClaw's transcript pipeline:
You: Index this podcast: https://youtube.com/watch?v=exampleBot: Fetching transcript for "Lex Fridman #412 — Sam Altman"...Transcript indexed:
Duration: 2h 34m
12,400 words across 84 chunks
Key topics: AGI timeline, safety research,
compute scaling, OpenAI governance
Tags: #podcast #ai #interviews
You can now ask questions like:
"What did Sam Altman say about AGI timelines on Lex Fridman?"Pairs well with the YouTube & Podcast Factory course.
When the same content arrives from multiple sources (you clip an article AND someone shares it in your Notion workspace), OpenClaw deduplicates:
- Content hashing detects identical or near-identical text
- URL matching catches the same web page saved twice
- Fuzzy matching identifies paraphrased duplicates (>90% similarity)
Duplicates are merged, keeping metadata from both sources. You'll never see the same quote twice in search results.
What is the most important design principle for a knowledge capture workflow?