Thursday, 29 May 2025
Convert HTML content to Markdown using LLM and Ollama
https://www.glukhov.org/post/2025/05/html-to-markdown-using-llm/
ReaderLM-v2
I have tried the next such model version - reader-lm-v2. ReaderLM-v2 is built on Qwen2.5-1.5B-Instruction. I can confirm: it works, but the conversion is somehow slow-ish…
Can you imagine the 500KB html webpage that you need to convert extract a text from? Maybe there is 100000 tokens? or let it be even 10k tokens.
I took a sample page of 121KB and conversion time on my PC is: ~1sec.
Calling Ollama Commandline
#!/bin/bash
MODEL="milkey/reader-lm-v2:latest"
INPUT_FILE="prompt.html"
OUTPUT_FILE="response.md"
# Read file content as prompt
PROMPT="Extract the main content from the given HTML and convert it to Markdown format.\nhtml:\n $(cat "$INPUT_FILE")"
# Call Ollama and save the response
ollama run "$MODEL" "$PROMPT" > "$OUTPUT_FILE"
echo "Ollama response saved to $OUTPUT_FILE"
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment