Thursday, 29 May 2025

Convert HTML content to Markdown using LLM and Ollama

https://www.glukhov.org/post/2025/05/html-to-markdown-using-llm/ ReaderLM-v2 I have tried the next such model version - reader-lm-v2. ReaderLM-v2 is built on Qwen2.5-1.5B-Instruction. I can confirm: it works, but the conversion is somehow slow-ish… Can you imagine the 500KB html webpage that you need to convert extract a text from? Maybe there is 100000 tokens? or let it be even 10k tokens. I took a sample page of 121KB and conversion time on my PC is: ~1sec. Calling Ollama Commandline #!/bin/bash MODEL="milkey/reader-lm-v2:latest" INPUT_FILE="prompt.html" OUTPUT_FILE="response.md" # Read file content as prompt PROMPT="Extract the main content from the given HTML and convert it to Markdown format.\nhtml:\n $(cat "$INPUT_FILE")" # Call Ollama and save the response ollama run "$MODEL" "$PROMPT" > "$OUTPUT_FILE" echo "Ollama response saved to $OUTPUT_FILE"

No comments:

Post a Comment