DocVert User Guide¶
What is DocVert?¶
Imagine you have a PDF report or a Word (DOCX) file from work. DocVert extracts the content and converts it into a clean text file that preserves the document structure (headings, tables, lists).
The output is saved as Markdown (.md) — a simple text format you can open with any text editor (Notepad, TextEdit, VS Code, etc.).
When is DocVert useful?
- PDF text is garbled when you try to copy-paste it
- You need to convert Word files into a format other systems can use
- You have many documents to convert at once
Before You Start¶
What is a "Terminal"?¶
It's a program where you type commands as text instead of clicking buttons. Don't worry — you'll just copy and paste the commands from this guide. Almost no typing required.
How to copy and paste¶
| OS | Copy | Paste |
|---|---|---|
| Mac | Cmd + C |
Cmd + V |
| Windows (WSL terminal) | Select text to auto-copy | Right-click |
| Linux | Ctrl + Shift + C |
Ctrl + Shift + V |
Important: In a terminal,
Ctrl + Cdoes NOT copy — it stops the running program! On Linux/WSL, always useCtrl + Shift + Cto copy.
After pasting a command¶
Press Enter to run it.
When asked for a password¶
Some commands start with sudo, which asks for your password. When you type the password, nothing shows on screen — no dots, no asterisks. This is normal. Just type it and press Enter.
Which installation method?¶
| Your computer | Go to |
|---|---|
| Mac | Mac Installation |
| Windows | Windows Installation |
| Linux | Linux Installation |
| No internet (air-gapped) | Offline Installation |
About Docker Desktop: There is also a Docker-based installation method, but Docker Desktop permanently uses 2-4GB+ of RAM whenever your computer is on. This is heavy for casual users, so this guide recommends native installation without Docker for Mac, Windows, and Linux. Docker is only used for the offline/air-gapped scenario.
Mac Installation¶
Step 1: Open Terminal¶
- Press
Cmd + Space(a search bar appears in the center of the screen). - Type
Terminal. - Click on Terminal in the results.
A window with a blinking cursor on a dark (or white) background opens. This is where you'll paste all commands.
Step 2: Install Developer Tools¶
Paste this and press Enter:
What happens: A popup appears. Click "Install" and wait 5-10 minutes.
If it says "already installed": That's fine. Move to the next step.
Step 3: Install Homebrew¶
Homebrew is a tool that makes it easy to install programs on Mac.
What happens: Installation progress is shown. It may ask for your Mac login password.
After it finishes: You'll see ==> Next steps: at the bottom. You must copy and run the commands shown there. They usually look like:
echo >> ~/.zprofile
echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> ~/.zprofile
eval "$(/opt/homebrew/bin/brew shellenv)"
Verify it worked:
If you see Homebrew 4.x.x, it's working.
Step 4: Install Required Programs¶
What happens: Programs are downloaded and installed (1-3 minutes).
What are these?
poppler— extracts text from PDF filestesseract— reads text from images (OCR)libmagic— detects file types automatically
Step 5: Install uv¶
After this, close Terminal completely and reopen it (required!).
What is uv? It automatically installs and manages the Python environment DocVert needs. You do NOT need to install Python yourself.
Step 6: Download DocVert¶
What you see:
Now enter the downloaded folder:
What does
cdmean? "Change directory" — it moves you into thedocvertfolder.
Install the required libraries:
What happens: Libraries are downloaded (2-5 minutes the first time).
Step 7: Verify Installation¶
If successful, you see:
Installation complete! Go to How to Use DocVert.
Windows Installation¶
On Windows, you first install WSL (Windows Subsystem for Linux) — an official Microsoft feature that lets you run Linux inside Windows.
Step 1: Install WSL¶
- Click the Start button (bottom-left corner).
- Type
PowerShell. - Right-click on "Windows PowerShell" → "Run as Administrator".
- Click "Yes" when asked to allow changes.
- In the blue PowerShell window, paste this and press Enter:
- Restart your computer when prompted.
After restart: An Ubuntu window opens automatically. Set up:
- Username: Type any lowercase name (e.g.,
myname) - Password: Type any password. Nothing shows on screen — that's normal. Type it and press Enter. Type it again to confirm.
Requires Windows 10 (version 2004+) or Windows 11. Check: Start → Settings → System → About → "Windows specifications"
Step 2: Install Required Programs¶
In the Ubuntu terminal (black window):
Enter the password you set in Step 1 (nothing shows — normal).
Step 3: Install uv¶
Step 4: Download DocVert¶
Step 5: Verify¶
If you see the help text, go to How to Use DocVert.
Finding Your Windows Files from WSL¶
| Windows path | WSL path |
|---|---|
C:\Users\John\Desktop\ |
/mnt/c/Users/John/Desktop/ |
C:\Users\John\Documents\ |
/mnt/c/Users/John/Documents/ |
D:\Work\ |
/mnt/d/Work/ |
Don't know your username? Open File Explorer and look at
C:\Users\.
Linux Installation¶
Ubuntu / Debian¶
Verify: uv run python -m docvert.cli.main --help
Fedora / RHEL / CentOS¶
Then same as Ubuntu (curl ... uv → git clone ... → uv sync).
Offline Installation¶
For secure networks with no internet access. Requires Docker or Podman on the target machine.
What you need¶
- One computer WITH internet
- USB drive (4GB+ recommended)
- Docker or Podman on the air-gapped machine
On the internet-connected computer¶
- Open https://github.com/seonghobae/docvert/releases in a browser.
- Find the latest version at the top.
- Download ALL files named
docvert-offline-release.tar.gz.part-aa,part-ab,part-ac, etc. - Copy them to a USB drive.
On the air-gapped machine¶
# Go to where the files are
cd /path/to/usb/files
# Combine split files into one
cat docvert-offline-release.tar.gz.part-* > docvert-offline-release.tar.gz
# Extract
tar -xzvf docvert-offline-release.tar.gz
cd docvert-offline-release
# Install into Docker
docker load -i docvert-offline.tar.gz
Verify: docker images | grep docvert — if you see docvert, it worked!
Using it offline¶
docker run --rm -v "$(pwd)":/data docvert:offline convert /data/report.pdf --output-dir /data/results
How to Use DocVert¶
Convert a single PDF file¶
Scenario: You have report.pdf on your Desktop and want to convert it.
On Mac:
On Windows (WSL):
cd /mnt/c/Users/YourName/Desktop
uv run python -m docvert.cli.main convert ./report.pdf --output-dir ./results
What each part means:
Part Meaning uv run python -m docvert.cli.mainRun the DocVert program convert"Convert one file" ./report.pdfThe file to convert --output-dir ./resultsWhere to save the output (auto-created)
If successful:
Convert a Word (DOCX) file¶
Convert all files in a folder¶
batchautomatically finds and converts ALL PDF and DOCX files in the folder, including subfolders.
What you get after conversion¶
results/
├── report.md ← converted text file (open with any text editor)
├── report.conversion.json ← conversion details (parser used, warnings)
└── report.assets/ ← images extracted from the document
├── image_0.png
└── image_1.png
Troubleshooting¶
"command not found: uv"¶
Close Terminal completely and reopen. If that doesn't work:
"command not found: git"¶
- Mac: Run
xcode-select --installagain - Ubuntu/Debian:
sudo apt install -y git - Fedora/RHEL:
sudo dnf install -y git
"pdfinfo not found"¶
- Mac:
brew install poppler - Ubuntu/Debian:
sudo apt install -y poppler-utils
"tesseract is not installed"¶
- Mac:
brew install tesseract - Ubuntu/Debian:
sudo apt install -y tesseract-ocr
Do I need to install Python?¶
No. uv downloads it automatically.
How do I update DocVert?¶
"No such file or directory" when I type cd docvert¶
DocVert is probably in your home folder: