Export of Habr articles to html, markdown

Before me, 4 articles were written on exporting articles from habr to FB2 and pdf:

I wanted to get the source of my articles. Since for writing on Habré I use the old editor and write in markdown, then the main goal was to get the markdown source. HTML went like a trailer, because. I get articles from habr in it, and then using the library markdownify and some dances with a tambourine I get md.

How to use

The script is written in python, download from github, install dependencies and run:

apt-get install python3-lxml
pip3 install -r requirements.txt
git clone https://github.com/dvjdjvu/habrArticleSrcDownloader
cd habrArticleSrcDownloader
./src/main.py jessy_james

Instead of jessy_james substitute the name of the desired user. You can get it from the profile link:

After launch, we get the following picture:

./src/main.py jessy_james
[info]: Скачивается: C/C++ из Python (ctypes) на Android
[info]: Директория: 16 C C++ из Python (ctypes) на Android создана
[info]: Директория: picture создана
[info]: Статья: C C++ из Python (ctypes) на Android сохранена
[info]: Скачивается: Своя docking station для ноутбука
[info]: Директория: 15 Своя docking station для ноутбука создана
[info]: Директория: picture создана
[info]: Статья: Своя docking station для ноутбука сохранена
[info]: Скачивается: Tango Controls hdbpp-docker
[info]: Директория: 14 Tango Controls hdbpp-docker создана
[info]: Директория: picture создана

...

[info]: Скачивается: Игрушка ГАЗ-66 на пульте управления. Часть 2
[info]: Директория: 2 Игрушка ГАЗ-66 на пульте управления. Часть 2 создана
[info]: Директория: picture создана
[info]: Статья: Игрушка ГАЗ-66 на пульте управления. Часть 2 сохранена
[info]: Скачивается: Игрушка ГАЗ-66 на пульте управления. Часть 1
[info]: Директория: 1 Игрушка ГАЗ-66 на пульте управления. Часть 1 создана
[info]: Директория: picture создана
[info]: Статья: Игрушка ГАЗ-66 на пульте управления. Часть 1 сохранена

Articles are downloaded from the last written to the first.
The article directory hierarchy would be:

Folder is created article, then the folder with the username, then the folders with the names of the articles. There will be two files in the folder with the article (.md and .html) and a folder with pictures of the article.

We look at what happened, take the contents of the resulting markdown file and paste it into the article editor:

Almost works as I wanted ;), there are some problems with displaying images (the html source displays correctly). I’ll try to fix them.

Thank you for your attention.

Links

Similar Posts

Leave a Reply