Basics for a simple slack bot which crawls websites

Writing a slack bot which pushes content to a workspace is quite simple and fast to do. So if there is an updating piece of information in the internet from which your teams needs regular updates this is an easy. In my case this is the weekly food plan.

I decided to go with requests-html, tinydb and of course slackclient. Please not that requests-html needs at least Python 3.6.

I’ll walk you through the important parts on GitHub

First setup a SlackClient and HTMLSession:

1slack_token = os.environ["SLACK_BOT_TOKEN"]
2sc = SlackClient(slack_token)
3session = HTMLSession()

Next crawl the content from the html page and select the interesting parts:

1r = session.get('https://tuerantuer.de/cafe/wochenplan/')
2yummyImages = r.html.find(".site-content", first=True).find('img[class*=wp-image-]')

Note: Make sure you have the permission to crawl the page!\ Note: This does not work if the page does not render the page on the server!

The last step is to post the content to the slack:

 1for yummyImage in yummyImages:
 2    imageUrl = yummyImage.attrs['src']
 3
 4    result = sc.api_call(
 5        "chat.postMessage",
 6        channel=CHANNEL,
 7        text=MESSAGE,
 8        attachments=[{
 9            "fallback": "Wochenplan from Cafe TaT",
10            "image_url": imageUrl
11        }]
12    )
13
14    if not result["ok"]:
15        print(result)
16print("Failed to send message to Slack")

Run this script as a cron job every hour to post updates! There is no need to use Web Hooks as this is only pushing to the slack.