Loading...

Reddit Monitoring with Python

Leon Wei | segue.co
Leon Wei
Share:

How to promote on reddit | segue

Introduction:

Reddit is the second-most popular website in the United States, with more than 300 million unique visitors per month.

It's also one of the most trafficked sites on the internet and has become an important part of online marketing strategy for brands across industries.

This article will show you how to programmatically set up a keyword monitor service on Reddit using Python and the PRAW library.

You will also learn how to set up this web service using Django and run it on your local machine to monitor Reddit, save the leads into a database automatically.

Don't have time to follow along?

All the code can be found here:

https://github.com/theleonwei/reddit_bot

All of the code and step-by-step instructions are based on the assumption that you are on a  Mac.


 

Table of contents:

  1. Setting up the Django Project with the CookieCutter template
  2. Register a Reddit application and install the PRAW library
  3. Keyword monitoring with regular expression
  4. Persisting data to a Postgres database
  5. Leads report view
  6. Schedule a cron job to check Reddit periodically
  7. How does this compare to Segue

 

Setting up the Django Project with the CookieCutter template

Cookiecutter Django is a framework for jumpstarting production-ready Django projects quickly.

To learn more, you can visit their official website on Github:

https://github.com/cookiecutter/cookiecutter-django

Step 1: install cookiecutter.

Open up your favorite terminal app (mine is iTerm2), and install the latest cookiecutter.

pip install "cookiecutter>=1.7.0"

 

Step 2: start a Django project.

cookiecutter https://github.com/cookiecutter/cookiecutter-django

 

Follow the instruction, answer the questions, and set up the project; here are my choices: 

 

project_name [My Awesome Project]: Reddit Bot
project_slug [reddit_bot]:
description [Behold My Awesome Project!]: My awesome Reddit Bot Project
author_name [Daniel Roy Greenfeld]: Leon W
domain_name [example.com]:
email [leon-w@example.com]:
version [0.1.0]:
Select open_source_license:
1 - MIT
2 - BSD
3 - GPLv3
4 - Apache Software License 2.0
5 - Not open source
Choose from 1, 2, 3, 4, 5 [1]: 5
timezone [UTC]: US/Pacific
windows [n]:
use_pycharm [n]: y
use_docker [n]:
Select postgresql_version:
1 - 14
2 - 13
3 - 12
4 - 11
5 - 10
Choose from 1, 2, 3, 4, 5 [1]:
Select cloud_provider:
1 - AWS
2 - GCP
3 - None
Choose from 1, 2, 3 [1]:
Select mail_service:
1 - Mailgun
2 - Amazon SES
3 - Mailjet
4 - Mandrill
5 - Postmark
6 - Sendgrid
7 - SendinBlue
8 - SparkPost
9 - Other SMTP
Choose from 1, 2, 3, 4, 5, 6, 7, 8, 9 [1]:
use_async [n]: n
use_drf [n]: n
Select frontend_pipeline:
1 - None
2 - Django Compressor
3 - Gulp
Choose from 1, 2, 3 [1]:
use_celery [n]: n
use_mailhog [n]: n
use_sentry [n]: n
use_whitenoise [n]: n
use_heroku [n]: y
Select ci_tool:
1 - None
2 - Travis
3 - Gitlab
4 - Github
Choose from 1, 2, 3, 4 [1]:
keep_local_envs_in_vcs [y]: n
debug [n]: n
 [SUCCESS]: Project initialized, keep up the good work!

 

Step 3: Install dependencies

cd reddit_bot; ls

 

Those are the files and directories that we have got so far.

 

Procfile  locale  reddit_bot  setup.cfg
README.md  manage.py  requirements  utility
config  merge_production_dotenvs_in_dotenv.py requirements.txt
docs  pytest.ini  runtime.txt

 

Step 3.1: Create a virtual environment.
reddit_bot ➤ python3 -m venv ./venv

 

After, you should see a newly created venv folder.

 

Procfile  locale reddit_bot  setup.cfg
README.md  manage.py requirements  utility
config  merge_production_dotenvs_in_dotenv.py requirements.txt  venv
docs  pytest.ini  runtime.txt

 

Step 3.2: Activate the virtual environment and install all dependencies.
source venv/bin/activate

 

Notice there is a (venv) prompt, which means you have successfully activated the virtual environment.

Note: if you have not installed a Postgres database on your  Mac, you must install it first.

Check this article on Postgres installation on Mac for more details.

Install the dependencies from the local.txt files (slightly different than production requirements as it gives you more tools for debugging and testing.)

(venv) reddit_bot ➤
(venv) reddit_bot ➤ pip install -r requirements/local.txt

 

Step 3.3: Create a local database reddit_bot
(venv) reddit_bot ➤ createdb reddit_bot

 

After that, start the  Django web server for testing.

 

(venv) reddit_bot ➤ python manage.py runserver

 

You will probably see something like the following.

 

Watching for file changes with StatReloader
INFO 2022-09-17 11:28:04,131 autoreload 17789 4335895936 Watching for file changes with StatReloader
Performing system checks...

System check identified no issues (0 silenced).

You have 28 unapplied migration(s). Your project may not work properly until you apply the migrations for app(s): account, admin, auth, contenttypes, sessions, sites, socialaccount, users.
Run 'python manage.py migrate' to apply them.
September 17, 2022 - 11:28:04
Django version 3.2.15, using settings 'config.settings.local'
Starting development server at http://127.0.0.1:8000/
Quit the server with CONTROL-C.
[17/Sep/2022 11:28:16] "GET / HTTP/1.1" 200 13541
[17/Sep/2022 11:28:16] "GET /static/debug_toolbar/css/toolbar.css HTTP/1.1" 200 11815
[17/Sep/2022 11:28:16] "GET /static/css/project.css HTTP/1.1" 200 228
[17/Sep/2022 11:28:16] "GET /static/debug_toolbar/css/print.css HTTP/1.1" 200 43
[17/Sep/2022 11:28:16] "GET /static/debug_toolbar/js/toolbar.js HTTP/1.1" 200 12528
[17/Sep/2022 11:28:16] "GET /static/js/project.js HTTP/1.1" 200 45
[17/Sep/2022 11:28:16] "GET /static/debug_toolbar/js/utils.js HTTP/1.1" 200 4479
[17/Sep/2022 11:28:16] "GET /static/images/favicons/favicon.ico HTTP/1.1" 200 8348

 

Since the database is brand new, we need to initialize it with some built-in Django tables.

 

(venv) reddit_bot ➤ python manage.py migrate

 

Once the migration is completed, restart the server.

 

(venv) reddit_bot ➤ python manage.py migrate

 

open your favorite browser (I am using the latest Chrome) and visit localhost:8000; you should see something like the following

reddit monitoring with django tutorial | segue.co

Congratulations on finishing setting up your local Django server; next, let's set up our Reddit account and install the Reddit API library: PRAW.

 

Register a Reddit application and install the PRAW library

We assume you already have a Reddit account to set up a Reddit developer account. If not, simply visit https://reddit.com and create one, then come back.

You must first register an application of the appropriate type on Reddit.

Then Visit https://www.reddit.com/prefs/apps/ 

Note: sometimes, I am having some page redirect issues when visiting the above page; if that happens to you, try visiting the following instead:

https://old.reddit.com/prefs/apps/

Scroll down and click the create another app... button

reddit monitoring tutorial | register application

 

For the name of your app, anything should be fine

On the app type: since we will be building something running in the backend, choose script.

And for redirect uri, enter http://localhost:8000 

reddit monitoring create app | segue.co

Then click the create app button to finish this step.

reddit monitoring app client secret | segue.co

There are two tokens that we will need for our service to run: 

1. client id: the one beneath personal use script

2. client secret: the one to the right of secret

 

Finally, we need to install the PRAW library and try to connect with Reddit using the secret keys from the last step.

(venv) reddit_bot ➤ pip install praw

 

Next, we append PRAW to the dependency (otherwise, the service won't work when we deploy to production:

(venv) reddit_bot ➤ pip freeze | grep praw >> requirements/base.txt

 

It's a common best practice not to save sensitive information such as your app secret tokens in the git repository, so let's create a new file .env at the root of your project so we can access the app secret on the local machine.

(venv) reddit_bot ➤ vim .env

 

Reddit monitoring environment variables | segue.co

 

Replace the secret and the client_id with yours from the last step.

For the user agent, it's not that important. To find out exactly what your user agent is, simply go to google and type "find my user agent" and copy and paste yours into the .env file.

Next, make sure you add .env into the list of files that will not be checked into the git repository by editing the .gitignore file.

 

(venv) reddit_bot ➤ vim .gitignore

 

And add .env to the file; after that, run the following to load the environment variables.

Test your Reddit connection. In your terminal, run the following:

(venv) reddit_bot ➤ source .env

 

We also need to load the variables into Django's environment variables.

Open up the config/settings/local.py file (I am using Pycharm) and add the following:

# Reddit settings:
REDDIT_SECRET=env('SECRET')
REDDIT_CLIENT_ID=env('CLIENT_ID')
REDDIT_USERAGENT=env('REDDIT_USERAGENT')

 

Reddit Monitoring Django Load Reddit App Secret | segue.co

 

In the terminal

Launch the Django console

(venv) reddit_bot ➤ python manage.py shell

 

In [1]: from django.conf import settings

In [2]: import praw

In [3]: reddit = praw.Reddit(
   ...:     client_id=settings.REDDIT_CLIENT_ID,
   ...:     client_secret=settings.REDDIT_SECRET,
   ...:     user_agent=settings.REDDIT_USERAGENT,
   ...: )

If you don't receive any error message, it means you have successfully created a Reddit instance through PRAW.

Next, let's run a simple task to check the connection.

In [4]: for submission in reddit.subreddit("marketing").hot(limit=10):
   ...:     print(submission.title)
   ...:
New Job Listings
Sorry if this isn't allowed, but I recently created a subreddit focused on the business side of art, and would love for people to go there and share their knowledge and experiences.
I read privacy and policies of Tiktok, IG and Other Platforms. Here’s what I learned about Social Media Platforms!
Has anyone actually worked with an impressive agency?
How to bring first customers to shop?
Facebook ad numbers don't ad up
Beginning my career in marketing, looking to go in to an Agency
I hate digital marketing - help me find a new role
Where to start for ecom store?
Making a website

 

If you see something similar to the above, congratulations, you've successfully connected with Reddit's official API and retrieved the top 10 hot posts from r/marketing, congratulations!


 

Keyword monitoring with regular expression

Assuming you run a Facebook ad agency and your target customers are new to Facebook ads, wouldn't it be nice if you could respond to someone who has questions about Facebook ads on Reddit?

Chiming in and joining a conversation on Reddit will help:

1. Establish your reputation as a Facebook ads expert;

2. Spread the word about your service to the world's largest online community and drive quality traffic to your website;

3. If you get enough votes, your response may become a backlink to improve your SEO

For this article, we will show you how to find any posts whose title contains the phrase 'facebook'. 

Of course, you can continue optimizing this matching rule and develop your own solution.

For a more advanced matching algorithm, feel free to check out Segue's Reddit lead generation engine based on a state-of-the-art NLP semantics search.

Again, we first open up a Django console.

(venv) reddit_bot ➤ python manage.py shell

 

Inside of the Django console:

 

import praw
from django.conf import settings

import re # new, the python regular expression library

keyword =  "facebook"

reddit = praw.Reddit(
     client_id=settings.REDDIT_CLIENT_ID,
     client_secret=settings.REDDIT_SECRET,
     user_agent=settings.REDDIT_USERAGENT,
)

for submission in reddit.subreddit("marketing").hot(limit=100): # we are searching 100 hottest posts on the marketing subreddit
    if re.search(keyword, submission.title, re.IGNORECASE): # new, notice we are ignoring the case sensitivity
        print(submission.title)
   

 

Your results may differ from mine (as we ran this search on September 18th, 2022).

 

In [2]: for submission in reddit.subreddit("marketing").hot(limit=100):
   ...:     if re.search(keyword, submission.title, re.IGNORECASE): # new, notice we are ignoring the case sensitivity
   ...:         print(submission.title)
   ...:
Facebook ad numbers don't ad up
Facebook & Instagram Ads campaign Setup
Measuring the impact of Facebook
How many interests is too many Interest - Facebook Ads

 

We've found four discussions about Facebook without too much work. How awesome is it!

Let's save those results and other metadata such as URLs, post date, and content into a database so we don't lose them.

 


 

Persisting data to a Postgres database.

Let's first create a new app

# In the root dir of your project
(venv) reddit_bot ➤ django-admin startapp reddit

 

Note: we also need to move the newly created app to the reddit_bot subdirectory. This step is important due to the way cookie-cutter structured our project.

 

mv reddit ./reddit_bot

 

Next, let's update the apps.py config file.

The default name is "reddit", we need to update it to "reddit_bot.reddit" since we have moved it from the root directory to the subdirectory.

Reddit monitoring Django apps config |segue.co

 

And don't forget to include this app in the base.py config file.

Reddit monitoring add app to config | segue.co

 

Now let's open up the models.py file and add our first class.

class Lead(models.Model):
    post_id = models.CharField(max_length=10) # Original post id
    title = models.TextField()
    content = models.TextField()
    posted_at = models.DateTimeField()
    url = models.URLField(max_length=500)

 

Reddit monitoring add class | segue.co

 

In the command line, let's install the app and model and make the migration.

(venv) reddit_bot ➤ python manage.py makemigrations
Migrations for 'reddit':
  reddit_bot/reddit/migrations/0001_initial.py
    - Create model Lead

(venv) reddit_bot ➤ python manage.py migrate
Operations to perform:
  Apply all migrations: account, admin, auth, contenttypes, reddit, sessions, sites, socialaccount, users
Running migrations:
  Applying reddit.0001_initial... OK

 

Next, let's create a command line script to execute the keyword matching and save the results into the Lead model.

For this script, let's call it lead_finder.py and put it under reddit/management/commands folder.

First we need to create the two folders:

# Move to the reddit app directory
(venv) reddit_bot ➤ cd reddit_bot/reddit
(venv) reddit ➤ mkdir management
(venv) reddit ➤ mkdir management/commands

 

# Inside reddit_bot/reddit_bot/reddit/management/commands/lead_finder.py file
import datetime as DT
import re

import praw
from django.conf import settings
from django.core.management.base import BaseCommand
from django.utils import timezone
from django.utils.timezone import make_aware

from reddit_bot.reddit.models import Lead

KEYWORD = "facebook"
SUBREDDIT = 'marketing'

reddit = praw.Reddit(
    client_id=settings.REDDIT_CLIENT_ID,
    client_secret=settings.REDDIT_SECRET,
    user_agent=settings.REDDIT_USERAGENT,
)

def convert_to_ts(unix_time):
    try:
        ts = make_aware(DT.datetime.fromtimestamp(unix_time))
        return ts
    except:
        print(f"Converting utc failed for {unix_time}")
        return None


def populate_lead(keyword, subreddit):
    for submission in reddit.subreddit(subreddit).hot(limit=100):
        if re.search(keyword, submission.title, re.IGNORECASE):
            if not Lead.objects.filter(post_id = submission.id):
                Lead.objects.create(post_id=submission.id,
                                    title=submission.title,
                                    url=submission.permalink,
                                    content=submission.selftext,
                                    posted_at=convert_to_ts(submission.created_utc))


class Command(BaseCommand):
    help = 'Populating leads'

    def handle(self, *args, **kwargs):
        try:
            current_time = timezone.now()
            self.stdout.write(f'Populating leads at {(current_time)}')
            populate_lead(KEYWORD, SUBREDDIT)
        except BaseException as e:
            current_time = timezone.now().strftime('%X')
            self.stdout.write(self.style.ERROR(f'Populating feeds failed at {current_time} because {str(e)}'))

        current_time = timezone.now()
        self.stdout.write(self.style.SUCCESS(f'Successfully populated new leads at {current_time}'))
        return

 

Some explanation:

  1. This script has 3 parts, the convert_to_ts function converts a UNIX time to human-readable format. Reddit stored the timestamp when a post was first created in the format of a big integer.
  2. the populate_lead uses the same logic in our last section and saves the new lead (if it has not already been saved in our table, remember, we enforced the post_id as the primary key in our Lead model definition)
  3. Lastly, we created a Command class so that we can execute the populate_lead  in a command line. There are other ways to execute a script on the command line, but this way is more of a Django style, in my opinion.

Finally, we can try to execute the script and populate some leads.

(venv) reddit_bot ➤ python manage.py lead_finder
Populating leads at 2022-09-18 17:50:35.045981+00:00
Successfully populated new leads at 2022-09-18 17:50:36.558052+00:00

 

Let's open up the Django console to verify the results are saved successfully.

 

from reddit_bot.reddit.models import Lead

for lead in Lead.objects.all():
        print(f'''title: {lead.title}\nposted_at:{lead.posted_at}\nurl: {lead.url}\n''')
   

 

title: Facebook ad numbers don't ad up
posted_at:2022-09-17 21:14:24+00:00
url: /r/marketing/comments/xgxv9c/facebook_ad_numbers_dont_ad_up/

title: Facebook & Instagram Ads campaign Setup
posted_at:2022-09-17 08:01:38+00:00
url: /r/marketing/comments/xgghl9/facebook_instagram_ads_campaign_setup/

title: Measuring the impact of Facebook
posted_at:2022-09-16 20:49:14+00:00
url: /r/marketing/comments/xg2kbi/measuring_the_impact_of_facebook/

title: How many interests is too many Interest - Facebook Ads
posted_at:2022-09-16 01:32:28+00:00
url: /r/marketing/comments/xfdwsg/how_many_interests_is_too_many_interest_facebook/

 

Here we go. All four leads persisted successfully! 


 

Leads report view

It's cool we can see the data in the console, but it will be easier if we can view the leads in a table from a browser.

Inside the view.py file, let's create a ListView.

# inside reddit_bot/reddit_bot/reddit/views.py

from django.views.generic import ListView

from .models import Lead

# Create your views here.
class LeadView(ListView):
    model = Lead
    template_name = 'lead_list.html'

lead_view = LeadView.as_view()

We also need to create a HTML file 'lead_list.html' inside of a new directory called templates under the reddit app.

Reddit monitoring add a lead_list html file | segue.co

{% extends 'base.html' %}

    {% block content %}
        <table class="table table-striped">
        <thead>
        <tr>
        <th scope="col">ID</th>
        <th scope="col">Title</th>
        <th scope="col">Posted At</th>
        <th scope="col">Content</th>
        </tr>
        </thead>
        <tbody>

        {% for lead in object_list %}
            <tr>
            <th scope="row">{{ lead.post_id }}</th>
            <td><a href="https://reddit.com{{ lead.url }}"> {{ lead.title }}</a></td>
            <td>{{ lead.posted_at }}</td>
            <td>{{ lead.content }}</td>
            </tr>
        {% endfor %}

        </tbody>
        </table>

    {% endblock %}

Next, we need to add a URL path to access this view.

Create a new file: urls.py under the reddit app.

# inside reddit_bot/reddit_bot/reddit/urls.py

from django.urls import path

from reddit_bot.reddit.views import lead_view

app_name = "reddit"

urlpatterns = [

    path("leads/", view=lead_view, name="leads"),

]

 

Finally, we must include the reddit app's URLs file on the project level.

Reddit keyword monitoring tutorial add url | segue.co

 

Final step: check the page, open your browser, and visit: http://localhost:8000/reddit/leads/

 

Reddit keyword monitoring tutorial lead report | segue.co

And if you click on the title, you will be redirected to the Reddit post page, where you can engage with your target customers. How cool is that!


 

Schedule a cron job to check Reddit periodically.

We are almost done, and if you are like me, we like to automate our tasks; how about we schedule the job to be run automatically?

And that's super easy. 

In the terminal, type crontab -e and enter.

Add the following line (you will need to edit the path of the reddit_bot Django project)

1 * * * * cd ~/reddit_bot; source venv/bin/activate; source .env; python manage.py lead_finder >/tmp/stdout.log 2>/tmp/stderr.log

It will run every hour at the 1 minute past that hour, for example, if now is 11:35 am, and the next time this job will run  at 12:01 pm, and 13:01 pm, etc.

Of course, you can change the schedule that works best for you. You can use the following to customize your cron job.

https://crontab.guru/

 


 

How does this compare to Segue?

The two biggest differences  are:

  1. Segue offers a more robust and productionized service;
  2. Segue uses a much more advanced NLP engine to semantically identify potential leads. 

To learn more: check out our free lead generation tool: 

Reddit Lead Generation Tool

And if you are interested in using Segue to grow your business, feel free to request a demo, and we will be happy to meet you and walk you through our product.

 


  Sign Up for 14 Day Free Trial

First published on Sept. 16, 2022

Related Articles

All Articles
Top