Automatic bot posting from Reddit to Youtube — Text to video
Objective:
When I was browsing Youtube, I found that there were a lot of videos that simply read out some of the comments from the top posts on AskReddit. It would read it out in a text-to-speech voice and would have the comment as an image whilst it was read it. So I thought that would be a fun project to try and replicate with automation!
Thought process:
espeak — this is the unix utility that will create a .wav file of a text file, or alternatively, you can read it in from standard input, i.e. echo “This is some piece of text” | espeak
Need some way of generating an image with the text in a file or something and adding the username. Trying to keep it as similar to reddit’s design as possible. My initial thought process was to generate a template that you would put the text and the user into and create an image of that.
Amalgamate the images and the sound together to create a video using ffmpeg.
What I initially started to do:
So one big mistake of this entire project was trying to scrape the json information from reddit without using the PRAW (Reddit Python Wrapper) module. This lead to me continually getting 429 HTTP Status Codes — Too many requests. With PRAW, this takes care of all the requests for you internally, making sure you won’t hit this.
Content:
First we had to get the two pieces of content:
Authenticate yourself with PRAW
reddit = praw.Reddit(client_id='', client_secret='', user_agent='The name of your bot') submission = "" url = "";
Get the post (hottest post on askreddit that wasn’t the sticky post by the mod)
for submission in reddit.subreddit('askreddit').hot(limit=2):
if(submission.stickied):
continue
else:
title = submission.title
submission = submission
url = str(submission.url)
div_id = "t3_" + str(submission.id)
f = open(dirpath + "title/title.txt", "w")
f.write(title)
f.close()
image_gen.generate(url, div_id, "title", "title")
break
Explanation: reddit.subreddit('askreddit').hot(limit=2)
this will get the top two posts on the hot section of the subreddit 'askreddit'. The reason for getting two is that in the next section:
if(submission.stickied)
we check if there is a stickied post, and we skip that. This makes sure that we will get a post that hasn't been made by the moderations. From there, we are extracting the information that is stored within the submission that we will need later (title, div_id of submission, url of the post). It will also set submission = submission
, so that we can use the global variable submission in our next loop. It will then write the text value for the title to a text file.
I will explain the image_gen.generate(args)
below.
- Get the post’s comments (hottest comments on the top hottest post without any sticky comments)
top_level_comments = list(submission.comments)
for i in range(11):
if(top_level_comments[i].stickied):
continue
else:
print(top_level_comments[i])
div_id = "t1_" + str(top_level_comments[i])
text = top_level_comments[i].body
text = re.sub("](.*)","]",text)
f = open(dirpath + "comments/" + str(i) + ".txt", "w")
f.write(text)
f.close()
image_gen.generate(url, div_id, str(i), "images")
Explanation: This does a similar operation to the loop above, it cycles through all of the comments from the submission that was selected in the previous loop, and if will first check that the comment isn’t sticked — for instance, there are comments on serious threads in askreddit that the automoderator bot will add a stickied comment, so this script would ordinarily pick up this comment as one of the comments, however, now this logic allows us to skip that first comment which is generic. If the comment isn’t sticked, it will then assign the div_id, text and write the text to a new file.
Image Generation:
This was the part I had to be a little creative about. Seeing that Reddit does a nice architecture whereby their comment identifier (html div id=””) is the value that you can easily get from the PRAW library. As you can see from the two code snippets above, it’s achieved through the objects “submission” and “submission.comments[i] OR top_level_comments[i]”. So I felt that the selenium framework would be a good choice for this, as it has a screenshot feature and I’m able to select the element by id.
Here is the code for that:
from selenium import webdriver
import os
import sys
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
dirpath = "/root/reddit_comment_reader/"
def generate(url, div_id, prefix, folder):
print(div_id)
chrome_options = Options()
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument("--window-size=1920,1080")
chrome_options.add_argument("disable-infobars")
chrome_options.add_argument("--disable-extensions")
driver = webdriver.Chrome("/usr/lib/chromium-browser/chromedriver",options=chrome_options)
print(url)
driver.get(url)
try:
cookies = driver.find_elements_by_tag_name("form")
cookies[1].submit()
if driver.find_element_by_css_selector("button._2JBsHFobuapzGwpHQjrDlD"):
driver.find_element_by_css_selector("button._2JBsHFobuapzGwpHQjrDlD").click()
image = WebDriverWait(driver,20).until(EC.presence_of_element_located((By.ID,div_id)))
image.screenshot(dirpath + folder + '/' + prefix + '.png')
finally:
driver.quit()
There are a few things to bare in mind for this code. I hit some problems when playing around with selenium for this task. The first one was that I realised that the image_gen.py was executed and seemed successful until I had a look at the images — and they all had “Please accept the cookies” banner all over them. To firstly, I had to address this issue, that’s where the following code comes into play:
cookies = driver.find_elements_by_tag_name("form")
cookies[1].submit()
Next, I found that after 5 comments, there is a button that will say something like “VIEW DISCUSSION X MORE COMMENTS”, so I had to press this in order to get to the rest of the comments that I had the div_id for, this problem is addressed by the following code:
if driver.find_element_by_css_selector("button._2JBsHFobuapzGwpHQjrDlD"):
driver.find_element_by_css_selector("button._2JBsHFobuapzGwpHQjrDlD").click()
Then, I had the issue which I’m sure a lot of people have faced with selenium when they’re trying to target an ID, the page doesn’t load quick enough and your code will be executed before the page has loaded with the div that you’re trying to target, therefore it will throw an error that no element was found. Therefore I had the following code:
image = WebDriverWait(driver,20).until(EC.presence_of_element_located((By.ID,div_id)))
This will wait for 20 seconds and then will try and locate the div that it’s looking for.
It will take the screenshot and save it as a png.
❗️Finally, the driver will close, I found that I had a few instances of chrome open on my small virtual machine, so always make sure to close your driver❗️
In terms of the headers, make sure that you start it headless as well and that it has the maximum screen resolution so that you’re getting a good picture quality. The rest of the headers are more generic settings.
The end result of these python files being run will leave you with text files and images in both a title and images and comments directory. Title will contain “title.txt” and “title.png”, comments will contain “0.txt” , “1.txt” , “2.txt” , “3.txt” , “4.txt” , “5.txt” , “6.txt” , “7.txt” , “8.txt” , “9.txt” and images will have:
“0.png” , “1.png” , “2.png” , “3.png” , “4.png” , “5.png” , “6.png” , “7.png” , “8.png”. If you have more, that’s fine too! 😁
That’s everything from on the python side done 🥳
Bash:
The first script that we have in the bash folder is textToSpeech.sh, this is probably the most important part of our project as the majority of the logic is in here.
cd "$dirpath/comments/"
for i in {0..10}
do
espeak -s 200 -g 12 -f $i.txt --stdout | ffmpeg -i - -ar 44100 -ac 2 -ab 192k -f mp3 $i.mp3
time=$(mp3info -p "%S" $i.mp3)
echo $time
ffmpeg -y -framerate 1/$time -start_number 0 -i $dirpath/images/$i.png -i $i.mp3 -vf '
scale=1920:1080:force_original_aspect_ratio=decrease,pad=1920:1080:(ow-iw)/2:(oh-ih)/2,setsar=1'
-c:v libx264 -r 25 -pix_fmt yuv420p -c:a aac -strict experimental -shortest
-max_muxing_queue_size 9999 $dirpath/videos/$i.mp4
donecd $dirpath/title
espeak -s 200 -g 12 -f title.txt --stdout | ffmpeg -i - -ar 44100 -ac 2 -ab 192k -f mp3 title.mp3
time=$(mp3info -p "%S" title.mp3)
ffmpeg -y -framerate 1/$time -start_number 1 -i title.png -i title.mp3 -vf '
scale=1920:1080:force_original_aspect_ratio=decrease,pad=1920:1080:(ow-iw)/2:(oh-ih)/2,setsar=1'
-c:v libx264 -r 25 -pix_fmt yuv420p -c:a aac -strict experimental -shortest
-max_muxing_queue_size 9999 $dirpath/videos/title.mp4cd $dirpath/videos/
ffmpeg -f concat -i inputs.txt -c copy output.mp4
Explanation: So immediately, there is a for loop that will cycle through 0–10, this is the reason behind the naming convention of our files in comments and images. Espeak is the command that is used to turn text into a wav audio file, the parameter -s is for the speed in words per minute, and g is the pause between words in milliseconds. -f is the file that you want to read from. This will give the output to standard out which will then be piped (|
) to ffmpeg which with the following parameters will turn it into an mp3 file! mp3info will allow us to get the number of seconds the audio goes on for, this is important for the next step as we will want the framerate to be 1/$time as you can see. This will make sure that the video clip of that comment will only run for the length of time in seconds that the audio file is. The next command takes each comment (1.txt) and image (1.png) and will combine them into an mp4 and name it 1.mp4. It will cycle through all comments (2.txt, 3.txt, etc) and images (2.png,3.png, etc) and create videos for them. The next few steps up until cd $dirpath/videos/
will create the same video using title.txt and title.png for the title of the askreddit post.
Then the last line will merge all the title and comment videos (title.mp4, 1.mp4, 2.mp4, etc) together to create output.mp4, the contents of inputs.txt is as follows:
file 'title.mp4'
file '0.mp4'
file '1.mp4'
file '2.mp4'
file '3.mp4'
file '4.mp4'
file '5.mp4'
file '6.mp4'
file '7.mp4'
file '8.mp4'
file '9.mp4'
file '10.mp4'
As for as reddit → video goes, that’s it all done! The only thing left to do is upload it to youtube!
I use the bash module youtube-uploader for this, so the code is as follows:
title=$(cat $dirpath/title/title.txt)
echo $title
youtube-upload --title="$title" --client-secrets="/path/to/secret.json" $dirpath/videos/output.mp4
It will also get the title of the askreddit post and use this as the title for the youtube video.
Apart from this, I just have two other scripts, more for house cleaning:
run.sh
python3 $dirpath/python/get_sources.py
bash $dirpath/bash/textToSpeech.sh
bash $dirpath/bash/yt_uploader.sh
bash $dirpath/bash/cleanup.sh
cleanup.sh
cd $dirpath/comments
rm *
cd $dirpath/videos
rm *.mp4
cd $dirpath/title
rm *
cd $dirpath/images
rm *That’s it!
For additional automation, you can checkout my article on SystemD in order to get this process to act like a bot and continually carry out the run operation every X minutes/hours/days/etc for you!
Improvements to be added:
- Add pauses between comments and the title page.