Advanced FeedParser for Medium
Welcome to the fourth blog in the RSS FeedParser tutorial series. In the previous blog we made a RSS FeedParser for medium where we got the blogs details such as title, links, date of publishing and more using Python. In this blog we will taking a step further and doing some advanced parsing for medium using the datetime library. We will be getting articles between specific timings.
So lets get started.
from datetime import date, datetimeimport feedparserurl = "https://aryanirani123.medium.com/feed"blog_feed = feedparser.parse(url)content = blog_feed.entries
We are going to start out by doing the same things. First we are going to import the FeedParser library followed by importing the datetime library and assigning it a key that we will be using whenever we used datetime functions.
Next we are going to be declaring the url that we are going to parse followed by parsing the url and then getting the content of the website using the blog_feed.entries function.
compare_blog_time = "2022-05-01 07:18:01"format = "%Y-%m-%d %H:%M:%S"compare_blog_time1 = datetime.strptime(compare_blog_time,format)print(type(compare_blog_time1))
Here I have declared a time that I will be using to compare and get blogs according to the time I have specified. After declaring the compare time variable I have declared a time format that I will be using to convert the time variable into a format that can be used to compare with the blog dates on the website.
After declaring the time variable, followed by the format, I am now converting the variable into the format that we used to compare with the blog dates. To do that I will be using the datetime.strptime function to convert the variable. Inside this function I will be passing the date variable followed by the format as parameters.
After converting the variable to the desired format, I have printed the type of the variable, to check if it has been converted to the datetime type.
On running the following code, you will see that the type of the variable has changed to datetime.
Here you can see that the variable type has been changed to datetime successfully.
newformat = "%a, %d %b %Y %H:%M:%S %Z"for blog in content:
if datetime.strptime(blog.published,newformat) > compare_blog_time1: print(blog.title) print(blog.published)
Now that we have everything setup, it’s time to get the blogs according to the specific date that we want. To do that we are going to start out by opening a for loop where we will be iterating through all the blogs. Our aim is to print all the blogs that are published after the 1st of May 2022.
Next we are going to use an if loop to check if the blog’s date matches the sample date variable.
To compare the blogs date with the sample date that we have, we are going to convert the blog’s date into the required format.To do that we are going to be using the datetime.strptime function, inside which we are going to pass the date of the current blog using the blog.published function, followed by passing the format of the date.
The if loop check is the blog is published before 1st May 2022, if the blog is published before 1st May it will print out the title of the blog and date the blog was published.
Our code is complete, lets go ahead and run the code.
On running the code, you can see all the blogs that were published after 1st May 2022 have been printed out. This shows that our code has successfully worked as well as the date conversions have successfully worked.
This is all for this blog, I hope you have understood how to get blogs from a specific date using the FeedParser module and the datetime module in Python.
Feel free to reach out to me if you have any issues/feedback at aryanirani123@gmail.com.