Scraping Billboard and IMDB using python Selenium

July 01, 2020

Scraping Billboard and IMDB using python Selenium

Selenium is an open-source testing tool. It allows us to open a browser and perform our own tasks.

Here we will use Selenium to navigate between web pages, and try to scrape data from them.

In this article, we will scrape Billboard Hot 100 songs and IMDB Top Chart.

Downloads

For more detailed usage on Selenium, read my previous article GETTING STARTED WITH PYTHON SELENIUM

Requirements

pip install bs4

pip install selenium

Scraping Billboard

# import necessary modules
from selenium import webdriver
from bs4 import BeautifulSoup
import time

# Set the location of your Webdriver
# driver = webdriver.Firefox()  // For Firefox
driver = webdriver.Chrome(executable_path=r"D:\Softwares\chromedriver_win32\chromedriver.exe")

# Set URL
driver.get('https://www.billboard.com/charts/hot-100')

songs = []
artists = []

# Scrape content
content = driver.page_source
soup = BeautifulSoup(content, 'html.parser')
for a in soup.findAll('li', attrs={'class': 'chart-list__element'}):
    song = a.find('span', 'chart-element__information__song')
    artist = a.find('span', 'chart-element__information__artist')
    songs.append(song.text)
    artists.append(artist.text)

time.sleep(10)
driver.close()

# convert to dictionary
tracks = dict(zip(songs, artists))

# Print dictionary line by line
for key, value in tracks.items():
    print(key, ' - ', value)

Scraping IMDB

# import necessary modules
from selenium import webdriver
from bs4 import BeautifulSoup
import time

# Set the location of your Webdriver
# driver = webdriver.Firefox()  // For Firefox
driver = webdriver.Chrome(executable_path=r"D:\Softwares\chromedriver_win32\chromedriver.exe")

# Set URL
driver.get('https://www.imdb.com/chart/top/')
movies = []
rating = []

# Scrape content
content = driver.page_source
soup = BeautifulSoup(content, 'html.parser')
for a in soup.findAll('td', attrs={'class': 'titleColumn'}):
    movie = a.find('a')
    movies.append(movie.text)
for a in soup.findAll('td', attrs={'class': 'imdbRating'}):
    rate = a.find('strong')
    rating.append(rate.text)

time.sleep(10)
driver.close()

imdb = dict(zip(movies, rating))

for key, value in imdb.items():
    print(key, ' : ', value)

Visit BASIC WEB SCRAPING WITH PYTHON BS4 AND URLLIB for more scraping techniques.

For more sample codes on selenium python, visit here

Search This Blog

Hackzism

Scraping Billboard and IMDB using python Selenium

Downloads

Requirements

Scraping Billboard

Scraping IMDB

Comments

Post a Comment

Popular Posts

Being Anonymous: A Beginners Guide

Check weather from Terminal using wttr.in