Scraping Billboard and IMDB using python Selenium
Selenium is an open-source testing tool. It allows us to open a browser and perform our own tasks.
Here we will use Selenium to navigate between web pages, and try to scrape data from them.
In this article, we will scrape Billboard Hot 100 songs and IMDB Top Chart.
Downloads
For more detailed usage on Selenium, read my previous article GETTING STARTED WITH PYTHON SELENIUM
Requirements
pip install bs4
pip install selenium
Scraping Billboard
# import necessary modules
from selenium import webdriver
from bs4 import BeautifulSoup
import time
# Set the location of your Webdriver
# driver = webdriver.Firefox() // For Firefox
driver = webdriver.Chrome(executable_path=r"D:\Softwares\chromedriver_win32\chromedriver.exe")
# Set URL
driver.get('https://www.billboard.com/charts/hot-100')
songs = []
artists = []
# Scrape content
content = driver.page_source
soup = BeautifulSoup(content, 'html.parser')
for a in soup.findAll('li', attrs={'class': 'chart-list__element'}):
song = a.find('span', 'chart-element__information__song')
artist = a.find('span', 'chart-element__information__artist')
songs.append(song.text)
artists.append(artist.text)
time.sleep(10)
driver.close()
# convert to dictionary
tracks = dict(zip(songs, artists))
# Print dictionary line by line
for key, value in tracks.items():
print(key, ' - ', value)
Scraping IMDB
# import necessary modules
from selenium import webdriver
from bs4 import BeautifulSoup
import time
# Set the location of your Webdriver
# driver = webdriver.Firefox() // For Firefox
driver = webdriver.Chrome(executable_path=r"D:\Softwares\chromedriver_win32\chromedriver.exe")
# Set URL
driver.get('https://www.imdb.com/chart/top/')
movies = []
rating = []
# Scrape content
content = driver.page_source
soup = BeautifulSoup(content, 'html.parser')
for a in soup.findAll('td', attrs={'class': 'titleColumn'}):
movie = a.find('a')
movies.append(movie.text)
for a in soup.findAll('td', attrs={'class': 'imdbRating'}):
rate = a.find('strong')
rating.append(rate.text)
time.sleep(10)
driver.close()
imdb = dict(zip(movies, rating))
for key, value in imdb.items():
print(key, ' : ', value)
Visit BASIC WEB SCRAPING WITH PYTHON BS4 AND URLLIB for more scraping techniques.
For more sample codes on selenium python, visit here
Comments
Post a Comment