π§ Understanding the Script
Load URLs from CSV: The script reads URLs from urls.csv
. Ensure the file is in the same directory as the script.
csv_filename = 'urls.csv'
urls = load_urls_from_csv(csv_filename)
Accept Cookies: The script attempts to click common cookie consent buttons using XPaths.
def accept_cookies(driver):
# Common consent button XPaths
Sanitize Filename: Converts URLs into valid filenames by replacing invalid characters.
def sanitize_filename(url):
filename = re.sub(r'[^a-zA-Z0-9_\-]', '_', url)
return filename[:255] # Limit length
Take Screenshot: Opens each URL, handles cookies, and saves a screenshot.
def take_screenshot(url, output_path):
# Selenium setup and screenshot logic