Startec

Startec

ūüöÄ Introducing ‚ú® Bose Framework - The Swiss Army Knife for Bot Developers ūü§Ė

Mai 24, às 16:40

·

9 min de leitura

·

0 leituras

Bot Development is Tough. Bot Detectors like Cloudflare are ready to defend websites from our Bots. Configuring Selenium with ChromeOptions to specify the driver path, profile, user agent, and window size is...
ūüöÄ Introducing ‚ú® Bose Framework - The Swiss Army Knife for Bot Developers ūü§Ė

Featured

Bot Development is Tough.

Bot Detectors like Cloudflare are ready to defend websites from our Bots. Configuring Selenium with ChromeOptions to specify the driver path, profile, user agent, and window size is Cumbersome and a nightmare in windows. Debugging Bot Crashes via logs is hard. How do you solve these pain points without sacrificing speed and handy development?

Enter Bose. Bose is the first bot development framework in the Developer Community that is specifically designed to provide the best developer experience for bot developers. Powered by Selenium, it offers a range of features and functionalities to simplify the process of bot development. As far as our knowledge goes, Bose is the first bot development framework of its kind in the town.

Getting Started

Clone Starter Template

git clone https://github.com/omkarcloud/bose-starter my-bose-project

Enter fullscreen mode Exit fullscreen mode

Then change into that directory, install dependencies, and start the project:

cd my-bose-project
python -m pip install -r requirements.txt
python main.py

Enter fullscreen mode Exit fullscreen mode

The first run will take some time as it downloads the chrome driver executable, subsequent runs will be fast.

Core features

  1. Adds Powerful Methods to make working with Selenium a lot easier.
  2. Follows best practices to avoid Bot Detection by Cloudflare and PerimeterX.
  3. Saves the HTML, Screenshot, and the run details for each task run to enable easy debugging.
  4. Utility components to write scraped data as JSON, CSV, and Excel files.
  5. Automatically downloads and initializes the correct Chrome driver.
  6. Fast and Developer friendly.

Usage

Say you want to start scraping a website. If you were using bare Selenium, you would have to handle the imperative tasks of opening and closing the driver like this:

from selenium import webdriver
driver_path = 'path/to/chromedriver'
driver = webdriver.Chrome(executable_path=driver_path)
driver.get('https://www.example.com')
driver.quit()

Enter fullscreen mode Exit fullscreen mode

However, with Bose Framework, you can take a declarative and structured approach. You only need to write the following code, and Bose driver will take care of creating the driver, passing it to the run method of the Task, and closing the driver:

from bose import *
class Task(BaseTask):
 def run(self, driver):
 driver.get('https://www.example.com')

Enter fullscreen mode Exit fullscreen mode

Configuration

In bare Selenium, if you want to configure options such as the profile, user agent, or window size, it requires writing a lot of code, as shown below:

from selenium.webdriver.chrome.options import Options
from selenium import webdriver
driver_path = 'path/to/chromedriver.exe'
options = Options()
profile_path = '1'
options.add_argument(f'--user-data-dir={profile_path}')
user_agent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.37")'
options.add_argument(f'--user-agent={user_agent}')
window_width = 1200
window_height = 720
options.add_argument(f'--window-size={window_width},{window_height}')
driver = webdriver.Chrome(executable_path=driver_path, options=options)

Enter fullscreen mode Exit fullscreen mode

On the other hand, Bose Framework simplifies these complexities by encapsulating the browser configuration within the BrowserConfig property of the Task, as shown below:

from bose import BaseTask, BrowserConfig, UserAgent, WindowSize
class Task(BaseTask):
 browser_config = BrowserConfig(user_agent=UserAgent.user_agent_106, window_size=WindowSize.window_size_1280_720, profile=1)

Enter fullscreen mode Exit fullscreen mode

Exception handling

Exceptions are common when using Selenium. In bare Selenium, if an exception occurs, the driver automatically closes, leaving you with only logs to debug.

In Bose, when an exception occurs in a scraping task, the browser remains open instead of immediately closing. This allows you to see the live browser state at the moment the exception occurred, which greatly helps in debugging.

error prompt

Debugging

Web scraping can often be fraught with errors, such as incorrect selectors or pages that fail to load. When debugging with raw Selenium, you may have to sift through logs to identify the issue. Fortunately, Bose makes it simple for you to debug by storing information about each run.

After each run a directory is created in tasks which contains three files, which are listed below:

task_info.json

It contains information about the task run such as duration for which the task run, the ip details of task, the user agent, window_size and profile which used to execute the task.

task info

final.png

This is the screenshot captured before driver was closed.

final

page.html

This is the html source captured before driver was closed. Very useful to know in case your selectors failed to select elements.

Page

error.log

In case your task crashed due to exception we also store error.log which contains the error due to which the task crashed. This is very helful in debugging.

error log

Outputting Data

After performing web scraping, we need to store the data in either JSON or CSV format. Typically, this process involves writing a significant amount of imperative code which looks like this:

import csv
import json
def write_json(data, filename):
 with open(filename, 'w') as fp:
 json.dump(data, fp, indent=4)
def write_csv(data, filename):
 with open(filename, 'w', newline='', encoding='utf-8') as csvfile:
 fieldnames = data[0].keys() # get the fieldnames from the first dictionary
 writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
 writer.writeheader() # write the header row
 writer.writerows(data) # write each row of data

data = [
 {
 "text": "\u201cThe world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.\u201d",
 "author": "Albert Einstein"
 },
 {
 "text": "\u201cIt is our choices, Harry, that show what we truly are, far more than our abilities.\u201d",
 "author": "J.K. Rowling"
 }
]
write_json(data, "data.json")
write_csv(data, "data.csv")

Enter fullscreen mode Exit fullscreen mode

Bose simplifies these complexities by encapsulating them in Output Module for reading and writing Data.

To use Output Method, call the write method for the type of file you want to save.

All data will be saved in the output/ folder:

See following Code for Reference

from bose import Output
data = [
 {
 "text": "\u201cThe world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.\u201d",
 "author": "Albert Einstein"
 },
 {
 "text": "\u201cIt is our choices, Harry, that show what we truly are, far more than our abilities.\u201d",
 "author": "J.K. Rowling"
 }
]
Output.write_json(data, "data.json")
Output.write_csv(data, "data.csv")

Enter fullscreen mode Exit fullscreen mode

Undetected Driver

Ultrafunkamsterdam created a ChromeDriver that has excellent support for bypassing all major bot detection systems such as Distil, Datadome, Cloudflare, and others.

Bose recognized the importance of bypassing bot detections and provides in built support for Ultrafunkamsterdam’s Undetected Driver

Using the Undetected Driver in Bose Framework is as simple as passing the use_undetected_driver option to the BrowserConfig, like so:

from bose import BaseTask, BrowserConfig
class Task(BaseTask):
 browser_config = BrowserConfig(use_undetected_driver=True)

Enter fullscreen mode Exit fullscreen mode

LocalStorage

Just like how modern browsers have a local storage module, Bose has also incorporated the same concept in its framework.

You can import the LocalStorage object from Bose to persist data across browser runs, which is extremely useful when scraping large amounts of data.

The data is stored in a file named local_storage.json in the root directory of your project. Here's how you can use it:

from bose import LocalStorage
LocalStorage.set_item("pages", 5)
print(LocalStorage.get_item("pages"))

Enter fullscreen mode Exit fullscreen mode

Boss Driver

The driver you receive in the run method of the Task is an extended version of Selenium that adds powerful methods to make working with Selenium much easier. Some of the popular methods added to the Selenium driver by Bose Framework are:

METHOD DESCRIPTION
get_by_current_page_referrer(link, wait=None) simulate a visit that appears as if you arrived at the page by clicking a link. This approach creates a more natural and less detectable browsing behavior.
js_click(element) enables you to click on an element using JavaScript, bypassing any interceptions(ElementClickInterceptedException) from pop-ups or alerts
get_cookies_and_local_storage_dict() returns a dictionary containing "cookies" and "local_storage‚ÄĚ
add_cookies_and_local_storage_dict(self, site_data) adds both cookies and local storage data to the current web site
organic_get(link, wait=None) visits google and then visits the ‚Äúlink‚ÄĚ making it less detectable
local_storage returns an instance of the LocalStorage module for interacting with the browser's local storage in an easy to use manner
save_screenshot(filename=None) save a screenshot of the current web page to a file in tasks/ directory
short_random_sleep() and long_random_sleep(): sleep for a random amount of time, either between 2 and 4 seconds (short) or between 6 and 9 seconds (long)
get_element_or_* [eg: get_element_or_none, get_element_or_none_by_selector, get_element_by_id, get_element_or_none_by_text_contains,] find web elements on the page based on different criteria. They return the web element if it exists, or None if it doesn't.
is_in_page(target, wait=None, raise_exception=False) checks if the browser is in the specified page

Bose is an excellent framework that simplifies the boring parts of Selenium and web scraping.

Wish you best of Luck and Happy Bot Development with Bose Framework!

Learn More

To learn about Bose Bot Development Framework in detail, read the Bose docs at https://www.omkar.cloud/bose/


If Bose Framework helped in Bot Development, please take a moment to star the repository. Your act of starring will help developers in discovering our Repository and contribute towards helping fellow developers in Bot Development. Dhanyawad ūüôŹ! Vande Mataram!


Continue lendo

Showmetech

Motorola Razr Plus é o novo dobrável rival do Galaxy Z Flip
Ap√≥s duas tentativas da Motorola em emplacar ‚ÄĒ novamente ‚ÄĒ telefones dobr√°veis, eis que temos aqui a terceira, e aparentemente bem-vinda, tentativa. Estamos falando do Motorola Razr Plus, um smartphone...

Hoje, às 15:20

DEV

Mentoring for the LGBTQ+ Community
Once unpublished, all posts by chetanan will become hidden and only accessible to themselves. If chetanan is not suspended, they can still re-publish their posts from their dashboard. Note: Once...

Hoje, às 15:13

TabNews

IA: mais um arrependido / Déficit de TI / Apple: acusação grave · NewsletterOficial
Mais um pioneiro da IA se arrepende de seu trabalho: Yoshua Bengio teria priorizado seguran√ßa em vez de utilidade se soubesse o ritmo em que a tecnologia evoluiria ‚Äď ele junta-se a Geoffr...

Hoje, às 14:37

Hacker News

The Analog Thing: Analog Computing for the Future
THE ANALOG THING (THAT) THE ANALOG THING (THAT) is a high-quality, low-cost, open-source, and not-for-profit cutting-edge analog computer. THAT allows modeling dynamic systems with great speed,...

Hoje, às 14:25

TabNews

[DISCUS√ÉO/OPINI√ēES] ‚Äď Outsourcing! O que, para quem, por que sim, por que n√£o! ¬∑ dougg
Quero tentar trazer nesta minha primeira publicação, uma mistura de um breve esclarecimento sobre o que são empresas de outsourcing, como elas funcionam e ganham dinheiro, mas também, ven...

Hoje, às 13:58

TabNews

Duvida: JavaScript - Desenvolver uma aplicação que vai ler um arquivo *.json · RafaelMesquita
Bom dia a todos Estou estudando javascript e me deparei com uma dificuldade e preciso de ajuda *Objetivo do estudo: *desenvolver uma aplicação que vai ler um arquivo *.json Conteudo do in...

Hoje, às 13:43

Showmetech

Automatize suas negocia√ß√Ķes com um rob√ī de criptomoedas
√ćndice Como o rob√ī de criptomoedas Bitsgap funciona?Qual a vantagem de utilizar um rob√ī de criptomoedas?Bitsgap √© confi√°vel? O mercado de trading tem se tornado cada vez mais popular e as possibilidades de...

Hoje, às 13:13

Hacker News

Sketch of a Post-ORM
I’ve been writing a lot of database access code as of late. It’s frustrating that in 2023, my choices are still to either write all of the boilerplate by hand, or hand all database access over to some...

Hoje, às 13:11

Showmetech

14 chuveiros elétricos para o banho dos seus sonhos
√ćndice Chuveiro ou Ducha?Tipos de chuveiro el√©trico9 fatores importantes para considerar na hora de comprar chuveiros el√©tricosMelhores chuveiros el√©tricosDuo Shower LorenzettiFit HydraAcqua Storm Ultra...

Hoje, às 11:00

DEV

Learn about the difference between var, let, and const keywords in JavaScript and when to use them.
var, let, and const: What's the Difference in JavaScript? JavaScript is a dynamic and flexible language that allows you to declare variables in different ways. You can use var, let, or const keywords to...

Hoje, às 10:21