Unlock the potential of generative AI across all your managerial functions.
Log in
Or create your account
You have just added to your selection
Your cart is empty, See our trainings

Description

Python is known for its ability to retrieve data from varied and heterogeneous sources, making it the ideal choice for accumulating a knowledge base using the scraping technique. This technique consists of extracting targeted information from a series of resources, such as websites or REST APIs.

The Scraping Python training offers to discover how to set up such a program, starting from the creation of a manual crawler and then moving towards more advanced technologies and complete automation of the process.

Who is this training for ?

For whom ?

This training is aimed at programmers who are already comfortable with Python, already have medium-sized projects under their belt, and wish to implement their own tools to expand the stock of data from which they can draw.< /p>

Prerequisites

To take this Scraping Python course, you must be comfortable with the Python language in its latest version. The participant must be able to create complex scripts independently as well as know how to use the language ecosystem (pip, virtualenv, etc.).

Training objectives

  • Master web data manipulation with Python
  • Understand the technical and ethnic issues of scraping
  • Know the different methods used to retrieve, process and store data
  • Master existing technologies to choose the solution adapted to your acquisition needs
  • Training program

      • Browse the file system
      • Handle encoding properly
      • Read and write files
      • Parse JSON
      • CSV and XML generators
      • Reminder about the HTTP protocol
      • Simple queries with Request
      • Storing data with SQLAlchemy
      • Parsing HTML with Beautiful Soup
      • Threads and GIL
      • Using multiple cores with multiprocessing
      • Asynchronous I/O programming
      • Performance and ethics
      • Using a form of cache: disk, RAM and redis
      • Introduce a random delay
      • The robot.txt file
      • Authentications and token
      • Anatomy of a REST API
      • Clean retry
      • Manage rate limiting Error management Application logging Example with a handmade Twitter client
      • Manage rate limiting
      • Error management
      • Application logging
      • Example with a handmade twitter client
      • Introduction to the basic mechanics of the framework
      • Using Selenium by hand
      • Using Scrappy and Selenium together
    • 475
    • 28 h

    Submit your review

    Translated By Google Translate