Description
Python is known for its ability to retrieve data from varied and heterogeneous sources, making it the ideal choice for accumulating a knowledge base using the scraping technique. This technique consists of extracting targeted information from a series of resources, such as websites or REST APIs.
The Scraping Python training offers to discover how to set up such a program, starting from the creation of a manual crawler and then moving towards more advanced technologies and complete automation of the process.
Who is this training for ?
For whom ?
This training is aimed at programmers who are already comfortable with Python, already have medium-sized projects under their belt, and wish to implement their own tools to expand the stock of data from which they can draw.< /p>
Prerequisites
To take this Scraping Python course, you must be comfortable with the Python language in its latest version. The participant must be able to create complex scripts independently as well as know how to use the language ecosystem (pip, virtualenv, etc.).
Training objectives
Training program
- The basis of batch processing (scraping)
- Browse the file system
- Handle encoding properly
- Read and write files
- Parse JSON
- CSV and XML generators
- Data browsing on the web
- Reminder about the HTTP protocol
- Simple queries with Request
- Storing data with SQLAlchemy
- Parsing HTML with Beautiful Soup
- Performance issues
- Threads and GIL
- Using multiple cores with multiprocessing
- Asynchronous I/O programming
- Performance and ethics
- Using a form of cache: disk, RAM and redis
- Introduce a random delay
- The robot.txt file
- Professional APIs
- Authentications and token
- Anatomy of a REST API
- Clean retry
- Manage rate limiting Error management Application logging Example with a handmade Twitter client
- Manage rate limiting
- Error management
- Application logging
- Example with a handmade twitter client
- Industrialize crawling
- Introduction to the basic mechanics of the framework
- Using Selenium by hand
- Using Scrappy and Selenium together