Description
The continued growth of digital data within businesses and public organizations has given rise to the concept of “Big Data”. This term refers to the management and preservation of vast amounts of data, and the potential value they represent. This seminar addresses the specific challenges linked to Big Data as well as possible technical solutions for the management and processing of these masses of data. These solutions involve a break from traditional analysis methods due to the large quantity of data to be processed.
Who is this training for ?
For whom ?
IS Directors, IS Managers, Project Managers, Architects, Consultants or any person required to participate in a Big Data project.
Prerequisites
Basic knowledge of technical architectures.
Training objectives
Training program
- Introduction
- The origins of Big Data: a world of digital data, e-Health, chronology.
- A definition by the four Vs: the provenance of data.
- A rupture: changes in quantity, quality, habits.
- The value of data: a change in importance.
- Data as a raw material.
- The fourth paradigm of scientific discovery.
- Big Data: processing, from acquisition to result
- The sequence of operations.
- Acquisition.
- Data collection: crawling, scraping.
- Flow management event processing (Complex Event Processing, CEP).
- Indexing the incoming flow.
- Integration with old data.
- Data quality: a fifth V? The different types of processing: research, learning (machine learning, transactional, data mining).
- Other sequencing models: Amazon, e-Health.
- One or more data repositories? From Hadoop to in-memory.
- From tone analysis to knowledge discovery.
- Relationships between Cloud and Big Data
- The architectural model of public and private Clouds.
- XaaS services.
- The objectives and advantages of Cloud architectures.
- Infrastructure.
- The equalities and differences between Cloud and Big Data.
- Storage clouds.
- Classification, security and confidentiality of data.
- Structure as a classification criterion: unstructured, structured, semi-structured.
- Classification according to life cycle: temporary or permanent data, active archives.
- Difficulties in terms of security: increase in volumes, distribution.
- Potential solutions.
- Introduction to Open Data
- The philosophy of open data and the objectives.
- The liberation of public data.
- The difficulties of implementation.
- The essential characteristics of open data.
- Areas of application.
- The expected benefits.
- Hardware for storage architectures
- Servers, disks, network and the use of SSD disks, the importance of network infrastructure.
- Cloud architectures and more traditional architectures.
- The advantages and difficulties.
- The TCO.
- Power consumption: servers (IPNM), disks (MAID).
- Object storage: principle and advantages.
- Object storage compared to traditional NAS and SAN storage.
- Software architecture.
- Implementation levels of data management storage.
- The "Software Defined Storage".
- Centralized architecture (Hadoop File System).
- Peer-to-Peer architecture and 'mixed architecture.
- Interfaces and connectors: S3, CDMI, FUSE, etc.
- Future of other storage (NAS, SAN) compared to object storage.
- Data protection
- Preservation over time in the face of increases in volume.
- Backup, online or local? The traditional archive and the active archive.
- Links with storage hierarchy management: future of magnetic tapes.
- Multi-site replication.
- The degradation of storage media.
- Treatment methods and fields of application
- Classification of analysis methods according to data volume and processing power.
- Hadoop: the Map Reduce processing model.
- The Hadoop ecosystem : Hive, Pig.
- The difficulties of Hadoop.
- Openstack and the Ceph data manager.
- Complex Event Processing: an example?
- From BI to Big Data.
- Renewed decision-making and transactional: NoSQL databases.
- Typology and examples.
- Data ingestion and indexing.
- Two examples: splunk and Logstash.
- Open source crawlers.
- Search and analysis: elasticsearch.
- Learning: Mahout.
- In-memory.
- Visualization: real time or not, on the Cloud (Bime), comparison Qlikview, Tibco Spotfire , Tableau.
- A general architecture of data mining via Big Data.
- Use cases through examples and conclusion
- Anticipation: user needs in businesses, equipment maintenance.
- Security: people, fraud detection (postal, taxes), the network.
- Anticipation: user needs in businesses, equipment maintenance.
- Recommendation.
- Marketing analyzes and impact analyses.
- Course analyses.
- Video content distribution.
- Big Data for the automotive industry? For the oil industry? Should we embark on a Big Data project? What future for data? Governance of data storage: role and recommendations , the data scientist, the skills of a Big Data project.