Talend Open Studio for Big Data, exploiter vos données massives

Description

Talend's data integration platform extends its capabilities to Big Data technologies such as Hadoop (HDFS, HBase, HCatalog, Hive and Pig) and the NoSQL Cassandra and MongoDB databases. This internship will provide you with the basics to properly use Talend components created to communicate with Big Data systems.

Who is this training for ?

For whom ?

Data managers, architects, business intelligence consultants.

Prerequisites

Experience using the Talend Open Studio For Data Integration tool or skills acquired during the "Talend Open Studio, implementing data integration" training

Training objectives

Read/write data to HDFS/HBase/HCatalog Perform transformation jobs using Pig and Hive Using Scoop to facilitate migration of relational databases into Hadoop Adopt good practices and design flexible and robust information systems

Training program

Présentation de Talend Open Studio for Big Data

Big Data issues: the 3V model, use cases.

The Hadoop ecosystem (HDFS, MapReduce, HBase, Hive, Pig...)

Unstructured data and NoSQL databases.

TOS for Big Data versus TOS for Data Integration.

Practical work: Installation/configuration of TOS for Big Data and a Hadoop cluster (Cloudera or Hortonworks), verification of proper operation.

Data integration in a cluster and NoSQL databases

Definition of Hadoop cluster connection metadata.

Connection to a MongoDB, Neo4j, Cassandra or Hbase database and data export.

Simple data integration with a Hadoop cluster.

Capture tweets (extension components) and direct import into HDFS.

Practical work: Read tweets and store them as files in HDFS, analyze the frequency of the themes covered and memorize the result in HBase.

Import / Export with SQOOP

Use Sqoop to import, export, update data between RDBMS and HDFS systems.

Import/export partial, incremental tables.

Import/Export a SQL database from and to HDFS.

Storage formats in Big Data (AVRO, Parquet, ORC, etc.).

Practical work: Carry out a data migration relational tables on HDFS and vice versa.

Perform manipulations on the data

Presentation of the PIG building block and its PigLatin language.

Talend's main Pig components, Pig flow design.

Development of UDF routines.

Practical work: Identify trends in the use of a website from the analysis of its logs.

Architecture and best practices in a Hadoop cluster

Design efficient storage in HADOOP.

Datalake versus Datawarehouse, should we choose?

HADOOP and the Activity Return Plan (PRA) in case major incident.

Automate your workflows.

Practical work: Create your data lake and automate its operation.

Analyze and store your data with Hive

Hive connection and schema metadata.

The HiveQL language.

Hive flow design, query execution.

Implement implements Hive's ELT components.

Practical work: Store the evolution of the price of a stock in HBase, consolidate this flow with Hive so as to materialize its evolution hour by hour for a given day.

1067
14 h

Log in

Or create your account

You have just added to your selection

Description

Who is this training for ?

Training objectives

Training program

Submit your review

Training in our centers

SII-298

2 Days ( 14 hrs)

Training in your company

SII-298

2 Days ( 14 h)

On-demand training

Training

Certifications

Services

About Us

Log in

Or create your account

You have just added to your selection

Talend open studio for big data, exploit your massive data

Skills Campus

Description

Who is this training for ?

Training objectives

Training program

Submit your review

Training in our centers

SII-298

2 Days ( 14 hrs)

Training in your company

SII-298

2 Days ( 14 h)

On-demand training

les Cookies