Unlock the potential of generative AI across all your managerial functions.
Log in
Or create your account
You have just added to your selection
Your cart is empty, See our trainings

Description

Talend's data integration platform extends its capabilities to Big Data technologies such as Hadoop (HDFS, HBase, HCatalog, Hive and Pig) and the NoSQL Cassandra and MongoDB databases. This internship will provide you with the basics to properly use Talend components created to communicate with Big Data systems.

Who is this training for ?

For whom ?

Data managers, architects, business intelligence consultants.

Prerequisites

Experience using the Talend Open Studio For Data Integration tool or skills acquired during the "Talend Open Studio, implementing data integration" training

Training objectives

  • Read/write data to HDFS/HBase/HCatalog Perform transformation jobs using Pig and Hive Using Scoop to facilitate migration of relational databases into Hadoop Adopt good practices and design flexible and robust information systems
  • Training program

      • Big Data issues: the 3V model, use cases.
      • The Hadoop ecosystem (HDFS, MapReduce, HBase, Hive, Pig...)
      • Unstructured data and NoSQL databases.
      • TOS for Big Data versus TOS for Data Integration.
      • Practical work: Installation/configuration of TOS for Big Data and a Hadoop cluster (Cloudera or Hortonworks), verification of proper operation.
      • Definition of Hadoop cluster connection metadata.
      • Connection to a MongoDB, Neo4j, Cassandra or Hbase database and data export.
      • Simple data integration with a Hadoop cluster.
      • Capture tweets (extension components) and direct import into HDFS.
      • Practical work: Read tweets and store them as files in HDFS, analyze the frequency of the themes covered and memorize the result in HBase.
      • Use Sqoop to import, export, update data between RDBMS and HDFS systems.
      • Import/export partial, incremental tables.
      • Import/Export a SQL database from and to HDFS.
      • Storage formats in Big Data (AVRO, Parquet, ORC, etc.).
      • Practical work: Carry out a data migration relational tables on HDFS and vice versa.
      • Presentation of the PIG building block and its PigLatin language.
      • Talend's main Pig components, Pig flow design.
      • Development of UDF routines.
      • Practical work: Identify trends in the use of a website from the analysis of its logs.
      • Design efficient storage in HADOOP.
      • Datalake versus Datawarehouse, should we choose?
      • HADOOP and the Activity Return Plan (PRA) in case major incident.
      • Automate your workflows.
      • Practical work: Create your data lake and automate its operation.
      • Hive connection and schema metadata.
      • The HiveQL language.
      • Hive flow design, query execution.
      • Implement implements Hive's ELT components.
      • Practical work: Store the evolution of the price of a stock in HBase, consolidate this flow with Hive so as to materialize its evolution hour by hour for a given day.
    • 1030
    • 14 h

    Submit your review

    Translated By Google Translate