Train together, save more! 10% off individual registration, 20% off for pairs.
×
Log in
Or create your account
You have just added to your selection
Your cart is empty, See our trainings

Description

The Apache Hadoop platform makes it easier to create distributed applications. This internship will allow you to understand its architecture and give you the knowledge necessary to install, configure and administer a Hadoop cluster. You will also learn how to optimize it and maintain it over time.

Who is this training for ?

For whom ?

Hadoop cluster administrators, developers.

Prerequisites

Training objectives

  • Discover the concepts and issues related to HadoopOptimize the platform
  • Understand how the platform and its components work
  • Install the platform and manage it
  • Training program

      • Big Data challenges and contributions of the Hadoop framework.
      • Presentation of the Hadoop architecture.
      • Description of the main components of the Hadoop platform.
      • Presentation of the main market distributions and complementary tools (Cloudera, MapR, Dataiku.
      • ).
      • Advantages/disadvantages of the platform.
      • Hadoop Distributed File System (HDFS) working principles.
      • MapReduce working principles.
      • Cluster "type" design.
      • Hardware selection criteria.
      • Practical work Configuration of the Hadoop cluster.
      • Deployment type.
      • Installation of Hadoop.
      • Installation of other components (Hive, Pig, HBase, Flume.
      • ).
      • Practical work Installation of a Hadoop platform and main components.
      • Management of Hadoop cluster nodes.
      • TaskTracker, JobTracker for MapReduce.
      • Management of tasks via schedulers.
      • Management of logs.
      • Using a manager.
      • Practical work List jobs, queue status, job status, task management, access to the web UI.
      • Import of external data (files, relational databases) to HDFS.
      • Handling of HDFS files.
      • Practical work Import external data with Flume, consult relational databases with Sqoop.
      • Authorization and security management.
      • Recovery from name node failure (MRV1).
      • NameNode high availability (MRV2/YARN).
      • Practical work Configuration of a service-level authentication (SLA) and an Access Control List (ACL).
      • Monitoring (Ambari, Ganglia.
      • ).
      • Benchmarking/profiling of a cluster.
      • Apache GridMix tools, Vaaidya .
      • Choose block size.
      • Other tuning options (use of compression, memory configuration.
      • ).
      • Practical work Understand cluster monitoring and optimization commands as they come.
    • 972
    • 28 h

    Submit your review

    Translated By Google Translate