Big Data with Apache Hadoop
Duration : 4 Days (09.00 – 16.00)
Venue & Price : www.purnamaacademy.com
Registration : www.purnamaacademy.com
Description :
Hadoop adalah framework atau platform open source berbasis Java di bawah lisensi Apache untuk support aplikasi yang jalan pada Big Data. Hadoop menggunakan teknologi Google MapReduce dan Google File System (GFS) sebagai fondasinya.
Pada awalnya Hadoop dikembangkan oleh Doug Cutting dan Mike Cafarella pada tahun 2005 yang saat itu bekerja di Yahoo. Nama Hadoop berdasarkan mainan 'Gajah' anak dari Doug Cutting.
Beberapa point penting Hadoop :
1. Hadoop merupakan framework/Platform open source berbasis Java
2. Hadoop di bawah lisensi Apache
3. Hadoop untuk support aplikasi yang jalan pada Big Data
4. Hadoop dikembangkan oleh Doug Cutting
5. Hadoop gunakan teknologi Google MapReduce dan Google File System (GFS)
Hadoop optimal digunakan untuk menangani data dalam jumlah besar baik data Structured, Semi-structured, maupun Unstructured. Hadoop mereplikasi data di beberapa komputer (Klustering), sehingga jika salah satu komputer mati/problem maka data dapat diproses dari salah satu komputer lainnya yang masih hidup
Topics include:
Introduction to Hadoop and Big Data:
• What is Big Data?
• What are the challenges for processing big data?
• What technologies support big data?
• What is Hadoop?
• Why Hadoop?
• History of Hadoop
• Use cases of Hadoop
• RDBMS vs Hadoop
• When to use and when not to use Hadoop
• Ecosystem tour
• Vendor comparison
• Hardware Recommendations & Statistics
HDFS: Hadoop Distributed File System:
Significance of HDFS in Hadoop
• Features of HDFS
• 5 daemons of Hadoop
• Data Storage in HDFS
Introduction about Blocks
Data replication
• Accessing HDFS
CLI (Command Line Interface) and admin commands
Java Based Approach
• Fault tolerance
• Download Hadoop
• Installation and set-up of Hadoop
Start-up & Shut down process
• HDFS Federation
Map Reduce:
• Map Reduce Story
• Map Reduce Architecture
• How Map Reduce works
• Developing Map Reduce
• Map Reduce Programming Model
• Creating Input and Output Formats in Map Reduce Jobs
PIG:
• Introduction to Apache Pig
• Map Reduce Vs. Apache Pig
• SQL vs. Apache Pig
• Different data types in Pig
• Modes of Execution in Pig
• Grunt shell
• Loading data
• Exploring Pig
• Latin commands
HIVE:
• Hive introduction
• Hive architecture
• Hive vs RDBMS
• HiveQL and the shell
• Managing tables (external vs managed)
• Data types and schemas
• Partitions and buckets
HBASE:
• Architecture and schema design
• HBase vs. RDBMS
• HMaster and Region Servers
• Column Families and Regions
• Write pipeline
• Read pipeline
• HBase commands
Flume
SQOOP
Participants : (System Architecture, Database Administrator, IT Developer, IT Manager, CTO, CIO)
No comments:
Post a Comment