Cut and paste of some resources on Hadoop Developer Certification.
http://www.fromdev.com/2010/12/interview-questions-hadoop-mapreduce.html
"To clear this test you need to have a very good understanding of the flow of data in Hadoop, i.e. how the files are stored and read. You should be able to visualize on how the MapReduce programs interact with data and how they process them as key-value pairs."
QUESTION NO: 1
When is the earliest point at which the reduce method of a given Reducer can be called?
A. As soon as at least one mapper has finished processing its input split.
B. As soon as a mapper has emitted at least one record.
C. Not until all mappers have finished processing all records.
D. It depends on the InputFormat used for the job.
Answer: C
Explanation: In a MapReduce job reducers do not start executing the reduce method until the all Map jobs have completed. Reducers start copying intermediate key-value pairs from the mappers as soon as they are available. The programmer defined reduce method is called only after all the mappers have finished.
QUESTION NO: 2
Which describes how a client reads a file from HDFS?
A. The client queries the NameNode for the block location(s). The NameNode returns the block location(s) to the client. The client reads the data directory off the DataNode(s).
B. The client queries all DataNodes in parallel. The DataNode that contains the requested data responds directly to the client. The client reads the data directly off the DataNode.
C. The client contacts the NameNode for the block location(s). The NameNode then queries the DataNodes for block locations. The DataNodes respond to the NameNode, and the NameNode redirects the client to the DataNode that holds the requested data block(s). The client then reads the data directly off the DataNode.
D. The client contacts the NameNode for the block location(s). The NameNode contacts the
DataNode that holds the requested data block. Data is transferred from the DataNode to the
NameNode, and then from the NameNode to the client.
Answer: C
Explanation: The Client communication to HDFS happens using Hadoop HDFS API. Client
applications talk to the NameNode whenever they wish to locate a file, or when they want to
add/copy/move/delete a file on HDFS. The NameNode responds the successful requests by
returning a list of relevant DataNode servers where the data lives. Client applications can talk directly to a DataNode, once the NameNode has provided the location of the data.
Reference: 24 Interview Questions & Answers for Hadoop MapReduce developers, How the Client communicates with HDFS?
QUESTION NO: 3
You are developing a combiner that takes as input Text keys, IntWritable values, and emits Text keys, IntWritable values. Which interface should your class implement?
A. Combiner <Text, IntWritable, Text, IntWritable>
B. Mapper <Text, IntWritable, Text, IntWritable>
C. Reducer <Text, Text, IntWritable, IntWritable>
D. Reducer <Text, IntWritable, Text, IntWritable>
E. Combiner <Text, Text, IntWritable, IntWritable>
Answer: D
QUESTION NO: 4
Indentify the utility that allows you to create and run MapReduce jobs with any executable or script as the mapper and/or the reducer?
A. Oozie
B. Sqoop
C. Flume
D. Hadoop Streaming
E. mapred
Answer: D
Explanation: Hadoop streaming is a utility that comes with the Hadoop distribution. The utility allows you to create and run Map/Reduce jobs with any executable or script as the mapper and/or the reducer.
Reference: http://hadoop.apache.org/common/docs/r0.20.1/streaming.html (Hadoop Streaming, second sentence)
QUESTION NO: 5
How are keys and values presented and passed to the reducers during a standard sort and shuffle phase of MapReduce?
A. Keys are presented to reducer in sorted order; values for a given key are not sorted.
B. Keys are presented to reducer in sorted order; values for a given key are sorted in ascending order.
C. Keys are presented to a reducer in random order; values for a given key are not sorted.
D. Keys are presented to a reducer in random order; values for a given key are sorted in
ascending order.
Answer: A
Explanation: Reducer has 3 primary phases:
1. Shuffle
The Reducer copies the sorted output from each Mapper using HTTP across the network.
2. Sort
The framework merge sorts Reducer inputs by keys (since different Mappers may have output the The shuffle and sort phases occur simultaneously i.e. while outputs are being fetched they are merged.
SecondarySort
To achieve a secondary sort on the values returned by the value iterator, the application should extend the key with the secondary key and define a grouping comparator. The keys will be sorted using the entire key, but will be grouped using the grouping comparator to decide which keys and values are sent in the same call to reduce.
3. Reduce
In this phase the reduce(Object, Iterable, Context) method is called for each <key, (collection of values)> in the sorted inputs.
The output of the reduce task is typically written to a RecordWriter via TaskInputOutputContext.write(Object, Object).
The output of the Reducer is not re-sorted.
Reference: org.apache.hadoop.mapreduce, Class
Reducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT>
http://www.fromdev.com/2010/12/interview-questions-hadoop-mapreduce.html
"To clear this test you need to have a very good understanding of the flow of data in Hadoop, i.e. how the files are stored and read. You should be able to visualize on how the MapReduce programs interact with data and how they process them as key-value pairs."
QUESTION NO: 1
When is the earliest point at which the reduce method of a given Reducer can be called?
A. As soon as at least one mapper has finished processing its input split.
B. As soon as a mapper has emitted at least one record.
C. Not until all mappers have finished processing all records.
D. It depends on the InputFormat used for the job.
Answer: C
Explanation: In a MapReduce job reducers do not start executing the reduce method until the all Map jobs have completed. Reducers start copying intermediate key-value pairs from the mappers as soon as they are available. The programmer defined reduce method is called only after all the mappers have finished.
QUESTION NO: 2
Which describes how a client reads a file from HDFS?
A. The client queries the NameNode for the block location(s). The NameNode returns the block location(s) to the client. The client reads the data directory off the DataNode(s).
B. The client queries all DataNodes in parallel. The DataNode that contains the requested data responds directly to the client. The client reads the data directly off the DataNode.
C. The client contacts the NameNode for the block location(s). The NameNode then queries the DataNodes for block locations. The DataNodes respond to the NameNode, and the NameNode redirects the client to the DataNode that holds the requested data block(s). The client then reads the data directly off the DataNode.
D. The client contacts the NameNode for the block location(s). The NameNode contacts the
DataNode that holds the requested data block. Data is transferred from the DataNode to the
NameNode, and then from the NameNode to the client.
Answer: C
Explanation: The Client communication to HDFS happens using Hadoop HDFS API. Client
applications talk to the NameNode whenever they wish to locate a file, or when they want to
add/copy/move/delete a file on HDFS. The NameNode responds the successful requests by
returning a list of relevant DataNode servers where the data lives. Client applications can talk directly to a DataNode, once the NameNode has provided the location of the data.
Reference: 24 Interview Questions & Answers for Hadoop MapReduce developers, How the Client communicates with HDFS?
QUESTION NO: 3
You are developing a combiner that takes as input Text keys, IntWritable values, and emits Text keys, IntWritable values. Which interface should your class implement?
A. Combiner <Text, IntWritable, Text, IntWritable>
B. Mapper <Text, IntWritable, Text, IntWritable>
C. Reducer <Text, Text, IntWritable, IntWritable>
D. Reducer <Text, IntWritable, Text, IntWritable>
E. Combiner <Text, Text, IntWritable, IntWritable>
Answer: D
QUESTION NO: 4
Indentify the utility that allows you to create and run MapReduce jobs with any executable or script as the mapper and/or the reducer?
A. Oozie
B. Sqoop
C. Flume
D. Hadoop Streaming
E. mapred
Answer: D
Explanation: Hadoop streaming is a utility that comes with the Hadoop distribution. The utility allows you to create and run Map/Reduce jobs with any executable or script as the mapper and/or the reducer.
Reference: http://hadoop.apache.org/common/docs/r0.20.1/streaming.html (Hadoop Streaming, second sentence)
QUESTION NO: 5
How are keys and values presented and passed to the reducers during a standard sort and shuffle phase of MapReduce?
A. Keys are presented to reducer in sorted order; values for a given key are not sorted.
B. Keys are presented to reducer in sorted order; values for a given key are sorted in ascending order.
C. Keys are presented to a reducer in random order; values for a given key are not sorted.
D. Keys are presented to a reducer in random order; values for a given key are sorted in
ascending order.
Answer: A
Explanation: Reducer has 3 primary phases:
1. Shuffle
The Reducer copies the sorted output from each Mapper using HTTP across the network.
2. Sort
The framework merge sorts Reducer inputs by keys (since different Mappers may have output the The shuffle and sort phases occur simultaneously i.e. while outputs are being fetched they are merged.
SecondarySort
To achieve a secondary sort on the values returned by the value iterator, the application should extend the key with the secondary key and define a grouping comparator. The keys will be sorted using the entire key, but will be grouped using the grouping comparator to decide which keys and values are sent in the same call to reduce.
3. Reduce
In this phase the reduce(Object, Iterable, Context) method is called for each <key, (collection of values)> in the sorted inputs.
The output of the reduce task is typically written to a RecordWriter via TaskInputOutputContext.write(Object, Object).
The output of the Reducer is not re-sorted.
Reference: org.apache.hadoop.mapreduce, Class
Reducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT>
Hadoop Developer online training| Hadoop Developer ...
ReplyDeletewww.21cssindia.com/courses/hadoop-online-training-182.html
hadoop developer online training, hadoop developer training, hadoop developer course contents, hadoop developer, hadoop developer enquiry, hadoop ...- Employees to learn at their own pace and maintain control of learning “where, when and how” with boundless access 24/7by 21st Century Software Solutions. contact@21cssindia.com ---- Call Us +917386622889
Higher Level Abstractions for MapReduce - 2 - Hive - Introduction - Hive QL - Hive User Defined Functions - Hive Use Cases - NOSQL Databases - NoSQL Concepts - Review of RDBMS - - Need for NOSQL - Brewers CAP Theorem - ACID vs BASE - Different Types of NoSQL Databases - Key Value - Columnar - Document - Graph - Columnar Databases - Hadoop Ecosystem - HBASE vs Cassandra - HBASE Architecture - HBASE Data Modeling - HBASE Commands - HBASE Coprocessors - Endpoints - HBASE Coprocessors - Observers - SQOOP - Flume & OOZIE.. - http://www.21cssindia.com/courses/hadoop-online-training-182.html
ReplyDeleteEmployees to learn at their own pace and maintain control of learning “where, when and how” with boundless access 24/7by 21st Century Software Solutions. contact@21cssindia.com
Hadoop Developer online training| Hadoop Developer ...
ReplyDeletewww.21cssindia.com/courses/hadoop-online-training-182.html
hadoop developer online training, hadoop developer training, hadoop developer course contents, hadoop developer, hadoop developer enquiry, hadoop ...Many more… | Call Us +917386622889
Visit: http://www.21cssindia.com/courses.html
Thanks for InformationHadoop Course will provide the basic concepts of MapReduce applications developed using Hadoop, including a close look at framework components, use of Hadoop for a variety of data analysis tasks, and numerous examples of Hadoop in action. This course will further examine related technologies such as Hive, Pig, and Apache Accumulo. HADOOP Online Training
ReplyDeleteHi, thanks for sharing your the tips for clearing the certification with us. This would really be helpful for the newbies in understanding the basics of Hadoop Developer Trainings. Would also like to suggest the newbies seeking for more information to visit this page as well - https://intellipaat.com/hadoop-developer-training/
ReplyDeleteThank you so much for sharing this worthwhile to spent time on. You are running a really awesome blog. Keep up this good work Big Data Hadoop Training in Chennai
ReplyDeleteLearning new technology would give oneself a true confidence in the current emerging Information Technology domain. With the knowledge of big data the most magnificent cloud computing technology one can go the peek of data processing. As there is a drastic improvement in this field everyone are showing much interest in pursuing this technology. Your content tells the same about evolving technology. Thanks for sharing this.
ReplyDeleteHadoop Training in Chennai | Best hadoop training institute in chennai | Big Data Hadoop Training in Chennai | Hadoop Course in Chennai
I have finally found a Worth able content to read. The way you have presented information here is quite impressive. I have bookmarked this page for future use. Thanks for sharing content like this once again. Keep sharing content like this.
ReplyDeleteSoftware testing training in chennai | Testing training in chennai | Manual testing training in Chennai
There is a huge demand for professional big data analysts who are able to use the software which is used to process the big data in order to get accurate results. MNC's are looking for professionals who can process their data so that they can get into a accurate business decision which would eventually help them to earn more profits, they can serve their customers better, and their risk is lowered.
ReplyDeletebig data training in chennai|big data training|big data course in chennai|big data training chennai|big data hadoop training in chennai
SAS stands for statistical analysis system which is a analysis tool developed by SAS institute and with the help of this tool data driven decisions can be taken which is helpful for the bsuiness.
ReplyDeleteSAS training in Chennai | SAS course in Chennai | SAS training institute in Chennai
Thank you for your guide to with upgrade information about Hadoop
ReplyDeleteHadoop Admin Online Course
ReplyDeleteThis is quite educational arrange. It has famous breeding about what I rarity to vouch. Colossal proverb.
This trumpet is a famous tone to nab to troths. Congratulations on a career well achieved. This arrange is synchronous s informative impolites festivity to pity. I appreciated what you ok extremely here
Selenium training in bangalore
Selenium training in Chennai
Selenium training in Bangalore
Selenium training in Pune
Selenium Online training
Thanks For sharing Your information The Information Shared Is Very Valuable Please Keep updating Us Time Just Went On Redaing The Article Python Online Course Devops Online Course Data Science Online Course Aws Science Online Course
ReplyDeleteHaving read this I thought it was extremely informative. I appreciate you spending some time and energy to put this informative article together. I once again find myself spending way too much time both reading and leaving theme comments. But so what, it was still worth it!
ReplyDeletegreat information thanks for sharing this nice blog
ReplyDeleteIeee Project Titles 2020
Ieee Project Titles 2020 Download
Latest Ieee Paper Titles 2020
Final Year Projects 2020
Final Year Ieee Projects 2020
it very student Hadoop Developer Certification resources all your information very interesting..
ReplyDeleteLatest ieee Paper Titles 2020
Final Year Projects 2020
Final Year Ieee Projects 2020
keep up the good work. this is an Assam post. this to helpful, i have reading here all post.Tank you. Android Training in Chennai | Certification | Mobile App Development Training Online | Android Training in Bangalore | Certification | Mobile App Development Training Online | Android Training in Hyderabad | Certification | Mobile App Development Training Online | Android Training in Coimbatore | Certification | Mobile App Development Training Online | Android Training in Online | Certification | Mobile App Development Training Online
ReplyDeleteI would really like to read some personal experiences like the way, you've explained through the above article. I'm glad for your achievements and would probably like to see much more in the near future. Thanks for share.
ReplyDeleteSalesforce Training in Chennai
Salesforce Online Training in Chennai
Salesforce Training in Bangalore
Salesforce Training in Hyderabad
Salesforce training in ameerpet
Salesforce Training in Pune
Salesforce Online Training
Salesforce Training
Thanks for sharing the valuable information. it’s really helpful. Who want to learn this blog most helpful. Keep sharing on updated tutorials..thanks lot!!
ReplyDeleteoracle training in chennai
oracle training in annanagar
oracle dba training in chennai
oracle dba training in annanagar
ccna training in chennai
ccna training in annanagar
seo training in chennai
seo training in annanagar
Excellent content!!! After reading your blog, I am curious to read the next part of the blog.
ReplyDeletehardware and networking training in chennai
hardware and networking training in tambaram
xamarin training in chennai
xamarin training in tambaram
ios training in chennai
ios training in tambaram
iot training in chennai
iot training in tambaram
Thanks for your informative article, Your post helped me to understand the future and career prospects & Keep on updating your blog with such awesome article.
ReplyDeleteoracle training in chennai
oracle training in omr
oracle dba training in chennai
oracle dba training in omr
ccna training in chennai
ccna training in omr
seo training in chennai
seo training in omr
Too good article,keep sharing more posts with.
ReplyDeletethank you....
big data online course
big data and hadoop online training