0) Environment preparation
(1) Modify IP
(2) Modify the mapping of host names and host names and IP addresses
(3) Turn off the firewall
(4) SSH free login
(5) Install jdk, configure environment variables, etc.
(6) Configure the Zookeeper cluster
1) Planning cluster
hadoop102 hadoop103 hadoop104
NameNode NameNode
JournalNode JournalNode JournalNode
DataNode DataNode DataNode
ZK ZK ZK
ResourceManager ResourceManager
NodeManager NodeManager NodeManager
2) Specific configuration
(1)yarn-site.xml
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!-Enable ResourceManager HA-> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <!-Develop the address of the two ResourceManager-> <property> <name>yarn.resourcemanager.cluster-id</name> <value>cluster-yarn1</value> </property> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>bigdata111</value> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>bigdata112</value> </property> <!-Specify the address of the zookeeper cluster-> <property> <name>yarn.resourcemanager.zk-address</name> <value>bigdata111:2181,bigdata112:2181,bigdata113:2181</value> </property> <!-Enable automatic recovery-> <property> <name>yarn.resourcemanager.recovery.enabled</name> <value>true</value> </property> <!-Specify the status information of the ResourceManager stored in Zookeeper cluster-> <property> <name>yarn.resourcemanager.store.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value> </property> </configuration> |
(2) Configuration information of other nodes synchronously
3) Start HDFS
(1) On each Journalnode node, enter the following command to start the Journalnode service:
sbin/hadoop-daemon.sh start journalnode
(2) On [nn1], format it and start:
bin/hdfs namenode -format
sbin/hadoop-daemon.sh start namenode
(3) On [nn2], synchronize NN1 metadata information:
bin/hdfs namenode -bootstrapStandby
(4) Start [nn2]:
sbin/hadoop-daemon.sh start namenode
(5) Start all Datanode
sbin/hadoop-daemons.sh start datanode
(6) Switch [nn1] to Active
bin/hdfs haadmin -transitionToActive nn1
4) Start Yarn
(1)bigdata112execution:
sbin/start-yarn.sh
(2)
Execution inbigdata112:
sbin/yarn-daemon.sh start resourcemanager
(3) View service status
bin/yarn rmadmin -getServiceState rm1
Congratulations to the construction and complete
Expand: HDFS Federation architecture design
- Namenode architecture
(1) restrictions of namespace (naming space)
Since the number of metadata (Metadata) in memory is stored in memory, the number of objects (files+blocks) that a single NameNode can be stored is limited by the Heap Size where NameNode is JVM. The 50G Heap can store an object of 2 billion (200million). These 2 billion objects support 4,000 Datanode and 12PB storage (assuming the average size of the file is 40MB). With the rapid growth of data, the demand for storage has also increased. A single Datanode increased from 4T to 36T, and the size of the cluster increased to 8,000 Datanode. Storage demand has increased from 12PB to greater than 100pb.
(2) isolation problem
Since the HDFS has only one NameNode and cannot isolate each program, one experimental program on HDFS is likely to affect the program running on the entire HDFS.
(3) Performance bottleneck
Since the HDFS architecture of a single nameNode is limited by the throughput of the entire HDFS file system, the throughput of the entire HDFS file system is limited by the throughput of a single nameNode.
——— Keep hunger and learn
Jackson_MVP