基于Hadoop的Hbase环境搭建
基于现有的Hadoop集群,来搭建Hbase的环境
整个过程还是比较简单的
1. 下载Hbase源码,并解压
cp hbase-0.20.6.tar.gz /opt/hadoop/
cd /opt/hadoop/
tar zxvf hbase-0.20.6.tar.gz
ln -s hbase-0.20.6 hbase
2.修改hbase-env.sh,加入java环境,并修改log位置
export JAVA_HOME=/opt/java/jdk
export HBASE_LOG_DIR=/opt/log/hbase
export HBASE_MANAGES_ZK=true
3. 修改hbase-site.xml,配置hbase
<property>
<name>hbase.rootdir</name>
<value>hdfs://zw-hadoop-master:9000/hbase</value>
<description>The directory shared by region servers.</description>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
<description>The mode the cluster will be in. Possible values are
false: standalone and pseudo-distributed setups with managed Zookeeper
true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
</description>
</property>
<property>
<name>hbase.master</name>
<value>hdfs://zw-hadoop-master:60000</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>zw-hadoop-slave225,zw-hadoop-slave226,zw-hadoop-slave227</value>
<description>Comma separated list of servers in the ZooKeeper Quorum. For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com". By default this is set to localhost for local and pseudo-distributed modes of operation. For a fully-distributed setup, this should be set to a full list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh this is the list of servers which we will start/stop ZooKeeper on.
</description>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/opt/log/zookeeper</value>
<description>Property from ZooKeeper's config zoo.cfg.
The directory where the snapshot is stored.
</description>
</property>
几个配置的说明:
- hbase.rootdir设置hbase在hdfs上的目录,主机名为hdfs的namenode节点所在的主机
- hbase.cluster.distributed设置为true,表明是完全分布式的hbase集群
- hbase.master设置hbase的master主机名和端口
- hbase.zookeeper.quorum设置zookeeper的主机,官方推荐设置为3,5,7比较好
4. 编辑regionservers文件,设置regionservers的服务器,和hadoop的slaves一样即可
5. 启动Hbase
/opt/sohuhadoop/hbase/bin/start-hbase.sh
/opt/sohuhadoop/hbase/bin/stop-hbase.sh
Hbase默认只有一个Master,我们可以也启动多个Master:
/opt/sohuhadoop/hbase/bin/hbase-daemon.sh start master
不过,其它的Master并不会工作,只有当主Master down掉后
其它的Master才会选择接管Master的工作
Hbase也有一个简单的web界面,来查看其状态
http://10.10.71.1:60010/master.jsp
http://10.10.71.1:60030/regionserver.jsp
http://10.10.71.1:60010/zk.jsp
hi,安装hbase前必须安装hadoop么?我直接在ubuntu11.04下安装hbase运行失败了;
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up proxy interface org.apache.hadoop.hbase.ipc.HRegionInterface to /127.0.0.1:51193 after attempts=1
at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:355)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:965)
at org.apache.hadoop.hbase.master.ServerManager.getServerConnection(ServerManager.java:606)
at org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:541)
at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:920)
at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:730)
at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:710)
at org.apache.hadoop.hbase.master.AssignmentManager.assignRoot(AssignmentManager.java:1189)
at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:432)
at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:389)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:283)
at org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.run(HMasterCommandLine.java:193)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.net.ConnectException: 拒绝连接
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328)
at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:883)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:750)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
at $Proxy7.getProtocolVersion(Unknown Source)
at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
... 12 more
zeyuan 说道:对不起,这几天比较忙,回复的晚了。第四张图引用自HBase Architecture 101 Write-ahead-Log根据habse0.90代码,HLog是每个region server一个,不是每个region一个,因此这张图是错的,文中没有写错。这样做是为了提高hbase的写性能,在Google Big Table论文中有详细描述:Commit-log imaetmentplionIf we kept the commit log for each tablet in a separatelog 02le, a very large number of 02les would be writtenconcurrently in GFS. Depending on the underlying 02lesystem imaetmentplion on each GFS server, these writescould cause a large number of disk seeks to write to thedifferent physical log 02les. In addition, having separatelog 02les per tablet also reduces the effectiveness of thegroup commit optimization, since groups would tend tobe smaller. To 02x these issues, we append mutationsto a single commit log per tablet server, co-minglingmutations for different tablets in the same physical log02le [18, 20].如果一个region server意外失效,在恢复其上的数据时,会先将Hlog按照region进行切分,在分发到未这些region提供服务的机器上去。非常感谢你的关注。欢迎交流。