hadoop部分(单节点):
1.vim /etc/hosts
192.168.8.201 hadoop

2.创建hadoop 用户
useradd hadoop

hadoop 密码:123

3.安装JDK
[root@h201 ~]# vim /etc/profile
export JAVA_HOME=/usr/local/jdk1.8.0
export JAVA_BIN=$JAVA_HOME/bin
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export JAVA_HOME JAVA_BIN PATH CLASSPATH

4.安装ssh 证书(免密)
[hadoop@h201 ~]$ ssh-keygen -t rsa
[hadoop@h201 ~]$ ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub h201

5.
[hadoop@h201 hadoop]$ cp hadoop-2.6.0.tar.gz /home/hadoop

[hadoop@h201 ~]$ vi .bash_profile
HADOOP_HOME=/home/hadoop/hadoop-2.6.0
HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
PATH=$HADOOP_HOME/bin:$PATH
export HADOOP_HOME HADOOP_CONF_DIR PATH

[hadoop@h201 ~]$ source .bash_profile

7.
编辑hdfs-site.xml
mkdir -p /home/hadoop/data/dfs/name
mkdir -p /home/hadoop/data/dfs/data
mkdir -p /home/hadoop/data/dfs/namesecondary

更改配置文件:
[hadoop@h201 hadoop]$ vi hdfs-site.xml


dfs.namenode.secondary.http-address
hadoop:50090
The secondary namenode http server address andport.


dfs.namenode.name.dir
file:///home/hadoop/data/dfs/name
Path on the local filesystem where the NameNodestores the namespace and transactions logs persistently.


dfs.datanode.data.dir
file:///home/hadoop/data/dfs/data
Comma separated list of paths on the local filesystemof a DataNode where it should store its blocks.


dfs.namenode.checkpoint.dir
file:///home/hadoop/data/dfs/namesecondary
Determines where on the local filesystem the DFSsecondary name node should store the temporary images to merge. If this is acomma-delimited list of directories then the image is replicated in all of thedirectories for redundancy.

<name>dfs.replication</name>
<value>1</value>

8.编辑mapred-site.xml

[hadoop@h201 hadoop]$ cp mapred-site.xml.template mapred-site.xml


mapreduce.framework.name
yarn
Theruntime framework for executing MapReduce jobs. Can be one of local, classic oryarn.


mapreduce.jobhistory.address

<value>hadoop:10020</value>
<description>MapReduce JobHistoryServer IPC host:port</description>



mapreduce.jobhistory.webapp.address

<value>hadoop:19888</value>
<description>MapReduce JobHistoryServer Web UI host:port</description>


属性”mapreduce.framework.name“表示执行mapreduce任务所使用的运行框架,默认为local,需要将其改为”yarn”


9.
编辑yarn-site.xml
[hadoop@h201 hadoop]$ vi yarn-site.xml


yarn.resourcemanager.hostname
namenode
The hostname of theRM.


yarn.nodemanager.aux-services
mapreduce_shuffle
Shuffle service that needs to be set for Map Reduceapplications.

10.
[hadoop@h201 hadoop]$ vi hadoop-env.sh
export JAVA_HOME=/usr/local/jdk1.8.0

11.
[hadoop@h201 hadoop]$ vi slaves
localhost

12.验证:

格式化:
[hadoop@h201 hadoop-2.6.0]$ bin/hdfs namenode -format
[hadoop@h201 hadoop-2.6.0]$ sbin/start-all.sh

[hadoop@h201 hadoop-2.6.0]$ jps
7054 SecondaryNameNode
7844 Jps
7318 NameNode
7598 ResourceManager

[hadoop@hadoop hadoop-2.6.0]$ bin/hadoop fs -ls /
[hadoop@hadoop hadoop-2.6.0]$ bin/hadoop fs -mkdir /aaa
[hadoop@hadoop hadoop-2.6.0]$ bin/hadoop fs -mkdir /home/hadoop
[hadoop@hadoop hadoop-2.6.0]$ bin/hadoop fs -mkdir /home/hive
[hadoop@hadoop hadoop-2.6.0]$ bin/hadoop fs -mkdir /home/hbase
[hadoop@hadoop hadoop-2.6.0]$ bin/hadoop fs -mkdir /home/spark
[hadoop@hadoop hadoop-2.6.0]$ bin/hadoop fs -mkdir /home/spark
[hadoop@hadoop hadoop-2.6.0]$ bin/hadoop fs -mkdir -p /home/flink/checkpoints
[hadoop@hadoop hadoop-2.6.0]$ bin/hadoop fs -mkdir /tmp/hadoop
[hadoop@hadoop hadoop-2.6.0]$ bin/hadoop fs -mkdir /tmp/hive
[hadoop@hadoop hadoop-2.6.0]$ bin/hadoop fs -mkdir /tmp/hbase
[hadoop@hadoop hadoop-2.6.0]$ bin/hadoop fs -mkdir /tmp/spark
[hadoop@hadoop hadoop-2.6.0]$ bin/hadoop fs -chmod 777 -R /tmp

hive部分(更改配置文件):
hive-site.xml 这个百度下!

hbase部分(更改配置文件):
hbase-site.xml

<property> 
    <name>hbase.rootdir</name> 
    <value>hdfs://hadoop:9000/home/hbase</value> 
    <description>此参数指定了HRegion服务器的位置,即数据存放位置</description> 
</property>

<property> 
    <name>dfs.replication</name> 
    <value>1</value> 
    <description>此参数指定了Hlog和Hfile的副本个数,此参数的设置不能大于HDFS的节点数。伪分布式下DataNode只有一台,因此此参数应设置为1 </description>        </property> 

<property> 
    <name>hbase.cluster.distributed</name> 
    <value>true</value>
</property>
   

 <name>zookeeper.session.timeout</name>
 <value>1200000</value>

 <name>hbase.regionserver.handler.count</name>
 <value>50</value>

 <name>hbase.client.write.buffer</name>
 <value>8388608</value>

 <name>mapreduce.task.timeout</name>
 <value>1200000</value>

 <name>hbase.client.scanner.timeout.period</name>
 <value>600000</value>

 <name>hbase.rpc.timeout</name>
 <value>600000</value>

spark部分(更改配置文件):
spark-defaults.conf
spark.master spark://hadoop:7077
spark.default.parallelism 6
spark.driver.memory 2g
spark.executor.memory 2g
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.sql.shuffle.partitions 1
spark.kryoserializer.buffer.max=1g
spark.kryoserializer.buffer=1g

spark.executor.extraClassPath /home/hadoop/hive-2.3.3/lib/mysql-connector-java-8.0.13.jar:/home/hadoop/hive-2.3.3/lib/hive-hbase-handler-2.3.3.jar:/home/hadoop/hbase-2.1.1/lib/hbase-client-2.1.1.jar:/home/hadoop/hbase-2.1.1/lib/hbase-server-2.1.1.jar:/home/hadoop/hbase-2.1.1/lib/hbase-common-2.1.1.jar:/home/hadoop/hbase-2.0.2/lib/hbase-protocol-shaded-2.0.2.jar:/home/hadoop/hbase-2.0.2/lib/hbase-protocol-2.0.2.jar:/home/hadoop/hbase-2.0.2/lib/htrace-core-3.2.0-incubating.jar:/home/hadoop/hbase-2.0.2/lib/htrace-core4-4.2.0-incubating.jar:/home/hadoop/hbase-2.0.2/lib/metrics-core-3.2.1.jar:/home/hadoop/hbase-2.0.2/lib/hbase-hadoop2-compat-2.0.2.jar:/home/hadoop/hbase-2.0.2/lib/hbase-hadoop-compat-2.0.2.jar:/home/hadoop/hbase-2.0.2/lib/guava-11.0.2.jar:/home/hadoop/hbase-2.0.2/lib/protobuf-java-2.5.0.jar

spark.driver.extraClassPath /home/hadoop/hive-2.3.3/lib/mysql-connector-java-8.0.13.jar:/home/hadoop/hive-2.3.3/lib/hive-hbase-handler-2.3.3.jar:/home/hadoop/hbase-2.1.1/lib/hbase-client-2.1.1.jar:/home/hadoop/hbase-2.1.1/lib/hbase-server-2.1.1.jar:/home/hadoop/hbase-2.1.1/lib/hbase-common-2.1.1.jar:/home/hadoop/hbase-2.0.2/lib/hbase-protocol-shaded-2.0.2.jar:/home/hadoop/hbase-2.0.2/lib/hbase-protocol-2.0.2.jar:/home/hadoop/hbase-2.0.2/lib/htrace-core-3.2.0-incubating.jar:/home/hadoop/hbase-2.0.2/lib/htrace-core4-4.2.0-incubating.jar:/home/hadoop/hbase-2.0.2/lib/metrics-core-3.2.1.jar:/home/hadoop/hbase-2.0.2/lib/hbase-hadoop2-compat-2.0.2.jar:/home/hadoop/hbase-2.0.2/lib/hbase-hadoop-compat-2.0.2.jar:/home/hadoop/hbase-2.0.2/lib/guava-11.0.2.jar:/home/hadoop/hbase-2.0.2/lib/protobuf-java-2.5.0.jar

spark-env.sh

export JAVA_HOME=/usr/local/jdk1.8.0
export HADOOP_HOME=/home/hadoop/hadoop-2.6.0
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HBASE_HOME=/home/hadoop/hbase-1.3.1
export HBASE_CONF=$HBASE_HOME/conf
export HIVE_HOME=/home/hadoop/hive-2.3.3
export SPARK_HOME=/home/hadoop/spark-2.1.1
export SPARK_MASTER_IP=hadoop
export SPARK_WORKER_MEMORY=1G
export SPARK_WORKER_CORES=1
export SPARK_WORKER_INSTANCES=1
export SPARK_EXECUTOR_INSTANCES=1

flink部分(更改配置文件):
下载地址:https://mirrors.bfsu.edu.cn/apache/flink/flink-1.12.5/flink-1.12.5-bin-scala_2.11.tgz
可参考网盘上的内容

flink-streaming-platform-web(更改配置文件):
这个只有一个配置文件:
application.properties
(就是mysql jdbc)

这些都是配置文件,可以到我们网盘上下载!
http://oneindex.iegum.com

重点讲解下flink部分
原理:
sum原理:X=X+Y
其中X存在于内存,Y存在于kafka!类似于这种模式,也就是说只要是聚合函数统一用这种模式!换而已言之永远是在处理两个值

我以flink-sql 讲解:
flink可以把kafka中的流数据映射成一张表,比如为a表.
那阿里那个大屏来说,方便大家理解,
比如a表中有一个字段total 为交易现金!
阿里数据属于海量数据,如果统计一天,用sql角度来讲:
就是select sum(total) from a;
数据量非常庞大,肯定资源不够!
如果用数据流的方式,那么就变成这样了!
我们采集A表到kafka,然后flink映射kafka_a表,在关联结果表 total_a
语句变成:
select sum(total) from (select total from kafka_a union all select total from total_a) tmp

flink第一次会从结果表total_a读取一次,然后union kafka_a 的total 计算并放在内存中,
后面kafka再有流数据过来,就变成从直接用内存中的值+kafka_a得total
相当于只从结果表读一次,后面不走硬盘,所以速度非常快!可以说flink真是小企业的福音,小配置一样可以高速处理数据!

图我就放在我们网站上了!后面直接用数据处理模型给大家讲解!